Decision Tree Complexity
Decision Trees usually have moderate time complexities
1. Quasilinear Complexity
Decision Trees have Quasilinear Complexity for training which can be represented as O(N*log(N)*P) in Big-O notation and they have Linear Complexity of O(K) for predictions where K is depth of the tree.
100+ years old algorithmic complexity notation
2. What is Big O?
Big O notation is commonly used to show time complexity or computation complexity of algorithms. It represents the worst-case scenario for computation regarding the specific algorithm. Big O arised from the work of German mathematicians Edmund Landau and Paul Bachman in late 19th century. “O” in “Big O” stands for “Order of Complexity”, a term that originated from German nomenclature of the same matter: “Ordnung der Komplexität”.
From fastest to slowest or least computationally exhaustive to most computationally exhaustive Big O Computational Complexities can be listed as:
- O(1): Constant Runtime or No Complexity
- O(n): Linear Complexity
- O(log n): Logarithmic Complexity
- O(n log n): Log-linear Complexity
- O(n^2): Quadratic Complexity. This is also the complexity of Support Vector Machines.
- O(2^n): Exponential Complexity
- O(n!): Factorial Complexity
A Kernel for Non-Linear Data
3. Data Size
Decision Trees is a quick machine learning algorithm that can be used with big data. It will handle millions of rows in a matter of a few minutes in most cases.
If data gets dimensionally too big meaning if it has 100s of features in addition to millions of rows this can be an issue since decision trees have high time complexity and won’t scale too well with dimensional data.
A Kernel for Sigmoid Data
4. Decision Tree Speed Tests
Decision Trees have O(Nlog(N)Pk) complexity where N is the rows, P the feature size and k depth of the tree.
Our tests with Decision Tree on a classification problem gave results as following:
56 features, max_depth=16
Decision Tree (100K): 9.00 seconds
Decision Tree (250K): 30.09 seconds
Decision Tree (500K): 53.95 seconds
Decision Tree (1M): 63.30 seconds
56 features, max_depth=4
Decision Tree (100K): 4.00 seconds
Decision Tree (250K): 8.20 seconds
Decision Tree (500K): 15.45 seconds
Decision Tree (1M): 18.10 seconds
2 features, max_depth=4
Decision Tree (100K): 0.4 seconds
Decision Tree (250K): 0.65 seconds
Decision Tree (500K): 1.15 seconds
Decision Tree (1M): 1.25 seconds
Please note: the tests were done with consumer tools (i7 8th Gen processor, 16GB RAM) and might not be very sensitive especially on the very quick end. Data reading and loading times are subtracted from ultimate performances.
Results demonstrate the affect of feature size as well as maximum tree depth on decision tree performance. Performance issues related to these issues can be addressed through tuning and optimization.
Overall, we can conclude that decision trees have intermediate performance results for training phase. While they are not the slowest machine learning algorithms, they can struggle when data gets too big especially dimensionally. But since the results are still tolerable and there are ways to optimize trees we can also conclude that their training performance will be adequate in most cases even with big data applications.
Aside of training, decision trees have linear time complexity for inference (prediction phase). And this makes them very fast and resource efficient during this stage. Decision trees can be suitable even for real time machine learning deployment.
We have done a few decision tree performance tests to give a better idea of how decision tree algorithms scale and what’s their runtime performance like.
There are multiple ways to improve decision tree performance by tuning its hyperparameters. You can check out the article below: