Decision Tree Disadvantages
Despite some of their shortcomings Decision Trees remain popular due to their unique advantages. In this tutorial we will take a look at those shortcomings so you can make more educated decisions when it comes to model selection in machine learning projects.
Decision trees are not very stable models because little change in the data or construction of the decision tree can introduce huge structural changes and result in a different outcome.
Random Forest machine learning model will address this by randomly choosing multiple decision trees hence reducing chances of modeling instability or completely solving it.
2) Reduced Accuracy
Decision Trees will perform less accurately than random forests in most situations since they don’t take advantage of many different tree constructions like them.
Decision tree might take different routes for branching in every construction which means possibility of high bias. Another point that’s addressed well with random forest implementation.
3) Overfitting High Risk
All models can overfit but some do more easily.
One technique to increase accuracy and lower entropy in decision trees is to increase tree depths and this will particularly increase chances of overfitting.
It’s much easier to avoid overfitting and still get a pretty accurate model using random forests.
Decision trees can still be a good alternative to other machine learning models such as logistic regression, knn and support vector machines while keeping in mind their shortcomings and how to address them.
There are numerous ways to get more out of decision trees and in fact decision trees became the building block of a number of holy grail ML algorithms such as XGBoost and Random Forests.
How to make decision trees more stable and robust?
1- Power of ensembling
Ensembling is a machine learning technique which aggregates the results of multiple models of different kind or same kind. This way ensembling makes use of averaging certain characteristics which can help reduce instability, bias and overfitting while increasing prediction accuracy and robustness.
Random Forest is a machine learning model that takes advantage of ensembling decision trees making them more stable, more accurate and less prone to overfitting.
How can I address overfitting with decision trees?
2- Decision tree depth
You can lower the tree depth parameter for your decision tree to reduce the risks of overfitting. This won’t be ideal for prediction capabilities of the model and limit the chances of getting meaningful predictions. There aren’t many other options to optimize overfitting / prediction ratio with trees, unless you want to switch to another more robust machine learning model like random forests.
Can decision trees accuracy be increased?
3- Boosting & Bagging
Boosting and Bagging techniques applied to decision trees give very accurate machine learning models. Random forest is an adaptation of bagging decision trees randomly and XGBoost (Extreme Gradient Boosting) is an adaptation of boosting decision trees. Both XGBoost and Random Forests are known to be extremely powerful and accurate machine learning models requiring little optimization in most cases, hence their popularity.
XGBoost and Random Forests are top choices among winning Kaggle machine learning competitors.
What is boosting?
Boosting: Ensembles features and samples while training trees successively and averaging learning results.
- i.e.: XGBoost
What is bagging?
Bagging: Ensembles features and samples while training trees independently and averaging learning results.
- i.e.: Random Forests
Isn't it possible to have best of both worlds: Interpretability and Accuracy?
4- Neural Backed Decision Trees
An exciting recent development is Neural Backed Decision Trees. These models aim to reach neural network level predictive accuracy while preserving interpretability skills.
You can check out this research paper if you are interested in Neural Backed Decision Trees:
You can also read more from paper’s author Alvin Wan’s website: