Skip to content

Decision Tree Tuning

Decision Trees can be improved by tuning their hyperparameters.

Limiting feature amount in the splits

1- criterion

criterion{“gini”, “entropy”}, default=”gini”
criterion parameter can be used to select the criteria that assesses the split quality in decision trees.

entropy and gini are two options and gini is the default here. In most cases they will give similar results so slightly more performant gini option is usually preferred.

gini is based on gini impurity while entropy is based on information gain. Gini impurity evaluates the likeliness of misclassification.

Here are some code examples in Python:

DTC = DecisionTreeClassifier(criterion="entropy")

DTR = DecisionTreeRegressor(criterion="poisson")

DecisionTreeClassifier Criterion
  • gini
  • entropy
DecisionTreeRegressor Criterion
  • mse
  • mae
  • poisson
  • mse-friedman

 

Limiting feature amount in the splits

2- splitter

splitter{“best”, “random”}, default=”best”

splitter parameter is used to define the strategy for choosing splits. best is the default but random can also be chosen for more random results.

random has some benefits here:

It can be used to lower overfitting likeliness since it introduces randomness to the model rather than going after the features with best splits.

Also it improves the performance slightly since there is no need to calculate best split performance and compare them.

best offer some benefits too:

if there are many features and not so much knowledge about which to go after

DTC = DecisionTreeClassifier(splitter="random")

Limiting feature amount in the splits

3- max_depth

max_depth is an interesting parameter that can be used to assign maximum depth to trees.

When default value None is selected trees get split until end nodes (leaves) or until all leaves have less samples than min_samples_split value.

This is probably the most tuned value in decision trees.

Tree will learn more information when the depth is increased but it will also be more likely to overfit.

If you’re having overfitting issues it would be a good practice to give this parameter a relatively high value and go down and retrain the model until overfitting issues are addressed.

On the other hand lowering max_depth unnecessarily will mean losing prediction accuracy for no good reason.

Ideal maximum depth of a decision tree will change based on the model and it takes experience to master this parameter. A good practice is to have an iterative approach and see the train/test results and adjust max_depth parameter accordingly.

DTC = DecisionTreeClassifier(max_depth=8)

Limiting feature amount in the splits

4- min_samples_split

This value is 2 by default and it stands for minimum sample number to split an internal node. But what does minimum sample number to split in an internal node mean?

min_samples_split can be increased to address overfitting and avoid unnecessarily detailed information gain but if you increase it too much again prediction will suffer. Best value will change for every dataset and decision tree model but research indicates best value is likely to be between 1 and 40 samples.

DTC = DecisionTreeClassifier(min_samples_split=3)

Limiting feature amount in the splits

5- min_samples_leaf

This is the minimum number that will be allowed to end up in the leaf node (or external node). This means if you have splits that end up with only 3 sample that split won’t be allowed.

It applies to minimum sample amount on both sides of the leaf node’s branches.

Takes the value 1 as default.

Both parameters can be used to avoid overfitting and if they are too small decision tree will tend to overfit while they are increased too much learning will suffer and so will prediction outcomes.

DTC = DecisionTreeClassifier(min_samples_leaf=2)

Limiting feature amount in the splits

6- max_features

maximum amount of features to be included in splits. It takes None by default which means maximum feature allowed will not be restricted and will be amount of features.

If given an integer that will be the maximum features allowed. It can also take other values such as sqrt, auto and log2 which will be applied to feature amount accordingly.

This is another very useful parameter to adjust. 

It will prevent overfitting to a degree when lowered since less features will be included in the splits.

On the other hand it can also be used to improve the performance of a decision tree model’s training process. Especially if dataset is dimensional with too many  features it can be computationally heavy to calculate informational gain for all of them at every split. By choosing a lower value such as sqrt or an integer this resource hungriness will be addressed.

It can also take a float value which will cause maximum features to be calculated as float x feature amount.

DTC = DecisionTreeClassifier(max_features="sqrt")

Different weights for classes

7- class_weight

This parameter is used to assign weights to each output class in classification.

it takes the value 1 for each class by default but can be assigned custom classes by assigning a dictionary with class values such as {‘0’: 0.8, ‘1’: 0.9}

in multioutput scenarios each class will be defined in its own column with class such as : [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}].

It can be highly useful to address bias when data is not well balanced.

Samples taken for values with the lower weight will be adjusted accordingly.

from sklearn import svm
classifier = svm.SVC(kernel="linear")

Summary

Decision Trees offer ample amount of hyperparameter tuning opportunities. Once you discover these settings decision trees become more powerful and offer more value.

You will want to experiment and master decision tree parameters because there is no one true rule and parameter values will change based on dataset and project needs.

In this Decision Tree Tutorial, we elaborated the most common decision tree parameters and how they can be tweaked to create a machine learning model that exactly fits your needs.