Updated October 6, 2023

Introduction to Decision Tree Hyperparameters

The decision tree hyperparameters are defined as the decision tree is a machine learning algorithm used for two tasks: classification and regression. In addition, the decision tree is used for building trees in ensemble learning algorithms, and the hyperparameter is a parameter in which its value is used to control the learning process. The decision tree has plenty of hyperparameters that need fine-tuning to derive the best possible model; by using it, the generalization error has been reduced, and to search the hyperparameter space for a set of values tuning method used it will optimize the architecture of our model.

Various Decision Tree Hyperparameters

Given below are the various decision tree hyperparameters:

1. max_depth

The name of hyperparameter max_depth is suggested the maximum depth that we allow the tree to grow to. The deeper you allow, the more complex our model will become; in the case of training error, it is easy to see what will happen. If we increase the max_depth value, training error will always go down but let us consider a case for testing error; if we set the max_depth too high, then without capturing the useful patterns, the decision tree might simply overfit the training data, this will cause the testing error to increase, but if we set the value of it to too low that is also not good then the decision tree might be having little flexibility to capture the patterns and interaction in the training data.

This will also cause the testing error to increase, which is a case of underfitting so we have to find the right max depth using hyperparameter tuning either grid search or at the best possible values of max_depth the random search might arrive.

2. min_samples_split

In a decision tree, there are multiple nodes, some of which are internal nodes, while others are called leaf nodes. Internal nodes can further split into child nodes. The min_samples_split hyperparameter specifies the minimal number of samples required to divide a node. This hyperparameter can accept various values: you can specify an integer to denote the minimum number of samples required in an internal node, or you can use a fraction to denote the minimum percentage of samples required in an internal node.

3. min_samples_leaf

Let us see another parameter which is called min_samples_leaf, as we have already seen a leaf node is a node without any children, so we cannot split a leaf node any further, so min_samples_leaf is the minimum number of samples that we can specify to term a given node as a leaf node so that we do not want to split it further.

For example, we start off with 10,000 samples, and we reach a node wherein we just have 100 samples; there is no point in splitting further as you would tend to overfit the training data that you have, so by using this hyperparameter smartly, we can also avoid overfitting, and this parameter is similar to min_samples_splits. However, this describes the minimum number of samples at the leaf that is a base of the tree.

4. max_features

This type of hyperparameter represents the number of features; when we are looking for the best split we can consider it, we can either specify a number at each split to denote the max_features, or there is a fraction to denote the percentage of features to consider while making a split, what percentage of features to consider by creating a split there are various options available as well we can use sqrt, log2, and we can also specify none that we do not want any of these max_features to come in handy.

5. min_weight_fraction_leaf

This is also another type of decision tree hyperparameter, which is called min_weight_fraction, it is the fraction of the input samples that are required at the leaf node where sample_weight determined weight, in this way, we can deal with class unbalancing, and the class unbalancing can be done by sampling an equal number of samples from each class.

Also, when we biased it towards dominant classes then that are not aware of the sample weights like min_sample_leaf, by using weight-based criteria, it is easier to optimize the structure of the tree if the samples are weighted, min_weight_fraction_leaf, in which the leaf nodes contain at least a fraction of the overall sum of the weights.

6. random_state

This hyperparameter is not really to tune; hence let us see when and why we need to set a random_state hyperparameter; many new students are confused with random_state values and their accuracy; it may happen because the algorithm of the decision tree is based on the greedy algorithm, that repeated a number of times by using random selection features and this selection affected by the pseudo-random number that takes in the random_state value as a seed value so that random_state might randomly pick good features, changing the accuracy of the model with random_state means that there is something wrong in the model; hence we can say that it is not a hyperparameter.

7. class_weight

This is also a hyperparameter called class_weight in which weight associated with classes or it is used to provide weight for each output class; this actually means when the algorithm calculates impurity to make the split at each node, then the resulting child node are weighted by class_weight by giving the child sample weight, distribution of our classes has been start then the weight of class and then depending on where our tree lean, we can try to increase the weight of the other class so that algorithm penalizes the sample of one class relative to the other.

Conclusion

In this article, we conclude that the first five hyperparameters of the decision tree are important, and they take crucial steps in building the most accurate model, we have seen the working of hyperparameters, tuning of the hyperparameters, and also how they interact with each other and it is not necessary to tune every hyperparameter.