Updated March 13, 2023
Introduction to Decision Tree Limitations
Decision Tree models are sophisticated analytical models that are simple to comprehend, visualize, execute, and score, with minimum data pre-processing required. These are supervised learning systems in which input is constantly split into distinct groups based on specified factors. They also have limitations which we are going to discuss; when there are few decisions and consequences in the tree, decision trees are generally simple to understand. Typical examples include the inability to measure attribute values, the high cost and complexity of such measures, and the lack of availability of all attributes at the same time.
Limitations of Decision tree
Here are the following limitations mention below
1. Not good for Regression
Logistic regression is a statistical analysis approach that uses independent features to try to predict precise probability outcomes. On high-dimensional datasets, this may cause the model to be over-fit on the training set, overstating the accuracy of predictions on the training set, and so preventing the model from accurately predicting results on the test set.
This is most common when the model is trained on a small amount of training data with a large number of features. Regularization strategies should be considered on high-dimensional datasets to minimize over-fitting (but this makes the model complex). The model may be under-fit on the training data if the regularization variables are too high.
Complex correlations are difficult to capture with logistic regression. This approach is readily outperformed by more powerful and complicated algorithms such as Neural Networks.
Because logistic regression(see above figure) has a linear decision surface, it cannot tackle nonlinear issues. In real-world circumstances, linearly separable data is uncommon. As a result, non-linear features must be transformed, which can be done by increasing the number of features such that the data can be separated linearly in higher dimensions.
2. Overfitting Problem
Overly complicated trees can be created by decision-tree learners, which do not generalize the input well. This is referred to as overfitting. Some of the important techniques to avoid such problems are –
- Establishing the minimum amount of samples required at a leaf node
- Setting the maximum depth of the tree
If we continue to develop the tree, each row of the input data table may be seen as the final rule. On the training data, the model will perform admirably, but it will fail to validate on the test data. Overfitting occurs when the tree reaches a particular level of complexity. Overfitting is quite likely to occur in a really large tree.
The decision makes an effort to avoid overfitting. Trees are nearly always stopped before reaching depth; thus, each leaf node only includes observations from one class or one observation point. There are several methods for determining when to stop growing the tree.
- If a leaf node is a pure node at any point during the growth process, no additional downstream trees will grow from that node. Other leaf nodes can be used to continue growing the tree.
- When the decrease in tree impurity is relatively slight. When the impurity lowers by a very little amount, say 0.001 or less, this user input parameter causes the tree to be terminated.
- When there are only a few observations remaining on the leaf node. This ensures that the tree is terminated when the node’s reliability for further splitting is questioned due to the limited sample size. According to the Central Limit Theorem, a big sample consists of around 30 observations when they are mutually independent. This can serve as a general guide, but because we typically work with multi-dimensional observations that may be associated, this user input parameter should be higher than 30, say 50 or 100 or more.
The cost of creating a decision tree is high since each node requires field sorting. In other algorithms, a mixture of several fields is used at the same time, resulting in even higher expenses. Pruning methods are also expensive due to the large number of candidate subtrees that must be produced and compared.
4. Independency between samples
Each training example must be completely independent of the other samples in the dataset. If they are related in some manner, the model will try to give those specific training instances more weight. As a result, no matched data or repeated measurements should be used as training data.
Because slight changes in the data can result in an entirely different tree being constructed, decision trees can be unstable. The use of decision trees within an ensemble helps to solve this difficulty.
6. Greedy Approach
To form a binary tree, the input space must be partitioned correctly. The greedy algorithm used for this is recursive binary splitting. It is a numerical procedure that entails the alignment of various values. Data will be split according to the first best split, and only that path will be used to split the data. However, various pathways of the split could be more instructive; thus, that split may not be the best.
7. Predictions Are Not Smooth or Continuous
As shown in the diagram below, decision tree forecasts are neither smooth or continuous but piecewise constant approximations.
Conclusion – Decision tree limitations
We mentioned the limitations of Decision Trees above, and it was discovered that the problems of Decision Trees outweigh the benefits, especially in large and complicated trees, preventing their widespread use as a decision-making tool. To get around the Decision Tree’s constraints, we need to employ Random Forest, which does not rely on a single tree. It plants a forest of trees and then makes a decision based on the number of votes cast. The bagging method, which is one of the Ensemble Learning approaches, is used in Random Forest. Because there is non-linearity at maximum points, you can’t always rely on linear models in machine learning. It should be mentioned that tree models such as Random Forest and Decision trees are good at dealing with non-linearity.
In other words, the decision tree attempted to learn everything possible from the training data, even noise, and outliers. While this is ideal for training data, it has a negative impact on future data (not noisy data).
This is a guide to Decision tree limitations. Here we discuss the limitations of Decision Trees above, and it was discovered that the problems of Decision Trees outweigh the benefits. You may also have a look at the following articles to learn more –