Introduction to Decision Tree Types
Decision tree types depend based on the target variable or data mining problem. Here, we will see decision tree types based on the data mining problem. If we see about the decision tree, a decision tree is defined as that given a database D = {t1, t2,….tn} where ti denotes a tuple, which is defined by attributes set A = {A1, A2,…., Am}. Also, given a set of classes C = {c1, c2,…, ck}.
A decision tree is a binary tree that has the following properties:
- First, each internal node of the decision tree is marked with an attribute Ai.
- Second, each edge is labelled with a predicate that can be applied to the attribute associated with the parent node of it.
- Finally, each leaf node is labelled with class cj.
Decision Tree Types in Data Mining
There are two types of decision trees in data mining:
- Classification Decision Tree
- Regression Decision Tree
Here, we will see both types of decision trees based on the data mining problems.
1. Classification Decision Tree
In general, a decision tree is a binary tree that recursively splits the dataset until we are left with pure leaf nodes. That means the data is only one type of class. So, for example, there are two kinds of nodes in the classification decision tree: decision nodes and leaf nodes.
The decision node contains a condition to split the data. And the leaf node helps us to decide the class of a new data point. If any child is a pure node, that means it does not contain any condition; then we don’t need to split this node further. Thus, we can easily classify our data when we get all the leaf nodes of our data set.
How do we split the data set in a classification decision tree?
How our model decides the optimal splits.
Let’s start at the root.
At root, we have the whole data with us. We are going to compare two splits. The first one is x1 <= 4, and the corresponding child nodes look like the first yellow and red dotted lines. The second condition is x0 <= -12, and the corresponding child nodes look like the next yellow and red dotted lines. If you remember, our goal that is to get pure leaf nodes; we must go for the second split. Because in the case we have successfully produced the left child with red points only. But how can our computer do the same. Well, the answer lies in the information theory. More precisely, our model will choose the split that maximizes the information gain. To calculate information gain, we first need to understand the information contained in a state. Let’s look at the root state; here, the number of red points and green points is the same. This means this state has the highest impurity and uncertainty. The way to quantify this is to use entropy.
Where p_i= probability of class i.
If entropy is high, then we are very uncertain about the randomly picked point. Using this formula, we will calculate the entropy of the remaining four states. The node which has minimum entropy is called a pure node. Now, to find the information gain corresponding to a split, we need to subtract the combined entropy of the child nodes from the entropy of the parent node.
The split which gives the greater information gain we will choose that split.
2. Regression Decision Tree
How we solve a regression problem using a decision tree. Well, the basic concept is the same as the decision tree classifier. We recursively split the data using a binary tree until we are left with pure leaf nodes. There are just two differences, how we define impurity and how we make a prediction.
How do we split the data set in a regression decision tree?
It is the most important part that how we split the dataset.
Here, we have the whole dataset, and the task will be to find the best splitting condition. For this, we will examine two candidate conditions; the first condition is x0 is less than or equal to 1; in this case, the split will look like the first dotted lines. And our second condition is x1 less than or equal to 2, in this case, the division like second dotted lines. Don’t forget the point that satisfies the condition go to the left and the rest to the right. Now, the question is, which is a better split? To find this, we need to calculate which split is decreasing the impurity of the child nodes the most. For that, we need to compute variance reduction. Yes, in the context of regression, we use variance as a measure of impurity.
Focus on the complete dataset; we are going to compute the variance of the whole dataset using this formula:
Remember, a higher value of variance means a higher impurity. So, first, compute the variance for the root node, then compute the variance for the divided individual dataset. Then we will compute the variance reduction. For that, we just subtract the combined variance of the child nodes from the parent node. The weights are just the relative size of the child with respect to the parent. If we compute variance reduction for both the conditions x0 <= 1 and x1 <= 2, we will get that the variance reduction for the first split is much more than the second one. This tells us that the first split can decrease the impurity much more than the second one. So finally we come to the conclusion that we should choose the first one.
Conclusion
We can divide the decision tree on both the basis, based on target variable or the basis of data mining problem. Their types, how they split the data, and choose the optimal answer for the given condition.
Recommended Articles
This is a guide to Decision Tree Types. Here we discuss the introduction and decision tree types in data mining, respectively. You may also have a look at the following articles to learn more –
85 Online Courses | 67 Hands-on Projects | 660+ Hours | Verifiable Certificate of Completion
4.8
View Course
Related Courses