EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Data Structures Tutorial Decision Tree Types
Secondary Sidebar
Data Structures Tutorial
  • Basics
    • Linked List Advantages
    • What is Data Structure
    • Heap Data Structure
    • Types of Trees in Data Structure
    • AVL Tree in Data Structure
    • B Tree in Data Structure
    • B+ Tree in Data Structure
    • DFS Algorithm
    • BFS Algorithm
    • Arrays in Data Structure
    • Graph in Data Structure
    • Graph Representation
    • Breadth First Search
    • Depth Limited Search
    • Hashing in Data Structure
    • Searching in Data Structure
    • Linear Search in Data Structure
    • Linked List in Data Structure
    • Doubly linked list in Data Structure
    • Circular Linked List in Data Structure
    • Pointers in Data Structure
    • Types of Graph in Data Structure
    • Bubble Sort in Data Structure
    • Quick Sort in Data Structure
    • Bitonic Sort
    • Merge Sort in Data Structure
    • Selection Sort in Data Structure
    • Insertion Sort in Data Structure
    • Radix Sort in Data Structure
    • Stack in Data Structure
    • Queue in Data Structure
    • Priority Queue in Data Structure
    • Asymptotic Analysis
    • Tree Traversal in Data Structure
    • Tree Traversal Techniques
    • Trie Data Structure
    • Splay Tree in Data Structure
    • Spanning Tree Algorithm
    • Sparse Matrix in Data Structure
    • Radix Sort Algorithm
    • Counting Sort Algorithm
    • Skip List Data Structure
    • Linked List Algorithm
    • Linked List Types
    • Inorder Traversal of Binary Tree
    • Kruskals Algorithm
    • Prims Algorithm
    • BFS VS DFS
    • BCNF
    • Skip List
    • Hash Table?in Data Structure
    • Data Structure Interview Questions
    • Data Structures & Algorithms Interview
    • AVL Tree Deletion
    • B+ Tree Deletion
    • Decision Tree Advantages and Disadvantages
    • Data Architect Skills
    • Data Architecture Principles
    • Data Engineer Jobs
    • Data Engineer Roadmap
    • Fundamentals of Data Structure
    • Circular queue in Data Structure
    • Spanning Tree in Data Structure
    • Tree traversal types
    • Deque in Data structure
    • Shell Sort in Data Structure
    • Heap sort in data structure
    • Heap data structure C++
    • Heap data structure in Java
    • Binary Search Tree Types
    • Binary Tree in Data Structure
    • Binary Tree Types
    • Binary search tree in data structure
    • Binary Search Tree Advantages
    • Binary Search Tree Properties
    • Binary Search in Data Structure
    • Binary Tree Deletion
    • Sparse Matrix Multiplication
    • Preorder Traversal of Binary Tree
    • Postorder traversal
    • Decision Tree Hyperparameters
    • PostOrder Traversal without Recursion
    • AVL Tree Rotation
    • Avro File Format
    • Decision Tree Types
    • Binomial heap
    • Confluence Jira Integration
    • Timm Sort
    • Depth First Search

Related Courses

All in One Data Science Course

Oracle DBA Course

SQL Certification Course

Decision Tree Types

Decision Tree Types

Introduction to Decision Tree Types

Decision tree types depend based on the target variable or data mining problem. Here, we will see decision tree types based on the data mining problem. If we see about the decision tree, a decision tree is defined as that given a database D = {t1, t2,….tn} where ti denotes a tuple, which is defined by attributes set A = {A1, A2,…., Am}. Also, given a set of classes C = {c1, c2,…, ck}.

A decision tree is a binary tree that has the following properties:

  • First, each internal node of the decision tree is marked with an attribute Ai.
  • Second, each edge is labelled with a predicate that can be applied to the attribute associated with the parent node of it.
  • Finally, each leaf node is labelled with class cj.

Decision Tree Types in Data Mining

There are two types of decision trees in data mining:

  • Classification Decision Tree
  • Regression Decision Tree

Here, we will see both types of decision trees based on the data mining problems.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

1. Classification Decision Tree

In general, a decision tree is a binary tree that recursively splits the dataset until we are left with pure leaf nodes. That means the data is only one type of class. So, for example, there are two kinds of nodes in the classification decision tree: decision nodes and leaf nodes.

Decision Tree Types 1

The decision node contains a condition to split the data. And the leaf node helps us to decide the class of a new data point. If any child is a pure node, that means it does not contain any condition; then we don’t need to split this node further. Thus, we can easily classify our data when we get all the leaf nodes of our data set.

How do we split the data set in a classification decision tree?

How our model decides the optimal splits.

Let’s start at the root.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,408 ratings)

data set in classification

At root, we have the whole data with us. We are going to compare two splits. The first one is x1 <= 4, and the corresponding child nodes look like the first yellow and red dotted lines. The second condition is x0 <= -12, and the corresponding child nodes look like the next yellow and red dotted lines. If you remember, our goal that is to get pure leaf nodes; we must go for the second split. Because in the case we have successfully produced the left child with red points only. But how can our computer do the same. Well, the answer lies in the information theory. More precisely, our model will choose the split that maximizes the information gain. To calculate information gain, we first need to understand the information contained in a state. Let’s look at the root state; here, the number of red points and green points is the same. This means this state has the highest impurity and uncertainty. The way to quantify this is to use entropy.

Entropy=∑-p_i log(p_i)

Where p_i= probability of class i.

If entropy is high, then we are very uncertain about the randomly picked point. Using this formula, we will calculate the entropy of the remaining four states. The node which has minimum entropy is called a pure node. Now, to find the information gain corresponding to a split, we need to subtract the combined entropy of the child nodes from the entropy of the parent node.

IG=E(parent)-w_i ∑(〖child〗_i)

The split which gives the greater information gain we will choose that split.

2. Regression Decision Tree

How we solve a regression problem using a decision tree. Well, the basic concept is the same as the decision tree classifier. We recursively split the data using a binary tree until we are left with pure leaf nodes. There are just two differences, how we define impurity and how we make a prediction.

Regression Decision Tree

How do we split the data set in a regression decision tree?

It is the most important part that how we split the dataset.

split the data set

Here, we have the whole dataset, and the task will be to find the best splitting condition. For this, we will examine two candidate conditions; the first condition is x0 is less than or equal to 1; in this case, the split will look like the first dotted lines. And our second condition is x1 less than or equal to 2, in this case, the division like second dotted lines. Don’t forget the point that satisfies the condition go to the left and the rest to the right. Now, the question is, which is a better split? To find this, we need to calculate which split is decreasing the impurity of the child nodes the most. For that, we need to compute variance reduction. Yes, in the context of regression, we use variance as a measure of impurity.

Focus on the complete dataset; we are going to compute the variance of the whole dataset using this formula:

V_ar=1/n∑(y_i-Ӯ)〖^2〗

Remember, a higher value of variance means a higher impurity. So, first, compute the variance for the root node, then compute the variance for the divided individual dataset. Then we will compute the variance reduction. For that, we just subtract the combined variance of the child nodes from the parent node. The weights are just the relative size of the child with respect to the parent. If we compute variance reduction for both the conditions x0 <= 1 and x1 <= 2, we will get that the variance reduction for the first split is much more than the second one. This tells us that the first split can decrease the impurity much more than the second one. So finally we come to the conclusion that we should choose the first one.

Conclusion

We can divide the decision tree on both the basis, based on target variable or the basis of data mining problem. Their types, how they split the data, and choose the optimal answer for the given condition.

Recommended Articles

This is a guide to Decision Tree Types. Here we discuss the introduction and decision tree types in data mining, respectively. You may also have a look at the following articles to learn more –

  1. What is Decision Tree?
  2. Decision Tree in Data Mining
  3. Create Decision Tree
  4. Decision Tree Algorithm
Popular Course in this category
Data Scientist Training (85 Courses, 67+ Projects)
  85 Online Courses |  67 Hands-on Projects |  660+ Hours |  Verifiable Certificate of Completion
4.8
Price

View Course

Related Courses

All in One Data Science Bundle (360+ Courses, 50+ projects)4.9
Oracle DBA Database Management System Training (2 Courses)4.8
SQL Training Program (7 Courses, 8+ Projects)4.7
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more