EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Machine Learning Tutorial Create Decision Tree
Secondary Sidebar
Machine Learning Tutorial
  • Deep Learning
    • What Is Deep learning
    • Overviews Deep Learning
    • Application of Deep Learning
    • Careers in Deep Learnings
    • Deep Learning Frameworks
    • Deep Learning Model
    • Deep Learning Algorithms
    • Deep Learning Technique
    • Deep Learning Networks
    • Deep Learning Libraries
    • Deep Learning Toolbox
    • Types of Neural Networks
    • Convolutional Neural Networks
    • Create Decision Tree
    • Deep Learning for NLP
    • Caffe Deep Learning
    • Deep Learning with TensorFlow
  • Basic
    • Introduction To Machine Learning
    • What is Machine Learning?
    • Uses of Machine Learning
    • Applications of Machine Learning
    • Naive Bayes in Machine Learning
    • Dataset Labelling
    • DataSet Example
    • Deep Learning Techniques
    • Dataset ZFS
    • Careers in Machine Learning
    • What is Machine Cycle?
    • Machine Learning Feature
    • Machine Learning Programming Languages
    • What is Kernel in Machine Learning
    • Machine Learning Tools
    • Machine Learning Models
    • Machine Learning Platform
    • Machine Learning Libraries
    • Machine Learning Life Cycle
    • Machine Learning System
    • Machine Learning Datasets
    • Machine Learning Certifications
    • Machine Learning Python vs R
    • Optimization for Machine Learning
    • Types of Machine Learning
    • Machine Learning Methods
    • Machine Learning Software
    • Machine Learning Techniques
    • Machine Learning Feature Selection
    • Ensemble Methods in Machine Learning
    • Support Vector Machine in Machine Learning
    • Decision Making Techniques
    • Restricted Boltzmann Machine
    • Regularization Machine Learning
    • What is Regression?
    • What is Linear Regression?
    • Dataset for Linear Regression
    • Decision tree limitations
    • What is Decision Tree?
    • What is Random Forest
  • Algorithms
    • Machine Learning Algorithms
    • Apriori Algorithm in Machine Learning
    • Types of Machine Learning Algorithms
    • Bayes Theorem
    • AdaBoost Algorithm
    • Classification Algorithms
    • Clustering Algorithm
    • Gradient Boosting Algorithm
    • Mean Shift Algorithm
    • Hierarchical Clustering Algorithm
    • Hierarchical Clustering Agglomerative
    • What is a Greedy Algorithm?
    • What is Genetic Algorithm?
    • Random Forest Algorithm
    • Nearest Neighbors Algorithm
    • Weak Law of Large Numbers
    • Ray Tracing Algorithm
    • SVM Algorithm
    • Naive Bayes Algorithm
    • Neural Network Algorithms
    • Boosting Algorithm
    • XGBoost Algorithm
    • Pattern Searching
    • Loss Functions in Machine Learning
    • Decision Tree in Machine Learning
    • Hyperparameter Machine Learning
    • Unsupervised Machine Learning
    • K- Means Clustering Algorithm
    • KNN Algorithm
    • Monty Hall Problem
  • Supervised
    • What is Supervised Learning
    • Supervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Perceptron Learning Algorithm
    • Simple Linear Regression
    • Polynomial Regression
    • Multivariate Regression
    • Regression in Machine Learning
    • Hierarchical Clustering Analysis
    • Linear Regression Analysis
    • Support Vector Regression
    • Multiple Linear Regression
    • Linear Algebra in Machine Learning
    • Statistics for Machine Learning
    • What is Regression Analysis?
    • Clustering Methods
    • Backward Elimination
    • Ensemble Techniques
    • Bagging and Boosting
    • Linear Regression Modeling
    • What is Reinforcement Learning
  • Classification
    • Kernel Methods in Machine Learning
    • Clustering in Machine Learning
    • Machine Learning Architecture
    • Automation Anywhere Architecture
    • Machine Learning C++ Library
    • Machine Learning Frameworks
    • Data Preprocessing in Machine Learning
    • Data Science Machine Learning
    • Classification of Neural Network
    • Neural Network Machine Learning
    • What is Convolutional Neural Network?
    • Single Layer Neural Network
    • Kernel Methods
    • Forward and Backward Chaining
    • Forward Chaining
    • Backward Chaining
  • RPA
    • What is RPA
    • What is Robotics?
    • Benefits of RPA
    • RPA Applications
    • Types of Robots
    • RPA Tools
    • Line Follower Robot
    • What is Blue Prism?
    • RPA vs BPM
  • Interview Questions
    • Deep Learning Interview Questions And Answer
    • Machine Learning Cheat Sheet

Related Courses

Machine Learning Training

Deep Learning Training

Artificial Intelligence Training

Create Decision Tree

By Priya PedamkarPriya Pedamkar

Create Decision Tree

Introduction to Create Decision Tree

The following article provides an outline for Create Decision Tree. With the recent rapid growth of the amount of data generated by information systems to handle large data sets, there is a dominant need for the decision tree to reduce computation complexity. Therefore, a decision tree can be considered the most important approach for representing classifiers. In other words, we can say the data are structured using a divide and conquer strategy. A decision tree is structured as a framework to accurate the values and probability of outcomes decisions from each level of the node, helping decision-makers to choose correct predictions among the various inappropriate data.

What is a Decision Tree?

A decision tree is a binary hierarchical structure that identifies the way where each node split a data set based on different conditions. To construct an optimal tree with a model approach to classify a response variable that predicts the value of a target variable with simple decision rules (if-then-else statements). The approach is supervised learning mostly used in classification problems and is considered to be a very effective predictive model. They are used in different application domains like Game theory, Artificial Intelligence, Machine Learning, Data Mining, and areas like security and medicine. 

How to Create a Decision Tree?

A decision tree is created in simple ways with the top-down manner; they consist of nodes that form a directed node which has root nodes with no incoming edges all other nodes are called decision – nodes (internal node and leaf nodes which corresponds to attribute and class labels) with at least one incoming edges. The main goal of the data sets is to minimize generalization errors by finding the optimal solution in the decision tree.

An example of a decision tree is explained below with a sample data set. The goal is to predict whether a profit is down or up using the attributes of life and competition. Here the decision tree variables are categorical (Yes, No).

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Data Set:

Life Competition Type Profit
Old Yes Software Down
Old No Software Down
Old No Hardware Down
Mid Yes Software Down
Mid Yes Hardware Down
Mid No Hardware Up
Mid No Software Up
New Yes Software Up
New No Hardware Up
New No Software Up

From the above data set: life, competition, Type are the predictors and the attribute profit is the target. There are various algorithms to implement a decision tree, but the best algorithm used to build a decision tree is ID3 which emphasis on greedy search approach. The decision tree follows the decision inference rule or disjunctive normal form(^).

Decision Tree:

create decision tree

 

Initially, all the training attribute is considered to be the root. The order priority for placing the attributes as root is done by the following approach. This process is known to attribute selection to identify which attribute is made to be a root node at each level. The tree follows two steps: construction of a tree, tree pruning. And the data are been split in all the decision nodes.

Information Gain:

It is the measure of the change in entropy based on the independent variable. The decision tree must find the highest information gain.

Create Decision Tree 3

Entropy:

Entropy is defined as for the finite set, the measure of randomness in data or event predictability, if the sample is similar values then entropy is zero and if it is equally divided with the sample then it is one. 

Entropy for the Class:

Entropy for the Class

Where p is the probability of getting profit to say ‘yes’ and N is loss say ‘No’.

Entropy for the Class

Therefore, entropy =1

Once entropy value is calculated it is necessary to decide a root node from the attribute.

Entropy of Age:

Entropy of Age

According to the data set for Life attribute we have old =3 down, mid = 2 down and one up concerning profit label.

Create Decision Tree 4

Life Pi ni I(pi,ni)
Old 0 3 0
Mid 2 2 1
New 3 0 0

Gain = Class Entropy – Entropy of Life = 1 – 0.4 = 0.6

Entropy (competition) = 0.87

Competition Pi ni I(pi,ni)
Yes 1 3 0.8
No 4 2 0.9

Gain = Class Entropy – Entropy of Life = 1 – 0.87 = 0.12

Now the problem arises in the attribute Life where the mid has an equal probability on both up and down. therefore, entropy is 1. similarly, it is calculated for type attribute again the entropy is 1 and gain is 0. Now a complete decision has been created to get an accurate result for mid-value.

Advantages and Disadvantages of Decision Tree

Given below are the advantages and disadvantages mentioned:

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,171 ratings)

Advantages:

  • They are easy to understand and the rules generated are flexible. Has little effort for data preparation.
  • A visual approach to represent decisions and outcomes is very helpful.
  • The decision tree handles the training data set with errors and missing values.
  • They can handle discrete values and numerical attributes. It works with categorical and continuous variables for input and output.
  • They are a useful tool for the business domain who has to take decisions after analyzing under certain conditions.

Disadvantages:

  • Learners can create a complex decision tree depending on trained data. this process is termed as overfitting, a difficult process in decision tree models.
  • The values preferred to be is categorical, if it is continuous, the decision tree loses information which leads to error-prone. Exponential calculation growth is higher while analyzing.
  • Many class labels lead to incorrect complex calculations and give low prediction accuracy of the dataset.
  • The information gained in the DT algorithm gives a biased response to categorical higher values.

Conclusion

Therefore, to conclude, decision trees provide a practical and easy method for learning and are strongly known as efficient tools for machine learning as in a short time they perform well with large datasets. It’s a learning task that uses a statistical approach to make a generalized conclusion. Now it’s better understood why the decision tree is used in predictive modeling and for data scientists they are a powerful tool.

Recommended Articles

This is a guide to Create Decision Tree. Here we discuss how to create a decision tree along with various advantages and disadvantages. You can also go through our other suggested articles to learn more –

  1. Decision Tree in R
  2. Decision Tree Algorithm
  3. Artificial Intelligence Tools
  4. what is Decision Tree?
Popular Course in this category
Machine Learning Training (20 Courses, 29+ Projects)
  19 Online Courses |  29 Hands-on Projects |  178+ Hours |  Verifiable Certificate of Completion
4.7
Price

View Course

Related Courses

Deep Learning Training (18 Courses, 24+ Projects)4.9
Artificial Intelligence AI Training (5 Courses, 2 Project)4.8
1 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more