EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Data Structures Tutorial Decision Tree Algorithm
Secondary Sidebar
Software Testing Tutorial
  • Advance
    • Cyclomatic Complexity
    • Decision Table Testing
    • Decision Tree Algorithm
    • What is Continuous Integration
    • Mantis Bug Tracker
    • Equivalence Partitioning
    • Gantt Chart Software
    • Acceptance Testing Types
    • Load testing tools
    • Install TestNG
    • Install Unity
    • Defect Management Process
    • Test Plan Template
    • Testing Interview Questions
    • Testing of Mobile application
    • What is Test Automation Frameworks
    • Test Automation Framework
    • Application of Automation
    • Test Automation Process
    • Automation Testing Roles and Responsibilities
    • What is Instruction Cycle?
    • What is Cucumber?
    • 15 Best Popular Bug Reporting Tools
    • What is Automated Testing?
    • Software Maintenance Types
    • Types of Penetration Testing
    • Software Reliability
    • Best Gantt Chart Software
    • Code Coverage
    • Branch Coverage
    • Decision Coverage
    • Statement Coverage
    • What is Test Case
    • Types of Test Case
    • What is Test Scenario
    • Formal Review
    • Alpha Beta Pruning
    • What is Cyclomatic Complexity?
    • Test Coverage
    • How to Write Test Case
    • Testing Documentation
    • Performance Testing Life Cycle
    • Test Harness
    • Test Strategy
    • Software Incident Management
    • What is Debugging
    • What is Defect?
    • Listeners in TestNG
  • Basics
    • What is Software Testing
    • Careers in Software Testing
    • Defect Life Cycle in Software Testing
    • Bug Life Cycle
    • Levels of Software Testing
    • Software Testing Life Cycle
    • Software Tester Work
    • Software Testing Principles
    • Software Testing Services
    • Testing Methodologies
    • Test Approaches
    • Grey Box Testing
    • Types of Software Testing
    • What is a Bug in Software Testing
    • Benefits of Automation Testing
    • What is Automation Testing?
    • Types of Automation
    • Typical Journey of a Software Tester
    • Automation Testing Process
    • Mobile Automation Testing
    • Automation Testing Life Cycle
    • Software Quality Assurance
    • Software Quality Assurance
    • What is Test Environment?
    • Verification and Validation Testing
  • Types of Testing
    • Adhoc Testing
    • Types of System Testing
    • Manual Testing Types
    • Unit Testing Types
    • Unit Testing Benefits
    • Agile Testing
    • What is Agile Testing
    • Acceptance Testing
    • Stress Testing Types
    • Alpha and Beta Testing
    • Application Testing
    • Automation Testing
    • Automation Testing Advantages
    • Benchmark Testing
    • Black Box Testing
    • Domain Testing
    • Dynamic Testing
    • Ecommerce Testing
    • Fuzz Testing
    • Gray Box Testing
    • GUI Testing
    • Installation Testing
    • Interface Testing
    • Interoperability Testing
    • Mainframe Testing
    • Manual Testing
    • Mutation Testing
    • Monkey Testing
    • Negative Testing
    • Penetration Testing
    • Penetration testing phases
    • Penetration testing framework
    • Protocol Testing
    • Recovery Testing
    • Regression Testing
    • Mobile Penetration Testing
    • Accessibility Testing
    • Sanity Testing
    • Scalability Testing
    • Security Testing
    • Spike Testing
    • Stability Testing
    • State Transition Testing
    • Static Testing
    • Gatling Load Testing
    • System Integration Testing
    • Structural Testing
    • Locust Load Testing
    • System Testing
    • Control Flow Testing
    • Unit Testing
    • Cypress testing
    • Volume Testing
    • Web Testing Application
    • What is Exploratory Testing
    • What is Stress Testing
    • What is Usability Testing
    • White Box Testing
    • Types of White Box Testing
    • Compatibility Testing?
    • Use Case Testing
    • Beta Testing
    • Integration Testing
    • Non Functional Testing
    • Non Functional Testing Types
    • What is Functional Testing
    • Functional testing types
    • Cookie Testing
    • Alpha Testing
    • Boundary Value Testing
    • Equivalence Class Testing
    • Glass Box Testing
    • SOA Testing
    • Smoke Testing
    • Visual Testing
    • Visual Paradigm
    • Model-Based Testing
  • Testing techniques
    • Software Testing Methodologies
    • Black Box Testing Techniques
    • Static Testing Techniques
    • Test Case Design Techniques
    • What is Static Analysis
  • Testing tools
    • Manual Testing Tools
    • Visual Testing Tools
    • Automation Testing Tools
    • Functional Testing Tools
    • GUI Testing Tools
    • Penetration Testing Tools
    • Performance Testing Tools
    • SOA Testing Tools
    • Accessibility Testing Tools
    • What is QTP
    • Regression Testing Tools
    • Security Testing Tools
    • Test Management Tools
    • Defect Management Tools
    • Code Coverage Tools
    • Test Coverage Tools
    • Defect Tracking Tools
    • Continuous Integration Tools
    • Install Bugzilla
    • Test data generation tool
    • Unit Testing Tools
    • Web Testing Tools
    • Stress Testing Tools
    • Performance Monitoring Tools
    • Mobile Testing Tools
    • Responsive Testing Tool
    • Cross Browser Testing Tools
    • Risk Based Testing
    • Database Testing Tools
    • WinRunner
    • What is Squish?
    • CubicTest
    • What is WinRM?
    • Bugzilla Tool
    • Code review tools
    • Penetration Testing Open Source Tools
  • Inteview Questions
    • Automation Testing Interview Questions
    • Manual Testing Interview Questions
    • ISTQB Interview Questions
    • Cucumber Interview Questions
    • Software Testing Interview Questions
    • Penetration Testing Interview Questions

Related Courses

Software Testing Course

Penetration Training Course

TestNG Training Course

Decision Tree Algorithm

By Priya PedamkarPriya Pedamkar

Decision Tree Algorithm

Introduction to Decision Tree Algorithm

The Decision Tree Algorithm is one of the popular supervised type machine learning algorithms that is used for classifications. This algorithm generates the outcome as the optimized result based upon the tree structure with the conditions or rules. The decision tree algorithm associated with three major components as Decision Nodes, Design Links, and Decision Leaves. It operates with the Splitting, pruning, and tree selection process. It supports both numerical and categorical data to construct the decision tree. Decision tree algorithms are efficient for large data set with less time complexity. This Algorithm is mostly used in customer segmentation and marketing strategy implementation in the business.

What is Decision Tree Algorithm?

Decision Tree Algorithm is a supervised Machine Learning Algorithm where data is continuously divided at each row based on certain rules until the final outcome is generated. Let’s take an example, suppose you open a shopping mall and of course, you would want it to grow in business with time. So for that matter, you would require returning customers plus new customers in your mall. For this, you would prepare different business and marketing strategies such as sending emails to potential customers, create offers and deals, targeting new customers, etc. But how do we know who are the potential customers? In other words, how do we classify the category of the customers? Like some customers will visit once in a week and others would like to visit once or twice in a month, or some will visit in a quarter. So decision trees are one such classification algorithm that will classify the results into groups until no more similarity is left.

In this way, the decision tree goes down in a tree-structured format.

The main components of a decision tree are:

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

  • Decision Nodes, which is where the data is split or, say, it is a place for the attribute.
  • Decision Link, which represents a rule.
  • Decision Leaves, which are the final outcomes.

is a person fit

Working of a Decision Tree Algorithm

There are many steps that are involved in the working of a decision tree:

1. Splitting – It is the process of the partitioning of data into subsets. Splitting can be done on various factors as shown below i.e. on a gender basis, height basis, or based on class.

splitting

2. Pruning – It is the process of shortening the branches of the decision tree, hence limiting the tree depth.

tree pruning Exam

Pruning is also of two types:

  • Pre-Pruning – Here, we stop growing the tree when we do not find any statistically significant association between the attributes and class at any particular node.
  • Post-Pruning – In order to post prune, we must validate the performance of the test set model and then cut the branches that are a result of overfitting noise from the training set.

3. Tree Selection – The third step is the process of finding the smallest tree that fits the data.

Example and Illustration of Constructing a Decision Tree

Let’s say you want to play cricket on some particular day (For e.g., Saturday). What are the factors that are involved which will decide if the play is going to happen or not?

Clearly, the major factor is the climate; no other factor has that much probability as much climate is having for the play interruption.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,650 ratings)

We have collected the data from the last 10 days, which is presented below:

Day Weather Temperature Humidity Wind Play?
1 Cloudy Hot High Weak Yes
2 Sunny Hot High Weak No
3 Sunny Mild Normal Strong Yes
4 Rainy Mild High Strong No
5 Cloudy Mild High Strong Yes
6 Rainy Cool Normal Strong No
7 Rainy Mild High Weak Yes
8 Sunny Hot High Strong No
9 Cloudy Hot Normal Weak Yes
10 Rainy Mild High Strong No

Let us now construct our decision tree based on the data that we have got. So we have divided the decision tree into two levels, the first one is based on the attribute “Weather”, and the second row is based on “Humidity” and “Wind”.

The below images illustrate a learned decision tree.

Weather 1 (Decision Tree Algorithm)

We can also set some threshold values if the features are continuous.

Weather 2 (Decision Tree Algorithm)

What is Entropy in Decision Tree Algorithm?

In simple words, entropy is the measure of how disordered your data is. While you might have heard this term in your Mathematics or Physics classes, it’s the same here. The reason Entropy is used in the decision tree is because the ultimate goal in the decision tree is to group similar data groups into similar classes, i.e. to tidy the data.

Let us see the below image, where we have the initial dataset, and we are required to apply a decision tree algorithm in order to group together the similar data points in one category.

After the decision split, as we can clearly see, most of the red circles fall under one class while most of the blue crosses fall under another class. Hence a decision was to classify the attributes that could be based on various factors.

Decision Split (Decision Tree Algorithm)

Now, let us try to do some math over here:

Let us say that we have got “N” sets of the item, and these items fall into two categories, and now in order to group the data based on labels, we introduce the ratio:

Decision Tree Algorithm 1

The entropy of our set is given by the following equation:

Decision Tree Algorithm 2

Let us check out the graph for the given equation:

Entropy

Above Image (with p=0.5 and q=0.5)

Advantages and Disadvantages

Given below are the advantages and disadvantages mentioned:

Advantages:

  • A decision tree is simple to understand, and once it is understood, we can construct it.
  • We can implement a decision tree on numerical as well as categorical data.
  • Decision Tree is proven to be a robust model with promising outcomes.
  • They are also time-efficient with large data.
  • It requires less effort for the training of the data.

Disadvantages:

  • Instability: Only if the information is precise and accurate, the decision tree will deliver promising results. Even if there is a slight change in the input data, it can cause large changes in the tree.
  • Complexity: If the dataset is huge with many columns and rows, it is a very complex task to design a decision tree with many branches.
  • Costs: Sometimes, cost also remains a main factor because when one is required to construct a complex decision tree, it requires advanced knowledge in quantitative and statistical analysis.

Conclusion

In this article, we saw about the decision tree algorithm and how to construct one. We also saw the big role that is being played by Entropy in the decision tree algorithm, and finally, we saw the advantages and disadvantages of the decision tree.

Recommended Articles

This has been a guide to Decision Tree Algorithm. Here we discussed the role played by entropy, working, advantages, and disadvantages. You can also go through our other suggested articles to learn more –

  1. Important Data Mining Methods
  2. Binary Search in Data Structure
  3. Guide to What is Data Science?
  4. Data Analyst Interview Questions
Popular Course in this category
Machine Learning Training (20 Courses, 29+ Projects)
  19 Online Courses |  29 Hands-on Projects |  178+ Hours |  Verifiable Certificate of Completion
4.7
Price

View Course

Related Courses

All in One Data Science Bundle (360+ Courses, 50+ projects)4.9
Oracle DBA Database Management System Training (2 Courses)4.8
SQL Training Program (7 Courses, 8+ Projects)4.7
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more