EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Data Structures Tutorial Decision Tree Hyperparameters
Secondary Sidebar
Data Structures Tutorial
  • Basics
    • Linked List Advantages
    • What is Data Structure
    • Heap Data Structure
    • Types of Trees in Data Structure
    • AVL Tree in Data Structure
    • B Tree in Data Structure
    • B+ Tree in Data Structure
    • DFS Algorithm
    • BFS Algorithm
    • Arrays in Data Structure
    • Graph in Data Structure
    • Graph Representation
    • Breadth First Search
    • Depth Limited Search
    • Hashing in Data Structure
    • Searching in Data Structure
    • Linear Search in Data Structure
    • Linked List in Data Structure
    • Doubly linked list in Data Structure
    • Circular Linked List in Data Structure
    • Pointers in Data Structure
    • Types of Graph in Data Structure
    • Bubble Sort in Data Structure
    • Quick Sort in Data Structure
    • Bitonic Sort
    • Merge Sort in Data Structure
    • Selection Sort in Data Structure
    • Insertion Sort in Data Structure
    • Radix Sort in Data Structure
    • Stack in Data Structure
    • Queue in Data Structure
    • Priority Queue in Data Structure
    • Asymptotic Analysis
    • Tree Traversal in Data Structure
    • Tree Traversal Techniques
    • Trie Data Structure
    • Splay Tree in Data Structure
    • Spanning Tree Algorithm
    • Sparse Matrix in Data Structure
    • Radix Sort Algorithm
    • Counting Sort Algorithm
    • Skip List Data Structure
    • Linked List Algorithm
    • Linked List Types
    • Inorder Traversal of Binary Tree
    • Kruskals Algorithm
    • Prims Algorithm
    • BFS VS DFS
    • BCNF
    • Skip List
    • Hash Table?in Data Structure
    • Data Structure Interview Questions
    • Data Structures & Algorithms Interview
    • AVL Tree Deletion
    • B+ Tree Deletion
    • Decision Tree Advantages and Disadvantages
    • Data Architect Skills
    • Data Architecture Principles
    • Data Engineer Jobs
    • Data Engineer Roadmap
    • Fundamentals of Data Structure
    • Circular queue in Data Structure
    • Spanning Tree in Data Structure
    • Tree traversal types
    • Deque in Data structure
    • Shell Sort in Data Structure
    • Heap sort in data structure
    • Heap data structure C++
    • Heap data structure in Java
    • Binary Search Tree Types
    • Binary Tree in Data Structure
    • Binary Tree Types
    • Binary search tree in data structure
    • Binary Search Tree Advantages
    • Binary Search Tree Properties
    • Binary Search in Data Structure
    • Binary Tree Deletion
    • Sparse Matrix Multiplication
    • Preorder Traversal of Binary Tree
    • Postorder traversal
    • Decision Tree Hyperparameters
    • PostOrder Traversal without Recursion
    • AVL Tree Rotation
    • Avro File Format
    • Decision Tree Types
    • Binomial heap
    • Confluence Jira Integration
    • Timm Sort
    • Depth First Search

Related Courses

All in One Data Science Course

Oracle DBA Course

SQL Certification Course

Decision Tree Hyperparameters

Decision Tree Hyperparameters

Introduction to Decision Tree Hyperparameters

The decision tree hyperparameters are defined as the decision tree is a machine learning algorithm used for two tasks: classification and regression. In addition, the decision tree is used for building trees in ensemble learning algorithms, and the hyperparameter is a parameter in which its value is used to control the learning process. The decision tree has plenty of hyperparameters that need fine-tuning to derive the best possible model; by using it, the generalization error has been reduced, and to search the hyperparameter space for a set of values tuning method used it will optimize the architecture of our model.

Various Decision Tree Hyperparameters

Given below are the various decision tree hyperparameters:

1. max_depth

The name of hyperparameter max_depth is suggested the maximum depth that we allow the tree to grow to. The deeper you allow, the more complex our model will become; in the case of training error, it is easy to see what will happen. If we increase the max_depth value, training error will always go down but let us consider a case for testing error; if we set the max_depth too high, then without capturing the useful patterns, the decision tree might simply overfit the training data, this will cause the testing error to increase, but if we set the value of it to too low that is also not good then the decision tree might be having little flexibility to capture the patterns and interaction in the training data.

This will also cause the testing error to increase, which is a case of underfitting so we have to find the right max depth using hyperparameter tuning either grid search or at the best possible values of max_depth the random search might arrive.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

2. min_samples_split

In a decision tree, we have multiple nodes which are there some of which are internal nodes, and the others are called leaves nodes, and the internal nodes will have further splits, so let us now see about the hyperparameter called min_sample_split, which specifies the minimum number of samples required to splits an internal code, their various values that we can enter for this hyperparameter we can either specify an integer or a number to denote the minimum number of samples we want to an internal node or a fraction in an internal node is used to denote the percentage of samples.

3. min_samples_leaf

Let us see another parameter which is called min_samples_leaf, as we have already seen a leaf node is a node without any children, so we cannot split a leaf node any further, so min_samples_leaf is the minimum number of samples that we can specify to term a given node as a leaf node so that we do not want to split it further.

For example, we start off with 10,000 samples, and we reach a node wherein we just have 100 samples; there is no point in splitting further as you would tend to overfit the training data that you have, so by using this hyperparameter smartly, we can also avoid overfitting, and this parameter is similar to min_samples_splits. However, this describes the minimum number of samples at the leaf that is a base of the tree.

4. max_features

This type of hyperparameter represents the number of features; when we are looking for the best split we can consider it, we can either specify a number at each split to denote the max_features, or there is a fraction to denote the percentage of features to consider while making a split, what percentage of features to consider by creating a split there are various options available as well we can use sqrt, log2, and we can also specify none that we do not want any of these max_features to come in handy.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,584 ratings)

5. min_weight_fraction_leaf

This is also another type of decision tree hyperparameter, which is called min_weight_fraction, it is the fraction of the input samples that are required at the leaf node where sample_weight determined weight, in this way, we can deal with class unbalancing, and the class unbalancing can be done by sampling an equal number of samples from each class.

Also, when we biased it towards dominant classes then that are not aware of the sample weights like min_sample_leaf, by using weight-based criteria, it is easier to optimize the structure of the tree if the samples are weighted, min_weight_fraction_leaf, in which the leaf nodes contain at least a fraction of the overall sum of the weights.

6. random_state

This hyperparameter is not really to tune; hence let us see when and why we need to set a random_state hyperparameter; many new students are confused with random_state values and their accuracy; it may happen because the algorithm of the decision tree is based on the greedy algorithm, that repeated a number of times by using random selection features and this selection affected by the pseudo-random number that takes in the random_state value as a seed value so that random_state might randomly pick good features, changing the accuracy of the model with random_state means that there is something wrong in the model; hence we can say that it is not a hyperparameter.

7. class_weight

This is also a hyperparameter called class_weight in which weight associated with classes or it is used to provide weight for each output class; this actually means when the algorithm calculates impurity to make the split at each node, then the resulting child node are weighted by class_weight by giving the child sample weight, distribution of our classes has been start then the weight of class and then depending on where our tree lean, we can try to increase the weight of the other class so that algorithm penalizes the sample of one class relative to the other.

Conclusion

In this article, we conclude that the first five hyperparameters of the decision tree are important, and they take crucial steps in building the most accurate model, we have seen the working of hyperparameters, tuning of the hyperparameters, and also how they interact with each other and it is not necessary to tune every hyperparameter.

Recommended Articles

This is a guide to Decision Tree Hyperparameters. Here we discuss the introduction and various decision tree hyperparameters, respectively. You may also have a look at the following articles to learn more –

  1. What is Decision Tree?
  2. Decision Tree in Data Mining
  3. Create Decision Tree
  4. Decision Tree Algorithm
Popular Course in this category
Data Scientist Training (85 Courses, 67+ Projects)
  85 Online Courses |  67 Hands-on Projects |  660+ Hours |  Verifiable Certificate of Completion
4.8
Price

View Course

Related Courses

All in One Data Science Bundle (360+ Courses, 50+ projects)4.9
Oracle DBA Database Management System Training (2 Courses)4.8
SQL Training Program (7 Courses, 8+ Projects)4.7
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more