EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Machine Learning Tutorial Ensemble Techniques
Secondary Sidebar
Machine Learning Tutorial
  • Supervised
    • What is Supervised Learning
    • Supervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Perceptron Learning Algorithm
    • Simple Linear Regression
    • Polynomial Regression
    • Multivariate Regression
    • Regression in Machine Learning
    • Hierarchical Clustering Analysis
    • Linear Regression Analysis
    • Support Vector Regression
    • Multiple Linear Regression
    • Linear Algebra in Machine Learning
    • Statistics for Machine Learning
    • What is Regression Analysis?
    • Clustering Methods
    • Backward Elimination
    • Ensemble Techniques
    • Bagging and Boosting
    • Linear Regression Modeling
    • What is Reinforcement Learning
  • Basic
    • Introduction To Machine Learning
    • What is Machine Learning?
    • Uses of Machine Learning
    • Applications of Machine Learning
    • Naive Bayes in Machine Learning
    • Dataset Labelling
    • DataSet Example
    • Deep Learning Techniques
    • Dataset ZFS
    • Careers in Machine Learning
    • What is Machine Cycle?
    • Machine Learning Feature
    • Machine Learning Programming Languages
    • What is Kernel in Machine Learning
    • Machine Learning Tools
    • Machine Learning Models
    • Machine Learning Platform
    • Machine Learning Libraries
    • Machine Learning Life Cycle
    • Machine Learning System
    • Machine Learning Datasets
    • Machine Learning Certifications
    • Machine Learning Python vs R
    • Optimization for Machine Learning
    • Types of Machine Learning
    • Machine Learning Methods
    • Machine Learning Software
    • Machine Learning Techniques
    • Machine Learning Feature Selection
    • Ensemble Methods in Machine Learning
    • Support Vector Machine in Machine Learning
    • Decision Making Techniques
    • Restricted Boltzmann Machine
    • Regularization Machine Learning
    • What is Regression?
    • What is Linear Regression?
    • Dataset for Linear Regression
    • Decision tree limitations
    • What is Decision Tree?
    • What is Random Forest
  • Algorithms
    • Machine Learning Algorithms
    • Apriori Algorithm in Machine Learning
    • Types of Machine Learning Algorithms
    • Bayes Theorem
    • AdaBoost Algorithm
    • Classification Algorithms
    • Clustering Algorithm
    • Gradient Boosting Algorithm
    • Gradient Descent in Machine Learning
    • Mean Shift Algorithm
    • Hierarchical Clustering Algorithm
    • Hierarchical Clustering Agglomerative
    • What is a Greedy Algorithm?
    • What is Genetic Algorithm?
    • Random Forest Algorithm
    • Nearest Neighbors Algorithm
    • Weak Law of Large Numbers
    • Ray Tracing Algorithm
    • SVM Algorithm
    • Naive Bayes Algorithm
    • Neural Network Algorithms
    • Boosting Algorithm
    • XGBoost Algorithm
    • Pattern Searching
    • Loss Functions in Machine Learning
    • Decision Tree in Machine Learning
    • Hyperparameter Machine Learning
    • Unsupervised Machine Learning
    • K- Means Clustering Algorithm
    • KNN Algorithm
    • Monty Hall Problem
  • Classification
    • Kernel Methods in Machine Learning
    • Clustering in Machine Learning
    • Machine Learning Architecture
    • Automation Anywhere Architecture
    • Machine Learning C++ Library
    • Machine Learning Frameworks
    • Data Preprocessing in Machine Learning
    • Data Science Machine Learning
    • Classification of Neural Network
    • Neural Network Machine Learning
    • What is Convolutional Neural Network?
    • Single Layer Neural Network
    • Kernel Methods
    • Forward and Backward Chaining
    • Forward Chaining
    • Backward Chaining
  • Deep Learning
    • What Is Deep learning
    • Overviews Deep Learning
    • Application of Deep Learning
    • Careers in Deep Learnings
    • Deep Learning Frameworks
    • Deep Learning Model
    • Deep Learning Algorithms
    • Deep Learning Technique
    • Deep Learning Networks
    • Deep Learning Libraries
    • Deep Learning Toolbox
    • Types of Neural Networks
    • Convolutional Neural Networks
    • Create Decision Tree
    • Deep Learning for NLP
    • Caffe Deep Learning
    • Deep Learning with TensorFlow
  • RPA
    • What is RPA
    • What is Robotics?
    • Benefits of RPA
    • RPA Applications
    • Types of Robots
    • RPA Tools
    • Line Follower Robot
    • What is Blue Prism?
    • RPA vs BPM
  • Interview Questions
    • Deep Learning Interview Questions And Answer
    • Machine Learning Cheat Sheet

Ensemble Techniques

By Priya PedamkarPriya Pedamkar

Ensemble-Techniques

Introduction to Ensemble Techniques

Ensemble learning is a technique in machine learning which takes the help of several base models and combines their output to produce an optimized model. This type of machine learning algorithm helps in improving the overall performance of the model. Here the base model which is most commonly used is the Decision tree classifier. A decision tree basically works on several rules and provides a predictive output, where the rules are the nodes and their decisions will be their children and the leaf nodes will constitute the ultimate decision. As shown in the example of a decision tree.

decisoin tree

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

The above decision tree basically talks about whether a person/customer can be given a loan or not. One of the rules for loan eligibility yes is that if (income = Yes && Married = No) Then Loan = Yes so this is how a decision tree classifier works. We will be incorporating these classifiers as a multiple base model and combine their output to build one optimum predictive model. Figure 1.b shows the overall picture of an ensemble learning algorithm.

Types of Ensembles Techniques

Different types of ensembles, but our major focus will be on the below two types:

  • Bagging
  • Boosting

These methods help in reducing the variance and bias in a machine learning model. Now let us try to understand what is bias and variance. Bias is an error that occurs due to incorrect assumptions in our algorithm; a high bias indicates our model is too simple/underfit. Variance is the error that is caused due to sensitivity of the model to very small fluctuations in the data set; a high variance indicates our model is highly complex/overfit. An ideal ML model should have a proper balance between bias and variance.

ensemble learning

Bootstrap Aggregating/Bagging

Bagging is an ensemble technique that helps in reducing variance in our model and hence avoids overfitting. Bagging is an example of the parallel learning algorithm. Bagging works based on two principles.

  • Bootstrapping: From the original data set, different sample populations are considered with replacement.
  • Aggregating: Averaging out the results of all the classifiers and providing single output, for this, it uses majority voting in the case of classification and averaging in the case of the regression problem. One of the famous machine learning algorithms which use the concept of bagging is a random forest.

Random Forest

In random forest from the random sample withdrawn from the population with replacement and a subset of features is selected from the set of all the features a decision tree is built. From these subsets of features whichever feature gives the best split is selected as the root for the decision tree. The features subset must be chosen randomly at any cost otherwise we will end up producing only correlated tress and the variance of the model will not be improved.

Now we have built our model with the samples taken from the population, the question is how do we validate the model? Since we are considering the samples with replacement hence all the samples will not be considered and some of it will not be included in any bag these are called out of bag samples. We can validate our model with this OOB (out of bag) samples. The important parameters to be considered in a random forest is the number of samples and the number of trees. Let us consider ‘m’ as the subset of features and ‘p’ is the full set of features, now as a thumb rule, it’s always ideal to choose

  • m as√and a minimum node size as 1 for a classification problem.
  • m as P/3 and minimum node size to be 5 for a regression problem.

The m and p should be treated as tuning parameters when we deal with a practical problem. The training can be terminated once the OOB error stabilizes. One drawback of the random forest is that when we have 100 features in our data set and only a couple of features are important then this algorithm will perform poorly.

Boosting

Boosting is a sequential learning algorithm that helps in reducing bias in our model and variance in some cases of supervised learning. It also helps in converting weak learners into strong learners. Boosting works on the principle of placing the weak learners sequentially and it assigns a weight to each data point after every round; more weight is assigned to the misclassified data point in the previous round. This sequential weighted method of training our data set is the key difference to that of bagging.

Fig3.a shows the general approach in boosting

boosting

The final predictions are combined based on weighted majority voting in the case of classification and weighted sum in the case of regression. The most widely used boosting algorithm is adaptive boosting (Adaboost).

Adaptive Boosting

The steps involved in the Adaboost algorithm is as follows:

  1. For the given n data points we define the target classy and initialize all the weights to 1/n.
  2. We fit the classifiers to the data set and we choose the classification with the least weighted classification error
  3. We assign weights for the classifier by a thumb rule based on accuracy, if the accuracy is more than 50% then the weight is positive and vice versa.
  4. We update the weights of the classifiers at the end of iteration; we update more weight for the misclassified point so that in the next iteration we classify it correctly.
  5. After all the iteration we get the final prediction result based on the majority voting/weighted average.

Adaboosting works efficiently with weak (less complex) learners and with high bias classifiers. The major advantages of Adaboosting are that it is fast, there are no tuning parameters similar to the case of bagging and we don’t do any assumptions on weak learners. This technique fails to provide an accurate result when

  • There are more outliers in our data.
  • The data set is insufficient.
  • The weak learners are highly complex.

They are susceptible to noise as well. The decision trees that are produced as a result of boosting will have limited depth and high accuracy.

Conclusion

Ensemble learning techniques are widely used in improving the model accuracy; we have to decide on which technique to use based on our data set. But these techniques are not preferred in some cases where interpretability is of importance, as we lose interpretability at the cost of performance improvement. These have tremendous significance in the health care industry where a small improvement in performance is very valuable.

Recommended Articles

This is a guide to Ensemble Techniques. Here we discuss the basic concept and two major types of Ensemble Techniques with detail explanation. You can also go through our other related articles to learn more –

  1. Machine Learning Techniques
  2. Team Building Techniques
  3. Data Science Algorithms
  4. Most Used Techniques of Ensemble Learning
Popular Course in this category
Machine Learning Training (20 Courses, 29+ Projects)
  19 Online Courses |  29 Hands-on Projects |  178+ Hours |  Verifiable Certificate of Completion
4.7
Price

View Course

Related Courses

Deep Learning Training (18 Courses, 24+ Projects)4.9
Artificial Intelligence AI Training (5 Courses, 2 Project)4.8
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2023 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more