Introduction to Bagging and Boosting
Bagging and Boosting are the two popular Ensemble Methods. So before understanding Bagging and Boosting, let’s have an idea of what is ensemble Learning. It is the technique to use multiple learning algorithms to train models with the same dataset to obtain a prediction in machine learning. After getting the prediction from each model, we will use model averaging techniques like weighted average, variance or max voting to get the final prediction. This method aims to obtain better predictions than the individual model. This results in better accuracy avoiding overfitting, and reduces bias and co-variance. Two popular ensemble methods are:
- Bagging (Bootstrap Aggregating)
Bagging, also known as Bootstrap Aggregating, is used to improve accuracy and make the model more generalize by reducing the variance, i.e., avoiding overfitting. In this, we take multiple subsets of the training dataset. For each subset, we take a model with the same learning algorithms like Decision tree, Logistic regression, etc., to predict the output for the same set of test data. Once we predict each model, we use a model averaging technique to get the final prediction output. One of the famous techniques used in Bagging is Random Forest. In the Random forest, we use multiple decision trees.
Boosting is primarily used to reduce the bias and variance in a supervised learning technique. It refers to the family of an algorithm that converts weak learners (base learner) to strong learners. The weak learner is the classifiers that are correct only up to a small extent with the actual classification, while the strong learners are the classifiers that are well correlated with the actual classification. Few famous techniques of Boosting are AdaBoost, GRADIENT BOOSTING, XgBOOST (Extreme Gradient Boosting). So now we know what bagging and boosting are and what are their roles in Machine Learning.
Working of Bagging and Boosting
Now let’s understand how bagging and boosting works:
To understand the working of Bagging, assume we have an N number of models and a Dataset D. Where m is the number of data and n is the number of features in each data. And we are supposed to do binary classification. First, we will split the dataset. For now, we will split this dataset into training and test set only. Let’s call the training dataset, where is the total number of training examples.
Take a sample of records from the training set and use it to train the first model, say m1. For the next model, m2 resample the training set and take another sample from the training set. We will do this same thing for the N number of models. Since we are resampling the training dataset and taking the samples from it without removing anything from the dataset, it might be possible that we have two or more training data record common in multiple samples. This technique of resampling the training dataset and providing the sample to the model is termed Row Sampling with Replacement. Suppose we have trained each model, and now we want to see the prediction on test data. Since we are working on binary classification, the output can be either 0 or 1. The test dataset is passed to each model, and we get a prediction from each model. Let’s say out of N models more than N/2 models predicted it to be 1; hence, using the model averaging technique like maximum vote, we can say that the predicted output for the test data is 1.
In boosting, we take records from the dataset and pass it to base learners sequentially; here, base learners can be any model. Suppose we have m number of records in the dataset. Then we pass a few records to base learner BL1 and train it. Once the BL1 gets trained, then we pass all the records from the dataset and see how the Base learner works. For all the records classified incorrectly by the base learner, we only take them and pass it to other base learners say BL2 and simultaneously pass the incorrect records classified by BL2 to train BL3. This will go on unless and until we specify some specific number of base learner models we need. Finally, we combine the output from these base learners and create a strong learner; thus, the model’s prediction power gets improved. Ok. So now we know how the Bagging and Boosting work.
Advantages and Disadvantages of Bagging
Below given are the top advantages and disadvantages.
Advantages of Bagging
- The biggest advantage of bagging is that multiple weak learners can work better than a single strong learner.
- It provides stability and increases the machine learning algorithm’s accuracy that is used in statistical classification and regression.
- It helps in reducing variance, i.e. it avoids overfitting.
Disadvantages of Bagging
- It may result in high bias if it is not modelled properly and thus may result in underfitting.
- Since we must use multiple models, it becomes computationally expensive and may not be suitable in various use cases.
Advantages and Disadvantages of Boosting
Below given are the top advantages and disadvantages.
Advantages of Boosting
- It is one of the most successful techniques in solving the two-class classification problems.
- It is good at handling the missing data.
Disadvantages of Boosting
- Boosting is hard to implement in real-time due to the increased complexity of the algorithm.
- The high flexibility of these techniques results in multiple numbers parameters that directly affect the behaviour of the model.
The main takeaway is that Bagging and Boosting are a machine learning paradigm in which we use multiple models to solve the same problem and get a better performance And if we combine weak learners properly, then we can obtain a stable, accurate and robust model. In this article, I have given a basic overview of Bagging and Boosting. In the upcoming articles, you will get to know the different techniques used in both. Finally, I will conclude by reminding you that Bagging and Boosting are among the most used techniques of ensemble learning. The real art of improving the performance lies in your understanding of when to use which model and how to tune the hyperparameters.
This is a guide to Bagging and Boosting. Here we discuss the Introduction to Bagging and Boosting and its Working along with Advantages and Disadvantages. You can also go through our other suggested articles to learn more –
- Introduction to Ensemble Techniques
- Categories of Machine Learning Algorithms
- Gradient Boosting Algorithm with Sample Code
- What is the Boosting Algorithm?