Introduction to AdaBoost Algorithm
An adaBoost algorithm can be used to boost the performance of any machine learning algorithm. Machine Learning has become a powerful tool which can make predictions based on a large amount of data. It has become so popular in recent times that the application of machine learning can be found in our day-to-day activities. A common example of it is getting suggestions for products while shopping online based on the past items bought by the customer. Machine Learning, often referred to as predictive analysis or predictive modeling, can be defined as the ability of computers to learn without being programmed explicitly. Instead, it uses programmed algorithms to analyze input data to predict output within an acceptable range.
What is AdaBoost Algorithm?
In machine learning, boosting originated from the question of whether a set of weak classifiers could be converted to a strong classifier. A weak learner or classifier is a learner who is better than random guessing. This will be robust in over-fitting as in a large set of weak classifiers, each weak classifier being better than random. As a weak classifier, a simple threshold on a single feature is generally used. If the feature is above the threshold than predicted, it belongs to positive otherwise belongs to negative.
AdaBoost stands for ‘Adaptive Boosting’, which transforms weak learners or predictors to strong predictors in order to solve problems of classification.
For classification, the final equation can be put as below:
Here fm designates the mth weak classifier, and m represents its corresponding weight.
How AdaBoost Algorithm Works?
AdaBoost can be used to improve the performance of machine learning algorithms. It is used best with weak learners, and these models achieve high accuracy above random chance on a classification problem. The common algorithms with AdaBoost used are decision trees with level one. A weak learner is a classifier or predictor which performs relatively poor in terms of accuracy. Also, it can be implied that the weak learners are simple to compute, and many instances of algorithms are combined to create a strong classifier through boosting.
If we take a data set containing n number of points and consider the below
-1 represents negative class, and 1 indicate positive. It is initialized as below, the weight for each data point as:
If we consider iteration from 1 to M for m, we will get the below expression:
First, we have to select the weak classifier with the lowest weighted classification error by fitting the weak classifiers to the data set.
Then calculating the weight for the mth weak classifier as below:
The weight is positive for any classifier with an accuracy higher than 50%. The weight becomes larger if the classifier is more accurate, and it becomes negative if the classifier has an accuracy of less than 50%. The prediction can be combined by inverting the sign. By inverting the sign of the prediction, a classifier with a 40% accuracy can be converted into a 60% accuracy. So the classifier contributes to the final prediction, even though it performs worse than random guessing. However, the final prediction will not have any contribution or get information from the classifier with precisely 50% accuracy.
The exponential term in the numerator is always greater than 1 for a mis-classified case from the positive weighted classifier. After iteration, the mis-classified cases get updated with larger weights. The negative weighted classifiers behave the same way. But there is a difference that after the sign is inverted, the correct classifications originally would convert into mis-classification. The final prediction can be calculated by taking into account each classifier and then performing the sum of their weighted prediction.
Updating the weight for each data point as below:
Zm is here the normalization factor. It makes sure that the sum total of all instance weights becomes equal to 1.
What is AdaBoost Algorithm Used for?
AdaBoost can be used for face detection as it seems to be the standard algorithm for face detection in images. It uses a rejection cascade consisting of many layers of classifiers. When the detection window is not recognized at any layer as a face, it is rejected. The first classifier in the window discards the negative window keeping the computational cost to a minimum. Though AdaBoost combines the weak classifiers, the principles of AdaBoost are also used to find the best features to use in each layer of the cascade.
Pros and Cons
One of the many advantages of the AdaBoost Algorithm is it is fast, simple and easy to program. Also, it has the flexibility to be combined with any machine learning algorithm, and there is no need to tune the parameters except for T. It has been extended to learning problems beyond binary classification, and it is versatile as it can be used with text or numeric data.
AdaBoost also has few disadvantages, such as it is from empirical evidence and particularly vulnerable to uniform noise. Weak classifiers being too weak can lead to low margins and overfitting.
We can consider an example of admission of students to a university where either they will be admitted or denied. Here the quantitative and qualitative data can be found from different aspects. For example, the result of the admission, which might be yes/no, can be quantitative, whereas any other area like the skills or hobbies of students can be qualitative. We can come up easily with the correct classification of training data at better than the chance for conditions like if the student is good at a particular subject, then she/he is admitted. But it is difficult to find highly accurate prediction, and then weak learners come into the picture.
AdaBoost helps in choosing the training set for each new classifier that is trained based on the results of the previous classifier. Also, while combining the results, it determines how much weight should be given to each classifier’s proposed answer. It combines the weak learners to create a strong one to correct classification errors which is also the first successful boosting algorithm for binary classification problems.
This has been a guide to AdaBoost Algorithm. Here we discussed the basic concept, uses, working, pros and cons with an example of the AdaBoost Algorithm. You can also go through our other suggested articles to learn more –