Introduction to Machine Learning Models
An overview of various machine learning models used in practice. Going by the definition, a machine learning model is a mathematical configuration obtained after applying specific machine learning methodologies. Using the extensive range of APIs, building a machine learning model is pretty much straight forward nowadays with the fewer lines of codes. But the real skill of an applied data science professional lies in choosing the correct model based on the problem statement and cross-validation instead of throwing data to fancy algorithms randomly. In this article, we will discuss various machine learning models and how to use them effectively based on the type of problems they address.
Types of Machine Learning Models
Based on the type of tasks we can classify machine learning models in the following types:
- Classification Models
- Regression Models
- Dimensionality Reduction
- Deep Learning etc.
With respect to machine learning, classification is the task of predicting the type or class of an object within a finite number of options. The output variable for classification is always a categorical variable. For example, predicting an email is spam or not is a standard binary classification task. Now let’s note down some important models for classification problems.
- K-Nearest neighbors algorithm – simple but computationally exhaustive.
- Naive Bayes – Based on Bayes theorem.
- Logistic Regression – Linear model for binary classification.
- SVM – can be used for binary/multiclass classifications.
- Decision Tree – ‘If Else’ based classifier, more robust to outliers.
- Ensembles – Combination of multiple machine learning models clubbed together to get better results.
In the machine, learning regression is a set of problems where the output variable can take continuous values. For example, predicting the airline price can be considered as a standard regression task. Let’s note down some important regression models used in practice.
- Linear Regression – Simplest baseline model for regression task, works well only when data is linearly separable and very less or no multicollinearity is present.
- Lasso Regression – Linear regression with L2 regularization.
- Ridge Regression – Linear regression with L1 regularization.
- SVM regression
- Decision Tree Regression etc.
In simple words, clustering is the task of grouping similar objects together. Machine learning models help to identify similar objects automatically without manual intervention. We can not build effective supervised machine learning models (models that need to be trained with manually curated or labeled data) without homogeneous data. Clustering helps us achieve this in a smarter way. Following are some of the widely used clustering models:
- K means – Simple but suffers from high variance.
- K means++ – Modified version of K means.
- K medoids.
- Agglomerative clustering – A hierarchical clustering model.
- DBSCAN – Density-based clustering algorithm etc.
4) Dimensionality Reduction
Dimensionality is the number of predictor variables used to predict the independent variable or target.often in the real world datasets the number of variables is too high. Too many variables also bring the curse of overfitting to the models. In practice among these large numbers of variables, not all variables contribute equally towards the goal and in a large number of cases, we can actually preserve variances with a lesser number of variables. Let’s list out some commonly used models for dimensionality reduction.
4.7 (4,019 ratings)
- PCA – It creates lesser numbers of new variables out of a large number of predictors. The new variables are independent of each other but less interpretable.
- TSNE – Provides lower dimensional embedding of higher-dimensional data points.
- SVD – Singular value decomposition is used to decompose the matrix into smaller parts in order to efficient calculation.
5) Deep Learning
- Multi-Layer perceptron
- Convolution Neural Networks
- Recurrent Neural Networks
- Boltzmann machine
- Autoencoders etc.
Which Model is the Best?
Above we took ideas about lots of machine learning models. Now an obvious question comes to our mind ‘Which is the best model among them?’ It depends on the problem at hand and other associated attributes like outliers, the volume of available data, quality of data, feature engineering, etc. In practice, it is always preferable to start with the simplest model applicable to the problem and increase the complexity gradually by proper parameter tuning and cross-validation. There is a proverb in the world of data science – ‘Cross-validation is more trustworthy than domain knowledge’.
How to Build a Model?
Let’s see how to build a simple logistic regression model using the Scikit Learn library of python. For simplicity, we are assuming the problem is a standard classification model and ‘train.csv’ is the train and ‘test.csv’ is the train and test data respectively.
In this article, we discussed the important machine learning models used for practical purposes and how to build a simple machine learning model in python. Choosing a proper model for a particular use case is very important to obtain the proper result of a machine learning task. To compare the performance between various models, evaluation metrics or KPIs are defined for particular business problems and the best model is chosen for production after applying the statistical performance checking.
This is a guide to Machine Learning Models. Here we discuss the Top 5 Types of Machine Learning Models with its Definition. You can also go through our other suggested articles to learn more –