EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Machine Learning Tutorial Machine Learning Feature Selection
Secondary Sidebar
Machine Learning Tutorial
  • Basic
    • Introduction To Machine Learning
    • What is Machine Learning?
    • Uses of Machine Learning
    • Applications of Machine Learning
    • Naive Bayes in Machine Learning
    • Dataset Labelling
    • DataSet Example
    • Deep Learning Techniques
    • Dataset ZFS
    • Careers in Machine Learning
    • What is Machine Cycle?
    • Machine Learning Feature
    • Machine Learning Programming Languages
    • What is Kernel in Machine Learning
    • Machine Learning Tools
    • Machine Learning Models
    • Machine Learning Platform
    • Machine Learning Libraries
    • Machine Learning Life Cycle
    • Machine Learning System
    • Machine Learning Datasets
    • Machine Learning Certifications
    • Machine Learning Python vs R
    • Optimization for Machine Learning
    • Types of Machine Learning
    • Machine Learning Methods
    • Machine Learning Software
    • Machine Learning Techniques
    • Machine Learning Feature Selection
    • Ensemble Methods in Machine Learning
    • Support Vector Machine in Machine Learning
    • Decision Making Techniques
    • Restricted Boltzmann Machine
    • Regularization Machine Learning
    • What is Regression?
    • What is Linear Regression?
    • Dataset for Linear Regression
    • Decision tree limitations
    • What is Decision Tree?
    • What is Random Forest
  • Algorithms
    • Machine Learning Algorithms
    • Apriori Algorithm in Machine Learning
    • Types of Machine Learning Algorithms
    • Bayes Theorem
    • AdaBoost Algorithm
    • Classification Algorithms
    • Clustering Algorithm
    • Gradient Boosting Algorithm
    • Mean Shift Algorithm
    • Hierarchical Clustering Algorithm
    • Hierarchical Clustering Agglomerative
    • What is a Greedy Algorithm?
    • What is Genetic Algorithm?
    • Random Forest Algorithm
    • Nearest Neighbors Algorithm
    • Weak Law of Large Numbers
    • Ray Tracing Algorithm
    • SVM Algorithm
    • Naive Bayes Algorithm
    • Neural Network Algorithms
    • Boosting Algorithm
    • XGBoost Algorithm
    • Pattern Searching
    • Loss Functions in Machine Learning
    • Decision Tree in Machine Learning
    • Hyperparameter Machine Learning
    • Unsupervised Machine Learning
    • K- Means Clustering Algorithm
    • KNN Algorithm
    • Monty Hall Problem
  • Supervised
    • What is Supervised Learning
    • Supervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Perceptron Learning Algorithm
    • Simple Linear Regression
    • Polynomial Regression
    • Multivariate Regression
    • Regression in Machine Learning
    • Hierarchical Clustering Analysis
    • Linear Regression Analysis
    • Support Vector Regression
    • Multiple Linear Regression
    • Linear Algebra in Machine Learning
    • Statistics for Machine Learning
    • What is Regression Analysis?
    • Clustering Methods
    • Backward Elimination
    • Ensemble Techniques
    • Bagging and Boosting
    • Linear Regression Modeling
    • What is Reinforcement Learning
  • Classification
    • Kernel Methods in Machine Learning
    • Clustering in Machine Learning
    • Machine Learning Architecture
    • Automation Anywhere Architecture
    • Machine Learning C++ Library
    • Machine Learning Frameworks
    • Data Preprocessing in Machine Learning
    • Data Science Machine Learning
    • Classification of Neural Network
    • Neural Network Machine Learning
    • What is Convolutional Neural Network?
    • Single Layer Neural Network
    • Kernel Methods
    • Forward and Backward Chaining
    • Forward Chaining
    • Backward Chaining
  • Deep Learning
    • What Is Deep learning
    • Overviews Deep Learning
    • Application of Deep Learning
    • Careers in Deep Learnings
    • Deep Learning Frameworks
    • Deep Learning Model
    • Deep Learning Algorithms
    • Deep Learning Technique
    • Deep Learning Networks
    • Deep Learning Libraries
    • Deep Learning Toolbox
    • Types of Neural Networks
    • Convolutional Neural Networks
    • Create Decision Tree
    • Deep Learning for NLP
    • Caffe Deep Learning
    • Deep Learning with TensorFlow
  • RPA
    • What is RPA
    • What is Robotics?
    • Benefits of RPA
    • RPA Applications
    • Types of Robots
    • RPA Tools
    • Line Follower Robot
    • What is Blue Prism?
    • RPA vs BPM
  • Interview Questions
    • Deep Learning Interview Questions And Answer
    • Machine Learning Cheat Sheet

Related Courses

Machine Learning Training

Deep Learning Training

Artificial Intelligence Training

Machine Learning Feature Selection

By Priya PedamkarPriya Pedamkar

Machine Learning Feature Selection

What is Machine Learning Feature Selection?

Feature selection is the process of identifying critical or influential variable from the target variable in the existing features set. The feature selection can be achieved through various algorithms or methodologies like Decision Trees, Linear Regression, and Random Forest, etc. These algorithms help us identify the most important attributes through weightage calculation.

“Feature selection is selecting the most useful features to train the model among existing features”

What is Machine Learning?

Machine Learning is an emerging and futuristic technology that stands as the starting point to create automated innovations with intelligence. It relies on the learning of patterns and trends that occurred in a period. The learning curve of the ML is high, as the implementation gets better with different programming languages with new perspectives. It has a wide range of learning capabilities over the internet. Classifications, Neural networks, Clustering, Model predicting are the core points in Machine Learning. The boom in technology is extended to cloud services and IoT creations.

To solve the problems with cutting edge machine learning technologies, we require a few processes to be carried out sequentially. They are,

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

  • Collecting data points: importing the dataset to the modeling environment.
  • Feature Engineering: A process of converting raw data into a structured format i.e. extracting new variables from the raw data. Making the data as ready to use for model training.
  • Feature Selection: Picking up the most predictive features from enormous data points in the dataset.
  • Model Selection: Picking up the right model for prediction through high weightage.
  • Model Prediction: Deriving results from the predicted model.

Let’s see the important stage of machine learning for model prediction.

Why feature Selection is Important in ML?

This process reduces physical intervention in data analysis. It makes the feature interpretation easy and ready to use. The technique helps us to select the most targeted variable correlating with other variables. This reduces the dimension of the set and improves the accuracy of the selected features. Hence the model performance is increased with the selected features.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,241 ratings)

The three main executions of Feature Selection are,

  • Feature selection can be done after data splitting into the train and validation set. To measure the performance of the variable and drop the columns through cross-validation.
  • Removing unnecessary features i.e low correlated variables -> having less weightage value.
  • Building a model on selected features using methods like statistical approaching, cross-validation, grid-search, etc.

How can you select Data points through feature Selection?

Let’s take a case study on finding Health Type of Cereal prediction using their Nutrition compositions. This dataset has a set of cereals having various nutrients like Fiber, Vitamins, Carbohydrates, and Potassium, etc. Let us juggle inside to know which nutrient contributes high importance as a feature and see how feature selection plays an important role in model prediction. Here, we will see the process of feature selection in the R Language.

Step 1: Data import to the R Environment.

R Environment

View of Cereal Dataset

Cereal Dataset

Step 2: Converting the raw data points in structured format i.e. Feature Engineering

raw data points

Step 3: Feature Selection – Picking up high correlated variables for predicting model

Step 3A: – Split the data into train & validation set

train & validation set

Step 3B: The train set is used for finding the importance and error rate using the RandomForest algorithm. You should see here that Cereal Name and Health Type are eliminated from the randomForest formula. Because the categorical variables with different sets of values are not supported in the algorithm.

Cereal Name and Health Typ

The error rate is represented using plotting. It is plotted based on the correlation values. The red plotting represents there is more number of low correlated values.

correlated values

Step 3C: Rank the features using their correlations and high importance. Here the features are ranked according to their importance in the training set. And plotted using the ggplot library. The importance is plotted using MeanDecreaseGini.

MeanDecreaseGini is a measure of the purity of a feature which implies whether the featured variable will be useful or not.

MeanDecreaseGini

graph

In this graph, we can see Carbohydrates, Protein, Fiber, Type of Manufacturing the cereals, and Calories are the top most important features. These features greatly contribute to the model prediction and decide the Health type of Cereal.

Step 3D: The top 10 variables are ranked according to their importance and ordered down. These 10 variables are considered as selected features and used for model prediction.

Machine Learning Feature Selection - 9

Note: The features can also be selected according to the usage of the model and no.of.features processed for finding importance. It truly depends on the categories and features used for solving the problem.

Step 3E: There are two methods for feature selection. One can stop here and use the most important features derived from RandomForest, and form formula for model prediction.

Machine Learning Feature Selection - 10

Step 3F: Another method to drill down the feature is the StepAIC method. This method uses reverse engineering and eliminates the low correlated feature further using logistic regression. This greatly helps to use only very high correlated features in the model. And that provides highly accurate results.

Machine Learning Feature Selection - 11

Machine Learning Feature Selection - 12

We achieved feature selection through the co-efficient of the variables used in the method. Below is the summarization of the StepAIC method for feature selection.

This summary is based on the logistic regression method.

Machine Learning Feature Selection - 13

  • Lower the AIC value produces efficient results.
  • Null deviance represents how well the response is predicted by the trained model with nothing but the intercept value. If the null deviance is small, then the model performs well. The p-value is considered for the measure and checks how well it fits the data model.
  • Residual deviance represents how well the response is predicted by the trained model when the predictors i.e. HealthType is included. It is used to see whether the null hypothesis is true or not.
  • This summary is based on backward propagation in StepAIC. Eliminating the low correlated values using their weightage.

Machine Learning Feature Selection - 14

So, the two features of Sodium and Fat are used for modeling.

Conclusion

The feature selection changes according to parameter tuning. There are so many methods to process the feature selection. Here we used two methods and understood how important to select the features and model to get good results. This feature selection process takes a bigger role in machine learning problems to solve the complexity in it.

Recommended Articles

This is a guide to Machine Learning Feature Selection. Here we discuss what is feature selection and machine learning and steps to select data point in feature selection. You can also go through our other related articles to learn more –

  1. Machine Learning System
  2. Supervised Machine Learning Algorithms
  3. Machine Learning Feature
  4. Machine Learning Techniques
Popular Course in this category
Machine Learning Training (20 Courses, 29+ Projects)
  19 Online Courses |  29 Hands-on Projects |  178+ Hours |  Verifiable Certificate of Completion
4.7
Price

View Course

Related Courses

Deep Learning Training (18 Courses, 24+ Projects)4.9
Artificial Intelligence AI Training (5 Courses, 2 Project)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more