EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Machine Learning Tutorial Multiple Linear Regression
Secondary Sidebar
Machine Learning Tutorial
  • Supervised
    • What is Supervised Learning
    • Supervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Perceptron Learning Algorithm
    • Simple Linear Regression
    • Polynomial Regression
    • Multivariate Regression
    • Regression in Machine Learning
    • Hierarchical Clustering Analysis
    • Linear Regression Analysis
    • Support Vector Regression
    • Multiple Linear Regression
    • Linear Algebra in Machine Learning
    • Statistics for Machine Learning
    • What is Regression Analysis?
    • Clustering Methods
    • Backward Elimination
    • Ensemble Techniques
    • Bagging and Boosting
    • Linear Regression Modeling
    • What is Reinforcement Learning
  • Basic
    • Introduction To Machine Learning
    • What is Machine Learning?
    • Uses of Machine Learning
    • Applications of Machine Learning
    • Naive Bayes in Machine Learning
    • Dataset Labelling
    • DataSet Example
    • Deep Learning Techniques
    • Dataset ZFS
    • Careers in Machine Learning
    • What is Machine Cycle?
    • Machine Learning Feature
    • Machine Learning Programming Languages
    • What is Kernel in Machine Learning
    • Machine Learning Tools
    • Machine Learning Models
    • Machine Learning Platform
    • Machine Learning Libraries
    • Machine Learning Life Cycle
    • Machine Learning System
    • Machine Learning Datasets
    • Machine Learning Certifications
    • Machine Learning Python vs R
    • Optimization for Machine Learning
    • Types of Machine Learning
    • Machine Learning Methods
    • Machine Learning Software
    • Machine Learning Techniques
    • Machine Learning Feature Selection
    • Ensemble Methods in Machine Learning
    • Support Vector Machine in Machine Learning
    • Decision Making Techniques
    • Restricted Boltzmann Machine
    • Regularization Machine Learning
    • What is Regression?
    • What is Linear Regression?
    • Dataset for Linear Regression
    • Decision tree limitations
    • What is Decision Tree?
    • What is Random Forest
  • Algorithms
    • Machine Learning Algorithms
    • Apriori Algorithm in Machine Learning
    • Types of Machine Learning Algorithms
    • Bayes Theorem
    • AdaBoost Algorithm
    • Classification Algorithms
    • Clustering Algorithm
    • Gradient Boosting Algorithm
    • Mean Shift Algorithm
    • Hierarchical Clustering Algorithm
    • Hierarchical Clustering Agglomerative
    • What is a Greedy Algorithm?
    • What is Genetic Algorithm?
    • Random Forest Algorithm
    • Nearest Neighbors Algorithm
    • Weak Law of Large Numbers
    • Ray Tracing Algorithm
    • SVM Algorithm
    • Naive Bayes Algorithm
    • Neural Network Algorithms
    • Boosting Algorithm
    • XGBoost Algorithm
    • Pattern Searching
    • Loss Functions in Machine Learning
    • Decision Tree in Machine Learning
    • Hyperparameter Machine Learning
    • Unsupervised Machine Learning
    • K- Means Clustering Algorithm
    • KNN Algorithm
    • Monty Hall Problem
  • Classification
    • Kernel Methods in Machine Learning
    • Clustering in Machine Learning
    • Machine Learning Architecture
    • Automation Anywhere Architecture
    • Machine Learning C++ Library
    • Machine Learning Frameworks
    • Data Preprocessing in Machine Learning
    • Data Science Machine Learning
    • Classification of Neural Network
    • Neural Network Machine Learning
    • What is Convolutional Neural Network?
    • Single Layer Neural Network
    • Kernel Methods
    • Forward and Backward Chaining
    • Forward Chaining
    • Backward Chaining
  • Deep Learning
    • What Is Deep learning
    • Overviews Deep Learning
    • Application of Deep Learning
    • Careers in Deep Learnings
    • Deep Learning Frameworks
    • Deep Learning Model
    • Deep Learning Algorithms
    • Deep Learning Technique
    • Deep Learning Networks
    • Deep Learning Libraries
    • Deep Learning Toolbox
    • Types of Neural Networks
    • Convolutional Neural Networks
    • Create Decision Tree
    • Deep Learning for NLP
    • Caffe Deep Learning
    • Deep Learning with TensorFlow
  • RPA
    • What is RPA
    • What is Robotics?
    • Benefits of RPA
    • RPA Applications
    • Types of Robots
    • RPA Tools
    • Line Follower Robot
    • What is Blue Prism?
    • RPA vs BPM
  • Interview Questions
    • Deep Learning Interview Questions And Answer
    • Machine Learning Cheat Sheet

Related Courses

Machine Learning Training

Deep Learning Training

Artificial Intelligence Training

Multiple Linear Regression

By Priya PedamkarPriya Pedamkar

multiple linear regression

Introduction to Multiple Linear Regression

In machine learning Linear regression is a supervised learning algorithm that is used to predict a continuous output having a constant slope. This means that the predicted values from the linear regression will be in a specific range and it won’t try them to classify them in categories.

Linear regression is classified into two types:

  • Simple Linear Regression: Only one independent variable is present.
  • Multiple Linear Regression: Multiple independent variables is present.

In most of the applications, the number of features used to predict the dependent variable is more than one so in this article, we will cover multiple linear regression and will see its implementation using python.

Explanation for Multiple Linear Regression

  • Multiple Linear Regression is an extension of the simple linear regression model in the way that there are multiple independent variables(features) that are used to predict the dependent variable.
  • Suppose an analyst wants to know the price of the house then his linear equation will be the area of land as the independent variable and the price as the dependent variable. But in reality, more features affect the price of the house. The price of the house depends on other predictors like the floors in the house, number of bedrooms, age of the house, etc. To understand the relationship in which multiple independent variables are involved, we use multiple linear regression.
  • MLR is used to determine a mathematical relationship between the multiple independent variables. Once the factor or coefficient for each independent variable is determined then the information can be used to accurately predict the outcome. The relationship created by the model is informed of a linear (straight) line that best approximates all the individual data points.

Having an idea about multiple linear regression lets us understand the maths behind it.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,650 ratings)

Equation in Multiple Linear Regression

Given below shows the equation in Multiple Linear Regression:

1. Hypothesis

The hypothesis or the model of the multiple linear regression is given by the equation:

h(x) = θ0 + θ1×1 + θ2×2 + θ3×3…θnxn

Where,

  • xi is the ith feature or the independent variables.
  • θi is the weight or coefficient of ith feature.
Note: That x0 = 1 and θ0 is the bias term.

This linear equation is used to approximate all the individual data points. If there would have been only 1 feature, then this equation would have resulted in a straight line.

2. Cost Function

Our objective is to find the optimal weight of the hypothesis so that the difference between the computed value and the actual value is minimum. This difference between the actual and computed value is termed as cost. Cost helps in determining the overall function of the model.

So, the cost of the multiple regression is given by the following equation:

J (θ) = 1/2m ∑mi=1(hθ(x(i)) – y(i))2

Where,

  • J(c): Cost Function
  • m: Number of the Training Dataset
  • h(θ): Hypothesis
  • θ: Weights
  • y: Labels

3. Gradient Descent

Next is the gradient descent, which is needed to update the weight values. The Gradient descent takes use of the partial derivative of the cost function.

Gradient descent for multiple linear regression is given by:

θj = θj – ∝*J (θ) (Simultaneously Update for Every j = 0,…n)

Where,

  • ∝: Is the learning rate.

4. Normalization

It is used to speed up the computation of the gradient descent process. To do so we need to ensure that all the features are within a specific range. It is important in the dataset which has a high standard deviation or has a different range of attributes. So first we have to bring the features in the range of -1 to 1.

To perform normalization, we can use feature scaling and mean normalization. It is given by the equation:

xi = (xi – µi) / (max(xi) – min(xi))

That’s it, these are the three major steps that we need to perform while simplifying the equation to fit the parameter in the multiple linear regression.

Implementing Multiple Linear Regression

We will now see the implementation of the MLR using the scikit-learn module of python. Since it contains the optimize implementation of most of the model including the Linear Regression it is recommended to use them despite creating your own. In order to implement the code, I have used Kaggle’s workspace.

The steps involved in the implementation are:

  • Import the Libraries
  • Import Dataset
  • Data Pre-Processing
  • Splitting Data Into Test and Training Set
  • Training Model
  • Evaluating Model

1. Import Libraries

  • To implement the multiple linear regression model we will take the help of the scikit-learn module as it comes prepacked with some of the sample datasets and useful functions.

Import Libraries

2. Import Dataset

  • The next step is to import the dataset. Since scikit already has sample datasets we’ll be using the Boston Housing dataset. This dataset consists of information about the homes in Boston. The features in the dataset are used to predict the price of the house.

Let’s create the instance of the dataset and see what feature it contains:

Import Dataset

Dataset contains the following information:

  • Data: The actual information on the house.
  • Target: Price of the house.
  • Feature names: Name of the feature of the info that the data contains.
  • DESCR: It is the command used to describe the feature of the dataset.

3. Data Pre-Processing

  • Now will create a DataFrame object of the data, by keeping the feature name as the header, using the panda’s library.

Multiple Linear Regression - 3

  • You can check the loaded data using the head command of the pandas.

Multiple Linear Regression - 4

  • We want to predict the price of the house but in our current data frame, we don’t have that information. So we need a new column and store the information from the dataset.

Multiple Linear Regression - 5

  • Normally we have to perform multiple techniques of data pre-processing but that is for the other article. I will directly take the feature with stronger correlation i.e. ‘LTSAT’, and ‘RM’ (number of rooms). So, we will use these two features to perform linear regression.

Multiple Linear Regression - 6

  • X value will contain the independent values while the Y variable will store the target values.

4. Splitting Data Into Test and Training Set

  • Now we will split the data into training and test set. Let’s keep 80% of the data for training and the remaining 20% of the data for testing.

Splitting Data Into Test and Training Set

5. Training Model

  • Now we need to create the instance of the model using the scikit-Learn. We do so by simply calling the LinearRegression().

Multiple Linear Regression - 8

  • Once the instance is created we can fit the model on the training data.

Multiple Linear Regression - 9

  • That’s it our model has been fitted. Now we can use it to make predictions by calling the predict command. We will run the prediction on the test data.

Multiple Linear Regression - 10

6. Evaluating Model

  • Using the RMSE and R2 metric we will compute the prediction against the actual values. RMSE and R2 are among the two popular metrics used for evaluating the regression task.
  • The lower the value of rmse the better is the fit and Closer the value of R2 is to 1 then better the model fits.

Evaluating Model

  • For our model, we got the result as 5.66 and 0.67 for rmse and r2_score respectively.
  • You can consider trying more features to improve the accuracy of the model.

Conclusion

In this article, we understood the Multiple linear regression along with its math and actual implementation using python. Since you have completed this article, I recommend you to try working on a different dataset and try to increase the accuracy.

Recommended Articles

This is a guide to Multiple Linear Regression. Here we discuss the explanation, with different forms of equations in multiple linear regression and examples to implement. You can also go through our other related articles to learn more –

  1. Simple Linear Regression in R
  2. OLS Regression in R
  3. Statistical Analysis Regression
  4. What is Linear Regression?
Popular Course in this category
All in One Data Science Bundle (360+ Courses, 50+ projects)
  360+ Online Courses |  1500+ Hours |  Verifiable Certificates |  Lifetime Access
4.7
Price

View Course

Related Courses

Machine Learning Training (20 Courses, 29+ Projects)4.9
Deep Learning Training (18 Courses, 24+ Projects)4.8
Artificial Intelligence AI Training (5 Courses, 2 Project)4.7
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Software Development Course

C# Programming, Conditional Constructs, Loops, Arrays, OOPS Concept

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Software Development Course

Web development, programming languages, Software testing & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more