EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials R Programming Tutorial Multiple Linear Regression in R
Secondary Sidebar
R programming Tutorial
  • Regression in R
    • Simple Linear Regression in R
    • Linear Regression in R
    • Multiple Linear Regression in R
    • Logistic Regression in R
    • Poisson Regression in R
    • OLS Regression in R
    • P-Value in Regression
  • Basic
    • What is R Programming Language
    • Careers in R Programming
    • Install R
    • List of R Packages
    • Introduction of R Tools Technology
    • R Programming Language
    • DataSet in R
    • What is RStudio?
    • R-studio-Functions
    • R Packages
    • Time series?in R
    • R Data Types
    • R for data science
    • R Operators
    • R Data Frame
    • R Analytics Tool
    • R Tree Package
    • Vectors in R
  • Control statement
    • If Statement in R
    • If Else Statement in R
    • Else if in R
    • Switch Statement in R
  • Loops
    • Loops in R
    • For Loop in R
    • Nested For Loop in R
    • While Loop in R
    • Next in R
  • Chart/graphs
    • Graphs in R
    • Bar Charts in R
    • Pie Chart in R
    • Histogram in R
    • Line Graph in R
    • Plot Function in R
    • Scatterplot in R
    • R Boxplot labels
  • Anova in R
    • ANOVA in R
    • One Way ANOVA in R
    • Two Way ANOVA in R
  • Data Structure
    • R list
    • Arrays in R
    • Data Frames in R
    • Factors in R
    • R Vectors
  • Advanced
    • Statistical Analysis with R
    • R String Functions
    • Data Exploration in R
    • R CSV Files
    • KNN Algorithm in R
    • Sorting in R
    • lm Function in R
    • Hierarchical Clustering in R
    • R Normal Distribution
    • Binomial Distribution in R
    • Decision Tree in R
    • GLM in R
    • Arima Model in R
    • Linear Model in R
    • Predict Function in R
    • Survival Analysis in R
    • Standard Deviation in R
    • Statistical Analysis in R
    • Predictive Analysis?in R
    • T-test in R
    • Database in R
  • Programs
    • Functions in R
    • Boxplot in R
    • R Program Functions
    • Factorial in R
    • Random Number Generator in R
  • Interview question
    • R Interview Questions

Related Courses

R Programming Certification Course

Statistical Analysis Course Training

All in One Data Science Courses

Multiple Linear Regression in R

By Priya PedamkarPriya Pedamkar

MultipleLinear Regression in R

Introduction to Multiple Linear Regression in R

Multiple Linear Regression is one of the data mining techniques to discover the hidden pattern and relations between the variables in large datasets. Multiple Linear Regression is one of the regression methods and falls under predictive mining techniques. It is used to discover the relationship and assumes the linearity between target and predictors. However, the relationship between them is not always linear. Hence, it is important to determine a statistical method that fits the data and can be used to discover unbiased results. Multiple linear regression is an extended version of linear regression and allows the user to determine the relationship between two or more variables, unlike linear regression where it can be used to determine between only two variables. In this topic, we are going to learn about Multiple Linear Regression in R.

Syntax

Lm() function is a basic function used in the syntax of multiple regression. This function is used to establish the relationship between predictor and response variables.

lm( y ~ x1+x2+x3…, data)

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,294 ratings)

The formula represents the relationship between response and predictor variables and data represents the vector on which the formulae are being applied.

For models with two or more predictors and the single response variable, we reserve the term multiple regression. There are also models of regression, with two or more variables of response. Such models are commonly referred to as multivariate regression models. Now let’s look at the real-time examples where multiple regression model fits.

For example, a house’s selling price will depend on the location’s desirability, the number of bedrooms, the number of bathrooms, year of construction, and a number of other factors. A child’s height can rely on the mother’s height, father’s height, diet, and environmental factors.

Now let’s see the general mathematical equation for multiple linear regression

Y= a + b1x1 + b2x2 +…bnxn

  • Where Y represents the response variable
  • a, b1, b2, and bn are coefficients
  • and x1, x2, and xn are predictor variables.

Examples of Multiple Linear Regression in R

The lm() method can be used when constructing a prototype with more than two predictors. Essentially, one can just keep adding another variable to the formula statement until they’re all accounted for. In this section, we will be using a freeny database available within R studio to understand the relationship between a predictor model with more than two variables. This model seeks to predict the market potential with the help of the rate index and income level.

Example #1 – Collecting and capturing the data in R

For this example, we have used inbuilt data in R. In real-world scenarios one might need to import the data from the CSV file. Which can be easily done using read.csv.

Syntax: read.csv(“path where CSV file real-world\\File name.csv”)

Example #2 – Check for Linearity

Before the linear regression model can be applied, one must verify multiple factors and make sure assumptions are met. Most of all one must make sure linearity exists between the variables in the dataset. One of the fastest ways to check the linearity is by using scatter plots.

In our dataset market potential is the dependent variable whereas rate, income, and revenue are the independent variables. Now let’s see the code to establish the relationship between these variables.

# extracting data from freeny database
data("freeny")
# plotting the data to determine the linearity
plot(freeny, col="navy", main="Matrix Scatterplot")

Output:

Matrix Scatterplot

From the above scatter plot we can determine the variables in the database freeny are in linearity.

# Constructing a model that predicts the market potential using the help of revenue price.index
and income.level
> model <- lm(market.potential ~ price.index + income.level, data = freeny)
> model

Output:

Multiple Linear Regression in R output 2

The sample code above shows how to build a linear model with two predictors. In this example Price.index and income.level are two

predictors used to predict the market potential. From the above output, we have determined that the intercept is 13.2720, the

coefficients for rate Index is -0.3093, and the coefficient for income level is 0.1963. Hence the complete regression Equation is market

potential = 13.270 + (-0.3093)* price.index + 0.1963*income level.

model <- lm(market.potential ~ price.index + income.level, data = freeny)
model
using summary(OBJECT) to display information about the linear model
summary(model)

Output:

Multiple Linear Regression in R output 3

Summary evaluation

1. Adjusted R squared

This value reflects how fit the model is. Higher the value better the fit. Adjusted R-squared value of our data set is 0.9899

2. P-value

Most of the analysis using R relies on using statistics called the p-value to determine whether we should reject the null hypothesis or

fail to reject it. With the assumption that the null hypothesis is valid, the p-value is characterized as the probability of obtaining a

result that is equal to or more extreme than what the data actually observed. P-value 0.9899 derived from out data is considered to be

statistically significant.

3. Std.Error

The standard error refers to the estimate of the standard deviation. The coefficient of standard error calculates just how accurately the

model determines the uncertain value of the coefficient. The coefficient Standard Error is always positive. One can use the coefficient

standard error to calculate the accuracy of the coefficient calculation.

The analyst should not approach the job while analyzing the data as a lawyer would.  In other words, the researcher should not be

searching for significant effects and experiments but rather be like an independent investigator using lines of evidence to figure out

what is most likely to be true given the available data, graphical analysis, and statistical analysis.

Conclusion

In this article, we have seen how the multiple linear regression model can be used to predict the value of the dependent variable with the help of two or more independent variables. The initial linearity test has been considered in the example to satisfy the linearity. As the variables have linearity between them we have progressed further with multiple linear regression models. We were able to predict the market potential with the help of predictors variables which are rate and income.

Recommended Articles

This is a guide to Multiple Linear Regression in R. Here we discuss how to predict the value of the dependent variable by using multiple linear regression model. You may also look at the following articles to learn more –

  1. Linear Model in R
  2. R Program Functions
  3. What is Regression?
  4. Statistical Analysis Regression
  5. What is Linear Regression?
Popular Course in this category
R Programming Training (13 Courses, 20+ Projects)
  13 Online Courses |  20 Hands-on Projects |  120+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

Statistical Analysis Training (15 Courses, 10+ Projects)4.9
All in One Data Science Bundle (360+ Courses, 50+ projects)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more