EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials R Programming Tutorial Linear Regression in R
Secondary Sidebar
R programming Tutorial
  • Regression in R
    • Simple Linear Regression in R
    • Linear Regression in R
    • Multiple Linear Regression in R
    • Logistic Regression in R
    • Poisson Regression in R
    • OLS Regression in R
    • P-Value in Regression
  • Basic
    • What is R Programming Language
    • Careers in R Programming
    • Install R
    • List of R Packages
    • Introduction of R Tools Technology
    • R Programming Language
    • DataSet in R
    • What is RStudio?
    • R-studio-Functions
    • R Packages
    • Time series?in R
    • R Data Types
    • R for data science
    • R Operators
    • R Data Frame
    • R Analytics Tool
    • R Tree Package
    • Vectors in R
  • Control statement
    • If Statement in R
    • If Else Statement in R
    • Else if in R
    • Switch Statement in R
  • Loops
    • Loops in R
    • For Loop in R
    • Nested For Loop in R
    • While Loop in R
    • Next in R
  • Chart/graphs
    • Graphs in R
    • Bar Charts in R
    • Pie Chart in R
    • Histogram in R
    • Line Graph in R
    • Plot Function in R
    • Scatterplot in R
    • R Boxplot labels
  • Anova in R
    • ANOVA in R
    • One Way ANOVA in R
    • Two Way ANOVA in R
  • Data Structure
    • R list
    • Arrays in R
    • Data Frames in R
    • Factors in R
    • R Vectors
  • Advanced
    • Statistical Analysis with R
    • R String Functions
    • Data Exploration in R
    • R CSV Files
    • KNN Algorithm in R
    • Sorting in R
    • lm Function in R
    • Hierarchical Clustering in R
    • R Normal Distribution
    • Binomial Distribution in R
    • Decision Tree in R
    • GLM in R
    • Arima Model in R
    • Linear Model in R
    • Predict Function in R
    • Survival Analysis in R
    • Standard Deviation in R
    • Statistical Analysis in R
    • Predictive Analysis?in R
    • T-test in R
    • Database in R
  • Programs
    • Functions in R
    • Boxplot in R
    • R Program Functions
    • Factorial in R
    • Random Number Generator in R
  • Interview question
    • R Interview Questions

Related Courses

R Programming Certification Course

Statistical Analysis Course Training

All in One Data Science Courses

Linear Regression in R

By Priya PedamkarPriya Pedamkar

Linear Regression in R

What is Linear Regression in R?

Linear Regression in R is an unsupervised machine learning algorithm. R language has a built-in function called lm() to evaluate and generate the linear regression model for analytics. The regression model in R signifies the relation between one variable known as the outcome of a continuous variable Y by using one or more predictor variables as X. It generates an equation of a straight line for the two-dimensional axis view for the data points. Based on the quality of the data set, the model in R generates better regression coefficients for the model accuracy. The model using R can be a good fit machine learning model for predicting the sales revenue of an organization for the next quarter for a particular product range.

Linear Regression in R can be categorized into two ways.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,112 ratings)

1. Simple Linear Regression

This is the regression where the output variable is a function of a single input variable. Representation of simple linear regression:

y = c0 + c1*x1

2. Multiple Linear Regression

This is the regression where the output variable is a function of a multiple-input variable.

y = c0 + c1*x1 + c2*x2

In both the above cases c0, c1, c2 are the coefficient’s which represents regression weights.

Linear Regression in R

R is a very powerful statistical tool. So let’s see how it can be performed in R and how its output values can be interpreted. Let’s prepare a dataset, to perform and understand regression in-depth now.

Linear Regression in R - Example 1

Now we have a dataset where “satisfaction_score” and “year_of_Exp” are the independent variable. “salary_in_lakhs” is the output variable.

Referring to the above dataset, the problem we want to address here through linear regression is:

Estimation of the salary of an employee, based on his year of experience and satisfaction score in his company.

R code:

model <- lm(salary_in_Lakhs ~ satisfaction_score + year_of_Exp, data = employee.data)
summary(model)

The output of the above code will be:

Linear Regression in R - Output 1

The formula of Regression becomes

Y = 12.29-1.19*satisfaction_score+2.08×2*year_of_Exp

In case one has multiple inputs to the model.

Then R code can be:

model <- lm(salary_in_Lakhs ~ ., data = employee.data)

However, if someone wants to select a variable out of multiple input variables, there are multiple techniques like “Backward Elimination”, “Forward Selection”, etc. that are available to do that as well.

Interpretation of Linear Regression in R

Below are some interpretations in r, which are as follows:

1. Residuals

This refers to the difference between the actual response and the predicted response of the model. So for every point, there will be one actual response and one predicted response. Hence residuals will be as many as observations are. In our case, we have four observations, hence four residuals.

linear Regression in R.5 png

2. Coefficients

Going further, we will find the coefficients section, which depicts the intercept and slope. If one wants to predict an employee’s salary based on his experience and satisfaction score, one needs to develop a model formula based on slope and intercept. This formula will help you in predicting salary. The intercept and slope help an analyst to come up with the best model that suits datapoints aptly.

Slope: Depicts the steepness of the line.
Intercept: The location where the line cuts the axis.

Let’s understand how formula formation is done based on slope and intercept.

Say intercept is 3, and the slope is 5.

So, the formula is y = 3+5x. This means if x is increased by a unit, y gets increased by 5.

a. Coefficient – Estimate: In this, the intercept denotes the average value of the output variable when all input becomes zero. So, in our case, salary in lakhs will be 12.29Lakhs as average considering satisfaction score and experience comes zero. Here slope represents the change in the output variable with a unit change in the input variable.

b. Coefficient – Standard Error: The standard error is the estimation of error we can get when calculating the difference between our response variable’s actual and predicted value. In turn, this tells about the confidence for relating input and output variables.

c. Coefficient – t value: This value gives the confidence to reject the null hypothesis. The greater the value away from zero, the bigger the confidence to reject the null hypothesis and establishing the relationship between output and input variable. In our case value is away from zero as well.

d. Coefficient – Pr(>t): This acronym basically depicts the p-value. The closer it is to zero, the easier we can to reject the null hypothesis. The line we see in our case, this value is near to zero; we can say there exists a relationship between salary package, satisfaction score and year of experience.

Coefficient

Multiple R-squared, Adjusted R-squared

R-squared is a very important statistical measure in understanding how close the data has fitted into the model. Hence in our case, how well our model that is linear regression represents the dataset.

R-squared value always lies between 0 and 1. Formula is:

R-squared

The closer the value to 1, the better the model describes the datasets and their variance.

However, when more than one input variable comes into the picture, the adjusted R squared value is preferred.

F-Statistic

It’s a strong measure to determine the relationship between input and response variables. The larger the value than 1, the higher is the confidence in the relationship between the input and output variable.

In our case, it’s “937.5”, which is relatively larger considering the size of the data. Hence the rejection of the null hypothesis gets easier.

If someone wants to see the confidence interval for the model’s coefficients, here is the way to do it:-

F-Statistic

Visualization of Regression

R Code:

plot(salary_in_Lakhs ~ satisfaction_score + year_of_Exp, data = employee.data)
abline(model)

Visualization Regression in R

It’s always better to gather more and more points before fitting to a model.

Conclusion

Linear regression is simple, easy to fit, easy to understand, yet a very powerful model. We saw how linear regression could be performed on R. We also tried interpreting the results, which can help you in the optimization of the model. Once one gets comfortable with simple linear regression, one should try multiple linear regression. Along with this, as linear regression is sensitive to outliers, one must look into it before jumping into the fitting to linear regression directly.

Recommended Articles

This is a guide to Linear Regression in R. Here, we have discussed what is Linear Regression in R? categorization, Visualization, and interpretation of R. You can also go through our other suggested articles to learn more –

  1. Predictive Modeling
  2. Logistic Regression in R
  3. Decision Tree in R
  4. R Interview Questions
Popular Course in this category
Statistical Analysis Training (15 Courses, 10+ Projects)
  15 Online Courses |  10 Hands-on Projects |  140+ Hours |  Verifiable Certificate of Completion
4.5
Price

View Course

Related Courses

R Programming Training (13 Courses, 20+ Projects)4.9
All in One Data Science Bundle (360+ Courses, 50+ projects)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more