EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Linear Regression in R

By Priya PedamkarPriya Pedamkar

Home » Data Science » Data Science Tutorials » R Programming Tutorial » Linear Regression in R

Linear Regression in R

What is Linear Regression in R?

Linear Regression in R is an unsupervised machine learning algorithm. R language has a built-in function called lm() to evaluate and generate the linear regression model for analytics. The regression model in R signifies the relation between one variable known as the outcome of a continuous variable Y by using one or more predictor variables as X. It generates an equation of a straight line for the two-dimensional axis view for the data points. Based on the quality of the data set the model in R generates better regression coefficients for the model accuracy. The model using R can be a good fit machine learning model for predicting the sales revenue of an organization for the next quarter for a particular product range.

Linear Regression in R can be categorized into two ways

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

1. Simple Linear Regression

This is the regression where the output variable is a function of a single input variable. Representation of simple linear regression:

y = c0 + c1*x1

2. Multiple Linear Regression

This is the regression where the output variable is a function of a multiple-input variable.

y = c0 + c1*x1 + c2*x2

In both the above cases c0, c1, c2 are the coefficient’s which represents regression weights.

Linear Regression in R

R is a very powerful statistical tool. So let’s see how it can be performed in R and how its output values can be interpreted. Let’s prepare a dataset, to perform and understand regression in-depth now.

Popular Course in this category
Statistical Analysis Training (10 Courses, 5+ Projects)10 Online Courses | 5 Hands-on Projects | 126+ Hours | Verifiable Certificate of Completion | Lifetime Access
4.5 (6,051 ratings)
Course Price

View Course

Related Courses
R Programming Training (12 Courses, 20+ Projects)All in One Data Science Bundle (360+ Courses, 50+ projects)

Linear Regression in R - Example 1

Now we have a dataset, where “satisfaction_score” and “year_of_Exp” are the independent variable. “salary_in_lakhs” is the output variable.

Referring to the above dataset, the problem we want to address here through linear regression is:

Estimation of the salary of an employee, based on his year of experience and satisfaction score in his company.

R code:

model <- lm(salary_in_Lakhs ~ satisfaction_score + year_of_Exp, data = employee.data)
summary(model)

The output of the above code will be:

Linear Regression in R - Output 1

The formula of Regression becomes

Y = 12.29-1.19*satisfaction_score+2.08×2*year_of_Exp

In case, one has multiple inputs to the model.

Then R code can be:

model <- lm(salary_in_Lakhs ~ ., data = employee.data)

However, if someone wants to select a variable out of multiple input variables, there are multiple techniques like “Backward Elimination”, “Forward Selection” etc. are available to do that as well.

Interpretation of Linear Regression in R

Below are some interpretations in r which are as follows:

1. Residuals

This refers to the difference between the actual response and the predicted response of the model. So for every point, there will one actual response and one predicted response. Hence residuals will be as many as observations are. In our case we have four observations, hence four residuals.

linear Regression in R.5 png

2. Coefficients

Going further, we will find the coefficients section, which depicts the intercept and slope. If one wants to predict the salary of an employee based on his experience and satisfaction score, one needs to develop a model formula based on slope and intercept. This formula will help you in predicting salary. The intercept and slope help an analyst to come up with the best model that suits datapoints aptly.

Slope: Depicts steepness of the line.
Intercept: The location where the line cuts the axis.

Let’s understand how formula formation is done based on slope and intercept.

Say intercept is 3 and the slope is 5.

So, the formula is y = 3+5x. This means if x increased by a unit, y gets increased by 5.

a. Coefficient – Estimate: In this, the intercept denotes the average value of the output variable, when all input becomes zero. So, in our case, salary in lakhs will be 12.29Lakhs as average considering satisfaction score and experience comes zero. Here slope represents the change in the output variable with a unit change in the input variable.

b. Coefficient – Standard Error: The standard error is the estimation of error, we can get when calculating the difference between the actual and predicted value of our response variable. In turn, this tells about the confidence for relating input and output variables.

c. Coefficient – t value: This value gives the confidence to reject the null hypothesis. The greater the value away from zero, the bigger the confidence to reject the null hypothesis and establishing the relationship between output and input variable. In our case value is away from zero as well.

d. Coefficient – Pr(>t): This acronym basically depicts the p-value. The closer it is to zero, the easier we can reject the null hypothesis. The line we see in our case, this value is near to zero, we can say there exists a relationship between salary package, satisfaction score and year of experiences.

Coefficient

Multiple R-squared, Adjusted R-squared

R-squared is a very important statistical measure in understanding how close the data has fitted into the model. Hence in our case how well our model that is linear regression represents the dataset.

R-squared value always lies between 0 and 1. Formula is:

R-squared

The closer the value to 1, the better the model describes the datasets and its variance.

However, when more than one input variable comes into the picture, the adjusted R squared value is preferred.

F-Statistic

It’s a strong measure to determine the relationship between input and response variable. The larger the value than 1, the higher is the confidence in the relationship between the input and output variable.

In our case its “937.5”, which is relatively larger considering the size of the data. Hence the rejection of the null hypothesis gets easier.

If someone wants to see the confidence interval for the model’s coefficients, here is the way to do it:-

F-Statistic

Visualization of Regression

R Code:

plot(salary_in_Lakhs ~ satisfaction_score + year_of_Exp, data = employee.data)
abline(model)

Visualization Regression in R

Its always better to gather more and more points, before fitting to a model.

Conclusion

Linear regression is simple, easy to fit, easy to understand yet a very powerful model. We saw how linear regression can be performed on R. We also tried interpreting the results, which can help you in the optimization of the model. Once one gets comfortable with simple linear regression, one should try multiple linear regression. Along with this, as linear regression is sensitive to outliers, one must look into it, before jumping into the fitting to linear regression directly.

Recommended Articles

This is a guide to Linear Regression in R. Here we have discussed what is Linear Regression in R? categorization, Visualization, and interpretation of R. You can also go through our other suggested articles to learn more –

  1. Predictive Modeling
  2. Logistic Regression in R
  3. Decision Tree in R
  4. R Interview Questions

Statistical Analysis Training (10 Courses, 5+ Projects)

10 Online Courses

5 Hands-on Projects

126+ Hours

Verifiable Certificate of Completion

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
R programming Tutorial
  • Regression in R
    • Simple Linear Regression in R
    • Linear Regression in R
    • Multiple Linear Regression in R
    • Logistic Regression in R
    • Poisson Regression in R
    • OLS Regression in R
    • P-Value in Regression
  • Basic
    • What is R Programming Language
    • Careers in R Programming
    • Install R
    • List of R Packages
    • Introduction of R Tools Technology
    • R Programming Language
    • What is RStudio?
    • R-studio-Functions
    • R Packages
    • R Data Types
    • R Operators
    • Vectors in R
  • Control statement
    • If Statement in R
    • If Else Statement in R
    • Else if in R
    • Switch Statement in R
  • Loops
    • Loops in R
    • For Loop in R
    • Nested For Loop in R
    • While Loop in R
    • Next in R
  • Chart/graphs
    • Graphs in R
    • Bar Charts in R
    • Pie Chart in R
    • Histogram in R
    • Line Graph in R
    • Plot Function in R
    • Scatterplot in R
    • R Boxplot labels
  • Anova in R
    • ANOVA in R
    • One Way ANOVA in R
    • Two Way ANOVA in R
  • Data Structure
    • R list
    • Arrays in R
    • Data Frames in R
    • Factors in R
  • Advanced
    • Statistical Analysis with R
    • R String Functions
    • Data Exploration in R
    • R CSV Files
    • KNN Algorithm in R
    • Sorting in R
    • lm Function in R
    • Hierarchical Clustering in R
    • R Normal Distribution
    • Binomial Distribution in R
    • Decision Tree in R
    • GLM in R
    • Linear Model in R
    • Predict Function in R
    • Survival Analysis in R
    • Standard Deviation in R
    • Statistical Analysis in R
    • T-test in R
    • Database in R
  • Programs
    • R Program Functions
    • Factorial in R
    • Random Number Generator in R
  • Interview question
    • R Interview Questions

Related Courses

R Programming Certification Course

Statistical Analysis Course Training

All in One Data Science Courses

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA Login

Forgot Password?

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

Special Offer - Statistical Analysis Training (10 Courses, 5+ Projects) Learn More