Linear Regression in Excel (Table of Contents)
Introduction to Linear Regression in Excel
Linear regression is a statistical technique/method used to study the relationship between two continuous quantitative variables. In this technique, independent variables are used to predict the value of a dependent variable. If there is only one independent variable, then it is a simple linear regression, and if a number of independent variables are more than one, then it is multiple linear regression. Linear Regression models have a relationship between dependent and independent variables by fitting a linear equation to the observed data. Linear refers to the fact that we use a line to fit our data. The dependent variables used in regression analysis are also called the response or predicted variables, and independent variables are also called explanatory variables or predictors.
A linear regression line has an equation of the kind: Y= a + bX;
- X is the explanatory variable,
- Y is the dependent variable,
- b is the slope of the line,
- a is the y-intercept (i.e. the value of y when x=0).
The least-squares method is generally used in linear regression that calculates the best fit line for observed data by minimizing the sum of squares of deviation of data points from the line.
Methods for Using Linear Regression in Excel
This example teaches you the methods to perform Linear Regression Analysis in Excel. Let’s look at a few methods.
Method #1 – Scatter Chart with a Trendline
Let us say we have a dataset of some individuals with their age, bio-mass index (BMI), and the amount spent by them on medical expenses in a month. Now with an insight into the individuals’ characteristics like age and BMI, we wish to find how these variables affect the medical expenses, and hence use these to carry out regression and estimate/predict the average medical expenses for some specific individuals. Let us first see how only age affects medical expenses. Let us see the dataset:
Amount on medical expenses= b*age + a
- Select the two columns of the dataset (x and y), including headers.
- Click on ‘Insert’ and expand the dropdown for ‘Scatter Chart’ and select ‘Scatter’ thumbnail (first one)
- Now a scatter plot will appear, and we would draw the regression line on this. To do this, right-click on any data point and select ‘Add Trendline.’
- Now in the ‘Format Trendline’ pane on the right, select ‘Linear Trendline’ and ‘Display Equation on Chart’.
- Select ‘Display Equation on Chart’.
We can improvise the chart as per our requirements, like adding axes titles, changing the scale, color and line type.
After Improvising the chart, this is the output we get.
Method #2 – Analysis ToolPak Add-In Method
Analysis ToolPak is sometimes not enabled by default, and we need to do it manually. To do so:
- Click on the ‘File’ menu.
After that, click on ‘Options’.
- Select ‘Excel Add-Ins’ in the ‘Manage’ box, and click on ‘Go.’
- Select ‘Analysis ToolPak’ -> ‘OK’
This will add ‘Data Analysis’ tools to the ‘Data’ tab. Now we run the regression analysis:
- Click on ‘Data Analysis’ in the ‘Data’ tab
- Select ‘Regression’ -> ‘OK’
- A regression dialog box will appear. Select the Input Y range and Input X range (medical expenses and age, respectively). In the case of multiple linear regression, we can select more columns of independent variables (like if we wish to see the impact of BMI as well on medical expenses).
- Check the ‘Labels’ box to include headers.
- Choose the desired ‘output’ option.
- Select the ‘residuals’ checkbox and click ‘OK.
Now our regression analysis output will be created in a new worksheet, stating the Regression Statistics, ANOVA, residuals and coefficients.
- Regression Statistics tells how well the regression equation fits the data:
- Multiple R is the correlation coefficient that measures the strength of a linear relationship between two variables. It lies between -1 and 1, and its absolute value depicts the relationship strength with a large value indicating a stronger relationship, a low value indicating negative and zero value indicating no relationship.
- R Square is the Coefficient of Determination used as an indicator of goodness of fit. It lies between 0 and 1, with a value close to 1 indicating that the model is a good fit. In this case, 0.57=57% of y-values are explained by the x-values.
- Adjusted R Square is R Square adjusted for a number of predictors in the case of multiple linear regression.
- Standard Error depicts the precision of regression analysis.
- Observations depict the number of model observations.
- Anova tells the level of variability within the regression model.
This is generally not used for simple linear regression. However, the ‘Significance F values’ indicate how reliable our results are, with a value greater than 0.05 suggesting to choose another predictor.
- Coefficients are the most important part used to build regression equation.
So, our regression equation would be: y= 16.891 x – 355.32. This is the same as that done by method 1 (scatter chart with a trendline).
Now, if we wish to predict average medical expenses when age is 72:
So y= 16.891 * 72 -355.32 = 860.832
So this way, we can predict values of y for any other values of x.
- Residuals indicate the difference between actual and predicted values.
The last method for regression is not so commonly used and requires statistical functions like slope (), intercept (), correl (), etc., to carry out regression analysis.
Things to Remember About Linear Regression in Excel
- Regression analysis is generally used to see if there is a statistically significant relationship between two sets of variables.
- It is used to predict the value of the dependent variable based on the values of one or more independent variables.
- Whenever we wish to fit a linear regression model to a group of data, then the range of data should be carefully observed. If we use a regression equation to predict any value outside this range (extrapolation), it may lead to wrong results.
This is a guide to Linear Regression in Excel. Here we discuss how to do Linear Regression in Excel along with practical examples and a downloadable excel template. You can also go through our other suggested articles –
- Excel Regression Analysis
- Linear Programming in Excel
- Linear Interpolation in Excel
- Statistics in Excel