Excel Regression Analysis (Table of Contents)
- Regression Analysis in Excel
- Explanation of Regression Mathematically
- How to Perform Linear Regression in Excel?
Regression Analysis in Excel
Linear regression is a statistical technique that examines the linear relationship between a dependent variable and one or more independent variables.
- Dependent Variable (aka response/outcome variable): Is the variable of your interest and which you wanted to predict based on the information available of Independent variable(s).
- Independent Variable (aka explanatory/predictor variable): Is/are the variable(s) on which response variable is depend. Which means these are the variables using which response variable can be predicted.
Linear relationship means the change in an independent variable(s) causes a change in the dependent variable.
There are basically two types of linear relationships as well.
- Positive Linear Relationship: When independent variable increases, the dependent variable increases too.
- Negative Linear Relationship: When independent variable increases, the dependent variable decreases.
These were some of the pre-requisites before you actually proceed towards regression analysis in excel.
There are two basic ways to perform linear regression in excel using:
- Regression tool through Analysis ToolPak
- Scatter chart with trendline
There is actually one more method which is using manual formula’s to calculate linear regression. But why should you go for it when excel does calculations for you?
Therefore, we are going to talk about the two methods discussed above only.
Suppose, you have a data of height and weight of 10 individuals. If you plot this information through a chart, let’s see what it gives.
As the above screenshot shows, the linear relationship can be found in Height and Weight through the graph. Don’t get much involved in graph now, we are anyhow going to dig it deep in the second portion of this article.
Explanation of Regression Mathematically
We have a mathematical expression for linear regression as below:
Y = aX + b + ε
- Y is a dependent variable or response variable.
- X is an independent variable or predictor.
- a is the slope of the regression line. Which represents that when X changes, there is a change in Y by “a” units.
- b is intercepting. It is the value Y takes when the value of X is zero.
- ε is the random error term. Occurs because the predicted value of Y will never be exactly the same to the actual value for given X. This error term, we don’t need to worry about. As there are some software’s which do the calculation of this error term in the backend for you. Excel is one of that software.
In that case, the equation becomes,
Y = aX + b
Which can be represented as:
Weight = a*Height + b
We’ll try to find out the values of these a and b using methods we have discussed above.
How to Perform Linear Regression in Excel?
The further article explains the basics of regression analysis in excel and shows a few different ways to do linear regression in Excel.
#1 – Regression Tool Using Analysis ToolPak in Excel
For our example, we’ll try to fit regression for Weight values (which is dependent variable) with the help of Height values (which is an independent variable).
- In the excel spreadsheet, click on Data Analysis (present under Analysis Group) under Data.
- Search out for Regression. Select it and press ok.
- Use the following inputs under Regression pane which opens up.
- Input Y Range: Select the cells which contain your dependent variable (in this example B1:B11)
- Input X Range: Select the cells which contain your independent variable (in this example A1:A11).
- Check the box named Labels if your data have column names (in this example we have column names).
- Confidence Level is set to 95% by default, which can be changed as per users requirements.
- Under Output options, you can customize where you want to see the output of regression analysis in Excel. In this case, we want to see the output on the same sheet. Therefore, given range accordingly.
- Under Residuals option, you have optional inputs like Residuals, Residual Plots, Standardized Residuals, Line Fit Plots which you can select as per your need. In this case, check Residuals checkbox so that we can see the dispersion between predicted and actual values.
- Under Normal Probability option, you can select Normal Probability Plots which can help you check the normality of predictors. Click on OK.
- Excel will compute Regression analysis for you in a fraction of seconds.
Till here, it was easy and not that logical. However, interpreting this output and make valuable insights from it is a tricky task.
One important part of this entire output is R Square/ Adjusted R Square under SUMMARY OUTPUT table. Which provides information, how good our model is fit. In this case, R Square value is 0.9547. Which interprets that the model has a 95.47% accuracy (good fit). Or in another language, information about Y variable is explained 95.47% by X variable.
The other important part of the entire output is a table of coefficients. It gives values of coefficients which can be used to build the model for future predictions.
Now our, regression equation for prediction becomes:
Weight = 0.6746*Height – 38.45508 (Slope value for Height is 0.6746… and Intercept is -38.45508…)
Did you get what you have defined? You have defined a function in which you now just have to put the value of Height and you’ll get the Weight value.
#2 – Regression Analysis Using Scatterplot with Trendline in Excel
Now, we’ll see how in excel we can fit a regression equation on a scatterplot itself.
- Select your entire two columned data (including headers).
- Click on Insert and select Scatter Plot under graphs section as shown in the image below.
- See the output graph.
- Now, we need to have a least squared regression line on this graph. To add this line, right click on any of the data points on the graph and select Add Trendline option.
- It will enable you to have a trendline of least square of regression like below.
- Under the Format Trendline option, check the box for Display Equation on Chart.
- It enables you to see the equation of least squared regression line on the graph.
This is the equation using which we can predict the weight values for any given set of Height values.
Things to Remember About Regression Analysis in Excel
- You can change the layout of trendline under Format Trendline option in scatter plot.
- It is always recommended to have a look at residual plots while you are doing regression analysis using Data Analysis ToolPak in Excel. It gives you a better understanding of the spread of the actual Y values and estimated X values.
- Simple Linear Regression in excel does not need ANOVA and Adjusted R Square to check. These features can be taken into consideration for Multiple Linear Regression. Which is beyond the scope of this article.
This has been a guide to Regression Analysis in Excel. Here we discuss how to do Regression Analysis in Excel along with excel examples and downloadable excel template. You can also go through our other suggested articles –