## Introduction to Regression

Now let us first understand what is regression and why do we use regression? this is a type of predictive modeling technique in which we find the relationship between independent variables and a dependent variable. It is mainly used for time series modeling, forecasting and finding causal relationships between the variables.

Why do we use regression? Let’s consider an example, to estimate the price of houses based on the data collected in the past years, we can use this model and define a curve. Based on this curve, we can make predictions of the houses. Regression analysis also helps us to compare the effects of variables measured in different scales. This analysis also helps to identify the impact of an independent variable or the strength of it on a dependent variable.

### What is Regression?

Regression is a method to determine the statistical relationship between a dependent variable and one or more independent variables. The change independent variable is associated with the change in the independent variables. This can be broadly classified into two major types.

- Linear Regression
- Logistic Regression

### Types of Regression

Regression has seven types but, the mainly used are Linear and Logistic Regression. These are the basic and simplest modeling algorithms. We will discuss both of these in detail here.

#### 1. Linear Regression

- The simplest case of linear regression is to find a relationship using a linear model (i.e line) between an input independent variable (input single feature) and an output dependent variable. This is called Bivariate Linear Regression.
- On the other hand, when there is a linear model representing the relationship between a dependent output and multiple independent input variables is called Multivariate Linear Regression.
- The dependent variable is continuous and independent variables may or may not be continuous. We find the relationship between them with the help of the best fit line which is also known as the Regression line. The equation of a line is,

`y = m * x + b`

Where,

**x:**Independent Variable**y:**Dependent Variable**m:**Slope of Line**b:**y Intercept

To evaluate the best fit line, the most common method is the Least Square Method. In this method, the regression line is calculated by minimizing the least squared error between the regression line and the data points. Another method to find this line is also called the R Squared analysis.

It is particularly useful when the relationship between the input variables and the output is not very complex. Also, note that it is very sensitive to outliers.

**Syntax in Python:**

The Python library named sklearn contains an inbuilt function, we will use LinerRegression from sklearn.

360+ Online Courses | 1500+ Hours| Verifiable Certificates| Lifetime Access

4.7

View Course

Let us first install the sklearn package.

`pip install scikit-learn`

from sklearn.linear_model import LinearRegression

linearReg = LinearRegression()

To train the model, we will use the fit() function.

`linearReg.fit(x_train, y_train)`

#### 2. Logistic Regression

- It is used when the output is categorical. It is more like a classification problem. The output can be Success / Failure, Yes / No, True/ False or 0/1. There is no need for a linear relationship between the dependent output variable and independent input variables.
- If the output has only two possibilities, then it is called Binary Logistic Regression. If the dependent output has more than two output possibilities and there is no ordering in them, then it is called Multinomial Logistic Regression. If there is order associated with the output and there are more than two output possibilities then it is called Ordinal Logistic Regression.
- Let us take an example, you want to create a model that identifies if the breast cancer is malignant(1) or benign(0). Taking another example if you want to classify if the input email is spam(1) or not spam (0).

It can be better explained by Sigmoid function.

`hΘ (x) = sigmoid (Z)`

**Sigmoid Function:**

`sig(t) = 1 / 1+e`

^{−t}

The sigmoid function is the S-shaped curve. If the value goes near positive infinity then the predicted value will be 1. Similarly, if it goes negative infinity then the predicted value will be 0.

**Syntax in Python:**

For the implementation of logistic regression in Python, there is an inbuilt function available in scikit- learn library of Python. For that first install scikit-learn using pip install.

`from sklearn.linear_model import LogisticRegression logisticRegr = LogisticRegression()`

To train the model, we will use the fit() function.

`logisticRegr.fit(x_train, y_train)`

### Conclusion

It is necessary to choose the right model of Regression based on the dependent and independent variables of your data and dimensionality of the data. Before selecting any model, it is necessary to explore data. To compare the goodness of model, different evaluation metrics can be used like R Squared, Root Mean Square Error, Confusion Matrix, F1 score, etc.

### Recommended Articles

This is a guide to What is Regression? Here we discuss what is regression? along with the two types of it in detail. You can also go through our other related articles to learn more –