## Introduction to Logistic Regression in Python

As today’s world is full of data exchange everywhere such as messaging, calling, browsing the Internet, etc all involve data exchange. So Data Science has 70% of its problems for classifying such data. There are many classification techniques or algorithms in machine learning applications. Logistic regression is one of the regression methods to solve such classification problems such as spam detection, predictions in diabetes or any other medical concepts, etc.

Logistic regression is the most widely used algorithm of machine learning for solving classification problems. In Python Logistic regression is one of the predictive analysis techniques. This regression estimates the relationship between one dependent variable and the independent variable and the output variable is dichotomous ie it has only two possible classes.

### Logistic Regression Working in Python

Logistic regression uses log function to predict the probability of occurrences of events. Logistic regression has the output variable also referred to as the dependent variable which is categorical and it is a special case of linear regression. So the linear regression equation can be given as

**y = β**

_{0 + }β_{1}X_{1}+β_{2}X_{2}+……+β_{n}X_{n}**Where**

**y:**is the dependent variable or target variable**X**independent variables._{1}, X_{2}……X_{n:}**β**the y – intercept._{0:}**β**the slope._{1:}

The sigmoid function is given by

- p = 1 / 1+ e
^{-y}

Apply this sigmoid function to the linear regression equation

- p = 1 / 1+ e
^{–}^{β0 + β1X1+β2X2+……+βnXn}

The sigmoid function also is known as the logistic function which gives an “S” shaped curve that can map the real values between 0 and 1. In this curve y predicted is 1 if it is positive infinity and 0 if it is negative infinity. If the sigmoid function value is above 0.5 its outcome will be 1 (Yes) and if its value is blown 0.5 then its outcome is 0 (No).

In this logistic regression, the dependent variable follows Bernoulli distribution and estimation is done through maximum likelihood. The maximum livelihood is maximization method which determines the parameters that are likely to produce the observed data that can be used to predict the data needed in the normal distribution.

### Logistic Regression Techniques

Below are the techniques:

#### 1. Binary Logistic Regression

Binary Logistic Regression has only two possible outputs as the name itself suggests. The target values can only be either yes or no, 1 or 0, true or false, etc. Let us consider an example say a student’s result according to the number of hours he/she studies. The result can be in only two values either Pass or Fail. This can be implemented in Python as follows:

4.8 (6,961 ratings)

View Course

**Example:**

**1.** Firstly gather data from which we want to determine how many students have passed. So for this, we have an independent variable as the number of hours of study and the dependent variable is in two possible values pass or fail.

The below table is the data set to be considered.

Number of hours of Study (hours) |
Result of Student (Pass) |

0.5 | 0 |

0.75 | 0 |

1.00 | 0 |

1.25 | 0 |

1.5 | 1 |

1.75 | 1 |

2.00 | 1 |

**2.** Secondly import the libraries in Python program for data analysis such as pandas to create DataFrame, sklearn to build a logistic regression model and seaborn to display the result in the form of the matrix known as a confusion matrix.

- import pandas as pd
- from sklearn import metrics
- import seaborn as sb

**3.** Thirdly build DataFrame this can be obtained by the table of data set given

- Students ={ ‘hours’ : [0.5, 0.75, 1.00, 1.25, 1.5, 1.75, 2.0], ‘result’ : [0, 0, 0, 0, 1, 1, 1] }
- df = pd.DataFrame(student, columns = [ ‘Hours’ , ‘Pass’])
- print (df)

Now creating logistic regression as

- X= df[‘Hour’]
- Y = df[‘Pass’]

Apply Logistic regression on data set using LogisticRegreassion() Function and obtain a confusion matrix to print the accuracy and predict how many students have been passed.

#### 2. Multinomial Logistic Regression

It has 3 or more output values or target values that are not in order such as classifying as cat, dog, elephant, predicting which food is healthier between veg, non-veg, vegan, etc. Multinomial logistic regression has more than 3 outputs which are of types that have no quantitative significance. If logistic regression is used for classifying multiple tasks then it is known as multinomial logistic regression.

**Example:**

**1.** Firstly import Python machine learning libraries like

- Pandas for data analysis
- Numpy for numerical calculation
- Sklearn which consists machine learning algorithms like linear_model, metrics, train_test_split
- Plotly for displaying plots according to data analysis.

**2.** Secondly either use data set or download it and then create headers as columns from the given data set. To print the loaded dataframe, columns, and header name.

**3.** Thirdly we need to understand the given data set and determine the relationship between single features with all target types. Then apply LogisticRegression() function and obtain the accuracy of the values which is calculated using the metrics method.

#### 3. Ordinal Logistic Regression

It also has 3 or more target values but the name itself suggests the output values are in order. In this, the categories of target values are in order not like multinomial unordered. We can divide the target values of the test as very poor, poor, good, very good which can also be given as 0, 1, 2, 3.

**Example:**

Predicting the movie rating from 0 to 1. Firstly we need to import packages to save data in DataFrame by importing Pandas, sklearn, mord package to import LogisticRegression packages.

- Import Pandas as pd
- df = pd.read_csv(‘https://archive.ics.uci.edu/ml/movie_rating.csv’, sep=’;’)
- Then predict the magnitude accuracy and category prediction and find its accuracy.

### Conclusion

Logistic Regression is an algorithm to analyze data and find the unique data among the given data to predict the data which is needed to solve the data wrangling problem. There are different techniques in this topic given as binary, multinomial and ordinal Logistic Regression. Logistic Regression with a binary that gives two target values, multinomial Regression which gives 3 or more target values but not in order where ordinal have ordered target values.

### Recommended Articles

This is a guide to Logistic Regression in Python. Here we discuss the introduction, how its work and techniques for Logistic Regression. You can also go through our other related articles to learn more –