Updated March 31, 2023
What is Logistic Regression in Python
Today’s world is full of data exchange everywhere, such as messaging, calling, browsing the Internet, etc., all involve data exchange. So Data Science has 70% of its problems for classifying such data. There are many classification techniques or algorithms in machine learning applications. Logistic regression is one of the regression methods to solve classification problems such as spam detection, predictions in diabetes or any other medical concepts, etc.
Logistic regression is the most widely used machine learning algorithm for solving classification problems. In Python, Logistic regression is one of the predictive analysis techniques. This regression estimates the relationship between one dependent variable and the independent variable, and the output variable is dichotomous, i.e. it has only two possible classes.
Logistic Regression Working in Python
Logistic regression uses the log function to predict the probability of occurrences of events. Logistic regression has the output variable, also referred to as the dependent variable, which is categorical, and it is a special case of linear regression. So the linear regression equation can be given as
Where
- y: is the dependent variable or target variable
- X1, X2……Xn: independent variables.
- β0: the y-intercept.
- β1: the slope.
The sigmoid function is given by
- p = 1 / 1+ e-y
Apply this sigmoid function to the linear regression equation.
- p = 1 / 1+ e– β0 + β1X1+β2X2+……+βnXn
The sigmoid function also is known as the logistic function, which gives an “S” shaped curve that can map the real values between 0 and 1. In this curve, y predicted is 1 if it is positive infinity and 0 if it is negative infinity. If the sigmoid function value is above 0.5, its outcome will be 1 (Yes), and if its value is blown 0.5, then its outcome is 0 (No).
In this logistic regression, the dependent variable follows the Bernoulli distribution, and estimation is done through maximum likelihood. The ultimate livelihood is the maximization method which determines the parameters that are likely to produce the observed data that can be used to predict the data needed in the normal distribution.
Logistic Regression Techniques
Below are the techniques:
1. Binary Logistic Regression
Binary Logistic Regression has only two possible outputs, as the name suggests. The target values can be yes or no, 1 or 0, true or false, etc. Let us consider an example: a student’s result according to the number of hours he/she studies. The result can be in only two values, either Pass or Fail. This can be implemented in Python as follows:
Example:
1. Firstly, gather data from which we want to determine how many students have passed. So for this, we have an independent variable as the number of study hours, and the dependent variable is in two possible values, pass or fail.
The below table is the data set to be considered.
Number of hours of Study (hours) | Result of Student (Pass) |
0.5 | 0 |
0.75 | 0 |
1.00 | 0 |
1.25 | 0 |
1.5 | 1 |
1.75 | 1 |
2.00 | 1 |
2. Secondly, import the libraries in the Python program for data analysis, such as pandas to create DataFrame, sklearn to build a logistic regression model, and seaborn to display the result in the form of the matrix known as a confusion matrix.
- import pandas as pd
- from sklearn import metrics
- import seaborn as sb
3. Thirdly, build DataFrame this can be obtained by the table of data set given
- Students ={ ‘hours’ : [0.5, 0.75, 1.00, 1.25, 1.5, 1.75, 2.0], ‘result’ : [0, 0, 0, 0, 1, 1, 1] }
- df = pd.DataFrame(student, columns = [ ‘Hours’ , ‘Pass’])
- print (df)
Now creating logistic regression as
- X= df[‘Hour’]
- Y = df[‘Pass’]
Apply Logistic regression on the data set using LogisticRegreassion() Function and obtain a confusion matrix to print the accuracy and predict how many students have passed.
2. Multinomial Logistic Regression
It has 3 or more output values or target values that are not in order, such as classifying as a cat, dog, or elephant, predicting which food is healthier between veg, non-veg, vegan, etc. Multinomial logistic regression has more than 3 outputs of types with no quantitative significance. If logistic regression is used for classifying multiple tasks, it is known as multinomial logistic regression.
Example:
1. Firstly, import Python machine learning libraries like
- Pandas for data analysis
- Numpy for numerical calculation
- Sklearn, which consists of machine learning algorithms like linear_model, metrics, train_test_split
- Plotly for displaying plots according to data analysis.
2. Secondly, either use the data set or download it and create headers as columns from the given data set. To print the loaded data frame, columns, and header name.
3. Thirdly, we need to understand the given data set and determine the relationship between single features with all target types. Then apply LogisticRegression() function and obtain the accuracy of the values, which is calculated using the metrics method.
3. Ordinal Logistic Regression
It also has 3 or more target values, but the name suggests the output values are in order. The categories of target values are in order, not like multinomial unordered. We can divide the test’s target values as very poor, poor, good, and very good, which can also be given as 0, 1, 2, 3.
Example:
We are predicting the movie rating from 0 to 1. Firstly we need to import packages to save data in DataFrame by importing Pandas, Sklearn, mord is a Python package to import LogisticRegression packages.
- Import Pandas as pd
- df = pd.read_csv(‘https://archive.ics.uci.edu/ml/movie_rating.csv’, sep=’;’)
- Then predict the magnitude accuracy and category prediction and find its accuracy.
Conclusion
Logistic Regression is an algorithm to analyze data and find the unique data among the given data to predict the data needed to solve the data wrangling problem. There are different techniques in this topic, given as binary, multinomial, and ordinal Logistic Regression. Logistic Regression with a binary that gives two target values, and multinomial Regression gives 3 or more target values but not in order, where ordinal have ordered target values.
Recommended Articles
This is a guide to Logistic Regression in Python. Here we discuss the introduction, its works,s and techniques for Logistic Regression. You can also go through our other related articles to learn more –