Updated April 1, 2023
Introduction to Statsmodels Python
We can explore the data, estimate various statistical models, and even test the models statistically by using the package of Python named statsmodels. Statsmodel is the package of Python programming and belongs to that stack of modules that deals with the scientific domain and has its implementation in future technologies, including data analysis, statistics, and data science. It can be considered the complimentary package to the stats module named SciPy.
This article will look at the overview of statsmodels, why it’s necessary to use Statsmodel, steps of installation, how to use it, linear regression statsmodels, and have a look at its implementation with the help of an example.
Overviews of Statsmodels Python
This library or package is created on top of the SciPy and NumPy packages and also makes the data handling by using pandas and has the patsy interface for the formula that resembles the R-like. The matplotlib is the library from which the graphics functions are used. Many other Python packages consider this one the base for creating statistics libraries.
Scipy.stats was the module of the scipy package and was written initially by Jonathan Taylor, but later it was removed, and a completely new package was created. Many improvements, rigorous testing, and corrections were made in the Google Summer of Code 2009, and finally, the package with the statsmodels was launched.
Even now, many of the statistical models, tools for plotting, and new models are emerging and introduced in the market with continuous development by the team of statsmodel.
Why StatsModels?
We can work with statistics in a way that no other platform will allow us as the statsmodel itself is made, keeping the purpose of hardcore statistics in mind. It has more inclination towards R and is a perfect tool when analyzing statistical things. Most of the developers who program in R can make use of this and can easily make their move to Python using this package.
Install StatsModels
We can install the library of statsmodel by executing certain commands. We have to make sure that the following prerequisites are fulfilled –
- Numpy 1.6 package or later version
- Pandas 0.12 package or later version
- Cython 0.24 package or later version
- Patsy 0.2.1 package or later version
- Scipy 0.11 package or later version
- Python 2.6 package or later version
If you have all the above packages installed on your machine, you can go for installing statsmodel by using the terminal commands. For this, open the terminal, and by using the pip, you need to enter the following command on the terminal –
Sudo pip install statsmodel
Executing the above command will give the following output –
As an alternative to the above, you can try installing statsmodel by using Conda, and the command will be –
Sudo conda install statsmodel
The execution of command on the terminal may result in the following –
The statsmodel package will be ready for use once you follow the above installation steps.
Using StatsModels
Once the statsmodel is installed, you can make use of the statsmodel package inside your Python program simply by importing the package at the top of the file by using the below import statement –
import statsmodel
After that, you can use the functions and features of statsmodel in that Python program.
Example of Statsmodels Python
Let us consider a straightforward example to help you better understand the package use. The steps we will follow will be to import the necessary packages, load the data we want to use, create the regression model, and fit it. In our example, we will use a regressor with a natural log of one. The final step will involve the inspection of the obtained results. We will write the following code of Python–
#importing the required packages
import numpy as np
import statsmodels.api as statsEducba
import statsmodels.formula.api as statsEducbaModel
#Providing the dataset for loading
sampleEducbaData= statsEducba.datasets.get_rdataset ("Educba", "Payal").data
#Fitting the model by using regression with the log of 1
sampleOutputRes = statsEducbaModel.ols('Lottery ~ Literacy + np.log(Pop1831)', data = sampleEducbaData).fit()
#showing the final summary of output containing analysis
print (sampleOutputRes.summary())
The output of executing the above code is as shown in the below image –
Linear Regression StatsModels
After you have learned the basics of using the statsmodel, it’s time to turn to a more sophisticated part where we will implement the linear regression in the source data with the help of the statsmodel package. We will follow the same steps mentioned in the above example with one additional part for the OLS model. Let us directly jump to code and then try to understand it –
#importing the necessary packages
import numpy as educbaSampleNumpy
import stateducbaSampleStatsodels.api as educbaSampleStats
#Loading the source data set
educba_data = educbaSampleStats.datasets.spector.load()
#Adding constants to the data file
educba_data.exog = educbaSampleStats.add_constant(educba_data.exog, prepend=False)
#Fitting the model which is in OLS
educbaModel = educbaSampleStats.OLS(educba_data.endog, educba_data.exog)
res = educbaModel.fit()
#Summarize the statistical results and printing the same on console
print(res.summary())
We can easily read the details of the result from the output. Execution of the above code gives the following output –
Let us take one more example that will implement the generalized linear models, also denoted as GLMs, with the support of exponential families with one parameter for estimation. It can be implemented simply by observing the below example –
#Importing required library of statsmodel
import statsmodels.api as educba_stat
#Loading the data
educba_sample_data = educba_stat.datasets.scotland.load()
#Creating the data file along with addition of constants to it
educba_sample_data.exog = educba_stat.add_constant(educba_sample_data.exog)
#Using the link function provided by default try to initialize the gamma family model
educba_GAMA = educba_stat.GLM(educba_sample_data.endog, educba_sample_data.exog, family=educba_stat.families.Gamma())
#Fit the model
educba_gamma_results = educba_GAMA.fit()
#print the summary
print(educba_gamma_results.summary())
The output of the above code is as shown in the below image –
Conclusion
We can use statsmodel to perform the statistical analysis and create the new models in just a few lines of code without much hassle, giving us an obvious and easy-to-understand summary output.
Recommended Articles
This is a guide to Statsmodels Python. Here we discuss the overview of statsmodels, why it’s necessary to use statsmodel, installation steps, and linear regression statsmodels. You may also have a look at the following articles to learn more –