Logistic Regression: Scikit Learn vs Statsmodels

Question

I am trying to understand why the output from logistic regression of these two libraries gives different results.

I am using the dataset from UCLA idre tutorial, predicting admit based on gre, gpa and rank. rank is treated as categorical variable, so it is first converted to dummy variable with rank_1 dropped. An intercept column is also added.

py
from patsy import dmatrices
from sklearn.linear_model import LogisticRegression
import pandas as pd
import statsmodels.api as sm

df = pd.read_csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
y, X = dmatrices('admit ~ gre + gpa + C(rank)', df, return_type = 'dataframe')
X.head()
>  Intercept  C(rank)[T.2]  C(rank)[T.3]  C(rank)[T.4]  gre   gpa
0          1             0             1             0  380  3.61
1          1             0             1             0  660  3.67
2          1             0             0             0  800  4.00
3          1             0             0             1  640  3.19
4          1             0             0             1  520  2.93

# Output from scikit-learn
model = LogisticRegression(fit_intercept = False)
mdl = model.fit(X, y)
model.coef_
> array([[-1.35417783, -0.71628751, -1.26038726, -1.49762706,  0.00169198,
     0.13992661]]) 
# corresponding to predictors [Intercept, rank_2, rank_3, rank_4, gre, gpa]

# Output from statsmodels
logit = sm.Logit(y, X)
logit.fit().params
> Optimization terminated successfully.
     Current function value: 0.573147
     Iterations 6
Intercept      -3.989979
C(rank)[T.2]   -0.675443
C(rank)[T.3]   -1.340204
C(rank)[T.4]   -1.551464
gre             0.002264
gpa             0.804038
dtype: float64

The output from statsmodels is the same as shown on the idre website, but I am not sure why scikit-learn produces a different set of coefficients. Does it minimize some different loss function? Is there any documentation that states the implementation?

The results with leaving the constant term out won't reproduce the Scikit results either, since I checked it. It must be the regularization. — , Mar 01 '20 at 17:23

score 59 · Accepted Answer · edited Dec 06 '20 at 12:25

59

Your clue to figuring this out should be that the parameter estimates from the scikit-learn estimation are uniformly smaller in magnitude than the statsmodels counterpart. This might lead you to believe that scikit-learn applies some kind of parameter regularization. You can confirm this by reading the scikit-learn documentation.

There is no way to switch off regularization in scikit-learn, but you can make it ineffective by setting the tuning parameter C to a large number. Here is how that works in your case:

# module imports
from patsy import dmatrices
import pandas as pd
from sklearn.linear_model import LogisticRegression
import statsmodels.discrete.discrete_model as sm
read in the data & create matrices
df = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
y, X = dmatrices('admit ~ gre + gpa + C(rank)', df, return_type = 'dataframe')
sklearn output
model = LogisticRegression(fit_intercept = False, C = 1e9)
mdl = model.fit(X, y)
model.coef_
sm
logit = sm.Logit(y, X)
logit.fit().params

UPDATE: As correctly pointed out in the comments below, now you can switch off the relularization in scikit-learn by setting penalty='none' (see the docs).

edited Dec 06 '20 at 12:25

desertnaut

288

answered Mar 25 '16 at 22:18

tchakravarty

8,722

Thank you very much for the explanation! With this regularized result, I was trying to duplicate the result using the glmnet package in R, but could not get the same coefficient. glmnet has a slightly different cost function comparing to sklearn, but even if I set alpha=0 in glmnet (meaning only use l2-penalty) and set 1/(N*lambda)=C, I still do not get the same result? – hurrikale Mar 26 '16 at 13:04
My intuition is that if I divide both terms of the cost function in glmnet by lambda and set the new constant in font of the log-likelihood, which is 1/(N*lambda) equal to that in sklearn, the two cost functions become identical, or am I missing something? – hurrikale Mar 26 '16 at 13:06
@hurrikale Ask a new question and link it here, and I will take a look. – tchakravarty Mar 26 '16 at 13:11
Thanks! I posted the question here. – hurrikale Mar 26 '16 at 13:55
3

I think the best way to switch off the regularization in scikit-learn is by setting penalty='none'. – Nzbuu Jul 17 '19 at 15:30
As Nzbuu points out, the newest version of sklearn finally does allow to completely switch off regularization. – Ben Reiniger Aug 29 '19 at 20:40
Now above-mentioned dataset could be found at https://stats.idre.ucla.edu/stat/data/binary.csv – banderlog013 Apr 28 '20 at 11:03
Does the addition of penalty='none' mean that the documentation here is outdated, since it only mentions the "setting C to a very high value" way of switching off regularization? – Quinn Culver May 04 '21 at 19:28

score 10 · Answer 2 · edited May 04 '22 at 04:59

What tripped me up:

disable sklearn regularization LogisticRegression(C=1e9)
add statsmodels intercept sm.Logit(y, sm.add_constant(X)) OR disable sklearn intercept LogisticRegression(C=1e9, fit_intercept=False)
sklearn returns probability for each class so model_sklearn.predict_proba(X)[:, 1] == model_statsmodel.predict(X)
use of predict function model_sklearn.predict(X) == (model_statsmodel.predict(X) > 0.5).astype(int)

I'm now seeing the same results in both libraries.

score 2 · Answer 3 · answered Sep 07 '17 at 18:52

2

Another difference is that you've set fit_intercept=False, which effectively is a different model. You can see that Statsmodel includes the intercept. Not having an intercept surely changes the expected weights on the features. Try the following and see how it compares:

model = LogisticRegression(C=1e9)

answered Sep 07 '17 at 18:52

brian dalessandro

37

3

It is the exact opposite actually - statsmodels does not include the intercept by default. See the SO threads Coefficients for Logistic Regression scikit-learn vs statsmodels and scikit-learn & statsmodels - which R-squared is correct?, as well as the answer below. – desertnaut May 26 '20 at 12:44
@desertnaut you're right statsmodels doesn't include the intercept by default. Here the design matrix X returned by dmatrices includes a constant column of 1's (see output of X.head()). Then even though both the scikit and statsmodels estimators are fit with no explicit instruction for an intercept (the former through intercept=False, the latter by default) both models effectively have an intercept, which can be seen by inspecting the outputs carefully. – rmwenz Jan 04 '21 at 02:29

Logistic Regression: Scikit Learn vs Statsmodels

3 Answers3

read in the data & create matrices

sklearn output

sm

Linked

Related