Logistic Regression: Scikit Learn vs glmnet

Question

I am trying to duplicate the results from sklearn logistic regression library using glmnet package in R.

From the sklearn logistic regression documentation, it is trying to minimize the cost function under l2 penalty $$\min_{w,c} \frac12 w^Tw + C\sum_{i=1}^N \log(\exp(-y_i(X_i^Tw+c)) + 1)$$

From the vignettes of glmnet, its implementation minimizes a slightly different cost function $$\min_{\beta, \beta_0} -\left[\frac1N \sum_{i=1}^N y_i(\beta_0+x_i^T\beta)-\log(1+e^{(\beta_0+x_i^T\beta)})\right] + \lambda[(\alpha-1)||\beta||_2^2/2+\alpha||\beta||_1]$$

With some tweak in the second equation, and by setting $\alpha=0$, $$\lambda\min_{\beta, \beta_0} \frac1{N\lambda} \sum_{i=1}^N \left[-y_i(\beta_0+x_i^T\beta)+\log(1+e^{(\beta_0+x_i^T\beta)})\right] + ||\beta||_2^2/2$$

which differs from sklearn cost function only by a factor of $\lambda$ if set $\frac1{N\lambda}=C$, so I was expecting the same coefficient estimation from the two packages. But they are different. I am using the dataset from UCLA idre tutorial, predicting admit based on gre, gpa and rank. There are 400 observations, so with $C=1$, $\lambda = 0.0025$.

#python sklearn
df = pd.read_csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
y, X = dmatrices('admit ~ gre + gpa + C(rank)', df, return_type = 'dataframe')
X.head()
>  Intercept  C(rank)[T.2]  C(rank)[T.3]  C(rank)[T.4]  gre   gpa
0          1             0             1             0  380  3.61
1          1             0             1             0  660  3.67
2          1             0             0             0  800  4.00
3          1             0             0             1  640  3.19
4          1             0             0             1  520  2.93

model = LogisticRegression(fit_intercept = False, C = 1)
mdl = model.fit(X, y)
model.coef_
> array([[-1.35417783, -0.71628751, -1.26038726, -1.49762706,  0.00169198,
     0.13992661]]) 
# corresponding to predictors [Intercept, rank_2, rank_3, rank_4, gre, gpa]


> # R glmnet
> df = fread("https://stats.idre.ucla.edu/stat/data/binary.csv")
> X = as.matrix(model.matrix(admit~gre+gpa+as.factor(rank), data=df))[,2:6]
> y = df[, admit]
> mylogit <- glmnet(X, y, family = "binomial", alpha = 0)
> coef(mylogit, s = 0.0025)
6 x 1 sparse Matrix of class "dgCMatrix"
                    1
(Intercept)      -3.984226893
gre               0.002216795
gpa               0.772048342
as.factor(rank)2 -0.530731081
as.factor(rank)3 -1.164306231
as.factor(rank)4 -1.354160642

The R output is somehow close to logistic regression without regularization, as can be seen here. Am I missing something or doing something obviously wrong?

Update: I also tried to use LiblineaR package in R to conduct the same process, and yet got another different set of estimates (liblinear is also the solver in sklearn):

> fit = LiblineaR(X, y, type = 0, cost = 1)
> print(fit)
$TypeDetail
[1] "L2-regularized logistic regression primal (L2R_LR)"
$Type
[1] 0
$W
            gre          gpa as.factor(rank)2 as.factor(rank)3 as.factor(rank)4         Bias
[1,] 0.00113215 7.321421e-06     5.354841e-07     1.353818e-06      9.59564e-07 2.395513e-06

Update 2: turning off standardization in glmnet gives:

> mylogit <- glmnet(X, y, family = "binomial", alpha = 0, standardize = F)
> coef(mylogit, s = 0.0025)
6 x 1 sparse Matrix of class "dgCMatrix"
                     1
(Intercept)      -2.8180677693
gre               0.0034434192
gpa               0.0001882333
as.factor(rank)2  0.0001268816
as.factor(rank)3 -0.0002259491
as.factor(rank)4 -0.0002028832

TomDLT · Answer 1 · 2016-03-30T15:20:53.527

10

Dougal's answer is correct, you regularize the intercept in sklearn but not in R. Make sure you use solver='newton-cg' since default solver ('liblinear') always regularizes the intercept.

cf https://github.com/scikit-learn/scikit-learn/issues/6595

edited Mar 30 '16 at 15:20

answered Mar 29 '16 at 13:11

TomDLT

591

Setting solver='newton-cg' made the results from sklearn and statsmodels consistent. Thanks a lot. – irene Jan 13 '19 at 05:19

score 9 · Answer 2 · answered Mar 27 '16 at 06:33

9

sklearn's logistic regression doesn't standardize the inputs by default, which changes the meaning of the $L_2$ regularization term; probably glmnet does.

Especially since your gre term is on such a larger scale than the other variables, this will change the relative costs of using the different variables for weights.

Note also that by including an explicit intercept term in the features, you're regularizing the intercept of the model. This is generally not done, since it means that your model is no longer covariant to shifting all the labels by a constant.

answered Mar 27 '16 at 06:33

Danica

24,685

glmnet allows turning off standardization of the inputs, but the estimated coefficients are even more different, please see above. Also, I explicitly included intercept term in sklearn because glmnet includes one automatically, so this is to make sure that the input to both models are the same. – hurrikale Mar 27 '16 at 15:57
2

@hurrikale I think glmnet is probably not regularizing the intercept, but sklearn is. Drop the intercept column from X and pass fit_intercept=True (the default) to LogisticRegression. There's probably something else going on as well, though. – Danica Mar 27 '16 at 16:08
I tried what you suggested and yet got different sets of coefficient: [-1.873, -0.0606, -1.175, -1.378, 0.00182, 0.2435] for sklearn and [-2.8181, 0.0001269, -0.0002259, -0.00020288, 0.00344, 0.000188] for glmnet in order of [Intercept, rank_2, rank_3, rank_4, gre, gpa]. My concern is they differ both in magnitude and in positively/negatively affecting the probability, so without knowing why they differ, it is hard to choose one to interpret upon. And if there is by any chance a bug in one of the implementations, it is particularly important that I know which one to rely on. – hurrikale Mar 27 '16 at 20:51

score 0 · Answer 3 · answered Dec 11 '19 at 23:09

You should also use L1_wt=0 argument along with alpha in fit_regularized() call.

This code in statsmodels:

import statsmodels.api as sm
res = sm.GLM(y, X, family=sm.families.Binomial()).fit_regularized(alpha=1/(y.shape[0]*C), L1_wt=0)

is equivalent to the following code from sklearn:

from sklearn import linear_model
clf = linear_model.LogisticRegression(C = C)
clf.fit(X, y)

Hope it helps!

Logistic Regression: Scikit Learn vs glmnet

3 Answers3

Linked