How to compute the standard errors of a logistic regression's coefficients

Question

I am using Python's scikit-learn to train and test a logistic regression.

scikit-learn returns the regression's coefficients of the independent variables, but it does not provide the coefficients' standard errors. I need these standard errors to compute a Wald statistic for each coefficient and, in turn, compare these coefficients to each other.

I have found one description of how to compute standard errors for the coefficients of a logistic regression (here), but it is somewhat difficult to follow.

If you happen to know of a simple, succint explanation of how to compute these standard errors and/or can provide me with one, I'd really appreciate it! I don't mean specific code (though please feel free to post any code that might be helpful), but rather an algorithmic explanation of the steps involved.

Are you asking for Python code to get the standard errors, or for how the SEs are computed (mathematically / algorithmically) so that you can do it yourself? If the former, this Q would be off-topic for CV (see our help center), but may be on-topic on Stack Overflow. If the latter, it would be on-topic here (but you may not get any code suggestions). Please edit your Q to clarify this. If it is the former, we can migrate it to SO for you (please don't cross-post, though). — gung - Reinstate Monica, Mar 10 '14 at 17:01
Thanks, Gung. I purposefully posted here because I'm expecting the latter, but I'll edit to clarify. I mentioned I'm working in Python with scikit-learn in case someone who uses this software can give me tips specific to it. — Gyan Veda, Mar 10 '14 at 18:11
Hi @GyanVeda, I am facing the same problem now, what's your final solution, please? — zyxue, Nov 22 '16 at 04:53
FYI: How are the standard errors computed for the fitted values from a logistic regression? — Franck Dernoncourt, Dec 18 '16 at 22:13
Looks like you could use bootstrapping too; I think this is what statsmodels does by default. https://web.stanford.edu/class/archive/stats/stats200/stats200.1172/Lecture26.pdf — wordsforthewise, Mar 24 '21 at 00:42

score 39 · Answer 1 · edited Sep 10 '18 at 17:18

The standard errors of the model coefficients are the square roots of the diagonal entries of the covariance matrix. Consider the following:

Design matrix:

$\textbf{X = }\begin{bmatrix} 1 & x_{1,1} & \ldots & x_{1,p} \\ 1 & x_{2,1} & \ldots & x_{2,p} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n,1} & \ldots & x_{n,p} \end{bmatrix}$ , where $x_{i,j}$ is the value of the $j$th predictor for the $i$th observations.

(NOTE: This assumes a model with an intercept.)

$\textbf{V = } \begin{bmatrix} \hat{\pi}_{1}(1 - \hat{\pi}_{1}) & 0 & \ldots & 0 \\ 0 & \hat{\pi}_{2}(1 - \hat{\pi}_{2}) & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & \hat{\pi}_{n}(1 - \hat{\pi}_{n}) \end{bmatrix}$ , where $\hat{\pi}_{i}$ represents the predicted probability of class membership for observation $i$.

The covariance matrix can be written as:

$\textbf{(X}^{T}\textbf{V}\textbf{X)}^{-1}$

This can be implemented with the following code:

import numpy as np
from sklearn import linear_model

# Initiate logistic regression object
logit = linear_model.LogisticRegression()

# Fit model. Let X_train = matrix of predictors, y_train = matrix of variable.
# NOTE: Do not include a column for the intercept when fitting the model.
resLogit = logit.fit(X_train, y_train)

# Calculate matrix of predicted class probabilities.
# Check resLogit.classes_ to make sure that sklearn ordered your classes as expected
predProbs = resLogit.predict_proba(X_train)

# Design matrix -- add column of 1's at the beginning of your X_train matrix
X_design = np.hstack([np.ones((X_train.shape[0], 1)), X_train])

# Initiate matrix of 0's, fill diagonal with each predicted observation's variance
V = np.diagflat(np.product(predProbs, axis=1))

# Covariance matrix
# Note that the @-operater does matrix multiplication in Python 3.5+, so if you're running
# Python 3.5+, you can replace the covLogit-line below with the more readable:
# covLogit = np.linalg.inv(X_design.T @ V @ X_design)
covLogit = np.linalg.inv(np.dot(np.dot(X_design.T, V), X_design))
print("Covariance matrix: ", covLogit)

# Standard errors
print("Standard errors: ", np.sqrt(np.diag(covLogit)))

# Wald statistic (coefficient / s.e.) ^ 2
logitParams = np.insert(resLogit.coef_, 0, resLogit.intercept_)
print("Wald statistics: ", (logitParams / np.sqrt(np.diag(covLogit))) ** 2)

All that being said, statsmodels will probably be a better package to use if you want access to a LOT of "out-the-box" diagnostics.

To avoid memory issues and to account for singular matrix case, you could update your code as follows - V = np.product(predProbs, axis=1); covLogit = np.linalg.pinv(np.dot(X_design.T * V), X_design) — steadyfish, May 01 '19 at 19:27

score 14 · Accepted Answer · answered Mar 10 '14 at 20:21

14

Does your software give you a parameter covariance (or variance-covariance) matrix? If so, the standard errors are the square root of the diagonal of that matrix. You probably want to consult a textbook (or google for university lecture notes) for how to get the $V_\beta$ matrix for linear and generalized linear models.

answered Mar 10 '14 at 20:21

generic_user

13,339

1

I haven't been able to find anything online for the generalized linear model case (maybe I don't know the right search terms?). Help? – Kevin H. Lin Dec 12 '14 at 03:56
3

Here is one that I found after a few minutes of googling. My advice is to first understand how the parameter variance is calculated in a basic linear model. Once you get that, the extension to GLM's is easier. All the same, knowing how to calculate it and knowing how to get it in a software package aren't the same thing. www.sagepub.com/upm-data/21121_Chapter_15.pdf – generic_user Dec 12 '14 at 16:14

score 7 · Answer 3 · answered Mar 11 '14 at 02:44

7

If you're interested in doing inference, then you'll probably want to have a look at statsmodels. Standard errors and common statistical tests are available. Here's a logistic regression example.

answered Mar 11 '14 at 02:44

jseabold

775
3
7

Thanks for the recommendation! I'll look into statsmodels. Too bad that scikit-learn doesn't provide this sort of output. – Gyan Veda Mar 11 '14 at 15:11
2

Yeah. It's just usually not the goal of machine learning-type toolboxes to provide tools for (frequentist) hypothesis tests. If you run into data-size constraints that don't work well in statsmodels but do work in scikit-learn, I'd be interested to hear about them on github. – jseabold Mar 12 '14 at 16:46
@jseabold However, if you want to get some ad hoc notion of feature importance in logistic regression, you cannot just read off the effect sizes (the coefficients) without thinking about their standard errors. So even if you're not doing a frequentist test, and you just want some indication of effect sizes and robustness, the sklearn lack of variance output is challenging. – ely Aug 09 '15 at 18:18

Artur · Answer 4 · 2023-01-03T11:41:40.850

Building upon the fantastic work of @j_sack I have two impulses to build on his code:

Since predict_proba is giving you the probability for n-classes, the result is an nD-array and that is why one should(?) specify the class of interest, e.g.:

predProbs[:,0]

(check class of interest with resLogit.classes_)

The diagonal matrix should not include the raw-predicted-probability but $w_{ii}=\pi_i(1-\pi_i) $ e.g:

V = np.diagflat(np.product(predProbs[:,0] * (1-predProbs[:,0]), axis=1))

How to compute the standard errors of a logistic regression's coefficients

4 Answers4

Linked