"scale" in logistic regression

Question

I am working on translating some R code into Python's statsmodels package, chiefly some logistic regression work that I've done, when I came across the following in the statsmodels documentation,

WARNING: Loglikelihood and deviance are not valid in models where scale is equal to 1 (i.e., Binomial, NegativeBinomial, and Poisson). If variance weights are specified, then results such as loglike and deviance are based on a quasi-likelihood interpretation. The loglikelihood is not correctly specified in this case, and statistics based on it, such AIC or likelihood ratio tests, are not appropriate.

What is this "scale", and what is the statistical reason why scale=1 invalidates the likelihood ratio test that I want to use and have used in R? (Was it even valid when I did it in R?)

I'm on mobile currently, and I can't find a canonical answer on this site for explaining this right now, but the scale and scale being 1 touches on the issue of what is commonly called overdispersion or underdispersion. Looking for those phrases here and elsewhere will help explain that. — Mark White, Apr 01 '20 at 22:25
@MarkWhite The quote makes it sound like scale=1 in all binomial, negative binomial, and Poisson models. Certainly there isn't overdispersion in all of those models no matter what, right? — Dave, Apr 01 '20 at 22:29
Okay, upon reading more, it seems like those are models where scale=1 is possible (maybe just likely) instead of assured, and we can call a statsmodels function to calculate the scale for a fitted model and check if scale=1. — Dave, Apr 01 '20 at 22:33
I think, better phrasing would be models/families where scale=1 under maximum likelihood assumptions. By default, those models impose scale=1, but that can be changed to account for excess dispersion or heterogeneous variance. However, if scale is not assumed to be 1, then we can only have QMLE and not MLE. — Josef, Apr 02 '20 at 00:42
@Josef I see what I did. The usual is that scale=1. Something in statsmodels.genmod.generalized_linear_model.GLM does QMLE, which is not valid for such cases. I am not sure where that happens, as I have gotten statsmodels logistic regressions to give me the same results as logistic regression in R, and a post on SO may be warranted, but statsmodels didn't break statistics with the quote I found. Phew! — Dave, Apr 02 '20 at 00:50
In statsmodels we can get QMLE for any family using scale or cov_type options. In R there are separate families for it. The default in both is MLE. Note, as in the answer by AdamO and the quote, likelihood based inference is not valid in general outside of MLE. However, wald inference, p-values and confidence intervals are still valid as long as we use the appropriate QMLE standard errors. — Josef, Apr 02 '20 at 01:10
@Josef So does statsmodels default to scale=1 for a "usual" logistic regression? — Dave, Apr 02 '20 at 01:19
yes, scale is also an attribute of the results instance that you can check to verify — Josef, Apr 02 '20 at 01:51
@Josef That seems contradictory. Scale seems like a value we specify. I hear you telling me that the function calculates it (which the documentation confirms). Please clarify. All I want is a regular logistic regression. — Dave, Apr 02 '20 at 01:56
For Binomial, NegativeBinomial and Poisson, scale is fixed at 1 by default, because that's the likelihood assumption. For other families like Gaussian, scale is estimated. However, there is an option to specify or ask for estimation of scale also for Binomial, NegativeBinomial and Poisson. In that case those turn into QMLE. It's just two-in-one models. — Josef, Apr 02 '20 at 02:58
Scale for a Gaussian would be part of estimating the variance, so that makes sense. Thanks a bunch! — Dave, Apr 02 '20 at 02:59

AdamO · Answer 1 · 2020-04-06T15:31:18.563

4

Logistic regression has one canonical parameter, the log odds. So if you use GLM as a maximum likelihood procedure, the linear model for the response is:

$$ \text{logit} \left( Pr(Y = 1) = \mu\right) = \beta_0 + \beta_1 X_1 + \ldots + \beta_p X_p$$

Where $g(\mu) = \text{logit}(\mu) = \log(\mu/(1-\mu))$ is called the "link function".

Consequently the mean-variance relationship is given by $V = \frac{\partial}{\partial \mu} g^{-1}(\mu)$ which in this case is $\mu(1-\mu)$, the readily recognizable variance of Bernoulli random variable.

GLMs are estimated by Fisher Scoring.

A scale family of distributions is any family of probability densities where given $X$ being a member of that family, $Y = \phi X$ is still a member of that same family (someone correct me with a formal definition here). The most famous example is the normal distribution.

The Bernoulli density is not a scale family. That means if you want to estimate a generalization of the logistic model where the linear model for the response is still given by:

$$ \text{logit} \left( \mu \right) = \beta_0 + \beta_1 X_1 + \ldots + \beta_p X_p$$

but the variance is given by:

$$ V = \phi^2 \mu (1-\mu)$$

you need to use non-standard GLM estimation, or you need to use quasilikelihood and calculate the dispersion parameter $\phi$ as a nuissance parameter, using the deviance residuals. This estimating equation is no longer a maximum likelihood procedure, but has many MLE-like properties. Wedderbern consequently coined the process one of quasilikelihood in 1973.

edited Apr 06 '20 at 15:31

answered Apr 01 '20 at 23:11

AdamO

62,637

What I'm gathering from this is that, for the usual logistic regression, the usual maximum likelihood estimation and likelihood ratio test work just fine (thank goodness). This is likely (pun intended) to warrant its own question, but what would I do to assess if the generalized logistic model is warranted? – Dave Apr 01 '20 at 23:13
@Dave qMLE has some of the properties of MLE, but not all. You should explore R more closely. For instance, there's no implementation of the qLRT because it can be dicey. It is not a "generalized" logistic model. It is a quasibinomial model. It's warranted when the variance of the response is not equal to, but proportional to the fitted probability times one minus the fitted probability. – AdamO Apr 01 '20 at 23:18
Why would the qMLE come up in the usual logistic regression? – Dave Apr 01 '20 at 23:43
Josef answered my question in the comment, but now I have another. How can a binary response variable deviate from the usual variance of $\mu (1-\mu )$? (I am content to post this as its own question if you think it warrants such a post.) – Dave Apr 02 '20 at 00:57
@Dave most often undetected correlated data. – AdamO Apr 02 '20 at 15:20
Does that apply, even when the conditional distribution is Bernoulli, or just when the conditional distribution has $2$+ flips of the coin (so $Binom(n\ge 2,; p)$)? – Dave Oct 06 '21 at 21:15

"scale" in logistic regression

1 Answers1

Linked