3

I'm doing a logistic regression, which I understand I can do by simply saying

$$ \operatorname{logit}(Y)=\beta_0+\beta_1 x+\varepsilon $$

where $\varepsilon$ is normally distributed around $0$. Then then we can use the usual OLS methodology to fit the $\beta$s, and when we set $\varepsilon =0$, this gives us our best estimate $\widehat{\operatorname{logit}(Y)}$.

My question is, how can we find $\hat Y$ from here. I think that it isn't as simple as $\hat Y=\operatorname{logit}^{-1}\left(\widehat{\operatorname{logit}(Y)}\right)$, because I know by analogy, $\hat Y=\exp\left(\widehat{\log(Y)}+\frac{1}{2}\sigma^2\right)$.

I looked up a logit-normal distribution (https://en.wikipedia.org/wiki/Logit-normal_distribution), but it says that there's no analytical solution for the mean of such a distribution. But I think I must be missing something because what good is the logistic regression if not to estimate $Y$.

1 Answers1

4

Your understanding of logistic regression has some errors.

The logistic regression equation is

$$ \operatorname{logit}(E(Y))=\beta_0+\beta_1 x $$

Notice, there is no random part of the model on the right hand side. The linear part estimates the logit of the expected value of $Y$ exactly.

The randomness comes from how $Y$ disperses around it's expectation. To write the model explicitly in your style, you would have to write something like

$$ Y \mid x = \operatorname{Bernoulli}\left(p = \operatorname{logit}^{-1}(\beta_0+\beta_1 x) \right) $$

As a consequence, you cannot use OLS technology to fit a logistic regression. Logistic regressions are fit using iterative optimization, usually based off Newton's method.

Matthew Drury
  • 35,629