Are we estimating the Bernoulli parameter in Logistic Regression?

Question

In logistic regression, we often use maximum likelihood to estimate the parameter vector $\boldsymbol{\beta}$ that parametrizes the logistic equation. My confusion stems from the following:

We know that the logistic regression is finding the conditional probability of $Y$ given $X$, i.e. $P(Y = 1 \mid X)$ for the binary case.
We also know that the conditional probability $Y \mid X \sim \text{Ber}(p)$ follows a Bernoulli distribution for the binary case.
Now the confusion I face is, after maximum likelihood estimation, we derive a set of “optimal” parameters $\boldsymbol{\beta}$, is the parameter found the same as $p$, where $p$ is the parameter of the Bernoulli distribution? My mind is fixated that since the likelihood function of $Y$ given $X$ is Bernoulli, then we should be finding the $p$ that maximise the data.

——

An attempt to answer this: finding the $\boldsymbol{\beta}$ is equivalent to finding the $p$ for the conditional distribution of $Y$ given a certain $X$ value. So they are the same.

EDIT: To clarify my question, by the definition of maximum likelihood, we are finding the parameter that maximise the conditional distribution $Y \mid X$, which in turn follows a Bernoulli. So my state of mind is that the parameter should be $p$, but of course we ended up finding $\boldsymbol{\beta}$. I understand the logistic function which is linear in the log odds with coefficients $\boldsymbol{\beta}$, what I failed to reconcile is whether we are following the definition that the maximum likelihood is returning us the parameter $p$ or $\boldsymbol{\beta}$, or it does not matter in this context since $\boldsymbol{\beta}$ and $p$ are linked.

It is possible to get a logistic regression coefficient estimate of $-2$. Is such a value a legitimate probability parameter for a Bernoulli distribution? // You’re on the right track to think that estimating $\beta$ relates to estimating the conditional Bernoulli probability parameters, but the values are not the same. — Dave, Mar 15 '23 at 12:41
Thanks @Dave, I was a bit loose with my question, they are definitely not the same. However, I made an edit to my question to clarify on whether we are following the definition of maximum likelihood as I got confused trying to follow strictly on what the maximum likelihood says (i.e. maximising the parameter of the distribution). — nan, Mar 15 '23 at 13:00

Tim · Accepted Answer · 2023-03-15T13:15:29.667

10

The logistic regression model is a kind of generalized linear model, so it consists of the linear predictor

$$ \eta = \boldsymbol{\beta}X $$

we pass it through the inverse of the link function $g$ (the logistic function), to obtain $p$, i.e. the conditional mean of the Bernoulli distribution

$$ E[Y|X] = p = g^{-1}(\eta) $$

since $Y$ is binary, we have

$$ Y|X \sim \mathsf{Bernoulli}(p) $$

so $\boldsymbol{\beta} \ne p$, but $g^{-1}(\boldsymbol{\beta}X) = p$. Logistic regression predicts the mean of the Bernoulli distribution.

Regarding your comment, in maximum likelihood, we are estimating the parameters $\boldsymbol{\beta}$ of our model by maximizing

$$ \hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\beta}}{\operatorname{arg\,max}} \; \mathsf{Bernoulli}\big(y \,|\, g^{-1}(\boldsymbol{\beta}X) \big) $$

(forgive me for the slight abuse of notation). Here $p$ is a function of $X$ and $\boldsymbol{\beta}$, rather than standalone parameter. Noting in the definition of maximum likelihood prohibits us from doing this.

edited Mar 15 '23 at 13:15

answered Mar 15 '23 at 12:42

Tim

138,066

Thank you for the detailed answer. To clarify my question, by the definition of maximum likelihood, we are finding the parameter that maximise the conditional distribution $Y \mid X$, which in turn follows a Bernoulli. So my state of mind is that the parameter should be $p$, but of course we ended up finding $\boldsymbol{\beta}$. I understand the logistic function which is linear in the log odds with coefficients $\boldsymbol{\beta}$, what I failed to reconcile is whether we are following the definition. – nan Mar 15 '23 at 12:58
1

@nan nothing in the definition of maximum likelihood says that the parameter needs to be $p$. I'll edit for clarity. – Tim Mar 15 '23 at 13:09
Thanks @Tim, to recap, $Y \mid X$ follows a conditional distribution of Bernoulli with parameter $p$ where $p$ is a function of $X$ and $\beta$. More concretely we can easily show that $p = \sigma(\beta^T \mathbf{x})$. What I understand is that we can just maximise the parameters $\beta$ in $p$ which serves the same purpose. Also, what does the notation in $\text{Bernoulli}$ refer to? – nan Mar 15 '23 at 13:36
1

@nan that's correct. – Tim Mar 15 '23 at 13:45

Henry · Answer 2 · 2023-03-15T13:58:07.920

4

Logistic regression tries to fit a model such as $$p(x_i)=\frac{1}{1+e^{-(\beta_0+\beta_1 x_i)}}$$ or equivalently with the log-odds $$\log_e\left(\frac{p(x_i)}{1-p(x_i)}\right)=\beta_0+\beta_1 x_i$$ to estimate $\beta_0$ and $\beta_1$ from the data, typically by using maximum likelihood methods: with the data $\{(x_i,y_i)\}$ where $y_i\in \{0,1\}$, you find the $\beta_0$ and $\beta_1$ which maximise $\prod_i p(x_i)^{y_i}(1-p(x_i))^{1-y_i}$ .

Here $p(x_i)$ is indeed a Bernoulli parameter in $(0,1)$, varying with $x_i$. You are trying to fit this with the logistic model.

$\beta_0$ and $\beta_1$ are not Bernoulli parameters, and can each take any real value.

edited Mar 15 '23 at 13:58

answered Mar 15 '23 at 13:38

Henry

39,459

Since $p(x)$ is indeed Bernoulli, then can I say that using maximum likelihood estimates and finding the beta coefficients is functionally the same as finding the Bernoulli parameter that maximizes the likelihood of observing the sequence of Bernoulli trials/events? – nan Mar 15 '23 at 13:43
1

@nan I have adjusted my answer to clarify some of the points, in particular that the observations are $y_i$ ($0$ or $1$) and you want to model this against the $x_i$. You are trying to estimate $\beta_0$ and $\beta_1$ for the logistic model. This is equivalent to finding estimates for the Bernoulli parameters, constrained by the logistic model; otherwise the maximum likelihood is the unhelpfully overfitted result saying $p(x_i)=1$ when $y_i=1$ and $p(x_i)=0$ when $y_i=0$ (and the natural interpretation if you have duplicated values of $x_i$) – Henry Mar 15 '23 at 14:03
Thanks, I understand better now. So my reasoning is somewhat correct to say that estimating the betas is equivalent to estimating the Bernoulli parameter since the Bernoulli parameter in this is a function of beta. In addition when you say constrained by the logistic model, what does that mean exactly? – nan Mar 15 '23 at 14:11
@nan - it is similar to linear regression where you cannot just have a squiggle joining all the dots as you would be constrained by the $\hat y_i= \beta_0+ \beta_1 x_i$ model to a straight line (or flat hyperplane). Here you are constrained to the logistic model $p(x_i)=\frac{1}{1+e^{-(\beta_0+\beta_1 x_i)}}$ – Henry Mar 15 '23 at 16:11

Are we estimating the Bernoulli parameter in Logistic Regression?

2 Answers2