In logistic regression, we often use maximum likelihood to estimate the parameter vector $\boldsymbol{\beta}$ that parametrizes the logistic equation. My confusion stems from the following:
- We know that the logistic regression is finding the conditional probability of $Y$ given $X$, i.e. $P(Y = 1 \mid X)$ for the binary case.
- We also know that the conditional probability $Y \mid X \sim \text{Ber}(p)$ follows a Bernoulli distribution for the binary case.
- Now the confusion I face is, after maximum likelihood estimation, we derive a set of “optimal” parameters $\boldsymbol{\beta}$, is the parameter found the same as $p$, where $p$ is the parameter of the Bernoulli distribution? My mind is fixated that since the likelihood function of $Y$ given $X$ is Bernoulli, then we should be finding the $p$ that maximise the data.
——
An attempt to answer this: finding the $\boldsymbol{\beta}$ is equivalent to finding the $p$ for the conditional distribution of $Y$ given a certain $X$ value. So they are the same.
EDIT: To clarify my question, by the definition of maximum likelihood, we are finding the parameter that maximise the conditional distribution $Y \mid X$, which in turn follows a Bernoulli. So my state of mind is that the parameter should be $p$, but of course we ended up finding $\boldsymbol{\beta}$. I understand the logistic function which is linear in the log odds with coefficients $\boldsymbol{\beta}$, what I failed to reconcile is whether we are following the definition that the maximum likelihood is returning us the parameter $p$ or $\boldsymbol{\beta}$, or it does not matter in this context since $\boldsymbol{\beta}$ and $p$ are linked.