1

I am trying, without success, to calculate the log-likelihood of the most basic logistic regression model - a constant probability model (i.e. only $\beta_0 \ne 0$).

For the simplest model with 1 coefficient (i.e. constant probability):

$$ E(y) = \pi = \frac{1}{1 + e^{-\beta_0}} $$

The maximum likelihood estimate of $\pi$ is $y/n$. This means that my likelihood is:

$$ \begin{aligned} \ln(L(y_1, \dots, y_2)) &= \ln \left[ \left(\frac{y}{n}\right)^{y}\left(1 - \frac{y}{n} \right)^{n-y} \right] \\ &= y[\ln(y) - \ln(n)] + (n-y)\ln(1 - \frac{y}{n}) \\ &= y\ln(y) - y\ln(n) + (n-y)[\ln(n - y) - \ln(n)] \\ &= y\ln(y) - y\ln(n) - n\ln(n) + y\ln(n) + (n-y) \ln(n-y) \\ &= y\ln(y) - n\ln(n) + (n-y)\ln(n-y) \end{aligned} $$

An example can be found from the book Introduction to Linear Regression Analysis by Montgomery on p. 426 as given below.

A 1959 article in the journal Biometrics presents data concerning the proportion of coal miners who exhibit symptoms of severe pneumoconiosis and the number of years of exposure. The response variable of interest, $y$ , is the proportion of miners who have severe symptoms. A reasonable probability model for the number of severe cases is the binomial, so we will fit a logistic regression model to the data.

exposure = c(5.8, 15.0, 21.5, 27.5, 33.5, 39.5, 46.0, 51.5)
cases = c(0,1,3,8,9,8,10,5)
miners = c(98,54,43,48,51,38,28,11)

y = sum(cases) n = sum(miners) print(y * log(y) - n * log(n) + (n - y)*log(n - y))

The above prints -135.0896. However, the log-likelihood of a constant probability model is:

m0 = glm(cbind(successes, failures) ~ 1, family=binomial(link='logit'))
logLik(m0)

The above prints -39.8646.

I don't understand where I'm going wrong with this.

s5s
  • 685

1 Answers1

1

A likelihood is a strange thing: it is not a probability and does not need to sum or integrate to $1$. In fact likelihood is only measured up to proportionality, with a positive multiplicative constant which becomes an additive constant if you take the logarithm. So you can find the ratios of likelihoods of different values of the parameter, i.e the difference for log-likelihoods, and you can which value of the parameter maximises the likelihood, but the likelihood itself not fixed absolutely; in Bayesian analysis the constant of proportionality integrates out.

If you are just looking at the overall figures from your data, you get $\frac{y}{n}=\frac{44}{371}\approx 0.1186$ and this is going to be the maximum likelihood proportion $\hat p$. In R:

    phat <- y/n 

The differences in the log-likelihoods come from binomial coefficients involving $y$ and $n$ but not $\hat p$. Here are three ways of calculating it, giving three different values:

  • Your: $$\log_e\left(\hat p^y (1-\hat p)^{n-y}\right)$$

    log(phat^y * (1-phat)^(n-y))
    # -135.0896
    
  • the binomial: $$\log_e\left({n \choose y} \hat p^y (1-\hat p)^{n-y}\right) = \log_e{n \choose y}+\log_e\left(\hat p^y (1-\hat p)^{n-y}\right)$$

    log(choose(n,y) * phat^y * (1-phat)^(n-y))
    # -2.749837 
    
  • the binomials for each set of miners as used by glm: $$\log_e\left(\prod\limits_i{\text{miners}_i \choose \text{cases}_i} \hat p^{\text{cases}_i} (1-\hat p)^{\text{miners}_i-\text{cases}_i}\right)\\ = \left(\sum\limits_i \log_e {\text{miners}_i \choose \text{cases}_i}\right) + \log_e\left(\hat p^y (1-\hat p)^{n-y}\right)$$

    log(prod(choose(miners,cases) * phat^cases * (1-phat)^(miners-cases)))
    # -39.8646
    

These are all log-likelihoods from the same data, with the difference between the first and second being log(choose(n,y)) about $132.3398$, and between the first and third sum(log(choose(miners,cases))) about $95.22504$, none of which involve $\hat p$.

Henry
  • 39,459
  • https://stats.stackexchange.com/questions/97515/what-does-likelihood-is-only-defined-up-to-a-multiplicative-constant-of-proport is related – Henry May 28 '23 at 23:28