0

Inspired by the question regarding Bessel's correction, I wonder whether there is a general rule regarding applicability of maximum likelihood for parameter estimation.

My guess is that the parameters estimated by maximising the likelihood in linear regression are optimal because they are independent. It seems intuitive to me, although not 'obvious', that the same holds for all generalised linear models.

On the other hand, the estimate of population variance depends on the estimate of the mean and this seems somehow to affect maximum likelihood (I'm aware of the math showing the need for correction, but it's still not intuitively clear to me).

So, is it correct to say that the independence of the parameter estimates is a requirement for maximum likelihood? If not, what conditions need to be satisfied?

Igor F.
  • 9,089
  • I’m not sure what your question is. A maximum likelihood estimate is optimal in the sense that it maximizes the likelihood function given a sample of data. If you get a sample $X$ and from it you somehow compute an estimate of the parameter $\theta$, denoted $\hat \theta$, and then plug $\hat \theta$ into the likelihood function with $X$, if the likelihood function is maximized, then $\hat \theta$ is a maximum likelihood estimate. – mhdadk Sep 19 '22 at 11:30
  • 2
    In the frequentist world in which maximim likelihood estimation is used, the parameters don't have distributions, so it is not meaningful to refer to them as dependent or independent of each other. So I gather you are referring to independence of parameter estimates. There is no need for that either, for MLE to be optimal. The main problem with MLE is that it is equivalent to Bayesian posterior mode where all values of all parameters are equally likely before knowing hte data. This is a very unreasonable assumption. – Frank Harrell Sep 19 '22 at 11:33
  • @FrankHarrell Thanks, I corrected the wording. I believe I get your point about the main problem with MLE (at least I hope so), but, if I do, I don't see a connection to my question. The ML estimate of population variance is not suboptimal because of the prior equal-likelihood assumption, but because it is related to the mean estimate. Or are the two somehow connected? – Igor F. Sep 19 '22 at 13:04
  • Are you asking when is maximum likelihood unbiased? Because the simple answer is practically never. – seanv507 Sep 19 '22 at 13:14
  • 1
    "Maximum-likelihood estimators have no optimum properties for finite samples, in the sense that (when evaluated on finite samples) other estimators may have greater concentration around the true parameter-value.[16] However, like other estimation methods, maximum likelihood estimation possesses a number of attractive limiting properties: As the sample size increases to infinity, sequences of maximum likelihood estimators have these properties... Wikipedia Maximum Likelihood – seanv507 Sep 19 '22 at 13:44
  • 1
    Maybe relevant: https://stats.stackexchange.com/questions/92097/why-maximum-likelihood-and-not-expected-likelihood/449782#449782 – kjetil b halvorsen Sep 19 '22 at 14:07
  • @seanv507 So sample mean as the estimate for the population mean is a rare exception? I was under the impression that linear regression computes the sample mean, conditioned on the predictors and, consequently, that the parameter estimates are unbiased. Am I wrong? – Igor F. Sep 20 '22 at 07:30
  • so linear regression is a rare exception ( and I guess that includes the sample mean as intercept only model). https://en.wikipedia.org/wiki/Ordinary_least_squares#Finite_sample_properties. Note that you need to condition on X or assume X is deterministic – seanv507 Sep 20 '22 at 07:59
  • 1
    @IgorF. we use MLE all the time when two of the parameter estimates are correlated (e.g., two regression coefficients for two collinear Xs) so don't get hung up on that. – Frank Harrell Sep 20 '22 at 12:12

0 Answers0