Inspired by the question regarding Bessel's correction, I wonder whether there is a general rule regarding applicability of maximum likelihood for parameter estimation.
My guess is that the parameters estimated by maximising the likelihood in linear regression are optimal because they are independent. It seems intuitive to me, although not 'obvious', that the same holds for all generalised linear models.
On the other hand, the estimate of population variance depends on the estimate of the mean and this seems somehow to affect maximum likelihood (I'm aware of the math showing the need for correction, but it's still not intuitively clear to me).
So, is it correct to say that the independence of the parameter estimates is a requirement for maximum likelihood? If not, what conditions need to be satisfied?