In a recent lecture I was told that, in order for the maximum likelihood estimate to be valid, the log likelihood needs to go to minus infinity as the parameter goes to the boundary of the parameter space. But I don't understand why this is essential. Suppose the log likelihood goes to some kind of asymptote. Then the parameter that maximizes the likelihood is still the maximum likelihood estimate, right?
-
2(+1). Gosh: so if I perform ML fitting of a Normal distribution to my data and limit the possible values of the SD to the range from $10^{-1000}$ to $10^{1000}$ and the mean to the range $\pm 10^{1000},$ I guess my estimates will no longer be valid ... . :-). Since those endpoints are beyond the range of IEEE floating point precision, that must mean nobody can trust any of the statistical software that runs on standard computing devices. It must be time for all of us to pull out that old abacus (it's on the shelf with the slide rule) and go back to doing the calculations by hand. – whuber Oct 28 '13 at 21:09
-
2The usual argument for the asymptotic normality of the ML estimator uses an assumption that true value of the parameter is in the interior of the parameter space. Presumably, the assumption you are talking about is used to prove this interiority. The condition you mention is definitely not essential, in the sense of being necessary. – Bill Oct 28 '13 at 21:24
-
What is the parameter space, what is the parameter in question, and which distribution? What you are told lacks a lot of critical information, in order for one to be able to assess its validity. – Alecos Papadopoulos Oct 29 '13 at 03:06
1 Answers
in order for the maximum likelihood estimate to be valid, the log likelihood needs to go to minus infinity as the parameter goes to the boundary
This is equal to saying, the Likelihood of a parameter needs to become 0 at the boundary of the parameter space in order for the result to be valid.
Well first of all, you can restrict the parameter space to values that all have a positive likelihood and still obtain a valid estimate.
Secondly, even if you use, say $(-\infty,\infty)$, you don't come close to the boundary since any off the shelf optimisation package performs some sort of random initialisation and then approaches the minimum using some method such as gradient descent, conjugate gradient or another. In either case, you almost never end up approaching the boundary of the parameter space, so I don't quite understand why the boundaries matter in the first place.
And even if you do that on purpose, at one point you will hit the floating point precision of your operating system. I can guarantee you that at that point, you haven't really approached the boundary $-\infty$ by much. :)
Personally I find the underflow problem arising when calculating sums and products of very small likelihoods and the log sum exp trick much more interesting and more noteworthy issue that actually matters a lot in practice, unlike reaching the boundaries of the parameter space.
- 282,281
- 1,671