In this answer https://stats.stackexchange.com/a/279111/151862 Glen_b says
The reason is that the d.f. parameter [of a t distribution] is very hard to estimate well from data, particularly if you're also estimating the scale parameter. Indeed you can often end up with either silly estimates or unstable estimates (e.g. from a ridge in parameter space)
I'm assuming he is talking about the geometry of the likelihood as a function of the parameters. Can someone explain why exactly a ridge makes things harder to estimate?
Rand Mathematica are obviously different. In statistics, one major purpose of optimization is to estimate parameters: one is not directly interested in what the particular value of the objective function might be. (It does play a role in assessing the uncertainty of the estimates.) – whuber Jun 29 '17 at 21:24