Is gradient descent for non-parametric maximum likelihood estimation?

Question

In my reading of maximum likelihood estimation, they go through samples with KNOWN distributions (e.g. binomial, poisson, etc.). I wonder how can I connect to my knowledge of machine learning.

In machine learning, it's always the log likelihood to be maximized, usually in terms of P(data|theta). However, how we don't even know what's the underlying distribution of data (consider data can be some text or some arbitrary numerics), so can I say optimization algorithms are used to estimate the best parameter when the underlying population parameter is unknown?

But I just know optimization only works when the objective function satisfy some conditions that make it optimizable (i.e. have a global/local mini- or maxi-ma). Then how can we deal with the MLE when there's no suitable objective function and with unknown underlying probability distribution? Or this is not even a problem at all as it fails all the condition for gradient descent?

TIA.

Could you please give an example? There are plenty of situations where the family of the conditional distribution is known (such as in “classification” problem). — Dave, Sep 03 '22 at 05:52
Another reference about maximum likelihood estimation vs Gradient descent — Dave, Sep 03 '22 at 06:29

Is gradient descent for non-parametric maximum likelihood estimation?

0 Answers0