0

The likelihood of observing the dataset we have (for some mean and variance) can be written as the product of the likelihood of observing each data point (since all the rows are independent). Now, using MLE we will be finding the mean and variance that maximize the likelihood of observing the data. For every value we predict, we set the mean of the Gaussian distribution to be equal to the predicted value and then evaluate the pdf at the ground truth value. This gives us the probability of observing values near the ground truth. If the probability is high, means it's more likely that we are going to predict the values that will be close to the ground truth values. If the probability is low, means it's unlikely that we will predict the values close to the ground truth. Based on the value we get, we adjust our model parameters to make better predictions and again repeat the same procedure. We then choose the value which maximizes the probability of observing the ground truth value.

  • 1
    We maximize the likelihood function of a sample with respect to parameters. Since probability density functions do not represent probabilities (it is a probability per unit--hence the term density), it is possible for PDFs to be greater than $1$ at certain points (but they must integrate to one!), and therefore, the likelihood, being a product of such densities, may also be greater than $1$--so "maxiziming the probability of observing the ground truth" is not quite precise enough to describe what is happening when we perform MLE, at least for continuous RVs. – Nap D. Lover Sep 01 '23 at 19:59
  • To clear up some of these issues, please see https://stats.stackexchange.com/questions/2641 – whuber Sep 08 '23 at 22:03

0 Answers0