3

I am understanding why likelihoods are not PDFs using links such as What is the reason that a likelihood function is not a pdf. However I am getting more confused.

For instance, the likelihood of I.I.D. Normals which can be seen as $$ \prod^N \mathcal N(x_n|\mu,\sigma^2) =\frac{1}{Z^N}\prod \frac{1}{2} (X-\mu)^\top (\sigma I)^{-1}(X-\mu) $$

This is equivalent with the PDF of an MVN $ N(\mu,\sigma^2 I)$.

This may also be the reason why maximizing the likelihood or the MVN PDF of the data with Covariance matrix $\sigma^2 I$ yields the same set of solution (particularly for linear regression).

Please help me clarify this. Is a reason why the likelihood is not considered a PDF is because of context? (The likelihood is meant to be a function that should be maximized with respect to the parameters)

The equivalence between the likelihood and the MVN above is just a coincidence? In fact I think any likelihood can be integrated to $1$ with a suitable normalizing constant unless maybe the likelihood itself does not converge under the integral.

  • Your $\prod^N \mathcal N(x_n|\mu,\sigma^2)$ is a possible pdf for $\mathbf X$ since it is non-negative and integrated over all possible values of $\mathbf X$ is $1$. It could not be a pdf for $(\mu, \sigma^2)$ as its integral over all possible values of $\mathbf X$ is not $1$. – Henry Jun 20 '23 at 08:17
  • @Henry did you also mean that $P(\mu,\sigma^2|\mathbf X)$ can not be considered as a pdf since it does not integrate to $1$ over all possible values of $\mathbf X$? – user1176663 Jun 20 '23 at 08:22
  • 1
    $P(\mu,\sigma^2\mid \mathbf X)$ is the wrong notation here for a likelihood. I am saying $\Lambda(\mu,\sigma^2 \mid \mathbf X)$ is not a pdf for $(\mu,\sigma^2)$ and it does not integrate to $1$ over all possible values of $(\mu, \sigma^2)$ – Henry Jun 20 '23 at 08:39
  • @Henry can I safely conclude that the PDF of the MVN evaluated at $\mathbf X$ and treating $\mu,\sigma$ as the argument variables makes it a likelihood? – user1176663 Jun 20 '23 at 13:06
  • 1
    Yes - that is essentially what the first sentence of Sextus Empiricus's answer says. – Henry Jun 20 '23 at 13:37
  • You might find the discussion here helpful. It includes a pretty clear illustration of why a likelihood function is not itself a density while being defined in terms of (a collection of) densities. – Glen_b Dec 08 '23 at 00:04

1 Answers1

4

The likelihood inverts the relationship between random data and the parameters of the distribution from which the data are sampled. Then due to the symmetry of the expression $(x-\mu)$ the likelihood function will resemble the pdf.

$$f(x,\mu) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} (x-\mu)^2 } = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} (x^2-2x\mu +\mu^2) }$$

This function will have the same shape when you switch $\mu$ and $x$ in the equations. (But note that having the shape of a pdf will not make it the same as a pdf*.)


For many other functions the likelihood function is not the same shape as the pdf. An example is the following pdf of $\hat{\theta}$ given the distribution parameter $\theta$

$${\hat\theta \sim \mathcal{N}(\mu=\theta, \sigma^2=1+\theta^2/3)}$$

or

$$f(\hat\theta, \theta ) = \frac{1}{\sqrt{2 \pi (1+\theta^2/3)}} \exp \left[ \frac{-(\theta-\hat\theta)^2}{2(1+\theta^2/3)} \right] $$

Imagine this probability density function $f(\hat \theta , \theta)$ plotted as a function of $\theta$ and $\hat \theta$. This is done as a surface plot and a contour plot in the image below.

example

You can see the likelihood function as a particular slice of the function for a fixed value of the observed $\hat{\theta}$ (also projected on the right), and it does not have the same shape as a normal distribution.

Other parts of the images, like the red and green curves and dots, are details that relate to the question The basic logic of constructing a confidence interval . The image shows how the boundaries of a confidence interval relate to particular points of the pdf/cdf at different values of the parameter, and how it is different from a likelihood interval.


Nick Cox
  • 56,404
  • 8
  • 127
  • 185
  • Then the equivalence is indeed a coincidence due to the symmetry? The likelihood assumes a fixed value for the $x_i$ and it is a function of the parameters. While the MVN PDF is the density of $x_i$, (a random variable) given fixed set of paremeters? In this sense, usage context also plays a part here. Sorry for the long comment but I need to have deeper understanding with this. – user1176663 Jun 20 '23 at 08:00
  • 2
    @user1176663 the likelihood is a function of probabilities, but that does not yet make it a probability distribution. – Sextus Empiricus Jun 20 '23 at 08:09
  • 1
    A similar difference occurs with the fiducial distribution which is a function of probability densities but does not behave like a probability density distribution. https://stats.stackexchange.com/a/592783/ – Sextus Empiricus Jun 20 '23 at 08:11
  • In my answer I sort of ignored the expression in the question, of the PDF of n iid variables. If you just consider the scale parameter, then the likelihood resembles the shape of the pdf. For the standard deviation $\sigma$ this is not the case. – Sextus Empiricus Jun 20 '23 at 14:28
  • That's a really nice illustration there! Can you please expand on it? (what are green vs. red dots? why there are 2 pdfs but only 1 likelihood graph? and what does the middle 3d represent? i wish there was a similar tool to illustrate the relationship between pdf and likelihood) – HeyJude Dec 07 '23 at 17:46
  • 1
    @HeyJude I have updated the question, adding some more information about the graph. – Sextus Empiricus Dec 07 '23 at 18:36
  • 1
    (+1) Great stuff, but red and green together are problematic for many people. Orange and blue would work better, or red and blue. – Nick Cox Dec 07 '23 at 19:21