Regarding "non-PhD" in the question title, please answer this question for the audience of people with a solid understanding of probability distributions but no knowledge of the nuances of different statistical paradigms (e.g., frequentism, Bayesianism, likelihoodism).
After reading through the existing stack exchange questions on this topic, I am confused as I have seen multiple top-voted answers that seem to contradict each other, and seem to contradict a highly regarded machine learning book.
Here is what I've gathered, please comment on what is correct / wrong. If something is both correct and wrong depending on the statistical paradigm one subscribes to, please say so instead of stating the belief dependent only on one paradigm. Please only comment if you are an expert on this topic, as this seems to be a contentious topic.
Firstly, I'll state that:
- likelihood is not a PMF / PDF as its integral does not sum to 1
- discrete / continuous functions have probabilities / probability densities
So no need to expend energy there.
Secondly, both Wikipedia and a forum top comment (Macro) agree that probability (density) and likelihood produce the same value, given some data X and some parameters $\theta$:
Wikipedia:
$\mathcal{L}(\theta|x) = P(X=x|\theta)$
$\mathcal{L}(\theta|x) = f(x|\theta)$
the likelihood is not the probability of the parameter value being correct or anything like that - it is the probability (density) of the data given the parameter value
Which contradict two other forum top comments, which say that likelihood is proportional but not equal to probability:
Although it seems like we have simply re-written the probability function, a key consequence of this is that the likelihood function does not obey the laws of probability (for example, it's not bound to the [0, 1] interval). However, the likelihood function is proportional to the probability of the observed data.
In your class they introduced the likelihood as being equal to the conditional probability $\mathcal{L}(\theta;x) = f(x;\theta)$, but this was just a simplification. The likelihood does not need to be equal and it is proportional.
I was tempted to just take hello_there_andy, ars, and Sextus Empiricus answer and move on, but I wrote this question so that we could have a clear comparing of these top answers.
Thirdly, Aurélien Géron's "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" states:
Given a statistical model with some parameters θ, the word “probability” is used to describe how plausible a future outcome x is (knowing the parameter values θ), while the word “likelihood” is used to describe how plausible a particular set of parameter values θ are, after the outcome x is known.
This seems to contradict the current Wikipedia definition, which includes this statement:
The likelihood function does not specify the probability that $\theta$ is the truth, given the observed sample X=x.
As well as part of Macro's answer:
the likelihood is not the probability of the parameter value being correct or anything like that
Maybe I am wrong in equating "plausibility" with "probability".
If I am wrong here, is it correct to say that likelihood:
Is representative of the plausibility of $\theta$, given X, while not being the probability of $\theta$, given X.
Is proportional (or equal...) to the probability of data x occurring, given parameters $\theta$.