I see that the Radon-Nikodym derivative is the ratio of probability measures, $dP/dQ$. How is this different, in general, from a likelihood ratio of two continuous distributions? I understand the RN-definition broadly applies for discrete/continuous/mixture densities, but beyond that is there any difference?
2 Answers
If $dx$ is Lebesgue measure, then it dominates both measures because they correspond with continuous random variables, and one of the properties of RN derivatives is $$ \frac{dP}{dQ} = \frac{\frac{dP}{dx}}{\frac{dQ}{dx}}. $$ The numerator is the density of $P$, and the denominator is the density of $Q$. This is the second property on wikipedia.
So yes, the likelihood ratio is just a particular case. If these two measures were for discrete random variables, then you would replace $dx$ with the counting measure, and you would get a ratio of probability mass functions.
- 544
- 3
- 15
-
In order for $dP/dQ$ to make sense, we need $P \ll Q$. Why is this the case? All we seem to have is that $P\ll dx$ and $Q\ll dx$, where $dx$ is Lebesgue measure. – Satana Feb 22 '23 at 02:09
-
1Because that is the Radon-Nikodym theorem: density (i.e. $dP/dQ$) existence iff absolute-continuity (sometimes called domination in certain textbooks). This is the foundational assumption, and the Lebesgue bit is an added/secondary assumption. – Taylor Feb 22 '23 at 03:52
In Probability Theory, density functions are generally defined as Radon-Nikodym derivatives themselves, $\frac{dP}{dQ}$.
The likelihood function interprets these densities (R-N derivatives) as a function of the parameters, given some observed outcome. More explicitly, let $X$ be an absolutely continuous random variable. Then, $$\mathcal{L}(\theta|x\in X) = f(x|\theta) = \mathbb{P}(x\in X|\theta)$$ In other words, the likelihood function measures the probability of observing $x$ given the parameters $\theta$.
The likelihood ratio is meant to assess the goodness-of-fit of two statistical models (with different parameters) given the same set of observations $x$, not two entirely different distributions. More explicitly, let $\Theta$ be the set of all possible parameters, and consider some subsets $\Theta_0, \Theta_1 \in \Theta$. The likelihood ratio is then, $$\mathcal{L(\Theta_0,\Theta_1)} = -2\log\frac{\sup_{\Theta_0\in\Theta} \mathcal{L}(\theta)}{\sup_{\Theta_1\in\Theta}\mathcal{L}(\theta)}$$ which tests for the null hypothesis $\theta\in\Theta_0$.
- 239
- 1
- 2
-
"Radon-Nikodym derivatives themselves" yes but then you should switch the notation instead of implicitly calling $Q$ Lebesgue measure – Taylor Apr 13 '20 at 16:14
-
Also, densities are not probability mass functions. They cannot be interepreted that way – Taylor Apr 13 '20 at 16:14
-
finally, the last expression is not the likelihood ratio, but it is a function of the likelihood ratio. The reason it is transformed is that when it is written in thsi way, it's asymptotically $\chi^2$ distributed--but this is irrelevant at the moment. – Taylor Apr 13 '20 at 16:15