4

I'm trying to understand this answer on Earth Mover's Distance, especially the first sentence (below), without having deep statistical knowledge. I think the word "marginals" is the biggest stumbling block. I can generally only find definitions for it as an adjective, and Andrew Gelman notes that the statistical use of "marginal" is the opposite from other fields.

$\DeclareMathOperator\EMD{\mathrm{EMD}} \DeclareMathOperator\E{\mathbb{E}} \DeclareMathOperator\N{\mathcal{N}} \DeclareMathOperator\tr{\mathrm{tr}} \newcommand\R{\mathbb R}$The earth mover's distance can be written as $\EMD(P, Q) = \inf \E \lVert X - Y \rVert$, where the infimum is taken over all joint distributions of $X$ and $Y$ with marginals $X \sim P$, $Y \sim Q$.

Further, what does "marginal $X \sim P$" convey compared to just $X \sim P$?

A plain English version of the whole where clause would be great, too.

xan
  • 8,898

2 Answers2

7

$$ \begin{array}{|c|cc|l|} \hline & 0 & 1 & \\ \hline 0 & 0.1 & 0.2 & 0.3 \\ 1 & 0.3 & 0.4 & 0.7 \\ \hline & 0.4 & 0.6 & 1 \\ \hline \end{array} $$ The table above means \begin{align} \Pr(X=0\ \&\ Y=0) = 0.1 & & & \Pr(X=0\ \&\ Y=1) = 0.2 \\ \Pr(X=1\ \&\ Y=0) = 0.3 & & & \Pr(X=1\ \&\ Y=1) = 0.4 \end{align} The right margin shows the sums: $0.1+0.2=0.3$ and $0.3+0.4=0.7$.

The bottom margin shows the sums: $\begin{array}{r} 0.1 \\ {} \underline{+\ 0.3} \\ {} = 0.4 \end{array}\ \ \ $ and $\begin{array}{r} 0.2 \\ {} \underline{+\ 0.4} \\ {}=0.6 \end{array}$

Thus we have $\Pr(X=0) = 0.3$ and $\Pr(X=1=0.7)$. That is the marginal distribution of $X$.

And $\Pr(Y=0) = 0.4$ and $\Pr(Y=1) = 0.6$. That is the marginal distribution of $Y$.

  • Thanks for the simple explanation. If you can continue a little, how is a marginal distribution different from a distribution? That is, is "marginal" just a description of the source, or does it affect how the distribution might be used? – xan Mar 04 '17 at 14:20
  • 2
    @xan, that's a good question (you could edit it into the body of your question above). The way I would describe it is that it implies you are thinking about the situation differently: $X$, say, is a subset of a larger set of variables, but for your present purposes (whatever they may be), you are integrating out the others to focus on $X$. – gung - Reinstate Monica Mar 04 '17 at 16:54
  • 2
    @xan : A marginal distribution is of course a distribution. And "the marginal distribution of $X$" is the same thing as what would usually be meant by "the distrtibution of $X$." The only occasion for the use of the word "marginal" is in contexts where one want to explicitly distinguish it from the joint or conditional distributions. – Michael Hardy Mar 04 '17 at 17:47
  • @MichaelHardy +1 – Taylor Mar 04 '17 at 19:32
5

In this case "marginal" is short for marginal distribution. If you have a distribution for a few random variables, that's usually termined the "joint" (distribution). When you look at individual components of this collection of random variables, usually you sum or integrate out the undesired portion, and what's left is called the "marginal."

If an arbitrary joint distribution/measure is $$ L_{X,Y}(A,B) = \mathbb{P}(X \in A, Y \in B), $$ then the marginals of $X$ and $Y$, respectively, would be $$ P(A) = L(A,\Omega) \hspace{10mm} \text{and} \hspace{10mm} Q(B) = L(\Omega,B), $$ where $\Omega$ is the sample space that $X$ and $Y$ both share.

If $X$ and $Y$ are continuous, then $L(dx,dy) = \ell(x,y)\,dx\,dy$, $P(dx) = p(x)\,dx$ and $Q(dx) = q(x)\,dx$, and the marginals would be $$ p(x) = \int \ell(x,y)\,dy \hspace{10mm} \text{and} \hspace{10mm} q(x) = \int \ell(x,y)\,dy. $$

If $X$ and $Y$ are discrete, then $L(dx,dy) = \ell(x,y)$, $P(dx) = p(x)$ and $Q(dx) = q(x)$, and the marginals would be $$ p(x) = \sum_y \ell(x,y) \hspace{10mm} \text{and} \hspace{10mm} q(x) = \sum_x \ell(x,y). $$

So this metric is taking the infimum over all possible $L$s that have these fixed marginals that we're calling $P$ and $Q$.

Taylor
  • 20,630