How can this counterintiutive result with the Mahalanobis distance be explained?

Question

I encountered a strange issue when performing Mahalanobis distance matching. Let's say I have one treated unit with the following values on two variables: $T:(17, 4)$. I have two control units with values $A:(16, 3)$ and $B:(15, 3)$. Intuitively, it seems like $T$ should be closer to $A$ than to $B$, and that would be true regardless of any affine transformation of the variables. The covariance matrix for my variables (which is computed in the full sample that includes many more observations) is $$ \Sigma = \begin{bmatrix}25 & 9\\9 & 5\end{bmatrix} $$

When I compute the Mahalanobis distance $d(.,.)$, I find that $d(T,A) = 0.522$ and $d(T,B)=0.452$; that is, $B$ is closer to $T$ than $A$ is on the Mahalanobis distance. This doesn't make much sense to me; intuitively, what is going on here?

Some R code to play around with:

T <- c(17, 4)
A <- c(16, 3)
B <- c(15, 3)
S <- matrix(c(25, 9,
              9, 5), nrow = 2)
mahalanobis(T, A, S) |> sqrt()
[1] 0.522233
mahalanobis(T, B, S) |> sqrt()
[1] 0.452267

The correlation between the two dimensions is very high, creating a "ridge" that runs more or less diagonally. $(16,3)$ is off to one side of the ridge, so it is less probable (Gaussian) or farther away (Mahalanobis) than $(17,4)$, which is closer to the "ridgeline". — jbowman, Jun 28 '23 at 00:31
@jbowman The question is why $(15,3)$ is closer to $(17,4)$ on the Mahalanobis distance than $(16,3)$ is. Surely $(15,3)$ is even less probable and farther away from $(17,4)$, no? — Noah, Jun 28 '23 at 01:25
Why not draw a picture? More specifically, draw the points $T,A,$ and $B$ on a pair of axes. Then, with $T$ at the center, start drawing concentric ovals that represent the level sets of the Mahalanobis distance with respect to $T$. You should then find that $B$ lies on an oval that is inside the oval that $A$ lies on. — mhdadk, Jun 28 '23 at 01:54
No. It's farther away if $S$ is diagonal, but $S$ is far from diagonal. Look at it this way: if $x_1 = 16$, we expect $x_2 = 3.6$, but it's not - which increases the distance. If $x_1 = 17$, we expect $x_2 = 4.2$, which is pretty close. — jbowman, Jun 28 '23 at 02:01
@mhdadk That sounds like a great demonstration, maybe you should make that an answer! I don't know what "level sets" are so hopefully your answer will explain that. I still don't understand how 15 could ever be closer to 17 than 16 is to 17 in any coordinate system, and that's why I have asked the question. — Noah, Jun 28 '23 at 02:04
It's not that 15 is closer to 17, it's that the pair (17,4) is closer than the pair (16,3). You can't just look at one dimension, you have to look at both. — jbowman, Jun 28 '23 at 02:06
@jbowman It sounds like you think I'm asking why $B$ is closer to $A$ than $T$ is to $A$, but I'm not. I'm asking why $B$ is closer to $T$ than $A$ is to $T$, despite all the values of $B$ being further from or equal to the difference between the corresponding values of $A$ and $T$. It would be really helpful if you posted an answer explaining your thoughts. — Noah, Jun 28 '23 at 02:07
I know what you're asking, and the accepted answer illustrates in a picture that is worth 1000+ words, what I was trying to say :) — jbowman, Jun 28 '23 at 02:32
You can easily and definitively answer this question yourself, Noah, simply by carrying out the graphical operations sketched at https://stats.stackexchange.com/a/62147/919. — whuber, Jun 28 '23 at 14:21
I'm just remaking jbowman's point a little differently, but notice that for a jointly Gaussian $(X,Y)$ with covariance $\Sigma$ and mean $(17,4)$, the conditional mean of $X$ given $Y = 3$ is $17 + (9/5)(3-4) = 15 + 1/5.$ Of course, $15$ is much closer to this conditional mean than $16$, so given $Y = 3,$ $X = 15$ is much more likely than $X = 16.$ The ordering of Mahalanobis distances reflects this fact. — stochasticboy321, Jun 30 '23 at 08:18

score 30 · Accepted Answer · answered Jun 28 '23 at 02:06

"Why not draw a picture?" asks @mhdadk. Why not indeed?

Here are contours of the Mahalanobis distance/Gaussian likelihood centred at T (17, 4) (open circle), and two points A: (16,3) and B(15,3). You can see the point at (15,3) is closer than that at (16,3) in this metric.

library(ellipse)
plot(ellipse(S,centre=T,level=0.6),type="n")
points(c(16,15,17),c(3,3,4),pch=c(19,19,1))
polygon(ellipse(S,centre=T,level=0.6),col="#AA00AA20",border=NA)
polygon(ellipse(S,centre=T,level=0.5),col="#AA00AA20",border=NA)
polygon(ellipse(S,centre=T,level=0.4),col="#AA00AA20",border=NA)
polygon(ellipse(S,centre=T,level=0.3),col="#AA00AA20",border=NA)
polygon(ellipse(S,centre=T,level=0.2),col="#AA00AA20",border=NA)
polygon(ellipse(S,centre=T,level=0.1),col="#AA00AA20",border=NA)

This is extremely helpful and just what I needed to understand the comments on the original post. Thanks to you and @mhdadk. — Noah, Jun 28 '23 at 02:10

score 7 · Answer 2 · answered Jun 28 '23 at 02:12

Very qualitative but visual explanation: since your covariance matrix is not diagonal, the distribution of the samples is going to be tilted with respect to the main axes. Both $(17,4)$ and $(15,3)$ are close to its main axis, and are thus close to each other under the Mahalanobis distance. On the opposite, $(16,3)$ is closer to the "outside" of the samples cloud.

score 1 · Answer 3 · answered Jun 28 '23 at 23:57

The other answers provided excellent graphical explanations. In this answer, I'll provide a numerical one.

Given a positive-definite covariance matrix $\Sigma$, the Mahalanobis distance between two points $x$ and $y$ in $\mathbb R^n$ is $$\sqrt{(x-y)^T \Sigma^{-1}(x-y)}$$ Because every covariance matrix is positive semi-definite, then its square root always exists and is symmetric. Let's call this square root $\Sigma^{1/2}$, such that $\Sigma = \Sigma^{1/2}\Sigma^{1/2}$. Let's also assume that $\Sigma$ is positive-definite, such that $\Sigma^{-1}$ exists and is equal to $\Sigma^{-1/2}\Sigma^{-1/2}$, where $\Sigma^{-1/2}$ is the inverse of $\Sigma^{1/2}$. Then, the Mahalanobis distance becomes \begin{align} \sqrt{(x-y)^T \Sigma^{-1}(x-y)} &= \sqrt{(x-y)^T\Sigma^{-1/2}\Sigma^{-1/2}(x-y)} \\ &= \sqrt{\left(\Sigma^{-1/2}(x-y)\right)^T\left(\Sigma^{-1/2}(x-y)\right)} \\ &= \sqrt{z^T z} \\ &= \|z\|_2 \end{align} where $z = \Sigma^{-1/2}(x-y)$. We can therefore see that the Mahalanobis distance is nothing but the Euclidean norm of the "whitened" version of $x-y$, such that if $x-y$ has a covariance of $\Sigma$, then $z = \Sigma^{-1/2}(x-y)$ has a covariance of $I$.

Going back to our case, let $$\Sigma = \begin{bmatrix}25 & 9\\9 & 5\end{bmatrix}$$ Then (computed using MATLAB's sqrtm function), $$\Sigma^{1/2} \approx \begin{bmatrix}4.8091 & 1.3683\\1.3683 & 1.7686\end{bmatrix}$$ and $$\Sigma^{-1/2} \approx \begin{bmatrix}0.2666 & -0.2063\\-0.2063 & 0.7250\end{bmatrix}$$ We can then compute $z_A = \Sigma^{-1/2}(T - A)$ and $z_B = \Sigma^{-1/2}(T - B)$ as \begin{align} T - A &= \begin{bmatrix}17 \\ 4\end{bmatrix} - \begin{bmatrix}16 \\ 3\end{bmatrix} = \begin{bmatrix}1 \\ 1\end{bmatrix} \\ z_A &= \Sigma^{-1/2}(T - A) \approx \begin{bmatrix}0.0604 \\ 0.5187\end{bmatrix} \\ T - B &= \begin{bmatrix}17 \\ 4\end{bmatrix} - \begin{bmatrix}15 \\ 3\end{bmatrix} = \begin{bmatrix}2 \\ 1\end{bmatrix} \\ z_B &= \Sigma^{-1/2}(T - B) \approx \begin{bmatrix}0.3270 \\ 0.3125\end{bmatrix} \end{align} By computing $\|z_A\|$ and $\|z_B\|$, we can see that $\|z_A\| > \|z_B\|$.

How can this counterintiutive result with the Mahalanobis distance be explained?

[1] 0.522233

[1] 0.452267

3 Answers3