What is the point of measuring statistical distance?

Question

On pg. 378 of "Cryptography with Tamperable and Leaky Memory", Kalai et al. claim two probability distributions are $e(k)$ close if the distance between them is at most $e(k)$.

What is significance of two distributions X and Y being "close to" or "far from" each other? Why would anybody care, especially in cryptography?

I think i may know the answer to this question. But I'm not 100%. If X and Y have a statistical distance that is e(k)-close then depending on the value of "k", X and Y could be "statistically indistinguishable", which is a good thing in cryptography because you don't want an adversary to be able to easily guess which distribution you sampled a variable from. — user1068636, Oct 14 '13 at 20:49
Have you consulted the Wikipedia article on "statistical distance", which points out there are many different distances? Which definition does your reference use? — whuber, Oct 14 '13 at 20:59
@whuber - I think Kolmogorov–Smirnov statistical distance. But I'm not 100% sure. — user1068636, Oct 14 '13 at 21:32
Why would anyone in cryptography be interested? To take a punt: suppose you were interested in finding a particular string, from a big space of strings. Exploring the whole space would take a long time. Imagine you knew that the string was likely to be in a particular area of the space - that's cut your search time down a lot. That's what having a probability distribution that was close to the true probability distribution would get you. — conjectures, Feb 16 '14 at 17:15

score 1 · Answer 1 · 2014-08-07T01:33:25.003

The standard game that's played in cryptographic proofs is to say:

Pretend you can break (within reasonable time bounds) a cryptosystem using some procedure. Breakage here just means subverting some desirable property of the cryptosystem that you're trying to demonstrate holds.
Then show that using this procedure you can solve some really hard math problem that's assumed to be unsolvable by creating an instance of the cryptographic protocol that encodes the really hard math problem and whose breakage will give you the answer to the really hard math problem.
Show that the procedure won't be able to tell the difference between a real version of the cryptographic protocol and the one you've created to get it to solve the really hard math problem.
Conclude that since nobody knows how to solve the math problem it's unlikely that someone can break the cryptosystem.

I imagine that the distance between distributions relates to point 3. If the procedure can distinguish between the simulated problem and a real problem then your encoding of the math problem hasn't really accomplished much, since the procedure may only work on real instances.

score 0 · Answer 2 · answered Oct 15 '13 at 18:23

Remember that continuous probability distributions can be represented analytically, as a curve in the plane. Suppose we have two curves represented by f(x) and g(x). One way to define the distance between them is the greatest value of the absolute value of f(x)-g(x), the distance between the two ordinates at the abscissa x. If this value is small, then the functions are close. Otherwise, nothing is guaranteed. This forms the "statistical distance."

Often, we want to see if our guess of the true probability distribution is correct. One way of doing this is to estimate, using computational methods, the estimated density of the sample. Then, using the test above, we can see if our guess is correct.

This is just reason I can think of.

score 0 · Answer 3 · answered Dec 14 '13 at 22:28

If you look at their definition given in [7], it says,

$\mathrm{dist}(X,Y) = \max_{A \subseteq S} \left| \mathrm{Pr}(X \in A) - \mathrm{Pr}(Y \in A)\right|$

where $X$ and $Y$ are random variables over $S$.

This is a discrete case of the total variation distance.

What is the point of measuring statistical distance?

3 Answers3