0

Suppose that I have a probability vector $p$ e.g. of size 10, and that I draw a multinomial sample of size $n$ from $p$. Does there exist a closed form formula to compute the expected total variation distance of the sample density vector from $p$, in terms of $p$ and $n$ ?

Note: I know that simulation-based estimates are possible, but here I am asking for a closed form formula for the expected TVD, or alternatively for other distances/divergences.

Meta Note: I am a newbie to this site. I thought that as an author of this question I could upvote answers without the need for privileges, but I don't seem to be able to...

Caldym
  • 1
  • Short answer: no. As far as alternative distances go, the chi-squared statistic is useful, its expectation can be computed (it is one less than the length of $p$), and under well-known conditions is closely approximated by a chi-squared distribution. Is that the kind of result you are seeking? – whuber Aug 31 '23 at 19:38
  • If anticipated sample sizes are of about 13 or less for a probability vector of size 10, then executing code to obtain the exact mean is doable. (For n=13 on an old desktop PC it takes about a minute to obtain the exact mean.) But for sample sizes larger than that, simulations are the way to go. – JimB Sep 01 '23 at 19:20
  • 1
    Thanks people for your answers! I was actually seeking a result that would help me get some idea of how many samples from a known multinomial distribution would be sufficient to approximate that distribution according to Total Variation Distance (TVD), without having to conduct a simulation. The simple/beautiful result mentioned by @whuber ("one less than the length of $p$) about the expectation for the chi-square statistic (not distribution) is very nice. I wonder if people would be interested to see a proof (which is pretty simple, in fact) in some other post. – Caldym Sep 12 '23 at 20:27
  • A proof might be nice. There is a one-liner because it's immediate (from the facts that the variance of a Binomial$(k,p)$ variable is $kp(1-p)$ and its mean is $kp$) that the expectation of each value's contribution to the chi-squared statistic is $kp_i(1-p_i)/(kp_i)=1-p$ and, because $\mathbf p$ is a probability vector, the sum of these expectations is $k-1.$ – whuber Sep 12 '23 at 20:38

0 Answers0