For a fixed value $m$, draw $k$ samples from a normal distribution, and select one, say $X$, which is closest to $m$. Then what distribution will $X$ follow? It is kind of similar to an extreme value distribution but I can not figure it out.
1 Answers
Let's solve this for all distributions, normal or not.
To this end, let the distribution function be $F$ and let $\epsilon \ge 0$ be any possible distance to $m.$ The event "$X$ is within distance $\epsilon$ of $m$" is the interval $X\in[m-\epsilon, m+\epsilon].$ According to the definition of $F,$ this can be expressed as
$$\Pr(|X-m|\le \epsilon) = F(m+\epsilon) - F(m-\epsilon) + \Pr(X=m-\epsilon).$$ (For a Normal distribution, or any continuous distribution, that last term is zero and can be ignored.)
The chance this does not occur is its complement,
$$\Pr(|X-m|\gt \epsilon) = 1- \Pr(|X-m|\le \epsilon).$$
For a random sample of $n$ independent values, these probabilities multiply (that's the definition of independence). Consequently, the chance that all values in the sample are greater than $\epsilon$ from $m$ is
$$\Pr(|X_i-m|\gt \epsilon\ \forall i) = \left[1- \Pr(|X-m|\le \epsilon)\right]^n.$$
Its complement therefore is the chance that at least one of the $X_i$ is within distance $\epsilon$ of $m.$ This is precisely the distribution function of the nearest distance. Writing $E = \min|X_i-m|$ for that distance, we have found
$$F_E(\epsilon) = \Pr(E\le \epsilon) = 1 - \left[1- \Pr(|X-m|\le \epsilon)\right]^n.$$
This is a thorough and fully general answer. When $F$ is continuous at $m\pm\epsilon$ (with density function $f$) though, we can (a) neglect that last probability term and (b) differentiate the expression to obtain a density for $E,$
$$f_E(\epsilon) = \frac{\mathrm d}{\mathrm{d}\epsilon} F_E(\epsilon) = n\left[F(m+\epsilon) - F(m-\epsilon)\right]^{n-1} \left(f(m+\epsilon) + f(m-\epsilon)\right).$$
Here are some plots of $f_E$ for various sample sizes from the standard Normal distribution.
It all makes sense: as you look from left to right, the sample size increases and therefore the chance of being close to any given $m$ increases. As $m$ increases from $0$ (the mode) to $4$ (far out into the right tail), the chance of being close to $m$ remains small, but the typical nearest distance to $m$ shrinks.
In a similar fashion you can write the (more complicated) formula for the signed distance between the nearest $X$ and $m.$ Adding $m$ to this will produce a distribution of the nearest $X,$ if that's what you want.
This is the R code used to generate the figure. It implements $F_E$ as pnormclosest and $f_E$ as dnormclosest. They are readily modified to handle any distribution $F$ by replacing pnorm and dnorm by its distribution and density functions, respectively.
pnormclosest <- function(x, m, n=1, mu=0, sigma=1) {
1 - (pnorm(m-x, mu, sigma) + pnorm(m+x, mu, sigma, lower.tail=FALSE))^n
}
dnormclosest <- function(x, m, n=1, mu=0, sigma=1) {
n * (pnorm(m-x, mu, sigma) + pnorm(m+x, mu, sigma, lower.tail=FALSE))^(n-1) *
(dnorm(m-x, mu, sigma) + dnorm(m+x, mu, sigma))
}
ns <- c(1, 2, 20, 100)
ms <- c(0, 1, 2, 4)
par(mfrow = c(1, length(ns)))
for (n in ns) {
for (m in ms) curve(dnormclosest(x, m, n), 0, 3, ylim=c(0,2), add=m != 0,
lwd=2, lty=abs(m)+1, col=hsv(abs(m)/(max(abs(ms))+1), .9, .8),
xlab="Distance", ylab="Density",
main=paste0("Sample size ", n))
legend("topright", bty="n", title="m", legend=ms, lty=abs(ms)+1, lwd=2,
col=hsv(abs(ms)/(max(abs(ms))+1), .9, .8))
}
par(mfrow=c(1,1))
- 322,774
-
2Just curious , do you have $Pr(|X-m|>\epsilon)$ instead of $Pr(|X-m|<\epsilon)$? I mean $Pr(|X-m|<\epsilon)\implies Pr(-\epsilon<X-m<\epsilon) = F(m+\epsilon) - F(m-\epsilon)$ Which is what you have. On the other hand, $Pr(|X-m|>\epsilon)\implies P(X-m>\epsilon \cup X-m<-\epsilon) = 1-F(m+\epsilon)+F(m-\epsilon)$ – Onyambu Dec 29 '21 at 18:12
-
1That's really helpful! Appreciate it! – Rafa Zhang Dec 31 '21 at 08:44
-
@r.e.s. That's correct, thank you: the formula in the code is correct (as the plots help demonstrate), but was reproduced in my post with the wrong sign. I'll fix that. – whuber Dec 31 '21 at 16:58
