I am wondering whether the distribution of different parameters (quantiles, min, max) of a dataset in different distribution (like normal and exponential distribution) follows the distribution of the dataset. I plot a histogram and did not see any consistent pattern between different datasets. Any help would be appreciated
- 123,354
- 11
-
1You are looking for order statistics. The Wikipedia entry gives you the result for an underlying exponential distribution, and the proposed duplicate treats an underlying normal distribution. – Stephan Kolassa Sep 21 '20 at 15:04
-
Does this answer your question? Approximate order statistics for normal random variables – Stephan Kolassa Sep 21 '20 at 15:04
1 Answers
Let $X_1, \dots, X_n$ be a random sample from a continuous distribution with density function $f(x)$ that is continuous and nonzero at the $p$th percentile $x_p$ $(0 < p < 1).$ If $k/n \rightarrow p$ (with $k-np$ bounded), then the sequence of order statistics $x_{k:n}$ is asymptotically normal with mean $x_p$ and variance $c^2/n,$ where $c^2 = p(1-p)/[f(x_p)]^2.$ [From Bain & Englehardt, 1992, 2e, Duxbury, p244.]
So for 'nice' distributions (with no discontinuities or 0-gaps) such as exponential or normal there is a "Central Limit Theorem" for quantiles (except the max and min).
In particular, the median of a moderately large sample from $\mathsf{Norm}(\mu,\sigma)$ is approximately normal. [With 100,000 iterations results should be accurate to about 2 significant digits, but $n=100$ is too small for perfect convergence of results being simulated.]
set.seed(912)
h = replicate(10^5, median(rnorm(100)))
mean(h); sd(h)
[1] 0.0006078384
[1] 0.1243622
And for an exponential population with mean 1 (median $log(2)=0.6931472).$
set.seed(912)
H = replicate(10^5, median(rexp(100)))
mean(H); sd(H)
[1] 0.6982845
[1] 0.09972337
Because of skewness, medians of exponentials converge to the (symmetrical) normal distribution somewhat more slowly.
par(mfrow=c(1,2))
hist(h, prob=T, col="skyblue2", main="Medians of Normal")
curve(dnorm(x, mean(h), sd(h)), add=T, col="red")
hist(H, prob=T, col="skyblue2", main="Medians of Exponential")
curve(dnorm(x, mean(H), sd(H)), add=T, col="red")
par(mfrow=c(1,1))
- 56,185
