0

Let's assume I sample many means from any distribution (that has the 1 raw moment). It should resemble the normal distribution.

I heard that the same works for medians, variance, range and any other unbiased statistic.

So, I sample some sample, calculate the statistic, and repeat that and the values of the statistic create approximately normal distribution.

But what about the samples itself? Not their measures, the raw data in these samples? If I will sum them together (not sum of their elements in each sample, I don't ask for sampling sums), will this resemble the normality too, at some big N?

Because I'm confused. Wikipedia says in the first sentence: "when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed."

And - a few stances below they immediately go to sample means (in their case - to the standardized statistic to obtain N(0,1). Not sums of distributions.

But what if I don't want sample means, but the data itself? Or is it somehow related? What is the Central Limit Theorem about then? The sampled statistics or the sum of distributions? Or both named the same way?

Why am I asking? Because I read, that we observe the normal distribution so often in nature because many variables act together and summing them makes the normality.

  • 1
    Re "I heard:" be careful. The range, in particular, is usually not going to be approximated by a Normal distribution. Re "sum them together:" since the sum is directly proportional to the mean, sampling distributions of sums look just like sampling distributions of means. Re "data itself:" although it's unclear what you're trying to ask, note that the distribution of a large sample should closely resemble the distribution of its parent process or population, which could be anything. Re "in nature:" it is truly rare to observe a Normal distribution in any natural quantity. – whuber Jun 22 '22 at 19:11
  • 1
    In light of these remarks, it sounds like you would benefit from searching out our higher-voted posts on the Central Limit Theorem and its relatives. You can also look at the highest-voted posts with the Central-limit-theorem tag. – whuber Jun 22 '22 at 19:12
  • Thank you. By "in nature" I mean lots of symmetric and bell-shaped distributions for various variables, not strictly normal, but resembling it. Not purely skewed, like incomes. I should say "approximately normal", of bell-shape. And I was told by a teacher that's because many samples generated by natural phenomena are "summed" in nature together and many factors affect them additively, which forms symmetric distributions. I was curious if this is about the CLT? The Wikipedia literally starts from it: "heir properly normalized sum tends toward a normal distribution" - is this incorrect then? – Kiraaaaa Jun 22 '22 at 19:16
  • Maybe it should not be said "sums" but ... I don't know - "combinations" of those samples? How should I name it, if some phenomenon produces data many many times and when we measure them and visualise we observe close normality? Maybe I should say "mixtures"? My teacher said exactly "they act additively" but I don't know what it means precisely. – Kiraaaaa Jun 22 '22 at 19:21
  • 1
    It depends on how accurate you need your approximations to be. Perceptive data analysts (such as John Tukey) have long noted that most real-world data are predictably non-Normal in several ways. For instance, they almost invariably contain "outlying" data to which the Normal approximation assigns astronomically small probabilities. Because the CLT concerns a limit, whether that limit is a decent approximation in any given case is always a matter for investigation: it cannot be assumed. As I wrote before, your "combinations" needn't ever get close to Normal. – whuber Jun 22 '22 at 19:21
  • 1
  • And if I replaces "normality" with "approximate symmetry" and "bell-shaped"? So many things when analysed shows this pattern. Similarly, we have the skewness approximated by log-normality, where, as others say, the "natural factors act multiplicatively". Is this about multiplying the distributions? That's why I try to find an explanation, why "additiveness" results in "approximate normality" and "multiplicativeness" - in log-normality in nature. I thought that the CLT says something about summing many different distributions, which finally may resemble normality. Thank you for your explanation – Kiraaaaa Jun 22 '22 at 19:24
  • Under this link I read that whuber says: "For the same reasons Gaussian distributions may appear "in nature" as sums of many small near-independent perturbations, none of which need be Gaussian" - I had this in mind, asking if this is done by the CLT? https://stats.stackexchange.com/questions/204578/whats-the-story-behind-the-log-normal-distribution/295171#295171 – Kiraaaaa Jun 22 '22 at 20:21
  • I found the answer - the convolution (sum) of N independent distributions will converge in distribtion to the normal one. https://rpubs.com/Shevek/convolution – Kiraaaaa Jun 23 '22 at 01:27
  • That statement contains two notable errors (some of which occur in your source and others of which misinterpret the source). First, it's not always true: there are fairly stringent requirements on the variances of those distributions. Second, that convolution usually does not converge: it has to be standardized first. These are important points that you can learn by reading some of the posts we have referred you to. – whuber Jun 23 '22 at 12:33

0 Answers0