2

The question was inspired by these comments to my other question. I have proved this proposition some time ago and could not find any issues in it.

It consists of two parts:

1. Where does my proof of $\overline{\xi_n}=\frac{\xi_1+...+\xi_n}{n}\xrightarrow{d} \mathcal{N}(a, \frac{\sigma^2}{n})$ break down?

2. Why (if this proposition is incorrect) this fact is being widely used in lots of lecture notes which I found before and how one can correctly interpret corollaries from CLT? I guess I will ask this question as another post after I will understand the first part.

First part.

I will assume that we know from CLT that If $\xi_1,...,\xi_n$ are identically distributed independent random variables with $\mathbb{E}\xi_1=a, \mathbb{V}ar\xi_1=\sigma^2$ then

$\frac{\xi_1+...+\xi_n - na}{\sqrt{n\sigma^2}} \xrightarrow{d} \eta,$ where $\eta \sim \mathcal{N}(0,1)$, so by definition of convergence in distribution,

$\forall x \in \mathbb{R}$, $\mathbb{P}[\frac{\xi_1+...+\xi_n - na}{\sqrt{n\sigma^2}}\leq x] \xrightarrow{n\to+\infty} \mathbb{P}[\eta \leq x].$

So we have $\mathbb{P}[\overline{\xi_n}\leq x]=\mathbb{P}[\frac{\sigma}{\sqrt{n}}\frac{\xi_1+...+\xi_n-na}{\sqrt{n\sigma^2}}+a \leq x] = \mathbb{P}[\frac{\xi_1+...+\xi_n - na}{\sqrt{n\sigma^2}}\leq\frac{x-a}{\sigma}\sqrt{n}]$.

But then, $\mathbb{P}[\frac{\xi_1+...+\xi_n - na}{\sqrt{n\sigma^2}}\leq\frac{x-a}{\sigma}\sqrt{n}] \xrightarrow{n\to+\infty}\mathbb{P}[\eta \le \frac{x-a}{\sigma}\sqrt{n}] = \mathbb{P}[\frac{\sigma\eta}{\sqrt{n}}+a\leq x]$

But we know if $\eta \sim \mathcal{N}(0,1),$ then $\frac{\sigma\eta}{\sqrt{n}}+a$ has distribution $\zeta \sim$ $\mathcal{N}(a, \frac{\sigma^2}{n})$.

This means that $\mathbb{P}[\overline{\xi_n}\leq x] \xrightarrow{n\to+\infty}$ $\mathbb{P[\zeta\leq x]}$, (I denote this proposition as *) and by definition of convergence in distribution,

$\overline{\xi_n}=\frac{\xi_1+...+\xi_n}{n}\xrightarrow{d} \mathcal{N}(a, \frac{\sigma^2}{n})$ (I denote this proposition by **).

  • 4
    Your statement (1) is mathematical nonsense because the "$n$" on the right hand side is undefined. This problem emerges only in the last line of your work. – whuber Mar 26 '23 at 13:20
  • @whuber Thank you for the answer. Can you describe it more detailed? I understand that there is a problem which has a connection with real analysis as I assume. Maybe some examples can help me understand my errors. Or links to books/discussions where it was already discussed. – perepelart Mar 26 '23 at 13:27
  • 3
    After taking limit as $n\to \infty$, there cannot be an $n$ in the limiting result. So it is not correct to write $\frac{\xi_1+...+\xi_n}{n}\xrightarrow{d} \mathcal{N}(a, \frac{\sigma^2}{n})$. – StubbornAtom Mar 26 '23 at 14:27
  • @StubbornAtom, thanks for you answer. So by definition $\xi_n \xrightarrow{d} \mathcal{L_{\eta}} \ \text{iff} \ P(\xi_n \leq x) \xrightarrow{n\to+\infty}P(\eta \leq x).$ On the right hand side of this definition we have just $F_{\eta}(x)$ and it can not depend on $n$ because it’s a fixed function. Am I right? – perepelart Mar 26 '23 at 15:26
  • You are mixing up what the CLT says with the weak law of large numbers (WLLN). The latter provides the correct result: the sample mean converges to $\mathcal N(a,0)$ which statistically-illiterate people refer to as "the constant $a$" or "the mean $a$". – Dilip Sarwate Mar 26 '23 at 16:04
  • @DilipSarwate, WLLN says that $\overline{\xi} \xrightarrow{P} \mathbb{E}[\xi_1]$. How can it help you when $\xi_i$ are not normal random variables? – perepelart Mar 26 '23 at 16:36
  • 1
    Hint: As you yourself write in your question, $\mathbb E\xi_1 = a$, and so, $\mathbb E[\xi_i]$ also has value $a$, unless you are thinking that the square brackets mean what is nowadays called the floor function. Yes, the WLLN says that the sample mean converges to the constant $a$ (the population mean) but you are hung up on normality and what is the normal random variable to which the sample mean is converging, and so I gave the answer that I thought would most please you: it is the normal random variable with mean $a$ and variance $0$, called the constant $a$ by people like myself. – Dilip Sarwate Mar 26 '23 at 19:20
  • @DilipSarwate, thank you for your answer, but normal random variable cannot have zero variance by its definition. So constant $a$ hasn't got normal distribution. I opened Joseph K. Blitzstein, Jessica Hwang - Introduction to Probability Second Edition, page 235 and this book agrees with my definition of normal random variable. – perepelart Mar 26 '23 at 21:32
  • There are good reasons, at times not always, and certainly not in books titled Introduction to Probability, to choose to regard a constants $a$ as a (honorary) normal random variable with variance $0$. Read the extensive set of comments following this answer to understand why such choices sometimes make sense. – Dilip Sarwate Mar 27 '23 at 03:47
  • @DilipSarwate, thank you for your answer. So in this case, do we need to say that normal random variable is defined using characteristic function, not density? I know that in case of multivariate normal random variable it's common practice to define it using characteristic function but have never seen such practice in defining univariate normal random variable. – perepelart Mar 27 '23 at 06:11
  • 3
    To try to make the point others are making: forget random variables, just let $x_n = n + 1 / n$ be deterministic. It would not make sense to say that $\lim_{n \to \infty} x_n = n$ because $n$ went to infinity so it can't appear on the right-hand-side. Conversely, it would make sense to say that $(x_n - n) \to 0$ as $n \to \infty$. This is what the CLT is doing: we can't say $\bar \xi_n \to N(a, \sigma^2 / n)$ because the right-hand-side doesn't make sense, so we instead look at $\sqrt n(\bar \xi_n - a)$, which is analogous to looking at $(x_n - n)$ in the example I gave above. – guy Mar 29 '23 at 03:46
  • @guy, Thanks! I hope now I got that everybody tried to explain me. – perepelart Mar 29 '23 at 10:37

2 Answers2

3

Background: This answer is prompted by the discussion between the OP and myself in the comments on the OP's question.

The title of the OP's question asks

Does sample mean always has normal distribution?

to which the answer is a resounding NO. The sample mean of $n$ i.i.d random variables $X_1, X_2, \ldots, X_n$ with finite mean $\mu$ and finite variance $\sigma^2$ almost never has a normal distribution, with the sole exception that requires the use of "almost" is the case when the $X_k$'s themselves are i.i.d normal random variables!!

Turning to the OP's multiple misconceptions about what the Central Limit Theorem (CLT) actually says, the CLT is not a result about the sample mean $S_n =\displaystyle \frac 1n \sum_{k=1}^n X_k$ at all, but about a related quantity $\displaystyle\frac{1}{\sqrt{n}} \sum_{k=1}^n (X_k-\mu) = \sqrt{n} (S_n-\mu)$. Note that $S_n$ is a random variable with mean $\mu$ and variance $\displaystyle \frac{\sigma^2}{n}$ and thus $\sqrt{n} (S_n-\mu)$ is a zero-mean random variable with variance $\sigma^2$. The simplest form of the CLT claims that for large values of $n$, $P(\sqrt{n} (S_n-\mu)\leq x)$ has value approximately $\Phi\left(\dfrac{x}{\sigma}\right)$ (where $\Phi(\cdot)$ is the CDF of a standard normal random variable), and that the approximation gets better as $n$ increases. In fact, the limit as $n \to \infty$ of $P(\sqrt{n} (S_n-\mu)\leq x)$ is $\Phi\left(\dfrac{x}{\sigma}\right)$.

But the OP doesn't care about the random variable $\sqrt{n} (S_n-\mu)$ at all; the OP wants to study the asymptotic behavior of $P(S_n \leq x)$, and so let's do just that. We have \begin{align} P(S_n \leq x)&= P(S_n-\mu \leq x-\mu)\\ &= P((\sqrt{n}(S_n-\mu) \leq \sqrt{n}(x-\mu))\\ &\approx \Phi\left(\dfrac{\sqrt{n}(x-\mu)}{\sigma}\right)\\ &= \Phi\left(\dfrac{x-\mu}{\frac{\sigma}{\sqrt{n}}}\right). \end{align}

So, it is appropriate to aver that for large $n$, the CDF of the sample mean $S_n$ is well-approximated by the CDF of a $\mathcal N\left(\mu, \dfrac{\sigma^2}{n}\right)$ random variable. What is not correct to conclude is what the OP has been insisting on all along in the face of numerous attempts by many people trying to correct his misconception: that in the limit as $n \to \infty$, the CDF of $S_n$ is converging to the CDF of a $\mathcal N\left(\mu, \dfrac{\sigma^2}{n}\right)$ random variable. Notice that

  • If $x<\mu$, then $$\lim_{n\to\infty}P(S_n \leq x) = \lim_{n\to\infty} \Phi\left(\dfrac{\sqrt{n}(x-\mu)}{\sigma}\right)$$ where the argument of $\Phi(\cdot)$ is diverging to $-\infty$ as $n \to -\infty$ and so for all $x <\mu$, $\lim_{n\to\infty}P(S_n \leq x)=0.$

  • If $x>\mu$, then $$\lim_{n\to\infty}P(S_n \leq x) = \lim_{n\to\infty} \Phi\left(\dfrac{\sqrt{n}(x-\mu)}{\sigma}\right)$$ where the argument of $\Phi(\cdot)$ is diverging to $\infty$ as $n \to \infty$ and so for all $x > \mu$, $\lim_{n\to\infty}P(S_n \leq x)=1$.

We conclude that in the limit as $n\to \infty$, the CDF of $S_n$ has value $0$ for $x < \mu$, and value $1$ for $x>\mu$.

The limit distribution of $S_n$ is the constant $\mu$ and not a normal distribution at all, except insofar as the constant $\mu$ can be regarded as a degenerate normal random variable with mean $\mu$ and variance $0$.

Dilip Sarwate
  • 46,658
  • Thanks for your answer. I knew that if we have normal random variables $\xi_1,...,\xi_n$ we can show that $\frac{\xi_1+...+\xi_n}{n}$ has normal distribution using characteristic functions and mathematical induction. In my proof I was trying to say something about $\xi_i$ with any distribution function. For now three people pointed on error in my proof but I still hardly to see where is exactly it. Will I be right if I say that my error in the transition from * to ** and I misused definition of convergence in distribution? – perepelart Mar 27 '23 at 06:51
  • If I will describe an error in transition from * to ** as "So by definition $\xi_n \xrightarrow{d} \mathcal{L}\eta$ iff $P(\xi_n \leq x) \xrightarrow{n \to +\infty} P(\eta \leq x)$. On the right hand side of this definition we have just $F\eta(x)$ and it can not depend on $n$, because it’s a fixed function", Will I be right? I need to understand my errors to not let them happen in future. – perepelart Mar 27 '23 at 07:20
  • Dilip Sarwate, many thanks for your such a detailed answer! I didn't insisit on my wrongness I just wanted to fully understand where exactly I made mistake in my reflections. – perepelart Mar 29 '23 at 10:21
2

For a fixed $x$, statements of the form $F_{X_n}(x) \overset{n \to \infty}\longrightarrow F_X(g_n(x))$ or $F_{X_n}(x) \overset{n \to \infty}\longrightarrow F_{h_n(X)}(x)$, where $g_n$ and $h_n$ depend on $n$, do not adhere to the concept of pointwise convergence (excluding points of discontinuity) of a sequence of CDFs and thus don't describe convergence in distribution.


If $a$ and $\sigma^2$ are finite, the Lindeberg-Lévy CLT gives $$ \sqrt{n}\left(\bar{\xi}_n - a\right) \overset{d}{\underset{n \to \infty}\longrightarrow} \mathop{\mathcal N}\left(0, \sigma^2\right), $$ where $\bar{\xi}_n = \frac{1}{n}\sum_{i=1}^n\xi_i$.

To show that this implies that $\bar{\xi}_n$ converges in distribution (or, equivalently, in probability) to $a$ as $n$ tends to infinity we can rewrite $\bar{\xi}_n$ as follows $$ \bar{\xi}_n = \underbrace{\frac{1}{\sqrt{n}}}_{{\underset{n \to \infty}\longrightarrow}0} \underbrace{\sqrt{n}\left(\bar{\xi}_n - a\right)}_{\overset{d}{\underset{n \to \infty}\longrightarrow} \mathop{\mathcal N}\left(0, \sigma^2\right)} + a. $$ Now, applying Slutsky's theorem, we have $$ \bar{\xi}_n \overset{d}{\underset{n \to \infty}\longrightarrow} 0 + a = a \iff \bar{\xi}_n \overset{p}{\underset{n \to \infty}\longrightarrow} a. $$

What is often used as an approximation is to act as if

$$ \sqrt{n}\left(\bar{\xi}_n - a\right) \sim \mathop{\mathcal N}\left(0, \sigma^2\right) $$

would hold for a finite sample size $n$. Under that assumption $$ \bar{\xi}_n \sim \mathop{\mathcal N}\left(a, \frac{\sigma^2}{n}\right) $$ is true and can be understood as an approximation to the finite-sample distribution of $\bar{\xi}_n$.

statmerkur
  • 5,950
  • @statmaker, I will be so grateful, if you will check my explanation of error in my proof and say did I get my error right or not? And if you have time for it (I anyway wanted to ask another question about it), can you clarify what does mean random variable is approximately normal? I guess here one can use $\mathcal{O}-$notation, to state formally what is "approximately", but cannot get how exactly it can be used. – perepelart Mar 27 '23 at 12:25
  • 1
    @perepelart hope my edit answers your first question. It's better to put your second question in a separate post. – statmerkur Mar 28 '23 at 11:13
  • @statemerkur, Thanks, I will open another question then! – perepelart Mar 28 '23 at 11:25