-1

From my readings (wikipedia, books and YT videos), I derive that there are means of observations (the frequency weighed values of the random variables), and also means of different samples "means of means".

I have read all the basic math and solved some problems. However the formula for the Standard Error defined as

The standard deviation for the normal distribution of the samples' means

escapes my understanding.

The formula is

$$ s_m = s_o / N ^ {1/2} $$

where $s_m$ is the mean of samples' means, and $s_o$ is the mean of observations, where does this come from exactly ?

Mah Neh
  • 29
  • 4
  • 1
    Where did you get that "$s_m$ is the mean of samples' means, and $s_o$ is the mean of observations". That's surely wrong, the standard error formula is a relation between standard deviations/variances. – Firebug Mar 19 '23 at 21:08
  • From the book statistics without tears @Firebug but not working for me :( I am extremely confused now, he cetainly uses that term – Mah Neh Mar 23 '23 at 11:35

2 Answers2

2
  1. The terms 'mean of observations' and 'mean of different samples' are a bit confusing. Generally, both observations and samples also refer to the realisation of a random variable.
  2. We rarely use $s_{.}$ to denote the mean of something. For the formula you mentioned, $s_m$ is the standard deviation of the sample mean and $s_o$ is the standard deviation of the Gaussian random variable (not necessary to be Gaussian).
  3. I think you are confused by the term 'standard error'. Standard error of the sample mean is just the standard deviation of the sample mean. When we measure the deviation of an estimator, we usually replace 'standard deviation' with 'standard error'.

Back to your question, where does the formula come from exactly? Suppose $X_1,...,X_N$ follow a probability distribution $F(x)$ with common variance $\sigma^2$. The standard error of $\overline{X}$ is \begin{eqnarray*} \newcommand{\ind}{\perp\!\!\!\!\perp} \mathrm{SE}(\overline{X}) &=& \sqrt{\mathbb{V}(\overline{X})}\\ &=& \sqrt{\mathbb{V}(\frac{\sum_{i=1}^{N}X_i}{N})}\\ &=& \sqrt{\frac{1}{N^2}\mathbb{V}(\sum_{i=1}^{N}X_i)}\\ &\overset{X_i\,\ind\,X_j}{=}& \sqrt{\frac{1}{N^2}\sum_{i=1}^{N}\mathbb{V}(X_i)}\\ &=&\sqrt{\frac{1}{N^2}N\sigma^2}\\ &=& \frac{\sigma}{\sqrt{N}} \end{eqnarray*}

ccy
  • 76
  • 4
  • Why do you assume all the means ($X_i$) have the same variance? – Mah Neh Mar 19 '23 at 20:34
  • 1
    $X_i$ is not the mean of anything, but just a random variable. The reason they have the same variance is that $X_1,\dots,X_N$ are samples from the same distribution. In fact, you are right that they don't necessarily need to have the same variance. Suppose $X_i\sim\mathrm{N}(\theta,\sigma_i^2)$ and $X_i$ is independent to $X_j$, $\mathbb{V}(\overline{X})=\frac{\sum_{i=1}^{N}\sigma_i^2}{N^2}$ – ccy Mar 20 '23 at 01:08
  • 1
    It appears that you may not be familiar with the notations commonly used in statistics. Typically, $X_1$ denotes a random variable drawn from a probability distribution, while $\overline{X}$ represents the sample mean, calculated as $\frac{1}{N}\sum_{i=1}^{N}X_i$. If you are referring to the "mean of means," I believe you may be referring to $\mathbb{E}(\overline{X})$. – ccy Mar 20 '23 at 01:18
  • But what does it mean the variance of $X_i$ in the sense of, what are you comparing it to ? – Mah Neh Mar 23 '23 at 11:34
  • $X_i$ is a random variable from some probability distribution, so $\mathbb{V}(X_i)$ is the variance of the underlying distribution of $X_i$. It can be expressed as $\mathbb{V}(X_i)=\mathbb{E}[(X_i-\mathbb{E}(X_i))^2]$, which is the average of the squared difference between $X_i$ and its mean. Very simply put, it measures how much the deviation from the mean is. – ccy Mar 24 '23 at 18:13
1

Using the properties (1) $Var(aX) = a^2 Var(X)$, and (2) $Var(X+Y)=Var(X)+Var(Y)$, assuming independence between $X$ and $Y$, it is easy to demonstrate that

$$Var(E[X])=Var\left(\frac{1}{n}\sum_{i}^nX_i\right)=\frac{1}{n^2}\sum_{i}^nVar(X_i)$$

Since the $X_i$ are samples from the same distribution, $Var(X_i)=\sigma^2$

$$Var(E[X])=\frac{1}{n^2}\sum_{i}^n\sigma^2=\frac{\sigma^2}{n}$$

Translating into finite sample estimates:

$$s_m^2=\frac{s_o^2}{n}$$

Firebug
  • 19,076
  • 6
  • 77
  • 139
  • What is $X_i$ here, the value of i-th observation of the random variable $X$ ? Then what would $Var(X_i)$ mean? – Mah Neh Mar 18 '23 at 22:56
  • @MahNeh Yes. $Var(X_i)$ is the variance of $X_i$, a sample from a distribution with variance $\sigma^2$ – Firebug Mar 18 '23 at 23:10
  • I see, now I understand. Actually $X_i$ is the "mean of mean i-th", which is the sample in this case. However, my problem deducing it this way is that I see no reason to assume all variances are the same, at all. Do you? – Mah Neh Mar 19 '23 at 00:20
  • @MahNeh the variances are the same because the $X_i$ come from the same distribution – Firebug Mar 19 '23 at 00:35
  • No $Xi$ are the means of means, and they do not have the same variance necessarily, why do you say so? For example, Wikipedia > Homoscedasticity, or homogeneity of variances, is an assumption of equal or similar variances in different groups being compared. – Mah Neh Mar 19 '23 at 20:27
  • @MahNeh the $X_i$ are samples from the same distribution. They aren't means or anything, just samples from a distribution. Read the linked question or the Wikipedia article. Mentioning homoskedasticity was a mistake by me, because it's implied since the $X_i$ are samples from the same distribution. – Firebug Mar 19 '23 at 21:07