0

For example, to estimate the population mean $\mu$, I am given two sample mean $\bar{x}_1$ and $\bar{x}_2$ from two (independent) data sets of $N_1$ and $N_2$ observations respectively.

Without access to the data sets, I am told that the sample standard deviations are $s_1$ and $s_2$ and asked which sample mean should be chosen.

I would say that the two estimates are not comparable. Is it correct? (my 1st question) My reasoning is as follows. $$s^2 = \frac{1}{N-1}\sum_i (x_i - \bar{x})^2$$ where $\bar{x} = \frac{1}{N}\sum_i x_i$. Expecting the estimator being unbiased means that I am assuming for example $x_i=\mu + e_i$ with $\mathbb{E}[e]=0$.

Injecting the model, $\bar{x} - \mu = \frac{1}{N}\sum_i e_i$ and therefore $s^2 = \frac{1}{N-1}\sum_i (\mu+e_i - \bar{x})^2 = \frac{1}{N-1}\sum_i (e_i + (\mu - \bar{x}))^2$.

I conclude that smaller $s$ does not mean smaller $(\mu-\bar{x})^2$.

A related question (my 1-bis question) is there any way to assess the "correctness" of the two estimates?

My second question would be what should I do to gain a better estimates from the mean and variance of the two data sets?

Rokai
  • 31
  • Welcome to Cross Validated! What are the two data sets, and how do they relate to $\mu?$ For instance, if $\mu$ is the average weight of an elephant, yet the two data sets measure the number of hot dogs I eat on the 4th of July and the number of spoons each house has in my neighborhood, then neither data set is informative about $\mu.$ – Dave Mar 08 '24 at 14:21
  • @Dave the two data sets are the observations of $\mu$, collected on two different days by two different teams using their own equipments. The two data sets relate to $\mu$ in the sense that the sample mean is expected to be an unbiased and consistent estimator. – Rokai Mar 08 '24 at 14:48
  • And what is it that you want to do, pick a team for future measurements? – Dave Mar 08 '24 at 14:56
  • @Dave something like that. Team 1 claimed that they were better because $s_1$ is smaller than $s_2$, sounding odd IMHO. – Rokai Mar 08 '24 at 15:16
  • What troubles you about that claim? – Dave Mar 08 '24 at 15:31
  • @Dave I think the claim is not valid because the standard deviation and variance of observations say nothing about the exactitude of the estimate. – Rokai Mar 08 '24 at 15:36
  • Why not? What is the formula for standard error of the mean? – Dave Mar 08 '24 at 15:38
  • Doesn't https://stats.stackexchange.com/questions/12251 answer your question? https://stats.stackexchange.com/questions/243922 provides further insight. – whuber Mar 08 '24 at 15:49
  • @Dave $s^2 = \frac{1}{N-1}\sum_i (x_i - \bar{x})^2$ where $\bar{x} = \frac{1}{N}\sum_i x_i$. Expecting the estimator being unbiased, I am assuming (for example) $x_i=\mu + e_i$ with $\mathbb{E}[e]=0$. Injecting the model, $\bar{x} - \mu = \frac{1}{N}\sum_i e_i$ and therefore $s^2 = \frac{1}{N-1}\sum_i (\mu+e_i - \bar{x})^2 = \frac{1}{N-1}\sum_i (e_i + (\mu - \bar{x}))^2$. Smaller $s$ does not mean smaller $(\mu-\bar{x})^2$. – Rokai Mar 08 '24 at 15:57
  • 1
    @whuber Thanks, it seems that they answer my second question (how to combine the two data sets) but not the first one (which one is better). But let me read them carefully before coming back to you. – Rokai Mar 08 '24 at 15:58

0 Answers0