1

In introductory books one can see such definition of sample proportion: if $X = (x_1,...,x_n)$ is our sample of length $n$, consists of $0$ and $1$, then sample proportion is $\hat{p} = \frac{\sum_{k=1}^{n}x_k}{n}$.

We define our sample $\xi=(\xi_1,...,\xi_n)$ as a sample where each random variable has Bernoulli distribution with unknown parameter $0 < p < 1$. So the sample proportion in this case by definition is just sample mean, $\frac{\sum_{k=1}^{n}\xi_k}{n}$.

$\mathbb{E}[\xi_1] = p$, $\mathbb{V}ar[\xi_1] = p-p^2$.

I want to understand the formal derivation for confidence interval of this statistics. But as we know from central limit theorem, $\frac{\xi_1+...+\xi_n}{n}\xrightarrow{d} \mathcal{N}(p, \frac{p-p^2}{n})$ and we can get confidence intervals for $p$ from this.

In my other question I was thoroughly answered about this and got why one cannot use such notation and what do people mean when they say that it's "approximately" normal.

So only one question leaves here:

Is "sample proportion" just the synonym for the "sample mean" but in the case when our sample came from Bernoulli distribution?

To be more clear, we say that $\overline{\xi}= \frac{\sum_{k=1}^{n}\xi_k}{n}$ is a sample proportion iff $\xi = (\xi_1, ...,\xi_n) : \forall 1 \leq i \leq n \ \ \xi_i \sim Bern(p)$.

I just didn't understand, why, for example John A. Rice in his "Mathematical Statistics and Data Analysis, Third Edition" on page 214 introduces both sample mean and sample proportion and doesn't say that sample proportion is just a sample mean in a particular case.

  • Your question is ungrammatical and therefore unclear. Could you rephrase it? – whuber Mar 25 '23 at 20:35
  • @whuber, Thanks for your response. I tried to make it clearer. It it's still unclear, let me know I will give it another try. – perepelart Mar 25 '23 at 20:51
  • 1
    There are problems in proving convergence when the point to which a sequence is to converge is a moving target. When you put the sample size $n$ on the right side of your claim about the central limit theorem, you are moving the point of convergence every time you change the sample size, so your exact expression of the central limit theorem is not correct. (People get this wrong all the time, and the shame of it is that, if you simulate some data, you are likely to find this to hold approximately. However, for doing mathematical statistics, this expression is not correct.) – Dave Mar 25 '23 at 20:55
  • 1
    As @Dave says. It is reasonable to say $\frac{\xi_1+...+\xi_n}{n}\to p$ with convergence both in probability and almost surely thanks to the law of large numbers, and to say $\text{Var}\left( \frac{\xi_1+...+\xi_n}{n}\right) = \frac{p-p^2}{n}$. If you want the Central limit theorem, then it is $\sqrt{n}\left(\frac{\xi_1+...+\xi_n}{n}-p\right) \xrightarrow{d} \mathcal{N}(0, p-p^2)$ – Henry Mar 25 '23 at 21:10
  • @Dave, Henry thanks for answers. I need time to think about it. I always thought that I proved this corollary of CLT right but what you did say sound logical to me. Maybe I need to open another question there I will demonstrate how I proved it and I will try to understand where is an error in my proof. But what is about sample proportion and sample mean? I couldn't find a definition for "sample proportion" in my books while I see this being widely used in "practical statistics texts" where nobody matters about rigour of definitions. – perepelart Mar 25 '23 at 21:19
  • 1
    Re "Is "sample proportion" just the synonym for the "sample mean"": that's exactly how sample proportions are introduced in some elementary textbooks, such as Freedman et al. "Statistics." Take a look. Note especially that the definition of a proportion needn't refer to any kind of probability mechanism whatsoever: a proportion is a property of a dataset. – whuber Mar 31 '23 at 13:59
  • @whuber, thanks for you answer. I opened book "Statistics, 4th edition by David Freedman, Robert Pisani, Roger Purves" and couldn't find definition of sample proportion. Did I open wrong book or did they call sample proportion the other way? – perepelart Mar 31 '23 at 15:59
  • They like to use "percentages." In the 3rd edition, the account of sample percentages begins at chapter 20, "Chance errors in sampling." Look up "sample percentages, defined" in the index. It ought to point to the next chapter 21, "The accuracy of percentages." – whuber Mar 31 '23 at 19:07
  • 1
    @whuber, thanks! I got it: sample proportion is a sample mean but for the data which is being composed of zeros and ones. To define it we even don't need to assume any probability distribution behind the data. – perepelart Mar 31 '23 at 19:46

0 Answers0