1

I have $X_1,\dots,X_n,X_{n+1}\overset{iid}{\sim}F_X(x)$, where $F_X$ has a finite mean $\mu$ and variance $\sigma^2$.

If I calculate $\bar X_n = \dfrac{1}{n}\sum_{i=1}^n$ and $S^2_n = \dfrac{1}{n-1}\sum_{i=1}^n\left(x_i - \bar x_n\right)^2$ based on the first $n$ observations, I am able to use those, along with $n$ and $X_{n+1}$, to calculate $S^2_{n+1} = \dfrac{1}{(n+1)-1}\sum_{i=1}^{n+1}\left(x_i - \bar x_n\right)^2$ based on all $n+1$ observations.

Does this make $(\bar X_n, S^2_n, n, X_{n+1})$ a sufficient statistic for $\sigma^2?$ If not, is my function of those four values a sufficient statistic for $\sigma^2?$

Intuitively, I say this should be the case, since I have as much information to estimate $\sigma^2$ by having $(\bar X_n, S^2_n, n, X_{n+1})$ as I do from having all of the $X_i$ values, but I struggle to formally prove this or even begin to prove it.

Dave
  • 62,186
  • 1
    Do you intend to update $\bar{X}$ with observation $n+1$? – krkeane Apr 21 '22 at 15:54
  • 1
    @krkeane Part of my calculation of $S_{n+1}^2$ for estimating $\sigma^2$ involves calculating $\bar X_{n+1} = \dfrac{n\bar X_n + X_{n+1}}{n+1}$, yes. – Dave Apr 21 '22 at 15:56
  • 3
    Do you have a parametric form for $F_{X}\left(x\right)$? Sufficiency depends upon the distribution you seeking to characterize. https://en.wikipedia.org/wiki/Sufficient_statistic – krkeane Apr 21 '22 at 15:57
  • @krkeane I'm taking $\sigma^2$ to be $\mathbb E\left[\left(X -\mathbb E\left[X\right]\right)^2\right]$, not as a parameter of, say, a Gaussian distribution, so I think the answer is that I don't have a particular parametric form in mind. Why should we need a particular parametric form, though? Even if $\sigma^2$ isn't a function of the parameters (since there are no particular parameters), it is a property of the distribution that can be estimated like any other. – Dave Apr 21 '22 at 16:02
  • 1
    Okay, so I think you are trying to estimate a population central moment from a sample statistic. Your algorithm sounds adequate and perhaps optimal. I haven't done enough theoretical statistics to say if its sufficient, or if sufficiency applies to population statistics as opposed to parameters of a distribution. – krkeane Apr 21 '22 at 16:09
  • @krkeane: The concept of sufficiency is the same in non-parametric families: a statistic is sufficient when the distribution of $X$ conditional on the value of the sufficient statistic doesn't depend on the particular distribution in the family from which $X$ arises. – Scortchi - Reinstate Monica Apr 22 '22 at 18:10
  • @Scortchi-ReinstateMonica - in the context of parametric distributions, eg the normal distribution, location and scale parameters uniquely identify the distribution, and $\sum x_i$, $\sum x_i^2$ are sufficient statistics. In the context of an empirical distribution, it seems you may not even know what statistics characterize the distribution. I'm thinking real versus deep fake images for instance. You can match any observed statistic in a synthesized distribution (eg Zhu Wu Mumford FRAME), but how do you know the number of statistics that characterize an empirical distribution? – krkeane Apr 22 '22 at 18:21
  • 1
    @krkeane: Even for parametric families, it's only in special cases that one, two, or any fixed number of statistics are sufficient regardless of the sample size. But assuming merely that observations are i.i.d. implies the order they come in is irrelevant to inference about the distribution; & the set of unordered observations is sufficient. This is clearly a paltry degree of data reduction compared with your example, but has important consequences nevertheless (I've edited my answer to mention one). – Scortchi - Reinstate Monica Apr 23 '22 at 07:35

1 Answers1

2

No: your argument would apply equally well to any family of distributions, not just the family of distributions with finite mean & variance, & it's easy to come up with counterexamples where the sample variance is not a component of the sufficient statistic (e.g. the family of gamma distributions having various scales & shapes, for which the sample arithmetic & geometric means are jointly sufficient). Sufficient statistics of fixed dimension are updateable (see When if ever is a median statistic a sufficient statistic? for why the sample median can never be sufficient) but the converse doesn't follow.

With i.i.d. samples from the non-parametric family you specify, the order statistic $(X_{(1)}, \ldots, X_{(n)})$ is minimal sufficient—only the order of the observations lacks information about the distribution from which they arise. It's also complete: consequently, the sample mean and variance, while not sufficient themselves, as functions of the order statistic are not only unbiased estimators of their population analogues, but the unique uniformly minimum-variance unbiased estimators.


If you know $(\bar X_n, S^2_n)$ is sufficient for a sample of size $n$, then $(\bar X_{n+1}, S^2_{n+1})$ is sufficient for a sample of size $n+1$. If you can show the latter statistic is a function of $(\bar X_n, S^2_n, X_{n+1})$, which is trivial, it follows that $(\bar X_n, S^2_n, X_{n+1})$ is also sufficient, as @whuber points out.

  • If I update the question to say that $S_n^2$ is sufficient for $\sigma^2$ when the sample size is $n$, do I have a sufficient statistic in $(\bar X_n, S^2_n, n, X_{n+1})$ when $X_{n+1}$ gets added to the sample? – Dave Apr 21 '22 at 16:30
  • 2
    Since you can compute $S_{n+1}^2$ from those statistics, a fortiori they are sufficient if $S_{n+1}^2$ is. They're likely not minimal sufficient. – whuber Apr 21 '22 at 17:33