6

Suppose we are interested estimating $X^k$, and we have access to independent unbiased estimators $Y_i$ for $i = \{1,2,\dots, k\}$, i.e., $\mathbb{E}[Y_i] = X$. A straightforward way to construct an unbiased estimator of $X^k$ is to define $Y^{(k)} := \prod_{i=1}^kY_i$, then $\mathbb{E}[Y^{(k)}]=X^k$ by independence.

What are other methods for estimating $X^k$ using only $c < k$ number of estimators? Is it even possible in this general setup? Is there a name to this type of problem?

fool
  • 2,440
  • 1
    I'm reasonably confident you can't do this with $c<k$, but I haven't yet been able to find a proof. It's pretty clear for $k=2$: you can't get an unbiased estimate of a variance from one observation. – Thomas Lumley Dec 24 '20 at 05:08
  • 2
    @ThomasLumley: Unless I am misunderstanding the question, I don't think $k$ is the number of observations, so it seems possible in theory to do this (assuming sufficient data). – Ben Dec 24 '20 at 07:21
  • 2
    @Thomas On the contrary, in realistic settings you can obtain an unbiased estimate of the variance from a single observation. For example, a single observation of a Poisson$(\lambda)$ variable estimates the variance $\lambda$ and is unbiased. – whuber Dec 24 '20 at 14:37
  • 1
    Concerning the question: what exactly do you mean by "access to"? For instance, if that means you can compute $Y_i$ for any subset of your data, then if you have at least $k$ observations you can apply $Y_1$ to any $k$-subset of them and compute the product, etc. – whuber Dec 24 '20 at 14:41
  • Thanks ThomasLumley and whuber for pointing out the ambiguity. I have added a description for $Y_i$s. In short, $Y_i$s are always the means of $M$ random variables (data). We have access to several independent data sets, hence multiple $Y_i$s, but data are expensive to obtain, thus we want to use fewer sets. – fool Dec 24 '20 at 20:31
  • Now I wonder if a bootstrap approach would be appropriate here... – fool Dec 24 '20 at 20:43
  • I reverted my edits and created a separate question that gives more context to the problem. – fool Dec 24 '20 at 22:38
  • 1
    @whuber. Yes and if you know the variance is 7 you can get an unbiased estimator with zero observations. It wasn't clear to me that the original post was interested in making those sorts of assumptions; maybe I misread it. – Thomas Lumley Dec 24 '20 at 23:01
  • 1
    The question is too vague as the role of $X$ the model is not specified. There are cases when a single observation is enough to estimate $X^k$. Take an $\mathcal Exp(X)$ for instance. – Xi'an Dec 25 '20 at 07:53

2 Answers2

2

The obvious analogy here is to use independent estimators $Y_1,...,Y_c$ with expectations:

$$\mathbb{E}(Y_i) = X^{p_i} \quad \quad \quad \sum p_i = k.$$

In theory this is possible so long as you have a method to construct the required estimators for this problem. The independence requirement will generally make it impractical, since in most cases it will require you to partition your dataset to form the estimators seperately. Consequently, while it is possible to get an unbiased estimator by this method, it will tend to be a poor estimator (with high variance), owing to the fact that it uses a partition of the data where only a small amount of the data is used for each part of the estimator.


An example using IID data with zero mean: Suppose you have IID data $X_1,...,X_n$ from some distribution with zero mean and finite variance $\sigma^2 < \infty$, and you want to estimate $\mathbb{E}(X^k)$. In this case the sample mean and sample variance give unbiased estimators for the first and second raw moments:

$$\mathbb{E}(\bar{X}) = \mathbb{E}(X) = 0 \quad \quad \quad \mathbb{E}(S^2) = \mathbb{E}(X^2) = \sigma^2.$$

Now, suppose we partition our data into $c < k$ parts where we have at least two data points in each part (so that we can form the sample variance). Denote these partition parts by $\boldsymbol{X}_{(1)},...,\boldsymbol{X}_{(c)}$ and denote statistics using these samples with the corresponding subscripts. Now, choose some values $p_1,...,p_c \in \{ 1,2 \}$ with $p_1 + \cdots + p_c = k$ and form the estimator:

$$\text{Est} \equiv \prod_{i:p_i = 1} \bar{X}_{(i)} \times \prod_{i:p_i = 2} S_{(i)}^2 .$$

The partition of the data means that the parts are independent, so we have:

$$\begin{align} \mathbb{E}(\text{Est}) &= \prod_{i:p_i = 1} \mathbb{E}(\bar{X}_{(i)}) \times \prod_{i:p_i = 2} \mathbb{E}(S_{(i)}^2) \\[6pt] &= \prod_{i:p_i = 1} \mathbb{E}(X) \times \prod_{i:p_i = 2} \mathbb{E}(X^2) \\[6pt] &= \prod_{i:p_i = 1} \mathbb{E}(X^{p_i}) \times \prod_{i:p_i = 2} \mathbb{E}(X^{p_i}) \\[6pt] &= \prod_{i} \mathbb{E}(X^{p_i}) \\[6pt] &= \mathbb{E}(X^{\sum p_i}) \\[12pt] &= \mathbb{E}(X^k). \\[12pt] \end{align}$$

Note that although this is an unbiased estimator, it will be a terrible estimator when the number of partition pieces is large.

Ben
  • 124,856
  • Hi Ben, thank you for this answer! Btw, I updated the question for clarity, and revamped the notation. Naively, I want to try a bootstrap method that leverages the full (in your example) data set instead of using partitions. Would that still yield an unbiased estimator and reduce variance? – fool Dec 24 '20 at 20:58
  • Well, it is unclear to me how you will preserve independence of the estimators if they share data points. Do you have a proposal for how to do this? – Ben Dec 24 '20 at 21:48
  • Also, your updated question is inconsistent with your original question, and renders the present answer inapplicable. Please revert to your original question --- see guidance here. – Ben Dec 24 '20 at 21:52
  • Sorry maybe I missed something, but $Y_i$s are still independent after the edit. For example, $X_{1,1},\dots,X_{1,M}$s are independent of $X_{2,1},\dots,X_{2,M}$ and so on. So $Y_i$s are still independent of each other. This was the original setup as well. The edits are mostly a change of notations with a little bit more context of what the actual estimator is (it is sample mean of some other random variables). – fool Dec 24 '20 at 22:09
  • Your edited question specifies that all estimators are now sample means, which is inconsistent with the original question and this answer. – Ben Dec 24 '20 at 22:12
  • 1
    Ah understood. I will revert the edits and post a separate question. Sorry about that and thanks for pointing this out! Edit: This rollback feature is amazing. – fool Dec 24 '20 at 22:14
  • 1
    @user228809: Bootstrap is based on a single sample and hence the resulting estimators are dependent. – Xi'an Dec 25 '20 at 09:11
1

To illustrate the point that the answer depends on the underlying statistical model: If $Y\sim\mathcal E(1/X)$, an exponential variable, then $$\mathbb E[Y^k]= X^k\Gamma(k+1)$$ meaning that $Y^k/\Gamma(k+1)=Y^k/k!$ is an unbiased estimator of $X^k$, based on a single observation. This extends to Gamma variables, obviously.

Xi'an
  • 105,342