2

I have a discussion at work on this topic and want to leverage the wisdom of the stats crowd :-)

Suppose $X\sim Normal(\mu, \sigma)$ is observed and that $y_i = f(x_i)$ where $Y\sim Binomial(p)$. Note that $f()$ is some function translating X to zeros and ones. Let y be the vector containing the observed $y_i$ for $i=1,2,...,n$

The goal is to compute confidence intervals for the observed values $\bar{y}$. Today, I do this via basic bootstrapping y and computing the confidence intervals with the reverse percentile interval method.

  1. Is information/uncertainty included from x via this computation method?
  2. Is information lost by not including the uncertainty from x directly?
  3. What is the expected difference in confidence interval between bootstrapping x vs y for $\bar{x}$ and $\bar{y}$?
  • 2
    This is opaque: how are $Y$ and $X$ related? Are you trying to say $Y = f(X)$? Assuming this, what are your data?? Do you observe $X$ or not? What form of bootstrapping do you use? What form of a bootstrap CI do you compute? What do you mean by "$\hat Y$"? Would that perhaps be $p$?? – whuber Sep 27 '22 at 14:30
  • Great questions. I've edited the post and corrected my notation, thank you for flagging, Whuber! To answer your questions; yes X is observed and from that we compute y with function f(). I use basic bootstrapping using the reverse percentile interval to compute the CI. I want to understand the difference in the CI when computing CI using observed x vs observed y (= f(x) ). – Sweetbabyjesus Sep 27 '22 at 19:52
  • 2
    Your question and your comment don't seem in sync: the question states only that $Y$ is observed. If $X$ is observed, you don't need to observe $Y$ unless you know $f$--do you? Confidence intervals don't apply to observed values: they apply to parameters of your probability model. Are you perhaps trying to construct a CI for $E(f(X))$ given a sample $X$? – whuber Sep 27 '22 at 19:58
  • 1
    Do you know $\mu$ and $\sigma$? If so, you can calculate the unconditional probability that $y = 0$ (or $1$ as the case may be) directly, without bootstrapping a sample. As @whuber comments, confidence intervals don't apply to observed values, so we are left guessing what you are really trying to do... – jbowman Sep 27 '22 at 20:01
  • 1
    @whuber, yes f() is known and I want to construct a CI for $E(f(X))$ given a sample $X$. Then I want compare it to the CI for $E(Y)$ given a sample $Y$. What is the difference between those CIs when bootstrapping sample $Y$ directly vs sample $X$? E.g. I expect the CI for $E(Y)$ to contain less information given that $y_i$ is binary (vs continuous $x_i$). However, is the CI for $E(Y)$ dependent on $\sigma_x$? Thanks for correcting my jargon, this helps a lot! :) – Sweetbabyjesus Sep 27 '22 at 20:07
  • I have a sense the answer will depend on $f.$ What can you tell us about that function? And could you explain why you are thinking of bootstrapping, given you have such a strong assumption about the distribution of $X$? Again, what kind of bootstrap and bootstrap CI do you have in mind? – whuber Sep 27 '22 at 20:14
  • $f(x_i) = 1$ if $x_i >= T$ else 0. In reality the distribution of $X$ is unknown, therefore I use empirical bootstrap to compute CI using reverse percentile interval for estimate $\bar{y}$. I use Normal distribution for simplicity. Perhaps the core question is how the CI changes for $\bar{y}$ when resampling x vs y? Thank you for your patience. – Sweetbabyjesus Sep 27 '22 at 20:35

1 Answers1

2

The answer depends on $f$ and its relationship with the distribution of $X.$

Ultimately, provided you use a reasonable procedure to compute a confidence interval for $p = E[f(X)],$ what matters is the sampling distribution of $Y = f(X).$ Let's study that.

I have formulated three innocent-looking binary functions $f$ for examination. I plot them in relationship to the standard Normal distribution with $\mu=0,$ $\sigma=1,$ so that the blue portions under the Normal curve are where $f=1$ and therefore the total area (in blue) is the value of $p$ in each case.

enter image description here

$f_1$ at the left indicates when $X$ is relatively large in magnitude. $f_2$ in the middle is a thresholding function, indicating when $X\gt 0.$ $f_3$ at the right indicates whether the integral part of $X$ is odd. In all three cases, $p = 1/2.$

The maximum likelihood estimate of $p$ based on $X$ is going to work well (except, perhaps, for tiny samples). It is obtained from a dataset $(x_i), i=1,2,\ldots, n$ by estimating

$$\hat\mu = \frac{1}{n}\sum_{i=1}^n x_i$$

and

$$\hat\sigma = \sqrt{\frac{1}{n}\sum_{i=1}^n (x_i - \hat\mu)^2}.$$

The MLE of $p$ is then

$$\hat p = \int_{-\infty}^{\infty} f(x) \phi(x;\hat\mu, \hat\sigma)\, \mathrm{d}x$$

where $\phi$ is the Normal density with the given parameters.

A reasonable estimator (also the MLE) of $p$ based on $Y$ is the usual sample proportion,

$$\hat p_0 = \frac{1}{n} \sum_{i=1}^n Y_i = \frac{1}{n}\sum_{i=1}^n f(X_i).$$

The values of this estimator are discrete: they are limited to $\{0, 1/n, 2/n, \ldots, (n-1)/n, 1\}.$

To compare two estimators (or, almost equivalently, two confidence interval methods), we study how much they vary from one sample to another. In the histograms below, a narrow histogram will be superior to a broad one.

Anyone contemplating a bootstrap for anything other than a toy problem has a largish sample -- surely $n=25$ or larger. Here, then, are the results of simulating a thousand standard Normal samples of $X$ of size $25$ and estimating $p$ for these three binary functions.

First, $f_1:$

enter image description here

You can see the discreteness of $\hat p_0.$ Its sampling variance is $(0.1 / 0.64)^2 \approx 2.4$ times greater than the sampling variance of $\hat p.$ This means the estimate based on $Y$ requires a sample size about $2.4$ times greater than an estimate based on $X$ to achieve the same precision.

Next, the thresholding function $f_2:$

enter image description here

The spreads are similar, but the estimate based on $X$ remains superior.

Finally, the integer-part function $f_3:$

enter image description here

The estimate based on $X$ is almost one hundred times better than the estimate based on $Y:$ what you can accomplish with $25$ observations of $X$ requires over two thousand observations of $Y.$

($f_3$ was inspired by Method 8 at https://stats.stackexchange.com/a/117711/919. The analysis there shows that unless $\hat\sigma \ll 1,$ the estimate of $p$ is going to be extremely close to $0.5.$ The point is that although $X$ might be uncertain, $E[f(X)]$ scarcely varies in many circumstances.)

The intuition suggested by these examples is that when $f$ has a detailed structure, a detailed understanding of $X$ leads to much better information about $p$ than mere examination of the values of $f.$

This intuition continues to work well when you explore other functions or vary the parameters $\mu$ and $\sigma$ or modify the distribution of $X$ or choose alternative estimation (or confidence interval) procedures. We may summarize these results loosely but memorably:

Discretizing the values of a random variable loses information.

whuber
  • 322,774