6

This question relates to How to estimate $P(x\le0)$ from $n$ samples of $x$?

One way to make this estimate is to use estimates $\hat\mu$ and $\hat\sigma$ and compute from those

$$ \hat p = \Phi \left(- \frac{\hat \mu}{\hat \sigma} \right)$$

What is an estimate for the variance of $\hat p$ (as function of $n$)?

  • While going through my old questions I have accepted my own answer here, but improvements can be made. For anybody that sees improvement, feel free to add them. – Sextus Empiricus Feb 22 '23 at 21:17

1 Answers1

6

The variable $\frac{\hat\mu}{\hat\sigma}$

This follows a non-central t-distribution scaled by $\sqrt{n}$ and has approximately the following variance (see a related question: What is the formula for the standard error of Cohen's d )

\begin{array}{crl} \text{Var}\left(\frac{\hat\mu}{\hat\sigma}\right) &=& \frac{1}{n}\left(\frac{\nu(1+n(\mu/\sigma)^2)}{\nu-2} - \frac{n(\mu/\sigma)^2 \nu}{2} \left(\frac{\Gamma((\nu-1)/2)}{\Gamma(\nu/2)}\right)^2 \right) \\ &\approx& \frac{1+\frac{1}{2}(\mu/\sigma)^2}{n} \end{array}

The transformed variable $\hat p = \Phi \left(- \frac{\hat \mu}{\hat \sigma} \right)$

This can be related to the variance above by using a Delta approximation. The variance scales with the slope/derivative of the transformation. So you get approximately

$$\begin{array}{} \text{Var}(\hat{p}) &\approx& \text{Var}\left( \frac{\hat\mu}{\hat\sigma}\right) \phi\left(- \frac{\hat \mu}{\hat \sigma} \right)^2 \\ &\approx& \frac{1}{n} \cdot \phi\left(- \frac{\hat \mu}{\hat \sigma} \right)^2 \cdot\left({1+ \frac{1}{2} (\mu/\sigma)^2} \right) \end{array}$$

Or by approximation

$$\begin{array}{} \text{Var}(\hat{p}) &\approx& c \cdot \frac{p}{n} \end{array}$$

where the factor $c = (1+\frac{1}{2}x^2)\phi(x)^2/\Phi(-x)$ is like:

factor c

Code for comparison:

The code below shows that this Delta approximation works reasonably for $\mu/\sigma = 2$. But for higher values of this parameter, the difference is large if $n$ is small, and a higher-order approximation should be made.

comparison of estimate with simulation

### settings
set.seed(1)
mu = 2
sig = 1
d = mu/sig
n = 10
p <- pnorm(-d)

function to simulate the sample and estimating it

sample = function(n) { x <- rnorm(n,mu,sig) d = mean(x)/var(x)^0.5 p_est <- 1-pnorm(d) p_est }

perform the simulation 1000 times

#smp <- replicate(1000,sample(n)) #var(smp) ### simulation variance #dnorm(d)^2 * (1 + 0.5 * d^2)/n ### formula variance

simulate for a range of n

n_rng = 10:100

simulated variance

v1 <- sapply(n_rng, FUN = function(x) { smp <- replicate(1000,sample(x)) ### simulate return(var(smp)) ### compute variance })

estimated variance

v2 <- dnorm(d)^2 * (1/n_rng) * (1 + 0.5 * d^2)

estimated variance with higher precision

tmu <- dsqrt(n_rng) nu <- n_rng-1 fc <- gamma((nu-1)/2)/gamma(nu/2) v3 <- dnorm(d)^2 (1/n_rng) * ( nu/(nu-2)(1+tmu^2) - tmu^2nu/2 * fc^2 )

plot results

plot(n_rng,v2, ylim = range(c(v1,v2)), log = "xy", type = "l", main = "compare simulated variance \n with estimated variance", xlab = "n", ylab = expression(hat(p))) lines(n_rng,v3, col = 1, lty = 2) points(n_rng,v1, col = 1, bg = 0 , pc = 21, cex = 0.7)

  • +1. But simulations show that unless $\mu/\sigma$ is small, this estimate of the variance is biased high and the bias is substantial until $n$ is relatively large (in the hundreds). The sampling distribution of the estimate is extraordinarily skewed for small $n,$ too, suggesting a more detailed analysis would be worthwhile. – whuber Feb 26 '21 at 14:21
  • @whuber I have less time now to go into that. But I will add my code for simulations and comparisons for others to play with . – Sextus Empiricus Feb 26 '21 at 14:23