2

I have a very symmetric distribution with kurtosis of 10 and sample size of more than 100. Here is the Histogram https://ibb.co/ws7vBjd

This histogram was obtained by asking participants in the control group to rate a difference in sharpness of two identical X-Ray images, using a continuous scale from -50 to 50 (including fractional values). A great majority of participants didn't see any difference and gave a score of 0 for the difference in sharpness (this is the largest center bin), but a few did say that they see a small difference (these are the smaller bins).

The same question was asked of the test group where two images did differ in sharpness, and the distribution gained from the test group was normal.

  1. What is the simplest way (least complicated) to find a confidence interval for the mean of such a leptokurtic distribution?
peter56
  • 23

2 Answers2

2

I suggest using a Lambert W x Gaussian distribution and estimate parameters , std err etc using the maximum likelihood estimates. You can use the LambertW R package to do this (https://github.com/gmgeorg/LambertW and on CRAN).

See Location parameter estimation in $\alpha-$stable distributions for a simulation study with examples of very heavy tailed data and excellent properties of the location estimate.

Or Comparing estimators of location of the Cauchy distribution for an example of Cauchy distributed data and estimates of location.

1

@Georg (+1) has given one possible answer. But you may not consider his answer as the 'simplest' possibility.

Consider the fictitious data below, 1000 observations which come from a symmetric, leptokurtic distribution.

summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  42.33   49.30   49.99   50.04   50.77   57.67 
hist(x, prob=T, col="skyblue2")

enter image description here

Suppose we do not know the mean or distribution of the population. Because the sample seems roughly symmetric, we could assume the population mean and median are the same. Then the confidence interval $(49.97, 50.07)$ from the Wilcoxon rank sum test in R, might be considered a 95% CI for $\mu.$

wilcox.test(x, conf.int=T)$conf.int
[1] 49.96674 50.07208
attr(,"conf.level")
[1] 0.95

Also, a reasonably simple 95% nonparametric bootstrap CI $(49.95, 50.12)$ for $\mu,$ nearly the same as above, can be obtained as shown below. If you know about bootstrap confidence intervals, you may consider this method sufficiently simple for your purposes.

a.obs = mean(x)
d = replicate(2000, mean(sample(x,1000,rep=T))-a.obs)
LU = quantile(d, c(.975,.025))
a.obs-LU 
   97.5%     2.5% 
49.95388 50.12392 

Note: The fictitious sample x used in the illustrations above was sampled from a Laplace (double exponential) distribution with mean $\mu = 50,$ using R:

set.seed(2022)
y = rexp(1000);  b = sample(c(-1,1), 1000, rep=T)
x = y*b + 50
BruceET
  • 56,185
  • BruceET, both solutions which you indicated look very simple. What would be a justification for performing a nonparametric bootstrap. How would one know that nonparametric bootstrap is an appropriate solution. Also, you wrote " If you know about bootstrap confidence intervals", what exactly do I need to know about bootstrap confidence intervals, aren't they calculated by rejecting 2.5% of the obtained values on each tail if the interval is 95%? – peter56 Mar 22 '22 at 08:48
  • BruceET, can you please respond to the question in my previous comment? – peter56 Mar 23 '22 at 15:32
  • 1
    You need to get more familiar with bootstrap. Chatting in comments is not the venue for that. – BruceET Mar 23 '22 at 16:00