0

How can I improve my quantile estimation method for estimating values between the 5% and 95% confidence intervals for a multimodal distribution (as a mixture of gaussians) ? Where the CI are only for the rightmost and leftmost part of the multimodal distribution. Further, the number of gaussians in the mixture is not known a priori.

Currently, I am using quantile estimation (from a histogram) to find the values between the 5% and 95% CIs, but I am not confident in the accuracy of this method. Is there a more reliable method that I can use to estimate these values? Any suggestions or resources would be greatly appreciated.

  • 2
    What do you mean by “multimodal Gaussian distribution“? There's no such thing, Gaussian is unimodal. – Tim Dec 26 '22 at 10:10
  • @Tim as a sum of several gaussians shifted by some interval $x$ – Daniel Wiczew Dec 26 '22 at 10:12
  • Sum of Gaussians is a (regular) Gaussian. Do you mean a mixture of Gaussians? – Tim Dec 26 '22 at 10:15
  • @Tim Yes, mixture of gaussians, but the number of gaussians is not known a priori – Daniel Wiczew Dec 26 '22 at 10:17
  • What do you need this interval for? – Tim Dec 26 '22 at 10:18
  • @Tim

    To estimate a bound for outliers that are outside of the range 5% - 95%

    – Daniel Wiczew Dec 26 '22 at 10:18
  • What are you using the histogram for? No histogram is needed for computing empirical quantiles. – Christian Hennig Dec 26 '22 at 10:40
  • 2
    What do you mean by "CI"? Confidence intervals? If so, you use the term wrongly. – Christian Hennig Dec 26 '22 at 10:41
  • 1
    If what you want is a method to determine 5% and 95% quantile of a distribution of which you assume, potentially wrongly, that it's a Gaussian mixture without knowing the number of components, I doubt that you can do better than using the empirical quantiles. You could estimate the mixture parameters and determine the number of components by BIC, but there is much uncertainty in this. – Christian Hennig Dec 26 '22 at 10:45
  • @ChristianHennig What about estimating it by inverse of CDF ? – Daniel Wiczew Dec 26 '22 at 11:17
  • 1
    Presumably, "using quantile estimation (from a histogram)" means something like a nonparametric estimator using the order statistics. If so, https://stats.stackexchange.com/questions/99829 explains how to assess the likely error in the estimates. If instead you are using something like a KDE to estimate quantiles, the problem is harder because the results depend (potentially strongly) on the shape of the kernel and the width you choose. But none of this is justifiable for outlier identification: you likely would be throwing out good data with the bad. – whuber Dec 26 '22 at 17:21
  • 1
    @DanielWiczew "What about estimating it by inverse of CDF ?" That's fine but doesn't need a histogram. – Christian Hennig Dec 26 '22 at 17:57

0 Answers0