2

I have some samples taken from a normal distribution of unknown $\mu$ and $\sigma$, and I know someone took away the top and bottom $p$ percent of the original samples ($p$ is known).

Is there a formula or a simple algorithm to estimate the parameters $\hat\mu$ and $\hat\sigma$ of the original, untruncated normal distribution?

If possible, as $p$ approaches zero, I'd like the formula for $\hat\mu$ to tend towards the sample mean and the formula for $\hat\sigma$ to tend towards the typical sample standard deviation estimate:

$$\hat\sigma = \sqrt{\frac{1}{N-1} \sum_{i=1}^N \left(x_i - \bar{x}\right)^2}$$

  • 1
    Related: Maximum likelihood estimators for a truncated distribution. Not a duplicate, though since that thread looks at truncation at specified limits $a$ and $p$, not symmetric truncation of a given percentage $p$ of observations. I suspect that some kind of iterative EM algorithm would be helpful. – Stephan Kolassa Jul 07 '22 at 10:17
  • @StephanKolassa Sounds like it should be possible to apply the answer there to this specific problem so thanks. It'll probably tend towards the population standard deviation formula as $p \to 0$, instead of the sample one, but it'll still be much better than what I have now. – relatively_random Jul 07 '22 at 13:54
  • 1
    There are many ways to approach this. If you are looking for simplicity rather than optimality, estimate $\mu$ as the median of the data and estimate $\sigma$ as a (readily computable) multiple of the difference between two symmetric order statistics (such as the IQR when $p$ is 25% or less). It's almost as simple and near-optimal to use the intercept and slope of a Normal probability plot for $\hat\mu$ and $\hat\sigma.$ For large $N$ the difference between this and a sample from a truncated distribution is negligible, permitting you to use MLE. So: what are your requirements? – whuber Jul 07 '22 at 14:11
  • @whuber Thanks. I know how to use just median and IQR, but my data is quantized so I don't like the results I get sometimes. The trimmed mean works much better for me so I wanted to know whether I could get something similar for estimating the standard deviation. The normal probability plot sounds great, I'll give it a go tomorrow. As for my requirements, I don't need much precision, just robustness to outliers (which is where trimming comes in), some consistency with $p=0$ and something to "smooth out" the quantization. For now, I use trimmed mean + IQR so almost anything is better. – relatively_random Jul 07 '22 at 14:40
  • You are basically telling us that your data are not Normal! If your initial assumption is true, you wouldn't have any repeating values. So, how does this quantization come about? This sounds like an essential detail to describe in your question. – whuber Jul 07 '22 at 15:19
  • @whuber I said the data was normal because that usually makes things simpler. I honestly didn't think it was relevant, I just mentioned as a side comment about my motivations. I apologize. – relatively_random Jul 07 '22 at 18:20
  • Please edit your post to reflect your actual problem. Right now, it risks getting an answer that is inconsistent with your comments, and then both you and the respondent will be disappointed. – whuber Jul 07 '22 at 19:24

0 Answers0