1

The Freedman-Diaconis Rule says that the optimal bin size of a histogram is $$ \text{Bin Size} = 2 \cdot \text{IQR}(x) n^{-1/3}$$ where $x$ is the data and $n$ is the number of observations in the data.

Using this rule, can we infer the shape of the distribution? I know that, depending on bin widths, histograms can be misleading. But can using the above bin width to fix this problem?

proton
  • 661
  • To begin understanding why the answer is in the negative, you might consider what happens to the histogram as you vary the start point of the bins, which is not specified by this rule of thumb. But why narrow your question so much? If you are truly interested in learning about the shape of a set of data, why do you limit yourself to one of the least precise and most arbitrary tools available (the histogram)? Do you have some need or constraint that compels you to use histograms for this purpose? – whuber Mar 12 '13 at 15:58
  • @whuber: No, just curious. Also the ks.test always returns D=1. So comparing methods. I think the qqplot seems the best. – proton Mar 12 '13 at 16:09
  • That indicates you are not using KS test correctly. – whuber Mar 12 '13 at 16:13
  • @whuber: I think I am using it correctly. I have my data set sample, a list of 40 numbers. Then I use ks.test(sample,pnorm), ks.test(sample,pexp), etc.. and get D=1. Maybe I should set a seed? – proton Mar 12 '13 at 16:16
  • Sure, it executes. But that version tests whether your sample came from a standard normal distribution. That's certainly not the case (unless you standardized your data beforehand, but that renders the test's p-value invalid). All you have learned is that your data obviously don't have a mean of zero, a standard deviation of 1, and an approximately Normal distribution. – whuber Mar 12 '13 at 16:18
  • @whuber: I see. So you need to specify the parameters beforehand? – proton Mar 12 '13 at 16:21
  • Absolutely. Moreover, you're not allowed to estimate them from the data (that would be cheating :-). There are versions of the KS test that don't require you to specify the parameters or estimate them, but I'm not sufficiently familiar with the implementations in R to recommend one. – whuber Mar 12 '13 at 16:25
  • To follow up @whuber's comment about the start point, rather than just the widths, see this – Glen_b Mar 13 '13 at 06:05
  • If you do a KS test with parameter estimation, you have to account for the effect on the distribution of the test statistic. In that case it's often called a Lilliefors test. [Lilliefors originally used simulation to get the distribution of these tests in the case of the normal (1967) and the exponential (1969), though his sample sizes were (perhaps unsurprisingly) fairly small, so the significance levels he got were only approximate.] But I wouldn't recommend this as a way to test normality, and I certainly wouldn't suggest using hypothesis tests in this way to infer a distribution! – Glen_b Mar 13 '13 at 06:17

0 Answers0