I'm trying to test if a dataset follows Benford's Law (https://en.wikipedia.org/wiki/Benford%27s_law), which basically says how many values in a data set we'd expect to have a first significant digit (i.e. start with) 1,2,...,9.
Here's some actual data.
1 2 3 4 5 6 7 8 9 FSD 0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046 Benford 0.305 0.179 0.126 0.098 0.077 0.064 0.057 0.049 0.046 Observed
As you can see, the observed data is SO close to what Benford expects. I'm trying to argue that Benford is a good model for expectations, but the standard Chi-squared says it is not a good match, since this particular observed data is over 25,000 points. Essentially, the large size of my data set makes the frequency difference look huge. Yet obviously, Benford's Law is a perfect model for this data.
My question: is it statistically correct to do chi-squared with the proportions instead of the frequencies? I know it can be done (I read Can chi square be used to compare proportions?), but I'm more concerned that reviewers of my paper will say that's incorrect.
For 8 degrees of freedom and with frequencies from my sample size on this data I shared (19,500), the p value is 0.33.
– Jay Feb 09 '15 at 16:0459693 35077 24655 19209 15136 12488 11184 9509 9048 (I can't get the formatting right, but those are frequencies for 1-9 respectively)
Sample size is 196,000
– Jay Feb 09 '15 at 16:17Not that you can solve this problem for me - but thanks :)
– Jay Feb 09 '15 at 16:36