How to interpret pvalue for Chi² goodness-of-fit test for Poisson distribution

Question

I'm a bit confused about the pvalues I'd get when checking my own data against a theoretical Poisson distribution. Following e.g. this post here: https://stats.stackexchange.com/a/78175/66544

I'd compare my own data against a theoretical Poisson-distributed data with parameters from my empirical data. I do this by setting up a classical Chi² deviance test, where the Null hypothesis is that there is no difference between the two distributions, which means that my data is indeed Poisson distributed. Now, if the pvalue for such a test is < 0.05, then we'd conclude that the empirical data doesn't follow a Poisson distribution.

Now, where I get confused is, if I'm not mistaken, if my empirical data deviates more and more from teh theoreitcal Poission distribution, I'd get a large Chi² value. If I feed that into the pvalue function, I'd get a high p value, which speaks for NOT rejecting the Null hypothesis. This is the opposite of what I'd assume, which is that if deviations from the theoretical distribution become large, I'd tend to reject the Null hypothesis.

I'm obviously missing a point, but not sure where I got wrong.

Here's an illustrative example in R.

set.seed(2)
empirical_data <- rpois(100, 5)
x <- table(empirical_data)
1  2  3  4  5  6  7  8  9 10 11 
 4  6 18 22 13 11 12  4  4  5  1
set.seed(2)
theoretical_data <- dpois(0:10, lambda = mean(empirical_data)) * length(empirical_data)
cbind(x,
      theoretical_data,
      PearsonRes = (x - theoretical_data )/sqrt(theoretical_data ),
      ContribToChisq = (x - theoretical_data )^2)
x theoretical_data PearsonRes ContribToChisq

1   4        0.6604527  4.1092896     16.8862613
2   6        3.3154724  1.4743316      2.1736536
3  18        8.3218357  3.3549297     11.2555530
4  22       13.9252051  2.1638677      4.6823233
5  13       17.4761325 -1.0707307      1.1464643
6  11       17.5460370 -1.5627479      2.4421811
7  12       14.6801843 -0.6995180      0.4893254
8   4       10.5277893 -2.0118590      4.0475766
9   4        6.6061878 -1.0139820      1.0281595
10  5        3.6847847  0.6851581      0.4694416
11  1        1.8497619 -0.6247976      0.3903720
chisq <- sum((x - theoretical_data)^2/theoretical_data)
[1] 45.01131
df <- length(x)-1-1
[1] 9
pchisq(chisq, df)
[1] 0.9999991

Now, what happens if I just decrease the Chi² value (would mean that I have LESS deviance from a Poisson distribution):

# pvalue gets lower, which brings me closer to rejecting the NUll hypothesis.
pchisq(5, 9)
[1] 0.1656917

You get a couple of details wrong. a) You don't align the observed data and the theoretical data correctly: x starts at 1 while theoretical_data starts at 0. b) More importantly, pchisq returns the lower-tail probability Pr{X <= x} while the p-value for the chi-squared test is the upper-tail probability Pr{X > x}. Try pchisq(stat, df, lower.tail = FALSE) or 1 - pchisq(stat, df). — dipetkov, May 08 '22 at 14:40
Ah, right regarding a). That‘s an oversight on my end, but shouldn‘t change the overall picture. b) is indeed what I got wrong. It‘s so obvious, a bit embarrassed that I missed that. — deschen, May 08 '22 at 17:46

How to interpret pvalue for Chi² goodness-of-fit test for Poisson distribution

0 Answers0