I'm a bit confused about the pvalues I'd get when checking my own data against a theoretical Poisson distribution. Following e.g. this post here: https://stats.stackexchange.com/a/78175/66544
I'd compare my own data against a theoretical Poisson-distributed data with parameters from my empirical data. I do this by setting up a classical Chi² deviance test, where the Null hypothesis is that there is no difference between the two distributions, which means that my data is indeed Poisson distributed. Now, if the pvalue for such a test is < 0.05, then we'd conclude that the empirical data doesn't follow a Poisson distribution.
Now, where I get confused is, if I'm not mistaken, if my empirical data deviates more and more from teh theoreitcal Poission distribution, I'd get a large Chi² value. If I feed that into the pvalue function, I'd get a high p value, which speaks for NOT rejecting the Null hypothesis. This is the opposite of what I'd assume, which is that if deviations from the theoretical distribution become large, I'd tend to reject the Null hypothesis.
I'm obviously missing a point, but not sure where I got wrong.
Here's an illustrative example in R.
set.seed(2)
empirical_data <- rpois(100, 5)
x <- table(empirical_data)
1 2 3 4 5 6 7 8 9 10 11
4 6 18 22 13 11 12 4 4 5 1
set.seed(2)
theoretical_data <- dpois(0:10, lambda = mean(empirical_data)) * length(empirical_data)
cbind(x,
theoretical_data,
PearsonRes = (x - theoretical_data )/sqrt(theoretical_data ),
ContribToChisq = (x - theoretical_data )^2)
x theoretical_data PearsonRes ContribToChisq
1 4 0.6604527 4.1092896 16.8862613
2 6 3.3154724 1.4743316 2.1736536
3 18 8.3218357 3.3549297 11.2555530
4 22 13.9252051 2.1638677 4.6823233
5 13 17.4761325 -1.0707307 1.1464643
6 11 17.5460370 -1.5627479 2.4421811
7 12 14.6801843 -0.6995180 0.4893254
8 4 10.5277893 -2.0118590 4.0475766
9 4 6.6061878 -1.0139820 1.0281595
10 5 3.6847847 0.6851581 0.4694416
11 1 1.8497619 -0.6247976 0.3903720
chisq <- sum((x - theoretical_data)^2/theoretical_data)
[1] 45.01131
df <- length(x)-1-1
[1] 9
pchisq(chisq, df)
[1] 0.9999991
Now, what happens if I just decrease the Chi² value (would mean that I have LESS deviance from a Poisson distribution):
# pvalue gets lower, which brings me closer to rejecting the NUll hypothesis.
pchisq(5, 9)
[1] 0.1656917