4

I have following values from an experiment:

   A   B
X 64  20
Y 62  11

I subject this to Chi-square test using following code:

from scipy.stats import chisquare
pval = chisquare([a,b], [c,d])[1]
print(pval)

Output is:

0.006421123271652286

This seems clearly significant (<0.05).

I now calculate odds ratio and its confidence intervals with above data using following formulae:

OR = (a*d) / (b*c)
se = math.sqrt((1/a)+(1/b)+(1/c)+(1/d))
lower  = np.exp(math.log(OR) - 1.96*se)
upper  = np.exp(math.log(OR) + 1.96*se)
print(OR, lower, upper)

Output is:

0.5677  0.2514   1.2819

( The confidence intervals agree with online calculator at https://select-statistics.co.uk/calculators/confidence-interval-calculator-odds-ratio/ )

So, confidence interval is very much overlapping 1, while I expected it to be on one side of 1 since P value was clearly significant.

I have following questions:

  1. Where is the error and how can I correct it?

  2. Would you call these data as statistically significant?

  3. What test can I use so that P value and confidence intervals match?

Thanks for your help.

rnso
  • 10,009

2 Answers2

8

The chisquare function tests given counts against expected counts. That's not what you intend. You're testing a contingency table. Use the chi2_contingency function with takes a table (nested array) as input and returns:

chi2: float 
    The test statistic.

p: float The p-value of the test

dof: int Degrees of freedom

expected: ndarray, same shape as observed The expected frequencies, based on the marginal sums of the table.

(https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html)

The correct analysis gives a p-value of 0.24:

>>> from scipy.stats import chi2_contingency
>>> chi2_contingency([[64,20],[62,11]])
(1.3719790003937939, 0.24147215490328422, 1, array([[ 67.41401274,  16.58598726],
       [ 58.58598726,  14.41401274]]))
>>> 
2

The answer from abstrusiosity is of course correct, I just wanted to add that you can also use the chisq.test() function in R to calculate the chi-square statistic and it's associated p-value. So, if you have a contingency table (a more-dimensional chi-square test), use the chisq.test() function like:

x <- matrix(c(64, 62, 20, 11), ncol = 2)

chisq.test(x)

Result:

X-squared = 1.372, df = 1, p-value = 0.2415
COOLSerdash
  • 30,198
  • Welcome to Stats.Stackexchange.com. You are correct and R is a more standard language for statistical work than Python. – rnso Jan 29 '23 at 11:20