I understand what the margins=True option in pd.crosstab does, but I don't understand why it would influence the outcome of the chi2_contingency. Here an example:
#data_crosstab:
srm no yes All
version
<V4 132 105 237
V4 29 24 53
All 161 129 290
chi2_contingency(data_crosstab, correction=False)
#yields
(0.016817770389843306,
0.9999648428969145,
4,
array([[131.57586207, 105.42413793, 237. ],
[ 29.42413793, 23.57586207, 53. ],
[161. , 129. , 290. ]]))
#while
#data_crosstab:
srm no yes
version
<V4 132 105
V4 29 24
chi2_contingency(data_crosstab, correction=False)
#yields
(0.016817770389843306,
0.896816958766594,
1,
array([[131.57586207, 105.42413793],
[ 29.42413793, 23.57586207]]))
I see that the DOF are different, but I really don't understand the role of the option margins. Thanks!
In another [answer] (https://stats.stackexchange.com/questions/103876/what-does-conditioning-on-the-margins-of-mean), I found out that margins should be used when the margins are fixed. I assumed that is my case, so I am not sure I understand why margins=False would give me the correct result.
– Chiara Feb 23 '23 at 14:05chi2_contingencyautomatically takes care of calculating the margins for you, this is why you have to pass the table without the margins (the parametermargins=Falsein the crosstab function makes sure that the generated table that you'll pass does not contain the margins; the margins will be calculated under the hood bychi2_contingency()). – J-J-J Feb 23 '23 at 14:24