14

I've read that the chi square test is useful to see if a sample is significantly different from a set of expected values.

For example, here is a table of results of a survey regarding people's favourite colours (n=15+13+10+17=55 total respondents):

red, blue, green, yellow

15, 13, 10, 17

A chi square test can tell me if this sample is significantly different from the null hypothesis of equal probability of people liking each colour.

Question: Can the test be run on the proportions of total respondents who like a certain colour? Like below:

red, blue, green, yellow

0.273, 0.236, 0.182, 0.309

Where, of course, $0.273 + 0.236 + 0.182 + 0.309=1$.

If the chi square test is not suitable in this case, what test would be?

Edit: I tried @Roman Luštrik answer below, and got the following output, why am I not getting a p-value and why does R say "Chi-squared approximation may be incorrect"?

chisq.test(c(0, 0, 0, 8, 6, 2, 0, 0), p = c(0.406197174, 0.088746395, 
             0.025193306, 0.42041479, 0.03192905, 0.018328576, 
             0.009190708, 0))
    Chi-squared test for given probabilities

data:  c(0, 0, 0, 8, 6, 2, 0, 0) 
X-squared = NaN, df = 7, p-value = NA

Warning message:
In chisq.test(c(0, 0, 0, 8, 6, 2, 0, 0), p = c(0.406197174, 
       0.088746395,  :
  Chi-squared approximation may be incorrect  

hpy
  • 639
  • 1
    In the second case, are you assuming you know the total sample size? Or not? – cardinal Apr 16 '11 at 22:04
  • 1
    @cardinal: yes I do know the total sample size. – hpy Apr 17 '11 at 00:30
  • 4
    then just multiply the proportions by the total sample size to transform into a table of counts, and apply the chi-sq. method corresponding to your first example. – Aaron Apr 17 '11 at 00:53
  • I suspect you are asking about the "goodness of fit" test (using the chi square). The use of which was explained bellow. Cheers, Tal – Tal Galili Apr 18 '11 at 04:42

6 Answers6

12

Correct me if I'm wrong, but I think this can be done in R using this command

chisq.test(c(15, 13, 10, 17))
    Chi-squared test for given probabilities

data:  c(15, 13, 10, 17) 
X-squared = 1.9455, df = 3, p-value = 0.5838

This assumes proportions of 1/4 each. You can modify expected values via argument p. For example, you think people may prefer (for whatever reason) one color over the other(s).

chisq.test(c(15, 13, 10, 17), p = c(0.5, 0.3, 0.1, 0.1))
    Chi-squared test for given probabilities

data:  c(15, 13, 10, 17) 
X-squared = 34.1515, df = 3, p-value = 1.841e-07

  • @Roman Luštrik : I tried your suggestion above, but got told that my chi square approximation might be incorrect, in fact the p-value is "NA". What does that mean? I have appended the results to my original question. Thanks. – hpy Apr 18 '11 at 22:35
  • 2
    I suspect you're seeing this because of some low cell counts (some books I've read suggest a min. of 5 per cell). Maybe someone more knowledgeable on the subject can chip in? – Roman Luštrik Apr 18 '11 at 23:48
  • 1
    Also notice that you can get a p value if you make the last of your probability more than zero (but the warning still remains). – Roman Luštrik Apr 18 '11 at 23:50
  • 1
    Ott & Longnecker (An introduction to statistical methods and data analysis, 5th edition) state, on page 504, that each cell should be at least five, to use the approximation comfortably. – Roman Luštrik Apr 18 '11 at 23:55
  • 1
    @penyuan : You should've mentioned that you have quite some zero counts. Roman is right, using a Chi-square in this case just doesn't work for the reasons he mentioned. – Joris Meys Apr 19 '11 at 00:40
  • @Joris Meys : is there an alternative to the chi square test in this case? Thanks. – hpy Apr 19 '11 at 04:20
  • 1
    @penyuan : I added an answer giving you some options. – Joris Meys Apr 19 '11 at 22:22
11

Using the extra information you gave (being that quite some of the values are 0), it's pretty obvious why your solution returns nothing. For one, you have a probability that is 0, so :

  • $e_i$ in the solution of Henry is 0 for at least one i
  • $np_i$ in the solution of @probabilityislogic is 0 for at least one i

Which makes the divisions impossible. Now saying that $p=0$ means that it is impossible to have that outcome. If so, you might as well just erase it from the data (see comment of @cardinal). If you mean highly improbable, a first 'solution' might be to increase that 0 chance with a very small number.

Given :

    X <- c(0, 0, 0, 8, 6, 2, 0, 0)
    p <- c(0.406197174, 0.088746395, 0.025193306, 0.42041479, 
           0.03192905, 0.018328576, 0.009190708, 0)

You could do :

p2 <- p + 1e-6
chisq.test(X, p2)
        Pearson's Chi-squared test

data:  X and p2 
X-squared = 24, df = 21, p-value = 0.2931

But this is not a correct result. In any case, one should avoid using the chi-square test in these borderline cases. A better approach is using a bootstrap approach, calculating an adapted test statistic and comparing the one from the sample with the distribution obtained by the bootstrap.

In R code this could be (step by step) :

    # The function to calculate the adapted statistic.
    # We add 0.5 to the expected value to avoid dividing by 0
    Statistic <- function(o,e){
        e <- e+0.5
        sum(((o-e)^2)/e)
    }
# Set up the bootstraps, based on the multinomial distribution
n &lt;- 10000
bootstraps &lt;- rmultinom(n, size=sum(X), p=p)

# calculate the expected values
expected &lt;- p*sum(X)

# calculate the statistic for the sample and the bootstrap
ChisqSamp &lt;- Statistic(X, expected)
ChisqDist &lt;- apply(bootstraps, 2, Statistic, expected)

# calculate the p-value
p.value &lt;- sum(ChisqSamp &lt; sort(ChisqDist))/n
p.value

This gives a p-value of 0, which is much more in line with the difference between observed and expected. Mind you, this method assumes your data is drawn from a multinomial distribution. If this assumption doesn't hold, the p-value doesn't hold either.

Joris Meys
  • 8,832
  • 1
    You might reconsider your first statement, which I do not believe is correct. If $p_i = 0$ for some $i$ and the observed counts are zero (which they better be), then this just reduces to a submodel. The effect is that the number of degrees of freedom is reduced by one for each $i$ such that $p_i = 0$. For example, consider testing for uniformity of a six-sided die (that is $p_i = 1/6$ for $i \leq 6$). But, suppose we (strangely) decide to record the number of times that the numbers $1,\ldots,10$ show up. Then, the chi-square test is still valid; we just sum over the first six values. – cardinal Apr 20 '11 at 12:24
  • @cardinal : I just described the data, where the expected value is 0 but the observed doesn't have to be. It's what OP gave us (although on second thought it does indeed sound rather irrealistic). Hence adding a little bit to the p value to make it highly improbable instead of impossible will help, but even then the chi-square is in this case invalid due to the large amount of table cells with counts less than 5 (as demonstrated by the code). I added the consideration in my answer, thx for the pointer. – Joris Meys Apr 20 '11 at 13:10
  • 1
    yes, I'd say if $p_i = 0$, but you observe a count for that cell, then you've got more serious problems on your hands, anyways. :) – cardinal Apr 20 '11 at 13:40
9

The chi-square test is good as long as the expected counts are large, usually above 10 is fine. below this the $\frac{1}{E(x_{i})}$ part tends to dominate the test. An exact test statistic is given by:

$$\psi=\sum_{i}x_{i}\log\left(\frac{x_{i}}{np_{i}}\right)$$

Where $x_{i}$ is the observed count in category $i$. $i\in \{\text{red, blue, green, yellow}\}$ in your example. $n$ is your sample size, equal to $55$ in your example. $p_i$ is the hypothesis you wish to test - the most obvious is $p_i=p_j$ (all probabilities are equal). You can show that the chi-square statistic:

$$\chi^{2}=\sum_{i}\frac{(x_{i}-np_{i})^{2}}{np_{i}}\approx 2\psi$$

In terms of the observed frequencies $f_{i}=\frac{x_{i}}{n}$ we get:

$$\psi=n\sum_{i}f_{i}\log\left(\frac{f_{i}}{p_{i}}\right)$$ $$\chi^{2}=n\sum_{i}\frac{(f_{i}-p_{i})^{2}}{p_{i}}$$

(Note that $\psi$ is the effectively the KL divergence between the hypothesis and the observed values). You may be able to see intuitively why $\psi$ is better for small $p_{i}$, because it does have a $\frac{1}{p_{i}}$ but it also has a log function which is absent from the chi-square, this "reigns in" the extreme values caused by small expected counts. Now the "exactness" of this $\psi$ statistic is not as an exact chi-square distribution - it is exact in a probability sense. The exactness comes about in the following manner, from Jaynes 2003 probability theory: the logic of science.

If you have two hypothesis $H_{1}$ and $H_{2}$ (i.e. two sets of $p_i$ values) that you wish to test, each with test statistics $\psi_{1}$ and $\psi_{2}$ respectively, then $\exp\left(\psi_{1}-\psi_{2}\right)$ gives you the likelihood ratio for $H_{2}$ over $H_{1}$. $\exp\left(\frac{1}{2}\chi_{1}^{2}-\frac{1}{2}\chi_{2}^{2}\right)$ gives an approximation to this likelihood ratio.

Now if you choose $H_{2}$ to be the "sure thing" or "perfect fit" hypothesis, then we will have $\psi_{2}=\chi^{2}_{2}=0$, and thus the chi-square and psi statistic both tell you "how far" from the perfect fit any single hypothesis is, from one which fit the observed data exactly.

Final recommendation: Use $\chi_{2}^{2}$ statistic when the expected counts are large, mainly because most statistical packages will easily report this value. If some expected counts are small, say about $np_{i}<10$, then use $\psi$, because the chi-square is a bad approximation in this case, these small cells will dominate the chi-square statistic.

  • 1
    I'm don't quite follow the "exact" terminology. Perhaps that is particular to Jaynes' work. Your $\psi$ is the log-likelihood-ratio test statistic though and so $2 \psi$ is asymptotically distributed as a $\chi^2$ distribution by Wilks' theorem. Also, $\chi^2 - 2 \psi \to 0$ in probability, which by Slutsky's theorem is enough to conclude that $\chi^2$ has the same distribution as $2\psi$. Finally, it turns out that $\chi^2$ is the scorte test statistic in this problem as well, which provides another connection between the two test statistics. – cardinal Apr 17 '11 at 01:53
  • Also, Agresti (Categorical Data Analysis, 2nd ed., p. 80) claims that $\chi^2$ actually converges to a chi-squared distribution faster than $2 \psi$, which seems at odds with your recommendation. :) – cardinal Apr 17 '11 at 01:55
  • 1
    @cardinal - you are focused on the distribution of the statistic. What I am saying is that the likelihood ratio is already exact using $\psi$, you can compare hypothesis directly using it, rather than a p-value based on its distribution. So if $\psi=ln(2)$ then this means that the "perfect fit" is twice as likely as the hypothesised fit - and I just realised my odds ratio is the "wrong way around" – probabilityislogic Apr 17 '11 at 02:03
  • I didn't quite follow that last comment, but I think the odds ratio you use is correct. – cardinal Apr 17 '11 at 02:07
  • @cardinal - they have the same asymptotic distribution, but different finite sample distributions, which is why I said use chi-square when sample size is large, and $\psi$ when sample size is small. can show by easy example that $\psi$ is way better that $\chi^2$ in small expected count case. – probabilityislogic Apr 17 '11 at 02:08
  • That depends on how you use $\psi$. What Agresti is claiming is that if you use the reference chi-square distribution for inference (which nearly any practitioner will almost certainly do), then the Pearson $\chi^2$ will give you close to the nominal rate for smaller samples. Of course, how useful that is to you depends on the context; you might pay a price in lost power or in other areas. – cardinal Apr 17 '11 at 02:11
  • consider the following example. You have 2 people, both with different hypothesis, one says equal probs $H_{1}:p_1=p_2=p_3=\frac{1}{3}$, another says two dominant ones $H_{2}:p_1=p_2=0.499,p_3=0.002$. Suppose you observed $x_1=x_2=14$ and $x_3=1$. $\psi$ supports $H_{2}$ over $H_{1}$ by a large amount, but $\chi^2$ goes the other way, which is clearly wrong - the data support $H_{2}$ over $H_{1}$ just by looking at them, no test is required. But $\chi^2$ fails in this case. – probabilityislogic Apr 17 '11 at 02:23
  • I don't follow. The OP is talking about goodness of fit testing and this is what this version of Pearson's $\chi^2$ statistic is geared towards. In the example you give, $\chi^2 \approx 11.65$. Using a chi-square with two degrees of freedom as the reference distribution gives a $p$-value of $p \approx 0.0031$, i.e., a highly significant value under the null assumption of uniformity. – cardinal Apr 17 '11 at 02:37
  • Yes, but the chi squared under the alternative $H_{2}$ is $\chi^2\approx 15$, suggesting that a uniform is a better fit to the data than the one proposed. this is where $\chi^2$ fails. – probabilityislogic Apr 17 '11 at 02:44
  • Using $\psi$ you get $\psi_1\approx 35$ and $\psi_2\approx 8$ – probabilityislogic Apr 17 '11 at 02:46
  • The version of Pearson's chi-square that you are using is not for comparing two fixed distributions against one another. Maybe that is where you are confused. Seeing that single count for $x_3$ is heavy evidence in both cases of a lack of fit. – cardinal Apr 17 '11 at 02:53
  • Yes, but if pearson's chi-squared is to be some "universal" GOF statistic (which many people claim), then it should be able to tell you if one hypothesis is a better fit to the data than another? how else can you define "goodness-of-fit"? – probabilityislogic Apr 17 '11 at 02:59
  • Easy. If you're considering two fixed distributions, as you are in your example, use the likelihood test and invoke Neyman-Pearson. – cardinal Apr 17 '11 at 03:00
  • @cardinal, and what if the two fixed distributions are my $H_1$ and my $H_2$? Chi-squared is not good here as a GOF measure. I would go further and say that you could not produce a single example where $\chi^2$ gives the right conclusion and $\psi$ gives the wrong one. – probabilityislogic Apr 17 '11 at 03:10
  • Then the log likelihood ratio statistic is most powerful and, as hinted at above, Pearson's $\chi^2$ is irrelevant. Which has been exactly my point. :) – cardinal Apr 17 '11 at 03:13
  • @cardinal - so you're saying that I should remove any reference to the $\chi^2$ test in my answer? I think this is unwise because $\chi^2$ is closely connected to $\psi$, and almost every statistician knows what a $\chi^2$ test is. They are both hypothesis tests against the "sure thing" hypothesis - but chi-square is not transient, whereas psi is transient. So you don't need to invent another test when using $\psi$ either in specific hypothesis tests or as a GOF measure. – probabilityislogic Apr 17 '11 at 03:22
  • No, quite the contrary. I think your reference to, and discussion of, $\chi^2$ in the answer is good and entirely relevant. It addresses what the OP is asking about. However, the tangent you took when you introduced two separate fixed distributions to compare against is where the use of $\chi^2$ is not well-motivated. Indeed, I think that $\chi^2$ is essentially irrelevant in that context. That is not the problem that particular statistic is trying to address. I'm not sure what you mean by "transient" above, either. – cardinal Apr 17 '11 at 03:30
  • As regards your final comment regarding $\psi$, it suffers from exactly the same "defect" as $\chi^2$ in the example in your comment: It is not the appropriate statistic to test two fixed distributions against one another. However, considering differences does work due to the fact that $\psi_1 - \psi_2$ is the log-likelihood statistic for $H_1$ vs. $H_2$ which, by Neyman-Pearson is optimal. (Maybe that's what you were getting at with the term "transient"??) – cardinal Apr 17 '11 at 03:34
  • @cardinal - both $\psi$ and $\chi^2$ are already hypothesis tests of the null against the perfect fit hypothesis, because $\psi_2=\chi^2_2=0$ in this case. This is what a GOF statistic really is - and you can always show this (a GOF test can be shown to be a hypothesis test with an implicit alternative). And by transient, I mean if A is a "worse fit" compared to C by an amount $X$ and B is a "worse fit" compared to C by $X-a$ for some $a>0$ then A should also be a "worse fit" compared to B by $a$ (can think of this as "consistent ranking" in some sense). – probabilityislogic Apr 17 '11 at 04:23
  • Maybe you meant transitive? Pearson $\chi^2$ can be motivated in a number of ways. One is as a Taylor series approximation to $\psi$, another is as a score test, which is essentially a way to get around having to do a Wald test based on a maximum likelihood estimator. In either case, you can think of Pearson $\chi^2$ as a local linearization of the desired statistic. Furthermore, an asymptotic distribution as opposed to the true distribution is used. So, putting those together, of course it won't be transitive in the sense you describe. – cardinal Apr 17 '11 at 04:36
  • But, that is all still a bit beside the point for the example you give. The $\psi$ is special because $\exp(\psi_1 - \psi_2)$ for two fixed alternatives is the likelihood ratio, which you know is optimal by Neyman-Pearson. In that sense, it is "special". – cardinal Apr 17 '11 at 04:38
  • Also the "perfect fit" analogy is a bit contrived in this case as you're implicitly comparing a very specific parametric form to a nonparametric one. – cardinal Apr 17 '11 at 04:41
  • @cardinal - yes I do mean transitive :) lol. my bad. And my point is as before - that you focus on the distribution of the statistic - which puts the attention into data sets that might have been observed, but were not actually observed. What should be the focus is how well the data is fit by the hypothesis - for the data set that we actually have - why should we care if an observed count for "red" might have been $12$, when we know that it is $15$? This may be important for something like prediction, but not for testing how well a hypothesis fits the data. – probabilityislogic Apr 17 '11 at 04:50
  • @cardinal - the perfect fit is the perfect fit within the beroulli/binomial class. If it was to all alternatives - then the way the observed data enters into the test would be different – probabilityislogic Apr 17 '11 at 04:52
  • I'm not sure what that means, but I'm likely just tired. There are only three free parameters in the OP's model. You can't get a likelihood of one (i.e., "perfect fit") for arbitrary data with a sample size larger than three. In fact, the best you can do is the MLE and that is precisely what both $\psi$ and $\chi^2$ are implicitly constructed to test against. – cardinal Apr 17 '11 at 05:10
  • @cardinal - That's what I meant by "perfect fit" - where you set "observed"="expected". but you are only testing GOF within a class of alternatives - not to everything. So we are basically in agreement that $\psi$ and $\chi^2$ are already hypothesis tests for the hypothesis against the "perfect fit" - I think this is also called the "saturated model" in some contexts. – probabilityislogic Apr 17 '11 at 05:26
  • I think two different concepts are being conflated in your latest comment. – cardinal Apr 17 '11 at 05:37
7

P value will vary with total size of sample even if proportion remains the same. This can be seen in following example with OP's proportions and varying sample size:

from statsmodels.stats.proportion import proportions_chisquare

countlist = [273, 236, 182, 309] nobslist = [1000,1000,1000,1000] res = proportions_chisquare(countlist, nobslist) print("P=",res[1])

countlist = [27.3, 23.6, 18.2, 30.9] nobslist = [100,100,100,100] res = proportions_chisquare(countlist, nobslist) print("P=",res[1])

countlist = [2.73, 2.36, 1.82, 3.09] nobslist = [10,10,10,10] res = proportions_chisquare(countlist, nobslist) print("P=",res[1])

countlist = [.273, .236, .182, .309] nobslist = [1,1,1,1] res = proportions_chisquare(countlist, nobslist) print("P=",res[1])

Four P values printed out by above code are all different:

P= 3.3202983952938086e-10
P= 0.19436113917526665
P= 0.9252292201159897
P= 0.9973200264790189

Only first of above is significant (P<0.05).

rnso
  • 10,009
6

The test statistic for Pearson's chi-square test is

$$\sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i}$$

If you write $o_i = \dfrac{O_i}{n}$ and $e_i = \dfrac{E_i}{n}$ to have proportions, where $n=\sum_{i=1}^{n} O_i$ is the sample size and $\sum_{i=1}^{n} e_i =1$, then the test statistic is is equal to

$$n \sum_{i=1}^{n} \frac{(o_i - e_i)^2}{e_i}$$

so a test of the significance of the observed proportions depends on the sample size, much as one would expect.

Henry
  • 39,459
3

Yes, you can test the null hypothesis:

"H0: prop(red)=prop(blue)=prop(green)=prop(yellow)=1/4"

using a chi square test that compares the proportions of the survey (0.273, ...) to the expected proportions (1/4, 1/4, 1/4, 1/4)

rnso
  • 10,009
  • Just to confirm, it will also work with expected proportions that are unequal to each other? – hpy Apr 17 '11 at 00:31
  • 4
    the test won't be meaningful unless you know the full sample size. Proportions of 1.0 / 0.0 / 0.0 / 0.0 mean very different things if they are from a sample of size 1 as opposed a sample of size 100. – Aaron Apr 17 '11 at 00:52
  • Yes, I DO know the total sample size. – hpy Apr 17 '11 at 14:08