Is Fisher's exact the right statistical analysis to be using?

Question

I am conducting a service evaluation study and want to assess the association between referral type (self vs. clinician) and gender (Female v Male v Other). Therefore my contingency table is 2x3. I have counts that are less than 5 e.g. Other - Clinician referral = 1, Other - self-referral = 0 as so thought Fisher's is better to use rather than Chi Square.

Is Fisher's exact test the correct analysis to use in this situation as I am unclear whether it can only be applied to 2x2 ?

score 3 · Answer 1 · answered Nov 16 '23 at 12:39

The alleged inaccuracy of the ordinary Pearson $\chi^2$ test when expected frequencies are < 5 has been much overstated. And Fisher's so-called "exact" test yields inaccurate p-values. See this which discusses the issues and provides a link to a long discussion on this site.

score 2 · Answer 2 · answered Nov 16 '23 at 07:42

To answer your question: Yes there is a version of Fisher's exact test, often referred to as Fisher-Freeman-Halton test or the Freeman-Halton extension to Fisher's exact test which can be used on any k x k contingency table. Software like SPSS (Analyse -> Crosstabs -> Exact -> select Exact test) and R fisher.test() calculate this test automatically instead of Fisher's exact test when your table is larger than 2 x 2. However in both cases only the p-value and not a test statistic will be returned, which is different to the 2 x 2 Fisher's exact test, where you also get a test statistic.

And just a quick add-on: It is usually recommended to use the Exact test when 20% of the expected counts in your table is < 5. Not the observed count. (See f.i. here)

Sextus Empiricus · Answer 3 · 2023-11-16T18:30:17.350

The issue here is not (only) about Fisher exact test versus chi-squared test.

The issue is that the Fisher exact test should be primarily used for tables with both marginals fixed.

You don't have this situation with both marginals fixed. See this question for more about it: Most appropriate statistical test for count data (2x2 contingency)

What you might use is Barnard's test, this involves a nuisance parameter that is varied to maximize the p-value. In the case of a 2x3 table, I am not sure whether there are implementations in statistical packages/computer languages. You might compute it manually (like below).

I would actually not analyze the 'other' data category, and instead just use a 2x2 table. This single observation that you have in the 'other' category does not add much information and including it into the analysis makes it more complex and not very powerful.

I have here a little code that computes Barnard's test for 2x3 tables. When I apply a correction for the fitting that occurs by varying the nuisance parameter, then the result is more or less similar to the chi-squared test. At the low p-values it might be slightly more accurate.

Warning note: I have intuitively added a correction for the fitting with the nuisance parameter, but I have no idea whether this is a standard practice, whether this has been studied, and what the authorial approach is. The reason for the correction is that Barnard's test is without that correction very conservative because it will use the worst case scenario for a nuisance parameter. The idea is to consider the resulting p-value as being related to a chi-squared distributed variable that was 1 degree of freedom too high.

### Example data
M = matrix(c(0,1,
             5,4,
             7,3), byrow = TRUE, ncol = 2)
Test for 2x3 tables with fixed rows
Barnard = function(M) {
  observed_lambda = sum(M[,1])/sum(M)
  observed_lambda
  lambda_test = seq(observed_lambda0.8,min(observed_lambda1.2,1),0.001) ### this is the nuisance parameter
p_vt = c() ### this is gonna contain the p_values as function of the nuisance parameter
compute p_values for different values of the nuisance parameter
for (p_i in lambda_test) {
    n = rowSums(M)
    p_value = 0
    p_obs = prod(dbinom(M[,1],n,rep(p_i,3)))  ### this is the probability of the observation
    ### compute in a nested loop all potential alternative observations and sum the probabilities of extremer cases
    for (k1 in 0:n[1]) {
      for (k2 in 0:n[2]) {
        for (k3 in 0:n[3]) {
          pt = prod(dbinom(c(k1,k2,k3),n,rep(p_i,3)))
          if (pt <= p_obs) {p_value = p_value +pt}
        }
      }
    }

    p_vt = c(p_vt,p_value)

  }
#plot(p_test,p_vt, type = "l")

  return(max(p_vt)) ### choose the maximum of the p-values
}
chisq.test(M)
Barnard(M)
simulations to compare barnard's test with the chi-squared test
set.seed(1)
m = 10000
p_chi = rep(0,m)
p_Barnard = rep(0,m)
repeat many times
for (i in 1:m) {
  n = c(1,9,10)
  x = rbinom(3,n,0.5)
  M = cbind(x,n-x)
p_chi[i] = chisq.test(M)$p.value
  p_Barnard[i] = 1-pchisq(qchisq(1-Barnard(M),sum(n)),sum(n-1)) ### a correction because we fit a parameter which reduces the residuals and increases the p-value
}
plot(c(1:m)/m, p_chi[order(p_chi)], xlab = "quantile", ylab = "empirical quantile", pch = 20, cex = 0.7,
     ylim = c(0,0.1), xlim = c(0,0.1))
points(c(1:m)/m, p_Barnard[order(p_Barnard)], col = 2, pch = 20, cex = 0.7)
lines(c(0,1),c(0,1))
legend(0,0.1, c("chi-squared", "Barnard's"), col = c(1,2), pch = 20, cex = 0.7)

Is Fisher's exact the right statistical analysis to be using?

3 Answers3

Test for 2x3 tables with fixed rows

compute p_values for different values of the nuisance parameter

simulations to compare barnard's test with the chi-squared test

repeat many times