4

Let's assume we have two groups of patients, a control group and a treatment group. They were asked a question and the answer can be yes or no. Since I come from biological science where I used mainly regression methods I would use logistic regression. A friend of mine from the sociological sciences would analyze this data using Chi-square test or Fisher's exact test. Now, I'm wondering if there are reason to use the one or the other method.

To clarify my question, I created a dataset which I analyzed with several methods. The R-code is also mentioned below. Here are the results (odds ratio and p-values) of the analyses:

                                odds ratio      p-value     comment
    manually (formula)               0.474        0.324     see details below
    chi-square                       -            0.524     warnings: approx. incorrect
    chi-square (simulated)           -            0.460
    fisher's exact test              0.477        0.465
    logistic regression              0.474        0.324     exact what I get manually
    Bayes logistic regression        0.526        0.356     shrinking effect of Bayes

As you can see the results are mostly similar. But imagine the situation if I want to publish the results performed with logistic regression and a reviewer asks me why I don't use the "classical" Chi-square test or Fisher's exact test and vice versa?

Here is how I created the data set and the analysis:

library(arm)
set.seed(12345)

size of groups

n <- 50 # control m <- 60 # treatment

Create data

df0 <- data.frame(group = c(rep("ctrl",n), rep("treat",m)) , out = c(rbinom(n=n, 1, 0.1), rbinom(n=m,1,0.03)) )

Tabulate

with(df0,table(group, out, useNA='ifany'))

      out

group 0 1 ctrl 42 8 treat 57 3

Convert to matrix for using Fisher's test or Chi-square test

mx0 <- with(df0,table(group, out, useNA='ifany'))

Are there expected values lower 5?

round(chisq.test(mx0)$exp,1)

  out

group 0 1 ctrl 46.4 3.6 treat 55.6 4.4

Chi-square test

chisq.test(mx0)$p.value # 0.5242444 chisq.test(mx0, simulate.p.value=TRUE)$p.value # 0.4602699

Fisher's exact test

fisher.test(mx0)$p.value # 0.4646205 fisher.test(mx0)$estimate # 0.4769107 fisher.test(mx0)$conf.int[1:2] # 0.0703001 2.6013230

manually (OR and CI):

(or <- prod(diag(mx0)) / prod(diag(apply(mx0,2,rev))) ) # 0.4736842

log of standard error: root of (1/a+1/b+1/c+1/d):

se.ln <- sqrt(sum(1/mx0))
(cil <- exp(log(or) - qnorm(0.975)se.ln)) # 0.1074239 (ciu <- exp(log(or) + qnorm(0.975)se.ln)) # 2.088704 (t <- abs(log(or)/se.ln)) 2*pnorm(t, lower.tail=FALSE) # 0.323628

Logistic regression

(fit1 <- summary(glm(out ~ group, data=df0, family="binomial"))$coef)

Estimate Std. Error z value Pr(>|z|)

(Intercept) -2.1972246 0.4714045 -4.6610172 3.146504e-06

grouptreat -0.7472144 0.7570094 -0.9870609 3.236128e-01

exp(fit1[2,"Estimate"]) # 0.4736842

(fit2 <- summary(bayesglm(out ~ group, data=df0, family="binomial"))$coef)

Estimate Std. Error z value Pr(>|z|)

(Intercept) -2.2326782 0.4645022 -4.8066039 1.535157e-06

grouptreat -0.6415988 0.6947115 -0.9235471 3.557222e-01

exp(fit2[2,"Estimate"]) # 0.5264501

Amin.A
  • 81
giordano
  • 1,009
  • 1
    what are your specific objectives of study? Also, you may state the hypotheses ? –  Nov 09 '18 at 05:07

1 Answers1

1

This is an interesting question. While there are subtle differences between the approches, they will often get similar answer. But note that the standard logistic regression analyses is asymptotic, so with few observations the other methods could be preferred, or you could maybe bootstrap the logistic regression.

  • 3
    They are all valid although the Fisher Exact test is usually overly conservative. You should choose one a priori and stick with that. I wouldn't switch based on a reviewer 's comments. – David Lane Jun 06 '17 at 02:42