Combining $\chi^{2}$ tests

Question

I have a problem with statistics. I'm looking at whether variation in the concentration of a certain substance affects how strongly insects are drawn towards it. I did this by baiting insect traps with the substance in different concentrations. I spread the traps out as pairs with the traps in the pair having different concentrations. Now the data I have looks a bit like this:

More : Less
   6 : 0  
   1 : 0   
   3 : 1  
   1 : 0  
  15 : 3

What I need to do is to show that insects prefer higher concentrations of bait.

Pooling the data and using $\chi^{2}$ is an option, but I don't know if it's the best option. Trapping was done in various locations during one month so there might be perturbation factors involved.

Thanks in advance!

EDIT: That is only a small portion of my full data. In total I have around 40 pairs of values to compare and around 300 insects caught. That is just a small, but representative portion.

This sounds like a problem for which meta-analysis may be well suited. — russellpierce, Jul 15 '14 at 03:46
I am not sure I understand what are the $\chi^2$ tests you want to combine here. — StasK, Jul 15 '14 at 12:48

John · Answer 1 · 2014-07-15T13:07:04.107

If the traps were all considered independent then this isn't a variation on a $\chi^2$ test, it just is one. But because they're used in pairs, and therefore not independent, then what you're looking for would be a variation on a McNemar's test. Unfortunately, any modification of that test for more than a 2x2 matrix will still suffer because you have such small numbers of items in the Less column.

Your effect is so strong I'd be tempted to just report the data. What's wrong with that? When the data are very strong and there's little to no noise in the effect then it's hard to see why you wouldn't just report what you found and state it as fact. Statistics aren't as powerful as you think and don't really make this compelling data story any better. I fear they'd just be used to hide the small sample problem.

If you really, really want a probability then resampling is probably your best bet here. You could perform a permutation or randomization test. Calculate the mean difference between your More and Less conditions. Then scramble up the samples randomly, maintaining your pairing, and calculate new mean differences. Do that on a computer thousands of times and figure out where the difference you found falls in the distribution of differences you sampled. The probability of an effect that large or larger will be your p-value to report.

Here's some base R that does it.

dat <- matrix(c(6, 1, 3, 1, 15, 0, 0, 1, 0, 3), ncol = 2)
n <- nrow(dat)
eff <- diff( colSums(dat) )
samps <- rowSums(dat)
nsamp <- 5000

# bootstrap nsamp replications of your experiment
y <- replicate(nsamp, {
    # get a random amount for each location and put it in the more trap
    more <- sapply(1:n, function(i) {sample(samps[i]+1, 1) - 1})
    # of course, the rest is in the less trap
    less <- samps - more
    # calculate effect (less - more might be backwards of what you want 
    # but it's what the diff command did above for the original effect so 
    #we keep calculating in the same direction
    sum(less) - sum(more)
    })
# two sided p-value
sum(y < eff | y > -eff) / nsamp

That p-value is a probability of data coming out with an effect as large, or larger, than the one you got given the null hypothesis, and an assumption of a representative sample (always implicit). Think about it as considering what would happen if the null was true. The traps would just randomly catch insects. Imagine you caught as many insects as you did, in as many locations, and then see how that distribution appears at random across the traps. If the effect you had was unlikely to occur when the null was true then we conclude the null was not.

Alternatively, one could sample the distribution of effects with replacement. By doing this one can bootstrap a confidence interval of the effect.

# get each separate effect
effs <- dat[,2] - dat[,1] 
nsamp <- 1000    
# bootstrap nsamp replications of your experiment
y <- replicate(nsamp, {
    # randomly sample from the distribution of effects
    effSamp <- sample(effs, replace = TRUE)
    # get total sample effect
    sum(effSamp)
    })
    # get y into order so we can get the distribution cutoffs
y <- sort(y)
# 95% CI
y[0.025 * nsamp]; y[0.975 * nsamp]

I have several issues with your answer, @John. First of all, you seem to confuse the bootstrap and permutation testing: the method where you scramble the labels and assign them randomly is permutation, not the bootstrap. TBC — StasK, Jul 15 '14 at 12:43
Second, in your first block of code, you seem to be treating the total number of insect, samps, as fixed. I don't think this is what you need or should do here. If you bootstrap, I would argue you should bootstrap the whole site, and this is a small enough problem that you can fully enumerate all $5^5=3125$ possible bootstrap samples. — StasK, Jul 15 '14 at 12:44
Ah yes, the former would be a randomization test... in this case maybe could have been a complete permutation test as you suggest. I didn't write it that way because I'm hoping the questioner adds a sample or two (which will quickly put it out of reach). If you argue that the whole site should be bootstrapped together then are you arguing that they're independent samples? — John, Jul 15 '14 at 13:03
They are coupled samples, and that's the design of the experiment that needs to be retained in the bootstrap scheme. — StasK, Jul 15 '14 at 13:06
Perhaps I'm missing what you're saying. I sample from the total in each pair and randomly assign insects to trap within the pair. What are you suggesting? — John, Jul 15 '14 at 13:14

score 1 · Answer 2 · answered Jul 15 '14 at 13:06

One uber-robust and uber-conservative approach would be non-parametric signed rank or matched pairs sign tests. In the first test, you assume identical distributions of two groups; in the second, you explicitly incorporate coupling of the two distributions.

. signrank trapped0 = trapped1

Wilcoxon signed-rank test

        sign |      obs   sum ranks    expected
-------------+---------------------------------
    positive |        0           0         7.5
    negative |        5          15         7.5
        zero |        0           0           0
-------------+---------------------------------
         all |        5          15          15

unadjusted variance       13.75
adjustment for ties       -0.13
adjustment for zeros       0.00
                     ----------
adjusted variance         13.63

Ho: trapped0 = trapped1
             z =  -2.032
    Prob > |z| =   0.0422

. signtest trapped0 = trapped1

Sign test

        sign |    observed    expected
-------------+------------------------
    positive |           0         2.5
    negative |           5         2.5
        zero |           0           0
-------------+------------------------
         all |           5           5

One-sided tests:
  Ho: median of trapped0 - trapped1 = 0 vs.
  Ha: median of trapped0 - trapped1 > 0
      Pr(#positive >= 0) =
         Binomial(n = 5, x >= 0, p = 0.5) =  1.0000

  Ho: median of trapped0 - trapped1 = 0 vs.
  Ha: median of trapped0 - trapped1 < 0
      Pr(#negative >= 5) =
         Binomial(n = 5, x >= 5, p = 0.5) =  0.0313

Two-sided test:
  Ho: median of trapped0 - trapped1 = 0 vs.
  Ha: median of trapped0 - trapped1 != 0
      Pr(#positive >= 5 or #negative >= 5) =
         min(1, 2*Binomial(n = 5, x >= 5, p = 0.5)) =  0.0625

Basically, these tests both say that five differences only happen to fall on the same side (higher counts for the "More" condition with 1:2^5 = 1:32 probability if the null of no differences were true. However, these tests may not have much power with the sample size of 5, and you can get stronger results by making stronger assumptions.

Since you are dealing with counts, an appropriate tool for your problem will be a generalized linear model, namely Poisson model for counts. Denoting the sites as block, and the conditions as treat, I am getting

Poisson regression                                Number of obs   =         10
                                                  LR chi2(5)      =      47.17
                                                  Prob > chi2     =     0.0000
Log likelihood =  -11.51985                       Pseudo R2       =     0.6718

------------------------------------------------------------------------------
     trapped |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       block |
          2  |   .1666667   .1800206    -1.66   0.097     .0200653    1.384368
          3  |   .6666667   .4303315    -0.63   0.530     .1881311    2.362419
          4  |   .1666667   .1800206    -1.66   0.097     .0200653    1.384368
          5  |          3   1.414214     2.33   0.020     1.190861    7.557558
             |
     1.treat |        6.5    3.49106     3.49   0.000     2.268531    18.62438
       _cons |         .8   .4953114    -0.36   0.719     .2377266    2.692168
------------------------------------------------------------------------------

That is, the "More" condition attracts, on average, 6.5 more insects than does the "Less" condition, with a confidence interval [2.27x, 18.62x]. The p-value here is much stronger than from a non-parametric test with p = 0.0005.

Stata code:

clear
input trapped block treat
6 1 1
0 1 0
1 2 1
0 2 0
3 3 1
1 3 0
1 4 1
0 4 0
15 5 1
3 5 0
end
poisson trapped i.block i.treat, irr
testparm i.treat
reshape wide trapped, i(block) j(treat)
signrank trapped0 = trapped1
signtest trapped0 = trapped1

I had considered the sign test but saw the N and immediately thought it's not big enough but forgot it is big enough for the paired signed rank test. I think it's a good idea to report the consistency in the result across trap locations regardless of the stat reported and probably should have put that in my answer. As an aside, it's probably unwise to call the p-value "stronger" in the Poisson case. You don't seem to be framing things with p-values as a measure of evidence. On the other hand, Poisson is probably the best way to parametrically model this problem. — John, Jul 17 '14 at 16:24
Hi, I've been trying to get this to work, but just can't figure out how to get the code to work in R. Could someone give me another hand with that please? — user2487380, Jul 26 '14 at 22:23

Combining $\chi^{2}$ tests

2 Answers2