1

In R, the function chisq.test has an option simulate.p.value. In the help page little explanation is provided but it reports to Hope (1968).

Could you please help me out by offering me a for dummy explanation of how p.values are being estimated when simulate.p.value is set to TRUE?

I have basic understanding of Monte Carlo procedures, there is no need to give a big intro on this subject.

Remi.b
  • 5,112
  • Have you read and thought about the linked paper? You will likely get better answers if you can point to a specific point in the paper or its connection to chisq.test() you have problems with. As it stands, your question reads like "I don't want to read the paper, someone please read it and explain it to me", which is not a good fit to CV. – Stephan Kolassa May 24 '17 at 06:29
  • 2
    @Stephan Actually the paper by Hope is not really enlightening on the specifics of simulating chi-square p-values -- it's a fairly general paper about Monte-Carlo hypothesis testing. – Glen_b May 24 '17 at 10:54

1 Answers1

4

The basic idea is to fix the margins and simulate from the set of tables with those margins.

Consider the 2x2 table:

  2   1
  0   4

The margins are:

 x   x   3
 x   x   4
 2   5

The possible tables with those margins are:

 0  3     1  2      2  1
 2  2     1  3      0  4

and their probabilities under the hypothesis of no association can be computed (consider the top left cell, say, and it will be hypergeometric).

Consequently we can simulate from that distribution over $2\times 2$ tables under the null and compute the distribution of any statistic we wish, and so obtain p-values in the usual fashion when sampling the null distribution*. The case for $r\times c$ tables is an extension of this but hopefully this is enough to get the idea. [There's some discussion of ways that $r \times c$ tables might be simulated here How to simulate effectiveness of treatment in R? and gung gives discussion of the situation if you don't have both margins fixed]

Note that in R this simulation is done using the algorithm of Patefield (1981) [1], as is explained in the help. (the function r2dtable will simulate for you, if you wanted to check the performance of the chi-square against some other statistic under fixed margins).

* it's also possible to generate all tables (if your tables aren't too big) and get an exact permutation test. Clever algorithms exist for just looking at some statistic for tables in the tail -- and doing so quite efficiently -- which makes it feasible to do exact tests for surprisingly large tables (to me at least, considering the scale of the combinatorial explosion).

[1] Patefield, W. M. (1981)
"Algorithm AS159. An efficient method of generating r x c tables with given row and column totals"
Applied Statistics, 30, 91-97.

Glen_b
  • 282,281
  • Clear and easy to read answer +1! Thank you. If I may ask a follow-up question. In a paper, how would you refer to such test? A permutation Chi-square test? – Remi.b May 24 '17 at 16:40
  • 1
    Oh, probably that way, yes. If I was not so sure the meaning would be clear to a potential audience, I might say something like "permutation test using the chi-square statistic with fixed margins" and then use the shorter phrase thereafter. I'd include a reference to Patefield if I did it using the R functions. – Glen_b May 24 '17 at 22:16