9

I'm brand new to statistics and am studying the math behind split testing (A/B and multivariate). I've learned how to calculate $\chi^2$ with given test data, and I understand how to translate this into a probability via a table, but I'd like to be able to calculate the probability myself. I've read through a couple of explanations online, but I'm not getting it.

Does anyone know of a resource or book that breaks this down?

Macro
  • 44,826
Lenwood
  • 503

1 Answers1

20

The $p$-value is the area under the $\chi^2$ density to the right of the observed test statistic. Therefore, to calculate the $p$-value by hand you need to calculate an integral.

In particular, a $\chi^2$ random variable with $k$ degrees of freedom has probability density

$$f(x;\,k) = \begin{cases} \frac{x^{(k/2)-1} e^{-x/2}}{2^{k/2} \Gamma\left(\frac{k}{2}\right)}, & x \geq 0; \\ 0, & \text{otherwise}. \end{cases}$$

Suppose you observe a test statistic $\lambda$. Then, the $p$-value corresponding to $\lambda$ is

$$ p = \int_{\lambda}^{\infty} \frac{x^{(k/2)-1} e^{-x/2}}{2^{k/2} \Gamma\left(\frac{k}{2}\right)} dx $$

After trying to evaluate this integral by hand, it may become clear to you why people use tables (and computers) for calculating such things.

Edit: (This was in the comments but seemed important enough to add here) Note that you can write the $p$-value using special functions:

$$ p = 1−\frac{γ(k/2,λ/2)}{Γ(k/2)} $$

where $\gamma(\cdot,\cdot)$ is the lower incomplete gamma function.

Macro
  • 44,826
  • Thanks for your reply Macro, your response is helpful. I think the place I'm stuck is the gamma function. I can push through the integration (though my calculus is rusty), but I don't understand the gamma function. I'll do some research on that. My goal is to build a web page that does these calculations. I know that a handful of these exist already, my main goal in doing this is fully understanding the math behind the results. – Lenwood Mar 23 '12 at 15:01
  • 2
    Well, if you're hoping to find an exact solution that doesn't involve special functions(e.g. the gamma function), I think you're going to remain stuck. The actual solution is $p = 1 - \frac{ \gamma(k/2, \lambda/2) }{\Gamma(k/2)}$, where $\gamma(\cdot,\cdot)$ denotes the lower incomplete gamma function. Both $\Gamma$ and $\gamma$ are defined in terms of integrals, so any calculator you program for your website will need to involve a numerical integration routine. – Macro Mar 23 '12 at 15:16
  • This has been a really fun exercise and I've learned a ton in the process. I've constructed my web page calculating the probability through linear interpolation. Thanks for your help Marco. – Lenwood Mar 31 '12 at 14:16
  • No problem, Lenwood. What kind of error do you incur? Do you have a link to the website? – Macro Mar 31 '12 at 22:26
  • I've been running the page on my own system, but I'll put it up for a few days. You can see it here, http://chisq.nfshost.com/. As I've spent more time with this, I realize that my values for $\chi^2$ are consistent with other calculators that I've seen, but not with R. I believe the difference is in the calculation of expected values. I'll post that as a separate question. – Lenwood Apr 05 '12 at 14:45