5

In the dark ages, we would map the results of a Student's t-test to a null hypothesis probability p by looking up T and degrees of freedom in a table to get an approximate result.

What is the mathematical algorithm that generates that table? ie, how can I write a function to generate a precise p given an arbitrary T and df?

The reason I ask is that I'm writing a piece of embedded software that continually monitors hundreds of populations with hundreds of samples each, and raises an alert if successive snapshots of a given population come to differ significantly. Currently it uses a crude z-score comparison, but it would be nice to use a more valid test.

Glen_b
  • 282,281
  • 1
    As explained in comments at http://stats.stackexchange.com/questions/57847/formula-to-calculate-a-t-distribution, there are trade-offs between computation time, storage, and programming complexity. What are your preferences concerning those? What is the anticipated range of degrees of freedom? How accurate do the calculations need to be? – whuber Oct 14 '13 at 21:21
  • 1
    @whuber I'm afraid I don't see any implementations of the actual formula in that question. There is a link to Wikipedia's article on the Student's T, but I can't find the formula for the density function there. There is also a link to an R builtin, but my software package is written de novo in embedded C. Basically I just need a pointer at somewhere that clearly describes the math and I can handle over the programming from there. – Crashworks Oct 14 '13 at 21:26
  • I did not refer you to that thread for its formulas--you are correct, it unfortunately lacks any--but to point out that there are myriad ways to compute the t distribution. It's not really a math question, but one of scientific programming. For instance, in your situation an attractive solution might be to store a few tables and interpolate within them, because then you won't have to program any kind of numerical integration routines. If you don't disclose your engineering constraints and objectives, you will reduce the opportunity to learn about such options. – whuber Oct 14 '13 at 21:35
  • @whuber Well, that brings me back to the question title. If I were to build a table offline that I then cubic-interpolate at runtime, how can I build that table? – Crashworks Oct 14 '13 at 21:49
  • For small integer d.f. you can do integration by parts. For larger d.f. you might do it by numerical integration, or by identifying a suitable approximation for the cdf (or some equivalent), some of which are in published algorithms. – Glen_b Oct 14 '13 at 22:09
  • @Glen_b Where can I find one of those published algorithms? I lack the necessary statistics background to know the right words to punch into Google Scholar. – Crashworks Oct 14 '13 at 22:15
  • 2
    Some algorithms for the cdf of the t are based on the incomplete beta function (which is a commonly used function in various parts of mathematics or physics). Plain googling on algorithm cdf|"distribution function" student t turns up plenty of references within the pages linked (e.g. here), such as Abramowitz and Stegun's Handbook of Mathematical Functions (which gives some small-d.f.-exact and approximate calculations), and various other books and papers. – Glen_b Oct 14 '13 at 23:02
  • 3
    If you want the noncentral t (e.g. for power calculations) a standard reference is Lenth, R. V. 1989. "Algorithm AS 243: Cumulative distribution function of the noncentral t distribution". Applied Statistics, 38, 185-189. – Glen_b Oct 14 '13 at 23:02

1 Answers1

3

While it's possible to do it recursively for fixed degrees of freedom (write the cdf for a given d.f. in terms of the cdf for lower degrees of freedom, and the integrals foir the two lowest-integer df may be done directly), I've never seen anyone try to implement it that way.

Some algorithms for the cdf of the $t$ are based on the incomplete beta function (which is a commonly used function in various parts of mathematics or physics).

There are some for the inverse cdf (quantile function) based on ratios of polynomials.

Plain googling on algorithm cdf|"distribution function" student t turns up plenty of references within the pages linked (e.g. here), such as Abramowitz and Stegun's Handbook of Mathematical Functions (which gives some small-d.f.-exact and approximate calculations), and various other books and papers.

If you want the noncentral t (e.g. for power calculations) a standard reference is Lenth, R. V. 1989. "Algorithm AS 243: Cumulative distribution function of the noncentral t distribution". Applied Statistics, 38, 185-189.

However, if you're doing many of these, hypothesis tests may not suit your purposes. Something more like a measure of effect size might be better.

Glen_b
  • 282,281