2

I am searching for the formula that will generate the critical values in a t-test instead of using a static critical value table.

I am programming in Java and not R so the qt() function is not what I am looking for.

Any help is appreciated.

RBI
  • 123
  • 5
  • What kind of formula do you want? There are dozens, ranging from inverting the integral to all kinds of power series and other approximations. How accurate does it need to be and what ranges of arguments does it need to handle? – whuber Mar 15 '13 at 17:56
  • 1
    I am just writing a general purpose stats library so I guess I would want it as accurate as possible while still considering run-time complexity. The arguments would simply be the degrees of freedom and the alpha value. Could you point me in the direction of one or two of these formulas since all my Google searches lead to excel and general t-test tutorials. thx – RBI Mar 15 '13 at 18:12
  • you might want to have a look at this open source java math library – user603 Mar 15 '13 at 20:39

1 Answers1

9

The Student t distribution is essentially an inverse beta function, which is defined in terms of an integral. A good resource for basic formulas for the common functions in statistics is the Numerical Recipes site (nr.com). The older editions, which include code in Fortran and C, are online. For the CDF of the Student t, see section 6.4 of the C version (which should port easily to Java and other C-like languages), p. 228; the inverse beta code is given on the immediately preceding pages.

Some words of experience: porting (or writing) numerical code is fraught with subtle perils and can introduce the nastiest kinds of bugs: those that give only slightly wrong answers and so go undetected. It is essential to create an exhaustive test suite. Consider tabulating a large array of values to high precision using reliable software like R or Mathematica (not a spreadsheet!) and setting up a direct comparison of those values to yours--before you even begin coding. Really stretch the code by evaluating it at endpoints: supplying enormous values for the degrees of freedom, supplying fractional degrees of freedom (you will need this capability for other computations, anyway, such as in a Satterthwaite t-test), infinitesimally small positive values for the limit of integration, and so on. This is especially needed for t distributions because they are often the foundations of other calculations, such as noncentral t distributions, etc., and the algorithms in those calculations might supply surprising values of the arguments in their numerical searches. Robust, defensive coding is essential.

You might consider using a well-tested library instead. :-)

whuber
  • 322,774