Should Welch t-test use unbiased standard deviation estimates to compute degrees of freedom?

Question

When calculating the degrees of freedom to use in a t test for a difference in means, allowing for unequal variances: This answer suggests that we should (when possible) correct the estimated standard deviation used in the Satterthwaite equation to compute the degrees of freedom $\nu$ for a t-test. A comment disagrees.

Of course this is only material when we are working with very small sample sizes, and only practical when we can find a correction term for estimates of standard deviation.

If we assume that our random variables are normally distributed, then we can unbias estimates of standard deviation by applying the correction factor $c_G(n) = \sqrt{\frac{N-1}{2}}\,\,\frac{\Gamma\left(\frac{N-1}{2}\right)}{\Gamma\left(\frac{N}{2}\right)}$.

If the variables are normal, then does incorporating that correction factor improve our calculations?

I attempted to answer this by running a bunch of simulated experiments on the difference of means of independent normal variables X, Y; computing the confidence interval (CI) on the difference using $\nu$ both with and without the correction factor; and counting the frequency with which the true difference is inside the CI. (Python code follows.) But even when using sample sizes n as small as 4, with 100,000 iterations I can't find any simulation parameters that result in CI coverage more than 1% from the desired confidence, and the difference in CI coverage between the two formulas is never more than 0.3%. So if there is a difference I haven't succeeded in finding it with this approach.

# Simulation to check confidence interval on difference of estimated means
import math
import numpy as np
from scipy.stats import t
nsim = 100_000   # Number of simulations to run
# Random variable parameters for simulation:
nX, nY = (8, 4)
meanX, meanY = (1, 2)
sigmaX, sigmaY = (1, .3)
gamma = 0.9  # Confidence interval coverage
def cG(n):
    return math.exp(math.lgamma((n-1)/2) - math.lgamma(n/2) - math.log(math.sqrt(2/(n-1))))
cGx = cG(nX)
cGy = cG(nY)
count_contains_true = 0  # Number of simulations in which base CI contains true difference (meanX-meanY)
count_corrected_contains_true = 0  # Same, but computing CI using unbiased standard deviations
for i in range(nsim):
    x = np.random.normal(size=nX, loc=meanX, scale=sigmaX)
    y = np.random.normal(size=nY, loc=meanY, scale=sigmaY)
    muX = np.mean(x)
    muY = np.mean(y)
    varX = np.var(x, ddof=1)
    varY = np.var(y, ddof=1)
    eX = varX / nX
    eY = varY / nY
    nu = (eX + eY)2 / (eX2 / (nX-1) + eY**2 / (nY-1))
    CI = t.interval(gamma, df=nu, loc=(muX - muY), scale=np.sqrt(eX+eY))
    if CI[0] < meanX-meanY < CI[1]:
        count_contains_true += 1
# Compute nu using correction factor
eXc = cGx**2 * eX
eYc = cGy**2 * eY
nu = (eXc + eYc)**2 / (eXc**2 / (nX-1) + eYc**2 / (nY-1))
CI = t.interval(gamma, df=nu, loc=(muX - muY), scale=np.sqrt(eX+eY))
if CI[0] &lt; meanX-meanY &lt; CI[1]:
    count_corrected_contains_true += 1


print(f'Over {nsim:,} simulations {gamma:.0%} confidence interval contained the true parameter:\n'
      f'\t\t{count_contains_true / nsim:.1%} of the time for base CI\n'
      f'\t\t{count_corrected_contains_true / nsim:.1%} of the time for unbiased CI'
)

Maybe generally small differences between the approaches explain a lack of answers (until now)... — Christian Hennig, Jan 14 '24 at 11:05
Ideally, each question would stand alone. Here, one has to read a prior long question and answers and comments in order to understand this one. Can you rewrite so the question is (nearly) complete and understandable on its own? — Harvey Motulsky, Jan 14 '24 at 15:24
@HarveyMotulsky I added a preamble. Is that what you meant? Or do you mean I should elaborate on what the Welch-Satterthwaite t-test is? — feetwet, Jan 14 '24 at 18:23
Because that version of the t test is already an approximation, the default assumption should be that the formula for df (which presumably optimizes the approximation in some way) is appropriate for the sd estimates required by the test and that changing those sd estimates would require a concomitant change in the formula. — whuber, Jan 14 '24 at 18:30

Should Welch t-test use unbiased standard deviation estimates to compute degrees of freedom?

0 Answers0