When calculating the degrees of freedom to use in a t test for a difference in means, allowing for unequal variances: This answer suggests that we should (when possible) correct the estimated standard deviation used in the Satterthwaite equation to compute the degrees of freedom $\nu$ for a t-test. A comment disagrees.
Of course this is only material when we are working with very small sample sizes, and only practical when we can find a correction term for estimates of standard deviation.
If we assume that our random variables are normally distributed, then we can unbias estimates of standard deviation by applying the correction factor $c_G(n) = \sqrt{\frac{N-1}{2}}\,\,\frac{\Gamma\left(\frac{N-1}{2}\right)}{\Gamma\left(\frac{N}{2}\right)}$.
If the variables are normal, then does incorporating that correction factor improve our calculations?
I attempted to answer this by running a bunch of simulated experiments on the difference of means of independent normal variables X, Y; computing the confidence interval (CI) on the difference using $\nu$ both with and without the correction factor; and counting the frequency with which the true difference is inside the CI. (Python code follows.) But even when using sample sizes n as small as 4, with 100,000 iterations I can't find any simulation parameters that result in CI coverage more than 1% from the desired confidence, and the difference in CI coverage between the two formulas is never more than 0.3%. So if there is a difference I haven't succeeded in finding it with this approach.
# Simulation to check confidence interval on difference of estimated means
import math
import numpy as np
from scipy.stats import t
nsim = 100_000 # Number of simulations to run
# Random variable parameters for simulation:
nX, nY = (8, 4)
meanX, meanY = (1, 2)
sigmaX, sigmaY = (1, .3)
gamma = 0.9 # Confidence interval coverage
def cG(n):
return math.exp(math.lgamma((n-1)/2) - math.lgamma(n/2) - math.log(math.sqrt(2/(n-1))))
cGx = cG(nX)
cGy = cG(nY)
count_contains_true = 0 # Number of simulations in which base CI contains true difference (meanX-meanY)
count_corrected_contains_true = 0 # Same, but computing CI using unbiased standard deviations
for i in range(nsim):
x = np.random.normal(size=nX, loc=meanX, scale=sigmaX)
y = np.random.normal(size=nY, loc=meanY, scale=sigmaY)
muX = np.mean(x)
muY = np.mean(y)
varX = np.var(x, ddof=1)
varY = np.var(y, ddof=1)
eX = varX / nX
eY = varY / nY
nu = (eX + eY)2 / (eX2 / (nX-1) + eY**2 / (nY-1))
CI = t.interval(gamma, df=nu, loc=(muX - muY), scale=np.sqrt(eX+eY))
if CI[0] < meanX-meanY < CI[1]:
count_contains_true += 1
# Compute nu using correction factor
eXc = cGx**2 * eX
eYc = cGy**2 * eY
nu = (eXc + eYc)**2 / (eXc**2 / (nX-1) + eYc**2 / (nY-1))
CI = t.interval(gamma, df=nu, loc=(muX - muY), scale=np.sqrt(eX+eY))
if CI[0] < meanX-meanY < CI[1]:
count_corrected_contains_true += 1
print(f'Over {nsim:,} simulations {gamma:.0%} confidence interval contained the true parameter:\n'
f'\t\t{count_contains_true / nsim:.1%} of the time for base CI\n'
f'\t\t{count_corrected_contains_true / nsim:.1%} of the time for unbiased CI'
)