Two-sample t-test with one the samples having only 1 element?

Question

In short

Assume there is a normally distributed population, from which I take a sample with $N_1=5$ elements. Then I obtain a separate number $x$ from somewhere. I want to test whether the value $x$ came from the above mentioned distribution.

Is a two-sample $t$-test with the assumption of common variance and sample sizes 5 and 1, respectively, appropriate here? Can one of the samples have size 1 in a $t$-test? I've read about "extremely" small sample sizes but never $N=1$. While having both samples with just $N=1$ sounds obviously silly, I feel the variability observed among the 5 elements of the first sample makes it possible to assess whether $x$ "fits in well" among those 5 values.

Background

I am analyzing a machine learning process that, at its end, spits out an evaluation metric (describing how good the trained model performs on the test set). The training process is stochastic due to random initialization, random shuffling of examples before each epoch and random data augmentation.

I am performing ablations to compare how changing hyperparameters affects the evaluation metric. Sometimes the difference is small, sometimes larger. To get an idea of whether these differences are meaningful or just variation induced by randomness of the training process, I first need to see how random the training process is: I've repeated the main experiment with fixed configuration 5 times (same experiment, the differences in the final evaluation metric result only from noise).

Then I make a change in the configuration, train and evaluate it and obtain 1 value. I want to be able to quantitatively say something about how meaningfully different this value is from the 5 other ones.

Is there any value in trying a formal hypothesis testing approach or should I just rely on intuitive eyeballing of the numbers?

Do note that obtaining one value costs about a day of computing, which can be put to better use than repeating the same training job again and again so it is not realistic to obtain more than 5 values for the same configuration nor to obtain more than a single value for any configuration other than the aforementioned one. Out of necessity, I am willing to assume that the training procedure yields evaluation metrics that have the same variance for all configurations.

If you are interested in testing whether the distributions of two groups are same or not, you can use permutation test which is an exact test so proper for small sample size problems. — JaeHyeok Shin, Nov 27 '18 at 20:21
@JaeHyeokShin, With small sample sizes, there are only few label permutations, in my case only 6. So there is no way to determine significance at p=0.05, as that would need at the very least 20 permutations. — isarandi, Nov 28 '18 at 18:40
@isarandi To make a statement with the significance of p = 0.05, you need to put very strong assumption on your data-generating distribution which cannot be tested. For example, if you assume 5 samples come from a normal distirbution, you can estimate mean and variance by using 5 samples and check how unlikely your one sample located in your estimated distribution. In spirit, it is what t-test essentially doing. — JaeHyeok Shin, Nov 29 '18 at 03:31

BruceET · Accepted Answer · 2018-11-22T09:37:49.433

Here is an example of a two sample t test of $H_0: \mu_1 = \mu_2$ against $H_1: \mu_1 \ne \mu_2.$ As you specify, the first sample consists of five observations from a normal distribution and the second sample has only one observation from a normal distribution with a different mean (and the same variance). [Simulation in R.]

set.seed(1122)
x1 = rnorm(5, 100, 5);  x2 = rnorm(1, 70, 5)

I chose the means relatively far apart in order to give an example in which the null hypothesis is rejected. (See the Comment by @Glen_b.) The simulated data are:

x1;  x2
[1]  92.88224 102.24980  99.64795 109.80224  97.24648
[1] 74.83476

t.test(x1, x2, var.eq = T)

        Two Sample t-test

data:  x1 and x2
t = 3.6985, df = 4, p-value = 0.02086
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  6.364756 44.697198
sample estimates:
mean of x mean of y 
100.36574  74.83476

Notes: (1) The assumption of equal variances is important because that provides the only information available about the variance of the single-observation second sample. In fact, it is not possible to do a Welch (separate-variances) t test with such data. Without the parameter var.eq=T, R will not perform a two-sample t.test.

t.test(x1, x2)
Error in t.test.default(x1, x2) : not enough 'y' observations

(2) Another, somewhat different, approach would be to make a 95% prediction interval for additional observations from the population associated with the sample of size 5. Such an interval would be $\bar X_1 \pm 2.7764\,S_1\sqrt{1 + \frac{1}{5}},$ where $\bar X_1$ and $S_1$ are the sample mean and SD, respectively, of that sample. Then you might be suspicious whether the additional value is from the same population as the sample of size five, if it does not fall within the prediction interval.

qt(.975, 4)
[1] 2.776445

I compared the t-test-based and the prediction interval-based solution, they give slightly different intervals. For my data (88.86, 89.08, 89.44, 88.82, 89.3), the first approach yields the non-significant interval from 88.27 to 89.94. The second approach gives 88.36 to 89.85. — isarandi, Nov 23 '18 at 17:35
Not exactly equivalent intervals. The t test in R provides CI for $\mu_1 - \mu_2;$ the PI compares the single observation in the second sample (regardless of its source) with $\mu_1.$ I think the prediction interval may be the better approach in your application. That's why I mentioned it in my note. — BruceET, Nov 23 '18 at 18:49

Two-sample t-test with one the samples having only 1 element?

In short

Background

1 Answers1