Someone I know are using linear regression to estimate differences-in-mean for parental stress, given a binary explanatory variable $X_i=[0,1]$. They have found a small, but significant effect.
They have a sample of 3000 parents, but the sample is very uneven, with 2700 parents with $X_i=0$ and 300 with $X_i=1$. So it's a 90/10 distribution. They are concerned that this could be a problem, and are considering randomly drawing 300 out of the 2700. I have argued against this, as a sample mean of 2700 parents will be closer to the population mean than with a sample of just 300 and the standard deviation will be smaller for a larger sample.
However, I have now read a very interesting old thread discussing this topic here on StackExchange, which raises the point that the power of a t-test is stronger for even samples, i.e., the risk of committing a type 2 error is smaller with 50/50 samples:
How should one interpret the comparison of means from different sample sizes?
The approved answer to the thread above shows how the power of a t-test can be stronger with a 50/50 sample than 75/25 and 90/10. The three samples have $N=100$. Increasing the power of the t-test is, as far as I can tell, the only point of insisiting on having equal samples. Now, I want to revisit this topic to ask whether or not this result is relevant for larger samples, such as $N=3000$.
The following R-code is lifted from the thread in the link above, with some alterations to include larger samples. The original post is here.
set.seed(9) # To get the some of the same numbers
# as in the previous thread
power1090 = vector(length=10000) # Storing the p-values from each
power5050 = vector(length=10000) # simulated test to keep track of how many
power100900 = vector(length=10000) # are 'significant'
power3002700 = vector(length=10000)
for(i in 1:10000){ # Runnning the following procedure 10k times
n1a = rnorm(10, mean=0, sd=1) # Drawing 2 samples of sizes 90/10 from 2 normal
n2a = rnorm(90, mean=.5, sd=1) # distributions w/ dif means, but equal SDs
n1b = rnorm(50, mean=0, sd=1) # Same, but samples are 50/ 50
n2b = rnorm(50, mean=.5, sd=1)
n1c = rnorm(100, mean=0, sd=1) # A 90/10 sample, with more observations
n2c = rnorm(900, mean=.5, sd=1)
n1d = rnorm(300, mean=0, sd=1) # A 90/10 sample with 3000 total observations
n2d = rnorm(2700, mean=.5, sd=1)
power1090[i] = t.test(n1a, n2a, var.equal=T)$p.value # here t-tests are run &
power5050[i] = t.test(n1b, n2b, var.equal=T)$p.value # the p-values are stored
power100900[i] = t.test(n1c, n2c, var.equal=T)$p.value # for each version
power3002700[i] = t.test(n1d, n2d, var.equal=T)$p.value
}
mean(power1090<.05) # The powe for a 90/10 sample is 32%.
[1] 0.3203
mean(power5050<.05) # For the 50/50 sample, the power increases to 70%.
[1] 0.7001 # This is clearly an improvement.
mean(power100900<.05) # But with much larger samples, the power is close
[1] 0.9967 # to 100%, even with uneven samples.
mean(power3002700<.05)
[1] 1
The results show how a 50/50 sample is better than 90/10 with $N=100$. But as the number of observations grow, the power of the t-test approaches 100%, even with a 90/10 split.
This leads me back to my initial opinion, that there are no reasons to reduce a sample to produce even groups, assuming that we are talking about sufficiently large samples. Does the community agree?