How to Simulate T-Values for Two Different Samples

Question

Motivating Question

I had a discussion with somebody recently, who said that a 10 participant sample with a t-value of 2.1 is more impressive than the same t-value for a 100 participant sample. The argument made was essentially this: because the formula for Cohen's d includes $n$, by nature it increases the effect size for 10 participants, thus larger effects. This to me seems fairly inaccurate for a lot of reasons, namely due to the lack of reliability/power of the estimate along with the issue of generalizability, but by an extreme technicality it is "correct" based off this very specific criterion.

Simulating the Data

Setting that aside, I tried simulating two samples to answer this question for myself, one with 10 participants and one with 100. I tried setting the SD fairly high to emulate how high variation affects the distribution and thus effect size of the populations.

#### Load Libraries ####
library(tidyverse)
library(effectsize)
library(ggpubr)
Set Seed
set.seed(123)
Simulate Small Sample
group <- rep(c("ctrl",
               "treat"),
             c(5,5))
response <- rnorm(n=10,
                  mean=50,
                  sd=35)
small.sample <- tibble(group,
                       response)
Simulate Larger Sample
large.group <- rep(c("ctrl",
                     "treat"),
                   c(50,50))
large.response <- rnorm(n=100,
                  mean=50,
                  sd=35)
large.sample <- tibble(large.group,
                       large.response)
large.sample
Cohens D
cohens_d(response,group)
cohens_d(large.response,
         large.group)
Plot Them
small.plot <- small.sample %>% 
  ggplot(aes(x=group,
             y=response))+
  geom_violin(fill="lightblue",
              alpha = .5)+
  geom_point()+
  labs(x="Small Sample Groups",
       y="Small Sample Response",
       title = "Violin Plots of Small Sample")+
  theme_bw()+
  scale_x_discrete(labels=c("Control",
                             "Treatment"))
small.plot
large.plot <- large.sample %>% 
  ggplot(aes(x=large.group,
             y=large.response))+
  geom_violin(fill="lightblue",
              alpha = .5)+
  geom_point()+
  labs(x="Large Sample Groups",
       y="Large Sample Response",
       title = "Violin Plots of Large Sample")+
  theme_bw()+
  scale_x_discrete(labels=c("Control",
                            "Treatment"))
large.plot
ggarrange(small.plot,
          large.plot)

Cohen Values

Looking at the Cohen's d values obtained, I get this for the small sample:

Cohen's d |        95% CI
-------------------------
0.24      | [-1.01, 1.47]

And this for the larger sample:

Cohen's d |        95% CI
-------------------------
0.05      | [-0.35, 0.44]

Immediately you can see that the effect size for the small sample is larger, but the confidence interval is also much wider.

Plotting Distribution

The violin plots also show why this happens:

Here the data is fairly evenly distributed in the large sample, whereas the small sample has some extreme values impacting the distribution, and thus likely impacting the effect sizes because of the differences in groups.

Unresolved

My main question remaining is how to simulate the exact t value specified: 2.1. I assume there is a more direct way of simulating this, but I have no clue how that can be accomplished in R. I feel like there is some way by messing with the mean/SD in some respect but I'm not sure how to accomplish that. Any advice would be great.

In order to run a simulation study, you want to use randomly-selected data many times. In your case, you want to get rid of set.seed(), and then wrap your whole function in a for() loop. You'll need a vector to save the results. Maybe 10,000 iterations ? — Sal Mangiafico, Nov 11 '22 at 13:42
That makes sense actually. But I'm still not entirely sure how to go about that, mainly because I am fairly inexperienced with for loops that are more complicated than the most basic examples. — Shawn Hemelstrand, Nov 11 '22 at 13:45
Rereading the question, I'm not sure how the t value plays into the question relative to the Cohen's d . But you could add the t value with e.g. t.test(Response1, Response2)$statistic. — Sal Mangiafico, Nov 11 '22 at 14:35

Sal Mangiafico · Accepted Answer · 2022-11-12T13:22:19.103

The following is R code to explore the difference in Cohen's d for two samples of different size, both selected from the same population. (Here, normal with mean = Mean, sd = SD, and sample sizes, SmallN and LargeN.)

If I were going to conduct this study, I would probably randomize the sample size, and then plot Cohen's d vs. Sample size. This would tell you if there is a sample size at which Cohen's d stays in a reasonable range.

if(!require(effectsize)){install.packages("effectsize")}
if(!require(FSA)){install.packages("FSA")}
library(effectsize)
N      = 1000
SmallN =  5
LargeN = 50
Mean   = 50
SD     = 35
SmallCD = rep(NA, N)
LargeCD = rep(NA, N)
###################
for(i in 1:N){
SmallResponse1 = rnorm(n=SmallN, mean=Mean, sd=SD)
SmallResponse2 = rnorm(n=SmallN, mean=Mean, sd=SD)
SmallCD[i] = cohens_d(SmallResponse1, SmallResponse2)$Cohens_d
LargeResponse1 = rnorm(n=LargeN, mean=Mean, sd=SD)
LargeResponse2 = rnorm(n=LargeN, mean=Mean, sd=SD)
LargeCD[i] = cohens_d(LargeResponse1, LargeResponse2)$Cohens_d
if(i%%(N/100)==0){cat(".")}
}
Data = data.frame(CohensD = c(SmallCD, LargeCD),
                  Group = factor(c(rep("Small", length(SmallCD)),
                            rep("Large", length(LargeCD)))))
plot(CohensD ~ Group, data=Data)
library(FSA)
Summarize(CohensD ~ Group, data=Data)

###########################################

Addendum:

Here's some code for my other idea, that plots Cohen's d vs. sample size.

if(!require(effectsize)){install.packages("effectsize")}
library(effectsize)
N      = 1000
MinN   =  4
MaxN   = 200
Mean   = 50
SD     = 35
CD = rep(NA, N)
SampleSize = rep(NA, N) 
###################
for(i in 1:N){
ThisN = runif(1, min=MinN, max=MaxN)
Response1     = rnorm(n=ThisN, mean=Mean, sd=SD)
  Response2     = rnorm(n=ThisN, mean=Mean, sd=SD)
  CD[i]         = cohens_d(Response1, Response2)$Cohens_d
  SampleSize[i] = ThisN
if(i%%(N/100)==0){cat(".")}
}
plot(CD ~ SampleSize)

This is an excellent answer. Just a couple questions. First, what does the if(i%%(N/100)==0){cat(".")} part do as well as the cat functions after? Second, what does runif do in this situation? — Shawn Hemelstrand, Nov 12 '22 at 02:46
runif() generates random values from a uniform distribution. So, it's analogous to rnorm(). ... The if() ... cat() code will print a period to the screen as the code loops through, printing 100 periods no matter the size of N. — Sal Mangiafico, Nov 12 '22 at 02:53
Ah that makes sense now. This essentially gets at my question, so I thank you Sal. — Shawn Hemelstrand, Nov 12 '22 at 02:54

How to Simulate T-Values for Two Different Samples

Motivating Question

Simulating the Data

Set Seed

Simulate Small Sample

Simulate Larger Sample

Cohens D

Plot Them

Cohen Values

Plotting Distribution

Unresolved

1 Answers1