1

Motivating Question

I had a discussion with somebody recently, who said that a 10 participant sample with a t-value of 2.1 is more impressive than the same t-value for a 100 participant sample. The argument made was essentially this: because the formula for Cohen's d includes $n$, by nature it increases the effect size for 10 participants, thus larger effects. This to me seems fairly inaccurate for a lot of reasons, namely due to the lack of reliability/power of the estimate along with the issue of generalizability, but by an extreme technicality it is "correct" based off this very specific criterion.

Simulating the Data

Setting that aside, I tried simulating two samples to answer this question for myself, one with 10 participants and one with 100. I tried setting the SD fairly high to emulate how high variation affects the distribution and thus effect size of the populations.

#### Load Libraries ####
library(tidyverse)
library(effectsize)
library(ggpubr)

Set Seed

set.seed(123)

Simulate Small Sample

group <- rep(c("ctrl", "treat"), c(5,5))

response <- rnorm(n=10, mean=50, sd=35)

small.sample <- tibble(group, response)

Simulate Larger Sample

large.group <- rep(c("ctrl", "treat"), c(50,50))

large.response <- rnorm(n=100, mean=50, sd=35)

large.sample <- tibble(large.group, large.response) large.sample

Cohens D

cohens_d(response,group) cohens_d(large.response, large.group)

Plot Them

small.plot <- small.sample %>% ggplot(aes(x=group, y=response))+ geom_violin(fill="lightblue", alpha = .5)+ geom_point()+ labs(x="Small Sample Groups", y="Small Sample Response", title = "Violin Plots of Small Sample")+ theme_bw()+ scale_x_discrete(labels=c("Control", "Treatment")) small.plot

large.plot <- large.sample %>% ggplot(aes(x=large.group, y=large.response))+ geom_violin(fill="lightblue", alpha = .5)+ geom_point()+ labs(x="Large Sample Groups", y="Large Sample Response", title = "Violin Plots of Large Sample")+ theme_bw()+ scale_x_discrete(labels=c("Control", "Treatment")) large.plot

ggarrange(small.plot, large.plot)

Cohen Values

Looking at the Cohen's d values obtained, I get this for the small sample:

Cohen's d |        95% CI
-------------------------
0.24      | [-1.01, 1.47]

And this for the larger sample:

Cohen's d |        95% CI
-------------------------
0.05      | [-0.35, 0.44]

Immediately you can see that the effect size for the small sample is larger, but the confidence interval is also much wider.

Plotting Distribution

The violin plots also show why this happens:

enter image description here

Here the data is fairly evenly distributed in the large sample, whereas the small sample has some extreme values impacting the distribution, and thus likely impacting the effect sizes because of the differences in groups.

Unresolved

My main question remaining is how to simulate the exact t value specified: 2.1. I assume there is a more direct way of simulating this, but I have no clue how that can be accomplished in R. I feel like there is some way by messing with the mean/SD in some respect but I'm not sure how to accomplish that. Any advice would be great.

1 Answers1

2

The following is R code to explore the difference in Cohen's d for two samples of different size, both selected from the same population. (Here, normal with mean = Mean, sd = SD, and sample sizes, SmallN and LargeN.)

If I were going to conduct this study, I would probably randomize the sample size, and then plot Cohen's d vs. Sample size. This would tell you if there is a sample size at which Cohen's d stays in a reasonable range.

if(!require(effectsize)){install.packages("effectsize")}
if(!require(FSA)){install.packages("FSA")}

library(effectsize)

N = 1000 SmallN = 5 LargeN = 50 Mean = 50 SD = 35

SmallCD = rep(NA, N) LargeCD = rep(NA, N)

###################

for(i in 1:N){

SmallResponse1 = rnorm(n=SmallN, mean=Mean, sd=SD) SmallResponse2 = rnorm(n=SmallN, mean=Mean, sd=SD) SmallCD[i] = cohens_d(SmallResponse1, SmallResponse2)$Cohens_d

LargeResponse1 = rnorm(n=LargeN, mean=Mean, sd=SD) LargeResponse2 = rnorm(n=LargeN, mean=Mean, sd=SD) LargeCD[i] = cohens_d(LargeResponse1, LargeResponse2)$Cohens_d

if(i%%(N/100)==0){cat(".")}

}

Data = data.frame(CohensD = c(SmallCD, LargeCD), Group = factor(c(rep("Small", length(SmallCD)), rep("Large", length(LargeCD)))))

plot(CohensD ~ Group, data=Data)

library(FSA)

Summarize(CohensD ~ Group, data=Data)

###########################################

Addendum:

Here's some code for my other idea, that plots Cohen's d vs. sample size.

if(!require(effectsize)){install.packages("effectsize")}

library(effectsize)

N = 1000 MinN = 4 MaxN = 200 Mean = 50 SD = 35

CD = rep(NA, N) SampleSize = rep(NA, N) ###################

for(i in 1:N){

ThisN = runif(1, min=MinN, max=MaxN)

Response1 = rnorm(n=ThisN, mean=Mean, sd=SD) Response2 = rnorm(n=ThisN, mean=Mean, sd=SD) CD[i] = cohens_d(Response1, Response2)$Cohens_d SampleSize[i] = ThisN

if(i%%(N/100)==0){cat(".")}

}

plot(CD ~ SampleSize)

Sal Mangiafico
  • 11,330
  • 2
  • 15
  • 35
  • This is an excellent answer. Just a couple questions. First, what does the if(i%%(N/100)==0){cat(".")} part do as well as the cat functions after? Second, what does runif do in this situation? – Shawn Hemelstrand Nov 12 '22 at 02:46
  • 1
    runif() generates random values from a uniform distribution. So, it's analogous to rnorm(). ... The if() ... cat() code will print a period to the screen as the code loops through, printing 100 periods no matter the size of N. – Sal Mangiafico Nov 12 '22 at 02:53
  • 1
    Ah that makes sense now. This essentially gets at my question, so I thank you Sal. – Shawn Hemelstrand Nov 12 '22 at 02:54