0

I hope someone can help me with this simple question:

I want to derive Cohen's d from raw data.

I know cohen's d is just the (mean1-mean2)/pooledSD. What I want to do is to derive it directly from raw data is such a way that I work in the standardised scale.

I have tried to calculate the pooled mean and pooled SD and then standardise each observation by (obs1-pooled mean)/pooledSD and do the same for all observations.

The Cohen's d and the difference calculated using the method above are very similar (almost the same) but not exactly the same.

Please can anyone confirm that cohen's d is mathematically the same as standardising the observations and calculating the mean difference of standardised observations between groups?

enter image description here

a}} calculating cohens'd from aggregated data grandmean 21.875 mean0 21 mean1 22.75 pooledsd 3.068953936

cohen's d -0.570226871

b}} calculating cohen'd from raw data using the field (mpg-grandmean)/pooledsd mean_1 -0.285113435 mean_2 0.285113435

diff 1 and 2 -0.570226871

I need to know whether approach a}} and approach b}} are equivalent mathematically.

I really appreciate your help!

Xavier
  • 23
  • Can you show us the formula you've used? Is your groups paired or unpaired, and is it balanced or unbalanced because that can affect the formula you use – Huy Pham Apr 18 '19 at 12:51

1 Answers1

0

EDIT: I’ve heavily edited my answer because I misinterpreted the question and then couldn’t do basic maths. OP wanted to standardize the data using the pooled SD then take the mean of each group and then subtract those means from each other. So,

$\frac{1}{n}\sum_i\frac{ x_i-GRAND}{S_{pooled}}-\frac{1}{n} \sum_j\frac{ x_j-GRAND}{S_{pooled}}\\=\frac{1}{n s_{pooled}}\sum_i (x_i -GRAND) -\sum_j (x_j -GRAND)\\=\frac{1}{n s_{pooled}}\sum_i (x_i) -nGRAND -\sum_j (x_j) +nGRAND\\=\frac{\sum_i x_i - \sum_j x_j}{n S_{pooled}}\\=\frac{1}{n}(\sum_i x_i -\sum_j x_j) \frac{1}{S_{pooled}}\\=\frac{\frac{1}{n}\sum_ix_i-\frac{1}{n}\sum_jx_j}{S_{pooled}}$

Which is the formula for Cohen's D.

Using the data provided in R:

    mpg<- 
   c(20,23,21,25,18,17,18,24,20,24,23,19,24,25,21,22,23,18,17,28,24,27,21,23)
    treated<-as.factor(c(rep(0,12),rep(1,12)))

    mean(mpg[c(1:12)]-mean(mpg))/
      sqrt((11*var(mpg[c(1:12)])+11*var(mpg[c(13:24)]))/(22))-
    mean(mpg[c(13:24)]-mean(mpg))/
      sqrt((11*var(mpg[c(1:12)])+11*var(mpg[c(13:24)]))/(22))
    [1] -0.5829654
    library(effsize)
    cohen.d.default(mpg, treated)
    Cohen's d

    d estimate: -0.5829654 (medium)
    95 percent confidence interval:
        lower      upper 
    -1.4474169  0.2814861 
Huy Pham
  • 1,072
  • 11
  • 14
  • No, I did not do that. what I did was i−(11∑+11∑). This made all my values standardised. Then I took the mean of the standardised values in group 1 and the mean of the standardised values in group 2. My question is whether cohen's D is the same as calculating the difference in the mean of standardised values – Xavier Apr 18 '19 at 15:43
  • OK fair enough, I will remove my answer, but can you maybe edit your question a little to show us what you've done? I can help but I'm going to need more information. It's not really clear what you've done at this stage. – Huy Pham Apr 18 '19 at 15:45
  • No, I did not do that. what I did was i−(11∑+11∑). This made all my values standardised. Then I took the mean of the standardised values in group 1 and the mean of the standardised values in group 2. My question is whether cohen's D is the same as calculating the difference in the mean of standardised values – Xavier Apr 18 '19 at 15:45
  • Well, it took a while to type, but hope that helps. – Huy Pham Apr 18 '19 at 16:22
  • I have edited my question with an example. Please could you look at it whether both approaches are equal. I simply do not know how I can use equations in here. Thanks in advance – Xavier Apr 18 '19 at 16:47
  • Actually sorry (had to delete my last comment) looking back over it again. It looks like you're doing the second way and I'm not sure if it works. Sorry but i can't read your data. If you can arrange it in a producible way i can put it through R and do the manual sums. Otherwise as i've said there's the effect size package in R, or i can even forward a spreadsheet if you'd like. Sorry, this is about as far as I feel comfortable commenting without something i can paste into R. – Huy Pham Apr 18 '19 at 17:21
  • Thank you, I have edited again and put it as an image. What I need to know is whether approach a and b are equivalent. Thanks!!!! – Xavier Apr 18 '19 at 18:17
  • Ok so from that data, the cohen's D is -0.5829654. Have you been using the sd of the whole sample as the 'pooled sd'? or have you been using sqrt(((n-1)var1+(n-1)var2)/(n1+n2-2))? I've recreated your numbers, and -0.570 happens when you just use the SD of the whole sample (3.068954), which is not the same as the 'pooled SD' of the two groups (3.001893). The good news is mean of ((mpg-grand)/SDpooled)-((mpg-grand)/SDpooled) DOES equal cohen's D, which is what you were after in the first place! But i think your issues might be coming from using the wrong 'pooled sd' – Huy Pham Apr 18 '19 at 19:03
  • I made an error in my maths because the second way--what you were saying does equal Cohen's D; it would've saved so much more time if i had seen it from the start! Still, good to check it through properly. So I'll be deleting this answer if you are satisfied. But yeah be careful of what you're using as the pooled SD. – Huy Pham Apr 18 '19 at 19:07
  • Thank you, yes I know that pooled SD is sqrt(((n-1)var1+(n-1)var2)/(n1+n2-2)). But my question was more whether i could workout from raw data a way to obtain the cohen's d instead of obtaining it using aggregated data. Thanks for all your help!!!! – Xavier Apr 18 '19 at 20:11
  • Thanks for putting up with me, we got there in the end. Well, if you found it helpful, i've edited the answer instead of deleting it then. The maths did add up, so it's not just a coincidence that they equal. I just didn't spot it the first time round. – Huy Pham Apr 18 '19 at 21:04