Analysis of one group of subjects (data provided)

Question

I have one group ($n = 39$) of subjects pre- and post-tested on a continuous variable. I also have a gender variable coded $0$ and $1$.

I was wondering how I can analyze this data so to detect any changes from pre- to post-test controlling for the gender variable?

Here is my data in R:

set.seed(62)
pre = rnorm(39)
post = rnorm(39, 3)
gender = rbinom(39, 1, .6)

I just added a set.seed() to ensure data are randomly generated the same way. Or you may get conflicting results. — Penguin_Knight, Jan 04 '19 at 19:14
No, because the question does not include any research question or what is the definition of "best." — Penguin_Knight, Jan 04 '19 at 19:17
One approach would be to compute the difference and use it as the dependent variable, and gender as the independent variable. Another approach could be multi-level model using a "long" form where each case is present twice, one for each time. Then run a mixed effects model using time point and gender as the independent, controlling for clustering at id level. You may also want to look into any interaction as well (aka does the magnitude of change depends on gender.) — Penguin_Knight, Jan 04 '19 at 19:41
@Penguin_Knight, amazing! can you possibly demonstrate any of these in R? — rnorouzian, Jan 04 '19 at 19:42

StatsStudent · Accepted Answer · 2019-01-05T15:50:37.750

I think the most straight forward approach (keep in mind there are several ways to skin this cat) is to perform a simple regression analysis on the difference in pre- and post-test scores with a single independent dummy variable for gender.

So for example, you'd want to preprocess your data and obtain $y_{diff_i}=y_{pre_i}-y_{post_i}$ for the $i$th subject $i=1$ to $39$. Then simply fit the model:

$y_{diff_i}=\beta_0+\beta_{gender}X_{gender_i}+\epsilon_i$

If you are interested in testing for a gender effect, you'd test whether:

$H_0: \hat{\beta}_{gender}=0$ vs. $H_1: \hat{\beta}_{gender}\ne0$.

Then, assuming there's no gender effect, to determine if there was a statistically significant difference between the pre and post tests, you simply test whether:

$H_0: \hat{\beta}_0=0$ vs. $H_1: \hat{\beta}_0\ne0$.

If you do find an important gender effect, you'd want to go about determining the average increase in score for each gender separately with the use of a contrast. Assuming men are coded 1 and women are coded 0, then estimated pre-post change for men becomes:

$\hat{\beta}_0+\hat{\beta}_{gender}$

and the average pre-post test change for women is estimated by just $\hat{\beta}_0$.

So continuing with your toy example, in R this could be fit as follows:

> y_diff=pre-post
> myfit<-lm(y_diff~gender)
> summary(myfit)

Call:
lm(formula = y_diff ~ gender)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.1334 -0.9911  0.4681  1.1861  2.4989 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -3.1081     0.4209  -7.385 8.75e-09 ***
gender       -0.3158     0.5480  -0.576    0.568    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.683 on 37 degrees of freedom
Multiple R-squared:  0.008892,  Adjusted R-squared:  -0.01789 
F-statistic: 0.332 on 1 and 37 DF,  p-value: 0.568

Of course, you'll need to carry out the usual model diagnostics and such, but I think you are familiar with those details.

Other possible methods include fitting a model and adjusting for the pre-test score as well as gender:

$y_{post_i}=\beta_0+\beta_{gender}X_{gender_i}+\beta_{pre}Y_{pre_i}+\epsilon_i$. See the first comment by rolando2 for more info on a model like this.

A plausible approach, but see https://pareonline.net/getvn.asp?v=14&n=6 for a brief account of why this approach is not necessarily the best. — rolando2, Jan 04 '19 at 21:23
+1 @rolando2. I agree very much that this is one of many approach that could be used. To be sure, each approach has their pros and cons. I purposefully selected this approach because - as stated in my opening - I think it's the most straight forward approach. This is especially so if the OP may not have more advanced skills to carry out something like a mixed effects models/repeated measures ANCOVAs, generalize estimating equations models, or a number of others of greater complexity. Thanks for posting the link. I think many will find this quite beneficial. — StatsStudent, Jan 04 '19 at 21:34
@user158565, no coding is needed for gender as it's binary as provided in the user's toy example so zero/one is sufficient. My previous answer only indicated how to handle the case if there were no gender effects. I've modified this accordingly - good catch. — StatsStudent, Jan 05 '19 at 14:56
Thank you so much! I see lots of comments about the use of multi-level model in this case. Can you possibly show a quick demonstration of that approach as well? — rnorouzian, Jan 05 '19 at 18:11

Analysis of one group of subjects (data provided)

1 Answers1