Change Score or Regressor Variable Method - Should I regress $Y_1$ over $X$ and $Y_0$ or $(Y_1-Y_0)$ over $X$

Question

I have data about investment preferences 1 year before the Covid and during the Covid lockdown.

Some changes appear using simple T-Test. I want to be able to assess if these changes are particularly strong for some specific demographics (e.g., older individuals ($X_1$), individuals with lower income ($X_2$), etc...).

Should I use the initial level of my dependant variable in the regressions? Basically, if I want to use OLS regressions to investigate which independant variable correlate with the change in my dependant variable, which model is preferrable?

Model 1 (apparently called Change Score Method): $(Y_2-Y_1)= \beta_1 . X_1+ \beta_2 . X_2 $

Model 2 (apparently called Regressor Variable Method) Score Method): $Y_2= \beta_1 . X_1+ \beta_2 . X_2 + \beta_3 . Y_1 $

Thank you so much for your help - Any reference would also be much appreciated!

Maybe a dup: https://stats.stackexchange.com/questions/3466/best-practice-when-analysing-pre-post-treatment-control-designs — kjetil b halvorsen, Jul 21 '20 at 01:33

rnso · Accepted Answer · 2022-08-12T18:06:08.860

2

Both methods have been used. See here for example. It depends what question you want to answer. If you want to talk mostly about "change" you can use

(Y2-Y1) ~ X1 + X2            # (1)

Basal (Y1) should not be added to above equation as it will always be correlated with difference (Y2-Y1) - see comments below by @EdM and here.

On the other hand, if you want to discuss factors affecting "final value", you can use

Y2 ~ X1 + X2 + Y1            # (2)

However, since repeated measurements (Y1,Y2 at 2 times) have been done on same subject, hence mixed model is also often used. (including interactions as commented by @dbwilson below):

Y ~ X1 + X2 + time + X1*time + X2*time + (1|subject)

Following simplified version of formula is effectively same as above:

Y ~ X1*time + X2*time + (1|subject)            # (3)

There is another method commonly used, especially in biomedical literature: "Percent change", i.e.

(100*(Y2-Y1)/Y1) ~ X1 + X2            # (4)

It is not correct to keep Y1 as a predictor variable in this last method as there will be strong correlation between baseline and percent change.

I think this last method (percent change) is most understandable.

See here for more information on this topic.

Edit: For equation 3, data should be in form such that columns are: subject, x1, x2, time and y. Hence, y1 and y2 will be in 2 different rows (having same subject, x1 and x2 but different y value and time). For other equations, data will be in form such that columns are: subject, x1, x2, y1, y2 (one row for each subject; subject column will be ignored here).

edited Aug 12 '22 at 18:06

answered Jul 21 '20 at 01:16

rnso

10,009

Thank you so much for this detailed answer. In the end, given that I was mostly interested in change, I used (Y2-Y1) ~ X1 + X2 It is however interesting to see the last two methods you propose. Thank you again! – L. M. Jul 21 '20 at 10:30
Regressing the difference against the initial value is not a good idea. See this answer and its links and this answer to the question "What are the worst (commonly adopted) ideas/principles in statistics?" – EdM Jul 21 '20 at 11:21
I have added a note regarding this in answer above. – rnso Jul 21 '20 at 11:30
In the mixed-model, the interaction between X1time and X2time are estimating the same effect as the X1 and X2 effects in the change score model. The code, however, should be Y~X1+X2+time+X1time+X2time + (1|subject). – dbwilson Jul 21 '20 at 11:55
I have added this in answer above with your reference. – rnso Jul 21 '20 at 12:35
I don't understand why Equation 3 is correct here. Does Y equal Y2? I assume the variable Time is a centered dummy code (e.g., -1, 1), so where does the base rate get controlled for? I can't figure out where Y1 is in this model. – aarsmith Aug 11 '22 at 22:40
I have added clarification on this in my answer above. – rnso Aug 12 '22 at 18:06

Change Score or Regressor Variable Method - Should I regress $Y_1$ over $X$ and $Y_0$ or $(Y_1-Y_0)$ over $X$

1 Answers1

Linked