I have a longitudinal data where the categorical response is collected at two-time points. I was wondering if it's possible to adjust my categorical response at baseline as a predictor and run a logistic regression model. The variables I have are Y1=response at time 1; Y2=response at time 2; X1= age; X2=a derived variable. Therefore the model will look like`Y2=a+bY1+cX1+d*X2. Could you please tell me if using this model will be mathematically correct at all? Thanks in advance.
Asked
Active
Viewed 219 times
1 Answers
1
Assuming that you are talking about a 2-level categorical response (e.g., 2-alternative forced-choice, lets' say "No=0/Yes=1") that model is "mathematically correct." The question is whether the model represents what you intend. Your model says that the log-odds of a "Yes" response at Time 2 has a contribution proportional to whether the choice was "Yes" at Time 1 (plus additive contributions proportional to age and to your "derived variable"). If that's what you intend, then go with it.
EdM
- 92,183
- 10
- 92
- 267
-
Thank you for your response. Yes, the model represents what I intend and formulated in a way that makes sense. I was worried if there are any hiccups in terms of losing statistical properties or violating any assumptions. Yes, the response in binary as well. – curiousmind Jul 15 '20 at 04:11
-
1@curiousmind including a baseline value as a predictor of future values is common practice. It's not good as a predictor of changes in values. See this link for some brief discussion. If your data are unbalanced there might be some bias in this approach, but bias is often a problem in observational studies. Furthermore, there's also a bias if you leave out any predictor associated with outcome from a logistic regression model, so including the baseline value makes sense on balance. – EdM Jul 17 '20 at 16:28