I would like to investigate the effect of a drug on survival with Cox regression. The drug can be administered either once or twice. In the univariate Cox regression, I see that a shorter time interval between the two drug administrations has an effect. I would also like to examine this effect in the multivariable Cox regression analysis. However, since only 50% of the patients were given the drug twice, I do not have the same number of patients as for the other variables. In the strict sense, this is not "missing data", as I know that this variable (time interval between drug administration) has no value for the patients with one dose. How can I deal with this? Or can I simply enter 0 as the time interval? I'm sure I'm not the first person to have this problem or question. In my search, however, I have only ever come across threads with the question of how to deal with "missing data". I look forward to any answers from you. Thank you very much!
-
Welcome to CrossValidated! First of all, do you know what software will you be using? Second, do you know what is the reason the other 50% of subjects were not given the drug twice? – jmarkov Nov 16 '23 at 11:33
-
I want to use R for analysis and wanted to ask the question in general. In detail: The immunity (survival) of a vaccine is to be investigated over time. This was administered either once or twice. Thank you in advance! @jmarkov – CoxRegressionNewbie2023 Nov 16 '23 at 12:07
-
Right, so the number of times it was administered it is random? I mean, a subject could have been administered twice just by chance? – jmarkov Nov 16 '23 at 12:22
-
This is correct. – CoxRegressionNewbie2023 Nov 16 '23 at 12:42
1 Answers
This situation is essentially what's described on this page with respect to loan amounts for individuals who didn't apply for loans: in your case, data on the interval between administrations can't exist for those who only got one administration.
The solution is similar: create a "dummy" predictor variable with a value of 0 for those who only got 1 administration, and a value of 1 for those who got 2 administrations (like the no-loan/loan dummy variable in the answer to the related question). Set the time interval between administrations to 0 for those with only 1 administration, and to the actual value for the others.
The Cox coefficient for that variable will be the log-hazard between those with only 1 dose and those with a (hypothetical) 0 interval between 2 doses. The coefficient(s)* for the time interval will only apply to those who received 2 doses.
After you fit the model, it will be most useful to compare predictions from representative scenarios rather than just relying on the values of the coefficients.
*You probably want to model that time interval flexibly, for example with a regression spline.
- 92,183
- 10
- 92
- 267
-
@CoxRegressionNewbie If this answer meets your needs, could you accept it by clicking the check mark? Thanks – Peter Flom Nov 30 '23 at 20:48