I am conducting a study in which I analyse the time to opt out after the free trial is ending. I have individuals who sign up for a free membership-trial period. At some point the period is ending and the hazard rate increases one month before that period, I can tell from the data. It is low before that. Therefore, I have chosen one month before end-of-trial period as my time zero. Of course, I still have data on the individuals who opted out before that, about 12%. Iām not sure whether I should call them left-censored or left-truncated though. The definition of left-censored implies that I do not know when they opted-out, but I do know that. On the other hand, the definition of truncated entails that I do not see the observations/they are hypothetical. So, neither definition fits my situation perfectly. What should I call these observations?
1 Answers
There's not a problem in this situation with using onTrialPeriod as a time-varying predictor in your model. You can keep other covariates as fixed in time if that's appropriate, just copy them over into the second data row (now with onTrialPeriod = FALSE) for each individual who extended beyond the trial period. That way you have at most 2 data rows for each individual. It probably makes sense to include the trial-period duration as a covariate, and to include interactions of covariates with onTrialPeriod. That uses all of your data most efficiently and allows for different associations of covariates with survival depending on whether the trial period is in effect.
There was a similar recent question here. If you don't care about the actual duration of free trial periods but only whether someone opted in, then you could use a binomial regression for the opt-in/not choice and model survival during the post-trial period.
With respect to the suggestion from @Henry in a comment, recall that a survival function in general is just 1 minus a (cumulative) probability distribution function. If the distribution function is defined for negative values then you have a survival function defined over negative values. Usual statistical survival software assumes that all time values are non-negative and will balk at using negative times, but there's no theoretical limitation on that. Left censoring is sometimes coded with -Inf as the start time in the counting-process data format, consistent with possible negative survival times.
- 92,183
- 10
- 92
- 267
time = 0to the start of the free membership. That gets around the problem completely. If different individuals are offered different trial-period lengths, then you can include that as a predictor in the model. ā EdM Feb 07 '23 at 18:25