I am conducting a discrete-time hazard model where the outcome being analysed is time to drop out of treatment during a 12-week clinical trial examining the effect of an agonist drug on days of illicit drug use. I am examining the influence of nine factors on whether and when participants drop out of treatment. The outcome is a discrete numeric variable, weeks in the trial (min = 1, max = 12).
Based on chapter 12 of Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence by Singer and Willett I am conducting a discrete-time hazard model with complementary Log-Log link function. It is essentially a logistic regression model with a cLogLog link function instead of the usual logit; chosen because the outcome is discrete for measurement reasons not actually discrete (time participants actually dropped out could be any day between week markers, but we only have info for the week they dropped out).
Based on what I read in the book I created a person-period dataset that looks like this (have left out all the columns of predictors save 1 as they wouldn't fit on the display)
You can see participant 5 dropped out three weeks into the trial, and hence has three rows, with the event occurring on the third, whereas participant 6 stayed the entire 12 weeks and hence has twelve rows with no event.
I ran the model in R with syntax that looks like this. For non-R users (along with my apologies for parochialism) the model has reference-level coding and includes all 12 columns for each discrete time period as well as the nine predictors (the -1 is R syntax for 'remove the intercept`)
modLogLog <- glm(formula = event ~ D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8 + D9 + D10 + D11 + D12 + gender + group + isi_tot + sf_pain + cpq_tot + qcq_tot + dass_tot + grams_per_day + durationRegUse_dec - 1,
family=binomial(link = "cloglog"),
data = treatDF_PP)
The output of the model looks like this
That was a lot of preamble I know. But my question relates to the coefficient for D12. For one it is tiny (1.82e-08 after reversing the complementary Log-Log transformation via 1-exp(-exp(foo))) and conspicuously non-significant (p=0.98), especially noticeable juxtaposed against the significant coefficients for the other time periods.
At first I was concerned that I had make some drastic mistake creating the person period dataset, but I checked and everything looked correct. But this morning at about 4am it came to me: the coefficient might be caused by all participants who stayed in treatment to week 12 being right-censored at week 12, which would effectively mean that, during the week 12 "window", the hazard of dropout is effectively zero (i.e. the dropout 'event' cannot occur at this time period, which means no 1's in the event column for anyone who has a 1 in the D12 column).
Which brings me to my two-part question
Was my 4am epiphany right? Is the out-of-place coefficient due to no events occurring during that time period?
and, more importantly
Should I remove the D12 column as a predictor from my model because no events can occur in that week?
Insights much appreciated


D12column containing zero events. But treating all people at study end as right-censored still causes the issue I ran into which is that all people remaining in the risk set are right-censored at that point (i.e. 0% hazard of dropout in that time window) which leads to those crazy coefficients. – llewmills Jun 22 '22 at 22:40