I have a simple Cox proportional hazards model, with a small made-up data set. There are three groups, and the data is generated so that the time-to-event is larger in the third group than the other two. Please note that this code is copied from another poster's example (see the following link: KM versus Cox model):
times <- rexp(30)
status <- rep(1, 30)
groups <- rep(c(1,2,3), 10)
dat <- data.frame(groups, times, status)
dat[dat$groups==1, "times"] <- dat[dat$groups==1, "times"]+6
mod1<- coxph(Surv(times, status) ~ groups, dat)
summary(mod1)
That is fine, but the problem appears when I explicitly make the 'groups' variable a categorical variable:
dat$groups <- factor(dat$groups)
mod1<- coxph(Surv(times, status) ~ groups, dat)
summary(mod1)
I get sensible results when the group variable is numeric, but not when they are converted to factor. It seems like the model, if anything, should be MORE sensible when the groups are categorical.
What is happening? Why when I run the model with categorical 'group' variable do I get the following warning:
In coxph.fit(X, Y, istrat, offset, init, control, weights = weights,
:
Loglik converged before variable 1,2 ; coefficient may be infinite.
When the predictor is numeric, then the estimated coefficient describes the (log) HR for each unit increase, and when it is a factor, then the coefficients describe the HR when moving from the reference group to groups 2 and 3.
– Aaron Dec 28 '22 at 16:36Can you please provide any additional clarification on this point? It is possible that if I understood the score equation better this confusion would go away, but I am still a little stuck on this issue.
– Aaron Dec 28 '22 at 16:36