I am trying to fit a proportional hazards regression model to case-cohort data (where cases are oversampled and not representative of the population). I am using the Survival package and cch() function to fit a prentice-weighted PH cox-regression.
fit <- cch(Surv(followuptime, event) ~ A1 + A2 + A3, data =datadf, stratum=NULL, subcoh = ~subcoh, id=~id, cohort.size=4512, method="Prentice")
The A1, A2, A3 are metabolites, and I have 812 of them. However, I am interested in training the model on a training subset of the data, and using the predict function to get a "score" for each subject in the validation set. I later want to split the subjects according to whether the score was higher or lower than than median and plotting the Kaplan-Meier curve. However, I am unable to use a predict function for a cch object. Is there an alternative to do that?
I have also looked at using the coxphw package which implements Prentice weights and has a predict function. However, I am unclear on how it works in the context of a case-cohort data given that there is no argument to indicate the sub cohort groups or the cohort size.
fit <- coxphw(Surv(followuptime, event) ~ A1 + A2 + A3, data = datadf, template = "AHR").
I would appreciate any advice on this. I was also wondering how one would approach cross-validating such a model. I have read a lot of the literature and I am unclear on what to do next, as I am struggling to find methods/functions/packages that fit what I need.
Information on dataset: My dataset is from a case-cohort study. Cases (n=98 cases in total) and a random sub-cohort (n=325) were included in the case-cohort study. There were 12 cases in the sub-cohort of 325. As such, the final dataset included 301 controls, and 98 cases (total n=399).
Thank you so much!