I am performing a coxph survival analysis and want to split my data into a training and test set. During the training I would like to identify the "best" genes that predict survival. Since the test.set has to be completely blind I wonder how I select the training set? I can perform a random sampling of the rows, then use the rows that are not part of the first sampling as the test set. However, If I perform the random sampling several times, then the all the rows in the dataset will eventually be used, and the test set is not unique anymore. Suggestions how to do this?
df.t <- structure(list(hsa_miR_105_5p = c(3.58497328179801, 5.73145238130165,
1.19037294682376, -1.28586123284671, 1.27004401721869, 0.958088884635556
), hsa_miR_17_3p = c(1.21345556145455, 4.71642723353062, 5.87616915208789,
0.776249937585565, 4.86437477300888, 1.71876771352689), hsa_miR_3916 = c(6.74863569372315,
3.23155618956527, -0.105259761381448, -1.28586123284671, 4.60953338597123,
2.95060221832751), hsa_miR_1295a = c(-1.35668910756094, 0.147551018264645,
2.44220202218853, -1.28586123284671, 5.47367734142336, -0.135507425889107
)), row.names = c("86", "175", "217", "394", "444", "618"), class = "data.frame")
Time <- structure(c(1796, 1644.04166666667, 606.041666666667, 1327.04166666667,
665, 2461), class = "difftime", units = "days")
Status <- c(0L, 0L, 1L, 0L, 1L, 0L)
cox.out <- capture.output(for(i in colnames(df.t)){
print(summary(coxph(as.formula(paste0("Surv(Time, Status)~", i )), data=as.data.frame(df.t))))
})
rmspackage, which includes several measures for Cox model validation and calibration. Theglmnetpackage uses partial likelihood deviance as a measure for hyper-parameter selection in LASSO, a modeling approach I strongly recommend for your application. – EdM Dec 05 '20 at 17:38statisticif you use that function. I sense that you are moving very quickly at looking for particular functions. Based on my own experience, I think that it will be in your best interest to step back a bit and first devote some serious study to the issues of evaluating Cox models and resampling. Then you will be better equipped to determine just what functions you need to implement to meet the goals of your project. You will also be able to respond intelligently to questions from reviewers when you go to publish. – EdM Dec 05 '20 at 17:56