How to compare two signature for prediction models?

Question

I have done WGCNA based on the module that showed association with the trait I narrowed down key modules from which I proceeded with cox regression model

So the steps what I did in R as such

fit.coxnet <- glmnet(x_tr, y_tr, family = "cox",alpha=.95,lambda = lambda_seq)
plot(fit.coxnet,xvar="lambda")
plot(fit.coxnet)
cv.coxnet <- cv.glmnet(x_tr,y_tr,
                       family="cox",
                       type.measure="C",
                        alpha = .95)
plot(cv.coxnet)
co_mat1 <- as.matrix(coef(cv.coxnet,s="lambda.min"))
co_mat1[co_mat1[,1]!=0,]

Where my x_tr and y_tr are both training

I get the co_mat1 output as such for Module1

structure(list(gene = c("ENSG00000006659", "ENSG00000008853", 
"ENSG00000070808", "ENSG00000084734", "ENSG00000086570", "ENSG00000091490", 
"ENSG00000100228", "ENSG00000105672", "ENSG00000106772", "ENSG00000108179", 
"ENSG00000116985", "ENSG00000124196", "ENSG00000125740", "ENSG00000127412", 
"ENSG00000127946", "ENSG00000128578", "ENSG00000138622", "ENSG00000140836", 
"ENSG00000149635", "ENSG00000153208", "ENSG00000159228", "ENSG00000160999", 
"ENSG00000164086", "ENSG00000166866", "ENSG00000166922", "ENSG00000171885", 
"ENSG00000177138", "ENSG00000185338", "ENSG00000186510", "ENSG00000205856", 
"ENSG00000213214"), `co_mat1[co_mat1[, 1] != 0, ]` = c(0.111315518531421, 
0.000571117486479822, 0.0891201630635949, 0.0598057712435711, 
-0.131144750854546, 0.182391613168578, 0.19326085436214, -0.191567837804389, 
0.00796721001734388, 0.085953634941934, 0.00554035926198626, 
0.0776288760670583, -0.187116328081864, -0.0327269478695253, 
0.216471914721977, -0.176956226796014, 0.0230560752481754, -0.109709077697175, 
0.170102961363829, -0.00023210509664439, 0.275551962171425, -0.0235573355772408, 
0.389779353352752, -0.0143858241673411, 0.00550239038776184, 
-0.102658476410967, -0.0673222763256406, 0.0582474104970146, 
0.0386658549097694, 0.0852155443694458, 0.0923302099305247)), row.names = c(NA, 
-31L), class = "data.frame")

The above is for one module which I subsequently tested on test data set which i had made split from the main datasets and I also tested on the validation datasets.

Now I same process I followed for another module I get something like this. So the coefficients are as such Module2

structure(list(gene = c("ENSG00000080546", "ENSG00000087589", 
"ENSG00000110002", "ENSG00000131482", "ENSG00000137843", "ENSG00000144821", 
"ENSG00000149218", "ENSG00000157388", "ENSG00000170522", "ENSG00000174500", 
"ENSG00000178104", "ENSG00000188522"), `co_mat1[co_mat1[, 1] != 0, ]` = c(0.0554274278043017, 
0.120477299631229, 0.088005988354015, 0.366288261878325, -0.0369065377791416, 
0.425559760303767, -0.0949215442527281, -0.157374794133298, -0.165132912096827, 
0.0604885449805194, 0.405129979244698, 0.357046644274689)), row.names = c(NA, 
-12L), class = "data.frame")

So far what I read and understood is I can compare these using ROC curve The doubt I have how do i compare which one is better using which dataset is it the test data set or the validation datasets?

For the ROC part I have tried something like this

library(ROCR)
library(rms)
library(pROC)
#fit.cph <- cph(Surv(OS_MONTHS, Status)~ turqoise_module, data=df2,
x = TRUE, y = TRUE, surv = TRUE)
fit <-  coxph(Surv(OS_MONTHS, Status) ~ turqoise_module, data = df2)
fit
pred_scores <- predict(fit, newdata = df1, type = "risk")
aa <- as.matrix(y_te)
ab<- as.data.frame(aa)
roc_obj <- roc(ab$status, pred_scores)
ggroc(roc_obj, legacy.axes = TRUE, title = "ROC Curve for Cox Model")

where df2 is the data which is using solution which was suggested so I take the coefficient and the categorize them based on median of that genes. Now this gives me one ROC curve.

Now If I have to do for two model comparison what is the statistical way and R way to tell Module1 is better than Module2 or vice-versa.

Fundamental doubt

pred_scores <- predict(fit, newdata = df1, type = "risk")

The above code works only for the data which I had split for training and testing data set, I tried with validation set it failed due to dimension issues. My question am I doing it the right way?

EdM · Accepted Answer · 2023-04-27T13:11:27.050

Frank Harrell explains here that measures based on ROC curves aren't very sensitive ways to compare different models. Also, unless you have a few tens of thousands of events, you don't have enough cases to use train/validation/test splits reliably, as Harrell explains here in general. (The power of survival models depends on the number of events, not the number of cases as does linear regression.)

Furthermore, in your generation of df2 you seem to have ignored an important warning in the answer to which you link:

Transformation of continous data to categorical means an information loss...

From those perspectives, you are not "doing it the right way."

One approach would be to take a bootstrap sample (of the same size, but with replacement) from from the full data set, perform the entire modeling process (cross validation to get penalty, etc.) with each of the modules on the bootstrap sample, and compare the 2 models' abilities to predict results on the entire data set. Repeat for a few hundred bootstrap samples, and combine the results.*

Finally, if your model doesn't also include outcome-associated clinical variables, it probably won't be very useful. See this recent report on better ways to use LASSO and other approaches for combining genomic data with clinical data for survival analysis: David Wissel, Daniel Rowson, and Valentina Boeva, "Systematic comparison of multi-omics survival models reveals a widespread lack of noise resistance," Cell Reports Methods 3, 100461 (2023).

In response to comment

The document you cite includes a different bootstrap-resampling approach for comparing two "modules." If you have precalculated continuous scores for all samples based on each of your 2 modules, then you can use that approach very simply.

Identify the interquartile range for each of your module's scores. Call them something like IQR1 and IQR2.

IQR1 <- IQR(module1)
IQR2 <- IQR(module2)

Set up a null vector to store the results of the bootstrapping.

resultStore <- NULL

Set a random seed for reproducibility.

set.seed(97675)

Then do the bootstrapping. Repeat the following a large number of times, say 999.

Take a bootstrap sample from the full data set. That can be done from your dataSet simply via:

bootSample <- dataSet[sample.int(nrow(dataSet),replace=TRUE),]

Do your Cox fits on bootSample for each of the modules.

bootFit1 <- coxph(Surv(time, status) ~ module1 + otherCovariates, data = bootSample)
bootFit2 <- coxph(Surv(time, status) ~ module2 + otherCovariates, data = bootSample)

Get the log-hazard-ratio across the IQR for each module's scores, and their difference:

logHR1 <- coef(bootFit1)[[1]] * IQR1 ## module1 is  the first coefficient
logHR2 <- coef(bootFit2)[[1]] * IQR2 ## module2 is  the first coefficient
logHRdiff <- logHR2 - logHR1

Append that difference to the result-storage vector

resultStore <- c(resultStore, logHRdiff)

After your multiple bootstrap samples, model fits, and calculations, your resultStore has an estimate of the distribution of the differences between log-hazard ratios for the two modules if you had applied the modules to new data samples. Put them in order:

resultStore <- resultStore[order(resultStore)]

and if you did 999 bootstrap samples then the 25th and 975th in order give you 95% confidence limits for the difference between the modules. If a difference of 0 is outside those confidence limits, then you have evidence that one of the modules is superior.

That seems to be the approach used in the link you provided. With continuous scores, you might be advised to use a regression spline rather than a linear fit in the Cox model, so you'd have to adapt the logHR calculations to get the actual logHR between the 25th and 75th percentiles of the corresponding module's scores. There also are potentially better ways to evaluate the bootstrapping results. And this doesn't deal with potential randomness in how you developed the score to start, which the approach I suggested earlier could address. But it might be good enough for your purpose.

*You also might try to to compare Akaike Information Criterion (AIC) values on the 2 models as built on the full data set, but that requires some extra care as explained on this page.

'Transformation of continous data to categorical means an information loss...' this you had mentioned earlier since I followed few of the paper which did the similar way as I choose to play safe due to my limited stats concept. I will try to follow the cell report one and the AIC one you suggested and perhaps I will come back with question if i have doubt which will be there — PesKchan, Apr 25 '23 at 18:13
This paper https://static-content.springer.com/esm/art%3A10.1038%2Fs41375-019-0604-8/MediaObjects/41375_2019_604_MOESM1_ESM.pdf the final figure they used the ROC approach that is what I tried to follow/copy — PesKchan, Apr 25 '23 at 18:14
@PesKchan although that document does show ROC curves at the end, the statistical method for comparing the 2 types of model described in "Supplementary Note2" is a bootstrap-based approach similar to what I suggest. — EdM, Apr 25 '23 at 18:25
I did see but i was not able to get it, again my question "is a bootstrap-based approach similar to what I suggest." is there any R based example which would be easier for people like me to follow? — PesKchan, Apr 25 '23 at 19:15
I dont have words to thank you for showing me how to do it,thank you again so very much — PesKchan, Apr 26 '23 at 16:03
logHR2 <- coef(bootFit1)[[1]] * IQR2 this is supposed to be bootFit1 or bootFit2?I think bootFit2 — PesKchan, Apr 27 '23 at 11:32
I will show you my full code what Im doing to make sure I understood what you told me — PesKchan, Apr 27 '23 at 15:38

How to compare two signature for prediction models?

x = TRUE, y = TRUE, surv = TRUE)

1 Answers1

Linked