This was one the [question][1] which was helped and suggested by [EdM][2] how to do it in R way.
So far what I have done is this
The dataset6 is my complete dataset which include all my genes and the Scores of the two modules which I have already calculated based on the regression coefficient already obtained.
dataset6 = inner_join(surv_obj1,dataset5) %>% cbind(M1[c(32)]) %>% cbind(M2[c(42)])
dim(dataset6)
##Modules : all the genes and the module scores
M1 <- read.csv("Module_turq.txt",sep = "\t")
names(M1)[32] = "turquoise"
M2 <- read.csv("module_red.txt",sep = "\t")
names(M2)[42] = "red"
As an example the M1 looks something like this
head(M1)
ENSG00000006659 ENSG00000008853 ENSG00000070808 ENSG00000084734 ENSG00000086570 ENSG00000091490 ENSG00000100228 ENSG00000105672 ENSG00000106772 ENSG00000108179
1 -0.6790895 9.838845 3.8723699 2.0334803 7.408578 11.63426 5.388844 3.403330 8.402316 12.10071
2 -0.7223128 9.216762 1.9638908 0.2942262 6.407777 11.65133 4.751099 2.862639 8.371253 10.56136
3 -0.6791654 8.901460 1.9595495 1.1313903 8.457811 11.66191 4.814653 2.939381 9.843061 11.04008
4 -0.7388880 9.249982 2.8520751 0.9272291 7.241343 11.37203 4.344087 3.680633 9.009311 11.64196
5 -0.6837360 10.312524 1.3708872 1.3626295 7.800494 11.00893 5.046930 3.630544 7.975663 11.56495
6 0.2083037 8.045902 0.9177488 0.7799186 9.234063 11.16630 5.218017 4.294078 6.153577 11.39511
ENSG00000116985 ENSG00000124196 ENSG00000125740 ENSG00000127412 ENSG00000127946 ENSG00000128578 ENSG00000138622 ENSG00000140836 ENSG00000149635 ENSG00000153208
1 8.337326 0.6028046 14.95574 3.676078 12.195097 6.266345 0.7590498 10.134101 1.07506924 9.153512
2 7.050741 0.1714888 13.19622 3.044690 11.475854 7.258213 0.9300694 11.619774 0.38973280 10.189657
3 7.675785 0.8536641 15.04202 3.275292 11.331764 6.495812 3.3460528 10.565562 -0.06284153 9.095599
4 8.114534 0.4120677 13.72461 2.954629 11.965255 6.803049 0.6115331 9.842173 0.67297629 9.193858
5 8.852637 0.2637794 14.80603 3.040592 11.733995 6.085176 1.3058149 11.195194 -0.07398686 8.680911
6 8.150247 1.5651071 13.70370 3.678386 9.989683 6.363244 1.2811180 10.964695 0.88423334 10.820407
ENSG00000159228 ENSG00000160999 ENSG00000164086 ENSG00000166866 ENSG00000166922 ENSG00000171885 ENSG00000177138 ENSG00000185338 ENSG00000186510 ENSG00000205856
1 9.890364 8.853479 10.935791 3.737294 4.531792 0.9634707 0.2378286 6.563547 2.408185 1.1550716
2 7.085259 7.590380 9.670493 4.879658 4.407784 0.4286926 2.2786092 4.555640 2.336665 0.6929619
3 8.294819 8.290893 9.657836 5.134109 3.646726 2.3294132 2.1727927 5.381531 3.271168 0.8168658
4 9.153176 7.964677 10.564849 3.630441 4.645231 1.3016413 0.5430753 5.313173 2.586458 1.1699673
5 7.814855 8.173153 11.026781 4.127289 4.826615 0.4942343 0.6896831 6.481789 1.758393 3.5270260
6 6.933262 8.273962 9.209629 4.554088 4.788554 2.2729740 0.9930626 6.183071 2.182287 0.3210672
ENSG00000213214 turquoise
1 0.9801103 8.135249
2 0.8521615 6.033414
3 1.5692256 6.050110
4 0.4066649 7.211567
5 2.3361480 6.996665
6 0.4935823 5.172616
Extract colnames to create formula
# Generate formula string
predictor_names1 <- colnames(M1) # exclude the first two columns (time and status)
formula_str1 <- paste("Surv(OS_MONTHS, Status) ~", paste(predictor_names1, collapse = " + "))
# Fit the Cox proportional hazards regression model
my_formula1 <- as.formula(formula_str1)
predictor_names2 <- colnames(M2) # exclude the first two columns (time and status)
formula_str2 <- paste("Surv(OS_MONTHS, Status) ~", paste(predictor_names2, collapse = " + "))
Fit the Cox proportional hazards regression model
my_formula2 <- as.formula(formula_str2)
Running bootstrap
for (i in 1:1000) {
Generate a bootstrap sample
bootSample <- dataset6[sample.int(nrow(dataset6),replace=TRUE),]
Calculate the IQR for each module
IQR1 <- IQR(M1$turquoise)
IQR2 <- IQR(M2$red)
Fit the Cox proportional hazards regression model for each module
bootFit1 <- coxph(my_formula1,data = bootSample)
bootFit2 <- coxph(my_formula2,data = bootSample)
Calculate the logHR difference between the two modules
logHR1 <- coef(bootFit1)[[2]] * IQR1
logHR2 <- coef(bootFit2)[[2]] * IQR2
logHRdiff <- logHR2 - logHR1
Store the result
if (i == 1) {
resultStore <- logHRdiff
} else {
resultStore <- c(resultStore, logHRdiff)
}
}
Sort the results
resultStore <- resultStore[order(resultStore)]
Calculate confidence limits for logHR difference
conf_limits <- quantile(resultStore, probs = c(0.025, 0.975))
Print the confidence limits
conf_limits
In the above code to find the IQR for the both the modules I used the module scores which is basically turquoise for first module and red for the second module
Before that I also tried something like this where I tried to find IQR of each column I ran this
IQR1 <- M1 %>%
select_if(is.numeric) %>%
summarise(across(everything(), IQR))
IQR2 <- M2 %>%
select_if(is.numeric) %>%
summarise(across(everything(), IQR))
it didn't work
As suggested in the my previous question
if you did 999 bootstrap samples then the 25th and 975th in order give you 95% confidence limits for the difference between the modules. If a difference of 0 is outside those confidence limits, then you have evidence that one of the modules is superior.
To do the above I ran this which in the above code
# Calculate confidence limits for logHR difference
conf_limits <- quantile(resultStore, probs = c(0.025, 0.975))
Print the confidence limits
conf_limits
Output
2.5% 97.5%
-1.814380 2.446325
So what would be the output interpretation since it contains 0 there both the model are basically same or similar ?
Coxph model output
Call:
coxph(formula = my_formula1, data = bootSample)
coef exp(coef) se(coef) z p
ENSG00000006659 -0.08480 0.91869 0.18335 -0.463 0.643702
ENSG00000008853 -0.71237 0.49048 0.45977 -1.549 0.121284
ENSG00000070808 0.22058 1.24680 0.18767 1.175 0.239853
ENSG00000084734 0.15579 1.16858 0.25058 0.622 0.534113
ENSG00000086570 -0.78152 0.45771 0.33290 -2.348 0.018894
ENSG00000091490 0.30301 1.35392 0.26720 1.134 0.256792
ENSG00000100228 0.82649 2.28528 0.32949 2.508 0.012128
ENSG00000105672 -0.57979 0.56001 0.22165 -2.616 0.008901
ENSG00000106772 0.21472 1.23952 0.21201 1.013 0.311153
ENSG00000108179 0.59817 1.81879 0.42130 1.420 0.155654
ENSG00000116985 -0.16206 0.85039 0.28520 -0.568 0.569867
ENSG00000124196 0.69766 2.00905 0.33448 2.086 0.036994
ENSG00000125740 -0.26467 0.76746 0.21396 -1.237 0.216094
ENSG00000127412 -0.09850 0.90620 0.22435 -0.439 0.660640
ENSG00000127946 0.33344 1.39575 0.26910 1.239 0.215323
ENSG00000128578 -0.33829 0.71299 0.21157 -1.599 0.109822
ENSG00000138622 0.02457 1.02488 0.37422 0.066 0.947643
ENSG00000140836 0.03179 1.03230 0.28102 0.113 0.909943
ENSG00000149635 0.71572 2.04565 0.23854 3.000 0.002696
ENSG00000153208 0.22391 1.25095 0.20699 1.082 0.279368
ENSG00000159228 0.18013 1.19737 0.22749 0.792 0.428466
ENSG00000160999 -0.56201 0.57006 0.28997 -1.938 0.052600
ENSG00000164086 1.17011 3.22234 0.44175 2.649 0.008078
ENSG00000166866 -0.34015 0.71166 0.28358 -1.200 0.230328
ENSG00000166922 0.48040 1.61673 0.30426 1.579 0.114354
ENSG00000171885 -0.37376 0.68814 0.27954 -1.337 0.181199
ENSG00000177138 0.11922 1.12661 0.30421 0.392 0.695140
ENSG00000185338 0.76505 2.14910 0.22498 3.400 0.000673
ENSG00000186510 0.47312 1.60500 0.23003 2.057 0.039711
ENSG00000205856 0.22150 1.24794 0.32990 0.671 0.501960
ENSG00000213214 0.06063 1.06250 0.22885 0.265 0.791064
turquoise NA NA 0.00000 NA NA
Likelihood ratio test=152.9 on 31 df, p=< 2.2e-16
n= 146, number of events= 92
Why the turquoise is NA?
[1]: How to compare two signature for prediction models?
[2]: https://stats.stackexchange.com/users/28500/edm
Cox model use only the module score and omit all of the individual gene-expression values that go into creating it.thank you for clarifying it I was kind of stuck and in doubt what I implemented here thinking if i understood your point or not – PesKchan Apr 28 '23 at 19:30