This is the paper which is kind of I'm trying to implement based on my set of genes which i have narrowed down from WGCNA analysis.
Its a both conceptual and R related question[How to do it in R] my lack of conpetual clarity.
For single gene Im clear what to do where I can divide or split my patinet group based on low and high expression from a particular gene and then I can do survival analysis as show this post
Now based on the above paper I cited I would like to highlight upon this figure, in the figure they have mentioned LS17(low standard and High). LSC17 is the combination of 17 genes which they are using to classify the Leukemia Cohort.
So if i have to do it in R on how to go about it since I'm unable to figure out how to go about it? As here in case instead of 1 gene I have 17 genes so How should I put a filter to categorize my patient samples as Low/Standard/High
As shown in the biostar post
survplotdata <- coxdata[,c('Time.RFS', 'Distant.RFS',
'X203666_at', 'X205680_at')]
colnames(survplotdata) <- c('Time.RFS', 'Distant.RFS',
'CXCL12', 'MMP10')
set Z-scale cut-offs for high and low expression
highExpr <- 1.0
lowExpr <- -1.0
survplotdata$CXCL12 <- ifelse(survplotdata$CXCL12 >= highExpr, 'High',
ifelse(survplotdata$CXCL12 <= lowExpr, 'Low', 'Mid'))
survplotdata$MMP10 <- ifelse(survplotdata$MMP10 >= highExpr, 'High',
ifelse(survplotdata$MMP10 <= lowExpr, 'Low', 'Mid'))
relevel the factors to have mid as the ref level
survplotdata$CXCL12 <- factor(survplotdata$CXCL12,
levels = c('Mid', 'Low', 'High'))
survplotdata$MMP10 <- factor(survplotdata$MMP10,
levels = c('Mid', 'Low', 'High'))
Here they have listed two genes which is CXCL12 and MMP10 gene but here they are calculating them differently.
So my question is how to do it for group of genes together?
Giving a dummy dataframe here
set.seed(123)
nr1 = 4; nr2 = 8; nr3 = 6; nr = nr1 + nr2 + nr3
nc1 = 6; nc2 = 8; nc3 = 10; nc = nc1 + nc2 + nc3
mat = cbind(rbind(matrix(rnorm(nr1*nc1, mean = 1, sd = 0.5), nr = nr1),
matrix(rnorm(nr2*nc1, mean = 0, sd = 0.5), nr = nr2),
matrix(rnorm(nr3*nc1, mean = 0, sd = 0.5), nr = nr3)),
rbind(matrix(rnorm(nr1*nc2, mean = 0, sd = 0.5), nr = nr1),
matrix(rnorm(nr2*nc2, mean = 1, sd = 0.5), nr = nr2),
matrix(rnorm(nr3*nc2, mean = 0, sd = 0.5), nr = nr3)),
rbind(matrix(rnorm(nr1*nc3, mean = 0.5, sd = 0.5), nr = nr1),
matrix(rnorm(nr2*nc3, mean = 0.5, sd = 0.5), nr = nr2),
matrix(rnorm(nr3*nc3, mean = 1, sd = 0.5), nr = nr3))
)
mat = mat[sample(nr, nr), sample(nc, nc)] # random shuffle rows and columns
rownames(mat) = paste0("gene", seq_len(nr))
colnames(mat) = paste0("LSC", seq_len(nc))
Transposing the dataframe
mat2 <- mat %>% as.data.frame() %>% t() %>% as.data.frame() %>% rownames_to_column("Patient")
In my dummy dataframe I have a total of 18 rows or genes so i would like to categorize all my columns[patient] based on the combination of 5 rows(genes) such as row1,row2,row3,row4,row5 as Five Low/ Five standard / Five high How to do it in R?
Any suggestion or help would be really appreciated

quantile(LSC17,probs=c(0.3333,0.6666))– David B Mar 21 '23 at 17:05