PAM50 gene expression classification

Question

I'm looking at running the PAM50 classifier on RNA-Seq from 138 breast cancer samples. However, the R package (genefu) that's useful for this does not have a particularly helpful vignette when it comes to processing the RNAseq counts because it uses pre-processed publicly-available RNAseq data.

Starting from counts, is there a way to pre-process the data for running intrinsic.cluster.predict in R/genefu using the built-in signatures? (I probably won't have enough power to split the set into a training/test set).
The best choices for the parameters in intrinsic.cluster.predict would also be very helpful.

For differential expression (I have pre and post surgery pairs), I have used conditional quantile normalisation to get normalised log2(RPKM) values for running through R/EdgeR, so I also have these available if that is the best way to go.

EDIT: looking at the genefu vignette, this is all based on microarray data (which is essentially log2(expression)+offset), so will adding an offset to the log2(RPKM) values improve prediction for PAM50?

PAM50 gene expression classification

0 Answers0