4

I'm looking at running the PAM50 classifier on RNA-Seq from 138 breast cancer samples. However, the R package (genefu) that's useful for this does not have a particularly helpful vignette when it comes to processing the RNAseq counts because it uses pre-processed publicly-available RNAseq data.

  1. Starting from counts, is there a way to pre-process the data for running intrinsic.cluster.predict in R/genefu using the built-in signatures? (I probably won't have enough power to split the set into a training/test set).
  2. The best choices for the parameters in intrinsic.cluster.predict would also be very helpful.

For differential expression (I have pre and post surgery pairs), I have used conditional quantile normalisation to get normalised log2(RPKM) values for running through R/EdgeR, so I also have these available if that is the best way to go.


EDIT: looking at the genefu vignette, this is all based on microarray data (which is essentially log2(expression)+offset), so will adding an offset to the log2(RPKM) values improve prediction for PAM50?

M__
  • 12,263
  • 5
  • 28
  • 47
user36196
  • 291
  • 1
  • 6

0 Answers0