4

I have RNA-seq raw counts data for 50 samples. 20 Normal and 30 tumor. After differential analysis I got 30 DEGs. I want to make a violin plot showing the expression of each gene. I transformed counts to logCPM.

counts:

Genes   Tumor1  Tumor2  Normal1 Normal2
RP11-351J23.1   0   5   6   0
MIR17HG 989 896 8   0
RP11-563N12.2   0   0   0   0
LINC04393   0   2   16  0
RP11-336A10.4   0   0   0   0
DRH6-AB1    53  13  39  9
RP11-115J16.1   0   0   50  6
LINC70518   2   65  0   0

logCPM <- cpm(counts, prior.count=2, log = TRUE)

                Tumor1   Tumor2  Normal1  Normal2
RP11-351J23.1 11.84477 13.09301 15.71337 11.84477
MIR17HG      19.84847 19.79600 16.10350 11.84477
RP11-563N12.2 11.84477 11.84477 11.84477 11.84477
LINC04393     11.84477 12.47723 17.06532 11.84477
RP11-336A10.4 11.84477 11.84477 11.84477 11.84477
DRH6-AB1      15.72257 14.03896 18.32772 19.19283
RP11-115J16.1 11.84477 11.84477 18.68262 18.61229
LINC70518     12.44599 16.08368 11.84477 11.84477

I want to make violin plot for each gene showing expression for Tumors and Normals with this.

I first transformed logCPM. then tried like below:

ggplot2.violinplot(data=logCPM, xName='RP11-351J23.1',
    groupName='Tumor', position=position_dodge(0.8), 
    backgroundColor="white", groupColors=c('#999999','#E69F00'),
    legendPosition="top")
llrs
  • 4,693
  • 1
  • 18
  • 42
beginner
  • 631
  • 7
  • 15
  • From what package is this ggplot2.violinplot function? Could you upload the image you get and why it doesn't meet your requirements? – llrs Jul 17 '18 at 13:24
  • "easyGgplot2" is the package. I didn't get any plot. something went wrong and no error also. – beginner Jul 17 '18 at 13:27
  • Could you post your sessionInfo then? And is there any reason why you use this function and not the base ggplot2 functions? – llrs Jul 17 '18 at 13:28

1 Answers1

2

You'll want to use long-form data for everything:

library(dplyr)
library(ggplot2)
logCPM$gene = row.names(logCPM)
d = logCPM %>% gather(Sample, logCPM, -gene)
d$group = c(rep("Tumor", 2), rep("Normal", 2))
geneOfInterest = d %>% filter(gene == 'RP11-351J23.1')
ggplot(geneOfInterest, aes(x=group, y=logCPM)) + geom_violin()

That's a simple example, you can tweak it to meet your needs.

Devon Ryan
  • 19,602
  • 2
  • 29
  • 60
  • With logCPM mentioned in my question I have used your code. But there're some warnings and error. logCPM$gene = row.names(logCPM) Warning message: In logCPM$gene = row.names(logCPM) : Coercing LHS to a list; d = logCPM %>% gather(Sample, logCPM, -gene) Error in UseMethod("gather_") : no applicable method for 'gather_' applied to an object of class "list" – beginner Jul 17 '18 at 13:43
  • Make sure logCPM is a data.frame – Devon Ryan Jul 17 '18 at 13:45
  • yes. but In the plot I can see only Tumor. There is no Normal. https://imgur.com/a/LnxG4DN And how to add asterisks to denote a significant difference? – beginner Jul 17 '18 at 13:55
  • You somehow filtered out the normal samples. You can figure out how that happened. You can add characters with geom_text(), please see the ggplot documentation. – Devon Ryan Jul 17 '18 at 13:59
  • I followed the code. But dataframe "d" looks like this; gene Sample logCPM group 1 RP11-351J23.1 Tumor1 11.84477 Tumor 2 MIR137HG Tumor1 19.84847 Tumor 3 RP11-563N12.2 Tumor1 11.84477 Normal 4 LINC00393 Tumor1 11.84477 Normal 5 RP11-336A10.4 Tumor1 11.84477 Tumor 6 DLX6-AS1 Tumor1 15.72257 Tumor 7 RP11-115J16.1 Tumor1 11.84477 Normal 8 LINC00518 Tumor1 12.44599 Normal – beginner Jul 17 '18 at 14:16
  • Ah, you'll need to change the d$gene assignment line. – Devon Ryan Jul 17 '18 at 14:22