Most Popular

1500 questions
5
votes
2 answers

Difference between samtools mark duplicates and samtools remove duplicates?

What is difference between samtools mark duplicates and remove duplicates ? Is it necessary to mark duplicates before removing duplicates with samtools?
5
votes
1 answer

Correcting for noise in RT-qPCR gene expression data

I have a training set of RT-qPCR gene expression data (not run in triplicate) for a batch of samples with two phenotypes $A$ and $B$ on which I've trained a "logistic regression classifier". I also have another smaller set of samples which have been…
Set
  • 241
  • 1
  • 8
5
votes
1 answer

No counts for added gene in cellranger (scRNA-seq)

I have a set of scRNA-seq samples enriched with FACS for cells expressing a specific gene reporter (TdTomato). In particular the gene I want to report has positive counts in the resulting matrix for 97% of cells. I followed CellRanger documentation…
gc5
  • 1,783
  • 18
  • 32
5
votes
2 answers

Reduce number of transcripts in a highly variable de novo transcriptome assembly

I have a de novo assembly using both multiple SRA and locally sequenced transcriptomes. I started with 270M PE reads from 9 tissues. Here are the assembly stats generated with TrinityStats.pl: ################################ ## Counts of…
LinuxBlanket
  • 309
  • 1
  • 10
5
votes
0 answers

Annotating splice junctions from tophat/STAR output

Is there a way to annotate the splice junctions output from tophat/STAR output? What I mean by annotate is can I know if it was involved in an alternative splicing event say skipped exon, MXE or retained intron...? I did some research looks like…
5
votes
1 answer

Why do NEBNext indexing primers have sequence between the p5 oligo and index?

In a previous post I asked Why do NEB adapters have non-complementary sequence? Since then, I realized that there is some other sequence in the p5 indexing primer, as well as in the p7 indexing primer. Here is a diagram of the NEBNext protocol. The…
conchoecia
  • 3,141
  • 2
  • 16
  • 40
5
votes
2 answers

Spearman correlation for large dataset

I have two datasets (DataA and DataB) and I want to find the Spearman correlation between genes and also pull out the gene names (stored in first column of dataset) in R. I am using fread from read.table to read the file and cor.test to find Rho and…
user98059
  • 347
  • 3
  • 11
5
votes
1 answer

Macs2 peak calling?

I have paired end ChIP-seq data with 101 bp and 2 biological replicates for each one. I have done peak calling with macs2 but I have some questions about it. I also faced with an warning: WARNING @ Thu, 07 Jun 2018 17:06:05: #2 Since the d (197)…
star
  • 153
  • 3
5
votes
1 answer

What causes the difference in total length of assembled contigs and scaffolds in SOAPdenovo2?

I use SOAPdenovo2 to assemble a large genome (4.8G) using ~20X paired-end reads. The total length of contig sizes is 6.3G while total length of scaffolds is 2.7G. Note that this is a haploid genome, so there is no issue of heterozygosity for…
5
votes
1 answer

Viral Metagenomics

I am analyzing viral metagenomics data (Illumina Miseq) for the first time. I have used Ray (reference below) for de novo viral genome assembly before but I haven't done metagenomics analysis before. I know that there are some tools like Metavelvet…
L R Joshi
  • 719
  • 3
  • 11
5
votes
3 answers

KEGG FTP vs KEGG API

I was reading the KEGG plea and I found that it doesn't forbid using the KEGG API. Then, what is in the FTP server license for personal use/academic use that it is not covered by the API? Or I could download all the database via the API? PS: I…
llrs
  • 4,693
  • 1
  • 18
  • 42
5
votes
4 answers

Specific cell type identification in Single Cell Sequencing

In order to define which cell is of which type we need to identify a set of rules, for instance neurons should express one of the following: Thy1, Rbfox3, MAP2, Camk2b, Gad1,Cck, Reln, and should not express any of the following: cd45, Tmem119,…
Nikita Vlasenko
  • 2,558
  • 3
  • 26
  • 38
5
votes
2 answers

Smallest group size for differential expression in limma (bulk RNA-Seq)

I am reading Smyth et al. (ref. 1). I want to run differential expression analysis on a bulk RNA-Seq dataset in which each group is composed by 2 samples. In the paper previously cited it is written that: Genes must be expressed in at least one…
gc5
  • 1,783
  • 18
  • 32
5
votes
4 answers

Pathway level analysis of single-cell gene expression

I'm looking for single-cell specific methods to construct (using gene expression data) new features that express pathway "level" or "activity", and then use these for clustering cells. One example for bulk RNA-seq is PLAGE, implemented in the GSVA R…
Peter
  • 2,634
  • 15
  • 33
5
votes
2 answers

Error creating indices using STAR

I am trying to index wheat genome using STAR through following command STAR --runMode genomeGenerate --genomeFastaFiles Triticum_aestivum.TGACv1.dna_sm.toplevel.fa --runThreadN 28 But getting following error, terminate called after throwing an…