Most Popular

1500 questions
6
votes
1 answer

Read counts from BAM file

I have few BAM files which are generated from Ion Torrent Server (ampliseq) aligned to hg19 genome. I want to extract read counts from the bam files and I know that "featureCounts" can be used for this. But before getting the read counts, to view…
stack_learner
  • 1,262
  • 14
  • 26
6
votes
1 answer

Is there a more elegant solution to the bwa-mem: paired reads have different names error?

I'm currently trying to run bwa-mem on Influenza substrains using the following command: ~/bwa mem h5n1_1_cons.fa h5n1_1_read1.fq h5n1_1_read2.fq h5n1_1_cons.fa is the consensus sequence for substrain h5n1_1, and the fq files are paired end reads…
sgav
  • 63
  • 1
  • 4
6
votes
2 answers

Estimate the length of poly-A tails from randomly-primed RNAseq data

So a poly-A tail is a long chain of adenine nucleotides that is added to a messenger RNA (mRNA) molecule during RNA processing to increase the stability of the molecule. For my project, I would like to estimate the length of poly-A tails from…
ShanZhengYang
  • 1,691
  • 1
  • 14
  • 20
6
votes
2 answers

Transferring genomic features on new coordinates

I have a eukaryotic genome for which an updated sequence for a chromosome was recently obtained. I want to map RNAseq reads on the genome (and perform other downstream analyses) and would like to use the most up-to-date information possible (so the…
BioNaab
  • 61
  • 1
6
votes
1 answer

Pseudo-temporal ordering in heterogeneous populations

What's the most widely accepted tool for doing pseudo-temporal ordering from scRNAseq data? Also is there away to separate differential expression that occurs based on "cell identity" or maybe more accurately cell type fate from that which arises…
Alon Gelber
  • 173
  • 1
  • 5
6
votes
4 answers

Why total RNA-seq usually yields low mapping rate?

Maybe this is a silly question, but I'm really wondering why we usually get low mapping rates if we map total RNA-seq, but not poly(A)-enriched (in particular, for human, mouse, and zebrafish datasets)? Doesn't the genome fasta file contain…
kaka01
  • 111
  • 1
  • 6
6
votes
2 answers

How to determine a protein's cellular location based on its sequence?

I am wondering about the appropriate workflow to determine a protein's cellular location based on its sequence. Let's say I have a sequence like this from a fasta…
Cleb
  • 743
  • 7
  • 18
6
votes
1 answer

Verify a predicted protein in one genome in a different genome of the same species

I have two genome assemblies of the same non-model species, call them Assembly 1 (generated from Illumina data) and Assembly 2 (generated from PacBio data). For Assembly 1, I also have predicted proteome data, generated with EVM. Say there is a…
aechchiki
  • 2,676
  • 11
  • 34
6
votes
1 answer

Download data from the Human Microbiome Project via ascp

I have asked this question in biostars, but I am trying here as well as people working with "omics" data might be able to help. I think my issue relates understanding how large data storage on online server works. I am trying to download large data…
CAsimonet
  • 61
  • 1
6
votes
3 answers

Somatic tumor only variant calling?

I'm evaluating possibilites for somatic tumor variant calling without paired-normal samples. I'm aware of the consequences without a normal sample. All the popular variant calling tools such as Strelka, VarScan etc require a normal sample. What are…
SmallChess
  • 2,699
  • 3
  • 19
  • 35
6
votes
2 answers

PLINK clump behavior on missing SNPs?

I have a long list of autoimmune-associated SNPs, and I want to boil it down so that I get one SNP representing each LD block. I chose to use PLINK's --clump option for this. I'm roughly following this tutorial (but analyzing my own data). I don't…
eric_kernfeld
  • 380
  • 1
  • 11
6
votes
1 answer

What is this 5-column sequencing format from 2009?

There is high throughput sequencing data here, and I don't know what format it is in. It was submitted in 2009, and the description says the following: Library strategy: ncRNA-Seq Library source: transcriptomic Library selection: size…
bli
  • 3,130
  • 2
  • 15
  • 36
6
votes
2 answers

Genome assembly from error-prone reads

I understand how to assemble genome from error-free reads. I implemented like this: Construct directed overlap graph with reads as vertices and edges as maximum overlap between two vertices. Edges represent the length of overlapping maximum…
gagro
  • 63
  • 3
6
votes
3 answers

Can exons be located outside of the coding sequence?

I have a gff file like this (I edited the name): scaffold_x source exon 2987526 2987805 . - . name "foobar";transcriptId 68892 scaffold_x source CDS 2987526 2987805 . - 1 name "foobar";proteinId 68892;exonNumber 5 scaffold_x …
Cleb
  • 743
  • 7
  • 18
6
votes
2 answers

How can I output an identity matrix in progressiveMauve?

I'm just getting started with the Mauve aligner and I'm finding the documentation a bit lacking. I'm using the progressiveMauve tool from the command line and would like to output an identity matrix file. The set of output files it generates by…
JaredL
  • 161
  • 3