Most Popular
1500 questions
6
votes
1 answer
Read counts from BAM file
I have few BAM files which are generated from Ion Torrent Server (ampliseq) aligned to hg19 genome. I want to extract read counts from the bam files and I know that "featureCounts" can be used for this. But before getting the read counts, to view…
stack_learner
- 1,262
- 14
- 26
6
votes
1 answer
Is there a more elegant solution to the bwa-mem: paired reads have different names error?
I'm currently trying to run bwa-mem on Influenza substrains using the following command:
~/bwa mem h5n1_1_cons.fa h5n1_1_read1.fq h5n1_1_read2.fq
h5n1_1_cons.fa is the consensus sequence for substrain h5n1_1, and the fq files are paired end reads…
sgav
- 63
- 1
- 4
6
votes
2 answers
Estimate the length of poly-A tails from randomly-primed RNAseq data
So a poly-A tail is a long chain of adenine nucleotides that is added to a messenger RNA (mRNA) molecule during RNA processing to increase the stability of the molecule.
For my project, I would like to estimate the length of poly-A tails from…
ShanZhengYang
- 1,691
- 1
- 14
- 20
6
votes
2 answers
Transferring genomic features on new coordinates
I have a eukaryotic genome for which an updated sequence for a chromosome was recently obtained. I want to map RNAseq reads on the genome (and perform other downstream analyses) and would like to use the most up-to-date information possible (so the…
BioNaab
- 61
- 1
6
votes
1 answer
Pseudo-temporal ordering in heterogeneous populations
What's the most widely accepted tool for doing pseudo-temporal ordering from scRNAseq data? Also is there away to separate differential expression that occurs based on "cell identity" or maybe more accurately cell type fate from that which arises…
Alon Gelber
- 173
- 1
- 5
6
votes
4 answers
Why total RNA-seq usually yields low mapping rate?
Maybe this is a silly question, but I'm really wondering why we usually get low mapping rates if we map total RNA-seq, but not poly(A)-enriched (in particular, for human, mouse, and zebrafish datasets)?
Doesn't the genome fasta file contain…
kaka01
- 111
- 1
- 6
6
votes
2 answers
How to determine a protein's cellular location based on its sequence?
I am wondering about the appropriate workflow to determine a protein's cellular location based on its sequence.
Let's say I have a sequence like this from a fasta…
Cleb
- 743
- 7
- 18
6
votes
1 answer
Verify a predicted protein in one genome in a different genome of the same species
I have two genome assemblies of the same non-model species, call them Assembly 1 (generated from Illumina data) and Assembly 2 (generated from PacBio data).
For Assembly 1, I also have predicted proteome data, generated with EVM. Say there is a…
aechchiki
- 2,676
- 11
- 34
6
votes
1 answer
Download data from the Human Microbiome Project via ascp
I have asked this question in biostars, but I am trying here as well as people working with "omics" data might be able to help. I think my issue relates understanding how large data storage on online server works. I am trying to download large data…
CAsimonet
- 61
- 1
6
votes
3 answers
Somatic tumor only variant calling?
I'm evaluating possibilites for somatic tumor variant calling without paired-normal samples. I'm aware of the consequences without a normal sample.
All the popular variant calling tools such as Strelka, VarScan etc require a normal sample.
What are…
SmallChess
- 2,699
- 3
- 19
- 35
6
votes
2 answers
PLINK clump behavior on missing SNPs?
I have a long list of autoimmune-associated SNPs, and I want to boil it down so that I get one SNP representing each LD block. I chose to use PLINK's --clump option for this. I'm roughly following this tutorial (but analyzing my own data).
I don't…
eric_kernfeld
- 380
- 1
- 11
6
votes
1 answer
What is this 5-column sequencing format from 2009?
There is high throughput sequencing data here, and I don't know what format it is in.
It was submitted in 2009, and the description says the following:
Library strategy: ncRNA-Seq
Library source: transcriptomic
Library selection: size…
bli
- 3,130
- 2
- 15
- 36
6
votes
2 answers
Genome assembly from error-prone reads
I understand how to assemble genome from error-free reads. I implemented like this:
Construct directed overlap graph with reads as vertices and edges as
maximum overlap between two vertices. Edges represent the length of
overlapping maximum…
gagro
- 63
- 3
6
votes
3 answers
Can exons be located outside of the coding sequence?
I have a gff file like this (I edited the name):
scaffold_x source exon 2987526 2987805 . - . name "foobar";transcriptId 68892
scaffold_x source CDS 2987526 2987805 . - 1 name "foobar";proteinId 68892;exonNumber 5
scaffold_x …
Cleb
- 743
- 7
- 18
6
votes
2 answers
How can I output an identity matrix in progressiveMauve?
I'm just getting started with the Mauve aligner and I'm finding the documentation a bit lacking. I'm using the progressiveMauve tool from the command line and would like to output an identity matrix file. The set of output files it generates by…
JaredL
- 161
- 3