Highest Voted Questions - Bioinformatics Stack Exchange

6

votes

1 answer

Read counts from BAM file

I have few BAM files which are generated from Ion Torrent Server (ampliseq) aligned to hg19 genome. I want to extract read counts from the bam files and I know that "featureCounts" can be used for this. But before getting the read counts, to view…

asked Dec 14 '17 at 09:41

stack_learner

1,262
14
26

6

votes

1 answer

Is there a more elegant solution to the bwa-mem: paired reads have different names error?

I'm currently trying to run bwa-mem on Influenza substrains using the following command: ~/bwa mem h5n1_1_cons.fa h5n1_1_read1.fq h5n1_1_read2.fq h5n1_1_cons.fa is the consensus sequence for substrain h5n1_1, and the fq files are paired end reads…

asked Nov 28 '17 at 21:50

sgav

63
1
4

6

votes

2 answers

Estimate the length of poly-A tails from randomly-primed RNAseq data

So a poly-A tail is a long chain of adenine nucleotides that is added to a messenger RNA (mRNA) molecule during RNA processing to increase the stability of the molecule. For my project, I would like to estimate the length of poly-A tails from…

asked Nov 23 '17 at 21:21

ShanZhengYang

1,691
1
14
20

6

votes

2 answers

Transferring genomic features on new coordinates

I have a eukaryotic genome for which an updated sequence for a chromosome was recently obtained. I want to map RNAseq reads on the genome (and perform other downstream analyses) and would like to use the most up-to-date information possible (so the…

asked Nov 22 '17 at 10:43

BioNaab

61
1

6

votes

1 answer

Pseudo-temporal ordering in heterogeneous populations

What's the most widely accepted tool for doing pseudo-temporal ordering from scRNAseq data? Also is there away to separate differential expression that occurs based on "cell identity" or maybe more accurately cell type fate from that which arises…

scrnaseq

asked May 16 '17 at 20:22

Alon Gelber

173
1
5

6

votes

4 answers

Why total RNA-seq usually yields low mapping rate?

Maybe this is a silly question, but I'm really wondering why we usually get low mapping rates if we map total RNA-seq, but not poly(A)-enriched (in particular, for human, mouse, and zebrafish datasets)? Doesn't the genome fasta file contain…

asked Nov 21 '17 at 10:54

kaka01

111
1
6

6

votes

2 answers

How to determine a protein's cellular location based on its sequence?

I am wondering about the appropriate workflow to determine a protein's cellular location based on its sequence. Let's say I have a sequence like this from a fasta…

asked Nov 15 '17 at 20:31

Cleb

743
7
18

6

votes

1 answer

Verify a predicted protein in one genome in a different genome of the same species

I have two genome assemblies of the same non-model species, call them Assembly 1 (generated from Illumina data) and Assembly 2 (generated from PacBio data). For Assembly 1, I also have predicted proteome data, generated with EVM. Say there is a…

asked Nov 13 '17 at 12:10

aechchiki

2,676
11
34

6

votes

1 answer

Download data from the Human Microbiome Project via ascp

I have asked this question in biostars, but I am trying here as well as people working with "omics" data might be able to help. I think my issue relates understanding how large data storage on online server works. I am trying to download large data…

data-download

asked Nov 06 '17 at 00:28

CAsimonet

61
1

6

votes

3 answers

Somatic tumor only variant calling?

I'm evaluating possibilites for somatic tumor variant calling without paired-normal samples. I'm aware of the consequences without a normal sample. All the popular variant calling tools such as Strelka, VarScan etc require a normal sample. What are…

asked Oct 31 '17 at 23:30

SmallChess

2,699
3
19
35

6

votes

2 answers

PLINK clump behavior on missing SNPs?

I have a long list of autoimmune-associated SNPs, and I want to boil it down so that I get one SNP representing each LD block. I chose to use PLINK's --clump option for this. I'm roughly following this tutorial (but analyzing my own data). I don't…

asked Oct 31 '17 at 16:01

eric_kernfeld

380
1
11

6

votes

1 answer

What is this 5-column sequencing format from 2009?

There is high throughput sequencing data here, and I don't know what format it is in. It was submitted in 2009, and the description says the following: Library strategy: ncRNA-Seq Library source: transcriptomic Library selection: size…

asked Oct 31 '17 at 14:58

bli

3,130
2
15
36

6

votes

2 answers

Genome assembly from error-prone reads

I understand how to assemble genome from error-free reads. I implemented like this: Construct directed overlap graph with reads as vertices and edges as maximum overlap between two vertices. Edges represent the length of overlapping maximum…

asked Oct 22 '17 at 12:20

gagro

63
3

6

votes

3 answers

Can exons be located outside of the coding sequence?

I have a gff file like this (I edited the name): scaffold_x source exon 2987526 2987805 . - . name "foobar";transcriptId 68892 scaffold_x source CDS 2987526 2987805 . - 1 name "foobar";proteinId 68892;exonNumber 5 scaffold_x …

asked Oct 20 '17 at 07:29

Cleb

743
7
18

6

votes

2 answers

How can I output an identity matrix in progressiveMauve?

I'm just getting started with the Mauve aligner and I'm finding the documentation a bit lacking. I'm using the progressiveMauve tool from the command line and would like to output an identity matrix file. The set of output files it generates by…

sequence-alignment

asked May 29 '17 at 00:01

JaredL

161
3

Most Popular