Highest Voted Questions - Bioinformatics Stack Exchange

7

votes

2 answers

Extracting the CIGAR string from a BAM via Python?

Is there a standard method in Python to extract a CIGAR string from the BAM? There are great libraries which parse the CIGAR, e.g. https://pypi.python.org/pypi/cigar/0.1 >>> c = Cigar('10M20S10M') >>> c.mask_left(10).cigar '10S20S10M' >>>…

asked Mar 09 '18 at 18:09

ShanZhengYang

1,691
1
14
20

7

votes

2 answers

Stranded vs. unstranded library preparation protocols in RNAseq

I've been reading this paper lately: Sailfish Enables Alignment-Free Isoform Quantification from RNA-Seq Reads Using Lightweight Algorithms I don't really understand the second paragraph under Quantification in the Online Methods section. I quote…

asked Mar 07 '18 at 03:57

Paghillect

309
1
7

7

votes

2 answers

Missing genes and normalisation of RSEM output using EBSeq

Without going into too much background, I just joined up with a lab as a bioinformatics intern while I'm completing my masters degree in the field. The lab has data from an RNA-seq they outsourced, but the only problem is that the only data they…

asked Jun 01 '17 at 22:57

J0HN_TIT0R

541
1
4
7

7

votes

6 answers

bed file with N regions of GRCh38 reference?

How do I obtain a bed file with the list of N-regions of GRCh38 reference? This is, the regions where the sequence is a stretch of Ns.

asked Feb 13 '18 at 09:45

719016

2,324
13
19

7

votes

2 answers

How to write a hash function for canonical kmers?

This question is a follow up from How do kmer counters determine which kmer is 'canonical'?. In that question we learned that kmer counting programs use a 2-bit hash function to internally represent canonical kmers as they are being counted. Now I…

asked Feb 08 '18 at 00:06

conchoecia

3,141
2
16
40

7

votes

3 answers

How to subset a VCF by chromosome and keep the header?

I would like to subset a VCF which only has chromosome 2. The problem with using various grep commands, e.g. grep -w '^#\|^2' my.vcf > my_new.vcf or if there's a 'chr' prefix grep -w '^#\|^chr2' my.vcf > my_new.vcf is that this will remove the…

asked Jan 25 '18 at 18:30

EB2127

1,413
2
10
23

7

votes

1 answer

Imputing missing genotypes from separate genotyping panels

What is the current standard for imputing missing genotypes between two genotyping panels? I have two populations genotyped using two different panels (A & B), and I would like to impute all the genotypes in population B for those positions on used…

asked Jun 01 '17 at 07:27

Greg

831
6
12

7

votes

1 answer

ethnicity check either from bam or vcf files

What tool could I use to check the ethnicity of a human bam or vcf file? I would like to use the results as a QC check to know whether a given sample or set of samples match the ethnicity information annotated in the meta data or not. Seen so…

asked Jan 09 '18 at 09:41

719016

2,324
13
19

7

votes

2 answers

Is this a low complexity region in our human genome?

I have a screenshot where many of my reads were aligned to a region that I suspect a low complexity region. Although you can't see, all those reads are clipped in the cigar strings. Sample cigar strings: 64S32M29S, 74S32M18S etc... Consequently, the…

genome

asked Jan 09 '18 at 06:36

SmallChess

2,699
3
19
35

7

votes

2 answers

Lisp in bioinformatics

After many years of programming in Python, C/C++ and other mainstream languages, I am looking for a new language of pleasure, mainly as a part of mental hygiene. Since I really admire the ideas behind Lisp, including the original paper from the…

reproducibility

asked Dec 28 '17 at 18:02

Kkk

71
1

7

votes

1 answer

How to remove all BAM read groups from all reads (not just the header)?

I have problem with one my BAMs---it appears to have invalid read groups. Normally when I have such a problem, I remove all the read groups from the BAM header as follows: samtools view -H your.bam | grep -v "^@RG" | samtools reheader -…

asked Dec 23 '17 at 11:43

EB2127

1,413
2
10
23

7

votes

2 answers

How to get all PDB homologs from Uniprot (mapping + BLAST)?

I'd like to create a dataset consisting of all sequences which are either present in the PDB, or whose homolog is present in the PDB. In other words, any sequence in the PDB or any sequence related to it. The similiarity margin is to be very wide,…

asked Dec 12 '17 at 13:33

Zubo

179
3

7

votes

1 answer

low-memory high-speed replacement for Picard MarkDuplicates

I am running Picard MarkDuplicates with the following parameters below. On the file described, it takes about 41.6Gb of RAM memory and about 20-25 minutes to compute (only uses 1 core AFAICS). java -Xmx54890m -jar picard.jar MarkDuplicates \ …

asked Dec 11 '17 at 15:45

719016

2,324
13
19

7

votes

1 answer

How to trim adapter sequences from GSE65360 in order to map the reads?

I am trying to map single-cell chromatin accessibility data from doi:10.1038/nature14590, obtained using scATAC-seq, to the reference genome. Example paired-end reads are here. The reads contain various adapter sequences, which the authors "trimmed…

asked Dec 05 '17 at 22:35

firefly

73
5

7

votes

2 answers

Which tools can detect chimeric RNA (fusion genes) from WGS or RNA-Seq data?

Given WGS data or RNA-seq data, which tools can I use to detect gene fusions?

asked May 31 '17 at 10:05

L42

281
2
6

Most Popular