Most Popular

1500 questions
14
votes
7 answers

Is there public RESTful api for Gnomad?

I currently find Harvard's RESTful API for ExAC extremely useful and I was hoping that a similar resource is available for Gnomad? Does anyone know of a public access API for Gnomad or possibly any plans to integrate Gnomad into the Harvard API?
Pasted
  • 243
  • 2
  • 5
14
votes
3 answers

How do you write a .gz fastq file with Biopython?

How do you write a .gz (or .bgz) fastq file using Biopython? I'd rather avoid a separate system call. The typical way to write an ASCII .fastq is done as follows: for record in SeqIO.parse(fasta, "fasta"): SeqIO.write(record, fastq,…
Mark Ebbert
  • 1,354
  • 10
  • 22
14
votes
10 answers

How to simulate NGS reads, controlling sequence coverage?

I have a FASTA file with 100+ sequences like this: >Sequence1 GTGCCTATTGCTACTAAAA ... >Sequence2 GCAATGCAAGGAAGTGATGGCGGAAATAGCGTTA ...... I also have a text file like this: Sequence1 40 Sequence2 30 ...... I would like to simulate next-generation…
SmallChess
  • 2,699
  • 3
  • 19
  • 35
14
votes
3 answers

What is a quick way to find the reverse complement in bash

I have a DNA sequence for which I would like to quickly find the reverse complement. Is there a quick way of doing this on the bash command line using only GNU tools?
winni2k
  • 2,266
  • 11
  • 28
14
votes
1 answer

Better aligner than bowtie2?

Bowtie2 is probably the most widely used aligner because of it's speed. Burrow-wheeler (BW) algorithms (including bwa) tend to be faster. However, they have limitations when it comes to aligning very short reads (e.g. gRNA). Also, setting maximum…
user345394
  • 675
  • 6
  • 20
14
votes
1 answer

Biopython Phylogenetic Tree replace branch tip labels by sequence logos

Having recently constructed a lot of phylogenetic trees with the module TreeConstruction from Phylo package from Biopython, I've been asked to replace the branch tip labels by the corresponding sequence logos (which I have in the same folder). I…
14
votes
3 answers

Generic HMM solvers in bioinformatics?

Hidden Markov models (HMMs) are used extensively in bioinformatics, and have been adapted for gene prediction, protein family classification, and a variety of other problems. Indeed, the treatise by Durbin, Eddy and colleagues is one of the defining…
Daniel Standage
  • 5,080
  • 15
  • 50
14
votes
1 answer

Why would someone use a CRAM instead of a BAM?

I had this question from a graduate student yesterday, and I was stuck. What should I say? Why use a CRAM instead of a BAM? When is it a good idea to use a CRAM instead of a BAM? When is it a bad idea?
EB2127
  • 1,413
  • 2
  • 10
  • 23
14
votes
3 answers

How can I improve the yield of MinION sequencing runs?

This is a frequently-asked question within the nanopore community. Oxford Nanopore currently claims that they are able to generate run yields of 10-15 gigabases (e.g. see here and here), and yet it's more common to see users only managing in the 1-5…
gringer
  • 14,012
  • 5
  • 23
  • 79
14
votes
2 answers

How does the BWA-MEM algorithm assign its mapping qualities?

Is there any resource (paper, blogpost, Github gist, etc.) describing the BWA-MEM algorithm for assigning mapping qualities? I vaguely remember that I have somewhere seen a formula for SE reads, which looked like $C * (s_1 - s_2) / s_1,$ where $s_1$…
Karel Břinda
  • 1,909
  • 9
  • 19
14
votes
2 answers

Is there a standard k-mer count file format?

I am doing a research project involving calculating k-mer frequencies and I am wondering if there is any standard file format for storing k-mer counts.
Jon Deaton
  • 399
  • 2
  • 10
14
votes
1 answer

Are soft-clipped bases used for variant calling in samtools + bcftools?

If there are soft clipped base pairs specified in the CIGAR string for a read in a SAM/BAM file, will these be used for variant calling in a samtools + bcftools workflow? The GATK HaplotypeCaller, for example, has an explicit option…
mattm
  • 754
  • 7
  • 19
14
votes
3 answers

How to make a distinction between the "classical" de Bruijn graph and the one described in NGS papers?

In Computer Science a De Bruijn graph has (1) m^n vertices representing all possible sequences of length n over m symbols, and (2) directed edges connecting nodes that differ by a shift of n-1 elements (the successor having the new element at the…
Leo Martins
  • 669
  • 4
  • 11
13
votes
8 answers

Fast way to count number of reads and number of bases in a fastq file?

I am looking for a tool, preferably written in C or C++, that can quickly and efficiently count the number of reads and the number of bases in a compressed fastq file. I am currently doing this using zgrep and awk: zgrep . foo.fasq.gz | awk…
terdon
  • 10,071
  • 5
  • 22
  • 48
13
votes
2 answers

Mapping drug names to ATC codes

I'm interested working with the medication information provided by the UK Biobank. In order to get these into a usable form I would like to map them to ATC codes. Since many of the drugs listed in the data showcase include dosage information,…
Greg
  • 831
  • 6
  • 12