Most Popular
1500 questions
14
votes
7 answers
Is there public RESTful api for Gnomad?
I currently find Harvard's RESTful API for ExAC extremely useful and I was hoping that a similar resource is available for Gnomad?
Does anyone know of a public access API for Gnomad or possibly any plans to integrate Gnomad into the Harvard API?
Pasted
- 243
- 2
- 5
14
votes
3 answers
How do you write a .gz fastq file with Biopython?
How do you write a .gz (or .bgz) fastq file using Biopython?
I'd rather avoid a separate system call.
The typical way to write an ASCII .fastq is done as follows:
for record in SeqIO.parse(fasta, "fasta"):
SeqIO.write(record, fastq,…
Mark Ebbert
- 1,354
- 10
- 22
14
votes
10 answers
How to simulate NGS reads, controlling sequence coverage?
I have a FASTA file with 100+ sequences like this:
>Sequence1
GTGCCTATTGCTACTAAAA ...
>Sequence2
GCAATGCAAGGAAGTGATGGCGGAAATAGCGTTA
......
I also have a text file like this:
Sequence1 40
Sequence2 30
......
I would like to simulate next-generation…
SmallChess
- 2,699
- 3
- 19
- 35
14
votes
3 answers
What is a quick way to find the reverse complement in bash
I have a DNA sequence for which I would like to quickly find the reverse complement. Is there a quick way of doing this on the bash command line using only GNU tools?
winni2k
- 2,266
- 11
- 28
14
votes
1 answer
Better aligner than bowtie2?
Bowtie2 is probably the most widely used aligner because of it's speed. Burrow-wheeler (BW) algorithms (including bwa) tend to be faster. However, they have limitations when it comes to aligning very short reads (e.g. gRNA). Also, setting maximum…
user345394
- 675
- 6
- 20
14
votes
1 answer
Biopython Phylogenetic Tree replace branch tip labels by sequence logos
Having recently constructed a lot of phylogenetic trees with the module TreeConstruction from Phylo package from Biopython, I've been asked to replace the branch tip labels by the corresponding sequence logos (which I have in the same folder). I…
Boris Schnider
- 143
- 6
14
votes
3 answers
Generic HMM solvers in bioinformatics?
Hidden Markov models (HMMs) are used extensively in bioinformatics, and have been adapted for gene prediction, protein family classification, and a variety of other problems. Indeed, the treatise by Durbin, Eddy and colleagues is one of the defining…
Daniel Standage
- 5,080
- 15
- 50
14
votes
1 answer
Why would someone use a CRAM instead of a BAM?
I had this question from a graduate student yesterday, and I was stuck.
What should I say? Why use a CRAM instead of a BAM?
When is it a good idea to use a CRAM instead of a BAM?
When is it a bad idea?
EB2127
- 1,413
- 2
- 10
- 23
14
votes
3 answers
How can I improve the yield of MinION sequencing runs?
This is a frequently-asked question within the nanopore community. Oxford Nanopore currently claims that they are able to generate run yields of 10-15 gigabases (e.g. see here and here), and yet it's more common to see users only managing in the 1-5…
gringer
- 14,012
- 5
- 23
- 79
14
votes
2 answers
How does the BWA-MEM algorithm assign its mapping qualities?
Is there any resource (paper, blogpost, Github gist, etc.) describing the BWA-MEM algorithm for assigning mapping qualities? I vaguely remember that I have somewhere seen a formula for SE reads, which looked like
$C * (s_1 - s_2) / s_1,$
where $s_1$…
Karel Břinda
- 1,909
- 9
- 19
14
votes
2 answers
Is there a standard k-mer count file format?
I am doing a research project involving calculating k-mer frequencies and I am wondering if there is any standard file format for storing k-mer counts.
Jon Deaton
- 399
- 2
- 10
14
votes
1 answer
Are soft-clipped bases used for variant calling in samtools + bcftools?
If there are soft clipped base pairs specified in the CIGAR string for a read in a SAM/BAM file, will these be used for variant calling in a samtools + bcftools workflow?
The GATK HaplotypeCaller, for example, has an explicit option…
mattm
- 754
- 7
- 19
14
votes
3 answers
How to make a distinction between the "classical" de Bruijn graph and the one described in NGS papers?
In Computer Science a De Bruijn graph has (1) m^n vertices representing all possible sequences of length n over m symbols, and (2) directed edges connecting nodes that differ by a shift of n-1 elements (the successor having the new element at the…
Leo Martins
- 669
- 4
- 11
13
votes
8 answers
Fast way to count number of reads and number of bases in a fastq file?
I am looking for a tool, preferably written in C or C++, that can quickly and efficiently count the number of reads and the number of bases in a compressed fastq file. I am currently doing this using zgrep and awk:
zgrep . foo.fasq.gz |
awk…
terdon
- 10,071
- 5
- 22
- 48
13
votes
2 answers
Mapping drug names to ATC codes
I'm interested working with the medication information provided by the UK Biobank. In order to get these into a usable form I would like to map them to ATC codes. Since many of the drugs listed in the data showcase include dosage information,…
Greg
- 831
- 6
- 12