Highest Voted Questions - Bioinformatics Stack Exchange

12

votes

5 answers

How to download FASTA sequences from NCBI using the terminal?

I have following accession numbers of the 10 chromosomes of Theobroma cacao genome. NC_030850.1 NC_030851.1 NC_030852.1 NC_030853.1 NC_030854.1 NC_030855.1 NC_030856.1 NC_030857.1 NC_030858.1 NC_030859.1 I need to download these FASTA files using…

asked Nov 06 '19 at 16:08

MudithMMBc

361
1
2
9

11

votes

1 answer

Which quality score encoding does PacBio use?

Do you know which quality score encoding PacBio uses now? I know some of their file formats have changed in the past year or two, but I haven't found much on their quality score encoding. The most recent answer I found is from 2012, where one user…

asked Jun 23 '17 at 20:03

Mark Ebbert

1,354
10
22

11

votes

1 answer

What is the best method to estimate a phylogenetic tree from a large dataset of >1000 loci and >100 species

I have a large phylogenomic alignment of >1000 loci (each locus is ~1000bp), and >100 species. I have relatively little missing data (<10%). I want to estimate a maximum-likelihood phylogenetic tree from this data, with measures of statistical…

asked Jun 10 '17 at 03:57

roblanf

962
7
15

11

votes

2 answers

Do variant calls change when you call from CRAM?

We're considering switching our storage format from BAM to CRAM. We work with human cancer samples, which may have very low prevalence variants (i.e. not diploid frequency). If we use lossy CRAM to save more space, how much will variants called from…

asked Jun 08 '17 at 14:54

morgantaschuk

530
4
9

11

votes

5 answers

How to convert species names into common names?

I’m trying to find common names from a list of scientific names (not all will have them though). I was attempting to use taxize in R but it aborts if it doesn’t find an entry in EOL and I don’t know a way around this other than manually editing the…

asked Jun 08 '17 at 10:38

Daniel Mead

113
1
4

11

votes

1 answer

How to read and interpret a gene expression quantification file?

I have a gene expression quantification file from TCGA that contains the following lines: ENSG00000242268.2 591.041000514 ENSG00000270112.3 0.0 ENSG00000167578.15 62780.6543066 ENSG00000273842.1 0.0 ENSG00000078237.5 …

asked Jun 08 '17 at 02:25

0x90

1,437
9
18

11

votes

3 answers

How to extract RNA sequence and secondary structure restrains from a PDB file

I'm trying to find a programmatic way to automatically extract the following information from a PDB file: RNA sequence Secondary structure restraints in bracket format, e.g. . (( . ( . ) . )) Does software exist that can take a PDB file as input…

asked Jun 07 '17 at 08:24

Peter

353
1
8

11

votes

3 answers

What is the index fastq file (sample_I*.fastq.gz) generated when demultiplexing Illumina paired-end runs?

What is the index fastq file that comes with some Illumina sequencing datasets? (The samplename_I*.fastq.gz file.) For example, I recently received some 10X Chromium reads for two libraries sequenced on the same lane. This was a 2x150 sequencing…

asked Oct 08 '18 at 19:20

conchoecia

3,141
2
16
40

11

votes

1 answer

Can I create a CRAM file with a relative reference path?

I’m trying to create a CRAM file that stores its path to the FASTA reference as a relative path, rather than an absolute path, so that I can move the files around. Unfortunately I can’t get this to work; I was expecting the following to work: ⟩⟩⟩…

asked Aug 03 '18 at 11:35

Konrad Rudolph

4,845
14
45

11

votes

4 answers

What are the pros and cons of the different basecallers in Oxford Nanopore Technology Sequencing?

What are the pros and cons of the different basecallers in Oxford Nanopore Technology Sequencing? I am about to start a MinION run on my laptop. What should I consider when choosing my basecaller? Can I let MinION do its sequencing and generate…

asked Feb 04 '18 at 18:02

Biomagician

2,459
16
30

11

votes

1 answer

Changing the record id in a FASTA file using BioPython

I have the following FASTA file, original.fasta: >foo GCTCACACATAGTTGATGCAGATGTTGAATTCACTATGAGGTGGGAGGATGTAGGGCCA I need to change the record id from foo to bar, so I wrote the following code: from Bio import SeqIO original_file =…

asked May 31 '17 at 13:40

BioGeek

496
5
15

11

votes

1 answer

State of the art mutation simulation software

There are many features affecting mutation probabilities, e.g. CpG mutations are 10-fold more likely than other types of mutations. Is there a model (preferably with software) which can take two aligned genomic regions, estimate parameters of the…

asked Dec 08 '17 at 09:54

Iakov Davydov

2,695
1
13
34

11

votes

1 answer

Why does a very strong BLAST hit get lost when I change num_alignments, num_descriptions or max_target_seqs parameter?

Disclaimer: This is a self answered question for documentation purpose and I adapted this from the following github gist. Especially from users terrycojones and peterjc as well as sujaikumar who raised the issue. I have a strange situation. I have…

asked Nov 14 '17 at 15:21

voiDnyx

401
2
12

11

votes

3 answers

Converting a VCF into a FASTA given a reference with Python, R

I am interested in converting a VCF file into a FASTA file given a reference sequence with Python or R. Samtools/BCFtools (Heng Li) provides a Perl script vcfutils.pl which does this, the function vcf2fq (lines 469-528) This script has been…

asked Nov 12 '17 at 10:27

ShanZhengYang

1,691
1
14
20

11

votes

1 answer

Quantifying reads mapping to multiple loci

I have been using STAR for our RNA-Seq samples. The final.out log file reports percentage of uniquely mapped reads along with percentage of reads that map to multiple loci (less than or equal to 10) and percentage of reads mapping to too many loci…

asked May 30 '17 at 09:05

rightskewed

991
8
17

Most Popular