Most Popular
1500 questions
12
votes
5 answers
How to download FASTA sequences from NCBI using the terminal?
I have following accession numbers of the 10 chromosomes of Theobroma cacao genome.
NC_030850.1
NC_030851.1
NC_030852.1
NC_030853.1
NC_030854.1
NC_030855.1
NC_030856.1
NC_030857.1
NC_030858.1
NC_030859.1
I need to download these FASTA files using…
MudithMMBc
- 361
- 1
- 2
- 9
11
votes
1 answer
Which quality score encoding does PacBio use?
Do you know which quality score encoding PacBio uses now? I know some of their file formats have changed in the past year or two, but I haven't found much on their quality score encoding.
The most recent answer I found is from 2012, where one user…
Mark Ebbert
- 1,354
- 10
- 22
11
votes
1 answer
What is the best method to estimate a phylogenetic tree from a large dataset of >1000 loci and >100 species
I have a large phylogenomic alignment of >1000 loci (each locus is ~1000bp), and >100 species. I have relatively little missing data (<10%).
I want to estimate a maximum-likelihood phylogenetic tree from this data, with measures of statistical…
roblanf
- 962
- 7
- 15
11
votes
2 answers
Do variant calls change when you call from CRAM?
We're considering switching our storage format from BAM to CRAM. We work with human cancer samples, which may have very low prevalence variants (i.e. not diploid frequency).
If we use lossy CRAM to save more space, how much will variants called from…
morgantaschuk
- 530
- 4
- 9
11
votes
5 answers
How to convert species names into common names?
I’m trying to find common names from a list of scientific names (not all will have them though).
I was attempting to use taxize in R but it aborts if it doesn’t find an entry in EOL and I don’t know a way around this other than manually editing the…
Daniel Mead
- 113
- 1
- 4
11
votes
1 answer
How to read and interpret a gene expression quantification file?
I have a gene expression quantification file from TCGA that contains the following lines:
ENSG00000242268.2 591.041000514
ENSG00000270112.3 0.0
ENSG00000167578.15 62780.6543066
ENSG00000273842.1 0.0
ENSG00000078237.5 …
0x90
- 1,437
- 9
- 18
11
votes
3 answers
How to extract RNA sequence and secondary structure restrains from a PDB file
I'm trying to find a programmatic way to automatically extract the following information from a PDB file:
RNA sequence
Secondary structure restraints in bracket format, e.g. . (( . ( . ) . ))
Does software exist that can take a PDB file as input…
Peter
- 353
- 1
- 8
11
votes
3 answers
What is the index fastq file (sample_I*.fastq.gz) generated when demultiplexing Illumina paired-end runs?
What is the index fastq file that comes with some Illumina sequencing datasets? (The samplename_I*.fastq.gz file.)
For example, I recently received some 10X Chromium reads for two libraries sequenced on the same lane. This was a 2x150 sequencing…
conchoecia
- 3,141
- 2
- 16
- 40
11
votes
1 answer
Can I create a CRAM file with a relative reference path?
I’m trying to create a CRAM file that stores its path to the FASTA reference as a relative path, rather than an absolute path, so that I can move the files around. Unfortunately I can’t get this to work; I was expecting the following to work:
⟩⟩⟩…
Konrad Rudolph
- 4,845
- 14
- 45
11
votes
4 answers
What are the pros and cons of the different basecallers in Oxford Nanopore Technology Sequencing?
What are the pros and cons of the different basecallers in Oxford Nanopore Technology Sequencing?
I am about to start a MinION run on my laptop. What should I consider when choosing my basecaller? Can I let MinION do its sequencing and generate…
Biomagician
- 2,459
- 16
- 30
11
votes
1 answer
Changing the record id in a FASTA file using BioPython
I have the following FASTA file, original.fasta:
>foo
GCTCACACATAGTTGATGCAGATGTTGAATTCACTATGAGGTGGGAGGATGTAGGGCCA
I need to change the record id from foo to bar, so I wrote the following code:
from Bio import SeqIO
original_file =…
BioGeek
- 496
- 5
- 15
11
votes
1 answer
State of the art mutation simulation software
There are many features affecting mutation probabilities, e.g. CpG mutations are 10-fold more likely than other types of mutations.
Is there a model (preferably with software) which can take two aligned genomic regions, estimate parameters of the…
Iakov Davydov
- 2,695
- 1
- 13
- 34
11
votes
1 answer
Why does a very strong BLAST hit get lost when I change num_alignments, num_descriptions or max_target_seqs parameter?
Disclaimer: This is a self answered question for documentation purpose and I adapted this from the following github gist. Especially from users terrycojones and peterjc as well as sujaikumar who raised the issue.
I have a strange situation. I have…
voiDnyx
- 401
- 2
- 12
11
votes
3 answers
Converting a VCF into a FASTA given a reference with Python, R
I am interested in converting a VCF file into a FASTA file given a reference sequence with Python or R.
Samtools/BCFtools (Heng Li) provides a Perl script vcfutils.pl which does this, the function vcf2fq (lines 469-528)
This script has been…
ShanZhengYang
- 1,691
- 1
- 14
- 20
11
votes
1 answer
Quantifying reads mapping to multiple loci
I have been using STAR for our RNA-Seq samples. The final.out log file reports percentage of uniquely mapped reads along with percentage of reads that map to multiple loci (less than or equal to 10) and percentage of reads mapping to too many loci…
rightskewed
- 991
- 8
- 17