Most Popular
1500 questions
10
votes
2 answers
Extract sequence context of high-degree nodes in assembly graphs
I often use metaSPAdes to assemble short reads from human microbiomes. My simplified understanding of short-read de Bruijn graph assemblers is that they fail where ambiguous paths cannot be resolved. While it can be said that these points of failure…
acvill
- 613
- 3
- 12
9
votes
2 answers
Run kallisto iteratively across many samples
I am on a Mac using UNIX. I am trying to use the kallisto quant command on all files in a directory (instead of manually entering them). Because I'm running the analysis against the same index file, I first enter the following:
./kallisto index -i…
ZincFingers
- 301
- 3
- 7
9
votes
3 answers
PDB format: remark number for free text
I would like to add a text to PDB files that I'm processing with my tool, rna-pdb-tools. Someone points that the way I'm using it right now it's not correct (https://github.com/mmagnus/rna-pdb-tools/issues/48).
I use HEADER right now which is not…
Marcin Magnus
- 676
- 3
- 11
9
votes
1 answer
Aligning many long sequences
I'm faced with having to align many (some 100s) bacterial genomes, where the genome length is in the millions. Obviously, this is beyond normal alignment techniques and it's unclear to me what the best practice is for such…
agapow
- 788
- 3
- 11
9
votes
3 answers
Visualisation of long read RNA-Seq splicing
I have a dataset of Oxford Nanopore cDNA reads. Many of my reads are full-length or close to full-length transcripts, and I and am interested in examining alternative splicing. For this, I would like to begin by visualising my reads and comparing…
Scott Gigante
- 2,133
- 1
- 13
- 32
9
votes
1 answer
Questions regarding Nanopore sequencing analysis
I am new to Nanopore sequencing analysis. I have a couple of questions regarding it which are as follows:
How do I know if my fast5 file is multiread or single read file?
Is there a way to combine all the fast5 files into a single fast5 file?
How…
aayushraman
- 91
- 2
9
votes
5 answers
Converter between PDB or mmCIF and MMTF
I'd like to test MMTF, a new format for storing biomolecular structures which is promoted by RCSB as a more compact alternative to mmCIF and PDB.
From MMTF FAQ:
How do I convert a PDBx/mmCIF file to an MMTF file?
The BioJava library contains…
marcin
- 1,261
- 7
- 14
9
votes
3 answers
Running Snakemake in one single conda env
I have been experimenting a lot lately with Snakemake, I love it. Recently I also switched to using conda (--use-conda) in the way that is advertised. However, I have some issues with it, mostly related to the way we work. I work in a lab that…
Freek
- 563
- 4
- 11
9
votes
2 answers
How do I generate a variant list (i.e. VCF file) using Illumina reads from a human genome?
This is a problem I have to solve frequently, and I'd be interested in knowing what other methods people use to solve the same problem.
About twice a year, I get asked to determine variants from Illumina reads, usually from either mouse or human.…
gringer
- 14,012
- 5
- 23
- 79
9
votes
5 answers
How can longest isoforms (per gene) be extracted from a FASTA file?
Is there a convenient way to extract the longest isoforms from a transcriptome fasta file? I had found some scripts on biostars but none are functional and I'm having difficulty getting them to work.
I'm aware that the longest isoforms aren't…
ZincFingers
- 301
- 3
- 7
9
votes
1 answer
Linear models of complex diseases
A popular framework to analyze differences between groups, either experiments or diseases, in transcriptomics is using linear models (limma is a popular choice).
For instance we have a disease D with three stages as defined by clinicians, A, B and…
llrs
- 4,693
- 1
- 18
- 42
9
votes
2 answers
How to use Python to count k-mers?
I have some FASTQ sequence files and a FASTA file for some regions I'm interested in.
I would like:
Build an index for the FASTA file
Use the index to count number of k-mers occurred in my sequence files
I know how to do this in many k-mer…
SmallChess
- 2,699
- 3
- 19
- 35
9
votes
2 answers
calling diploid SNVs from long reads
I'd like to call diploid SNV variants from long-reads data (~80SMRTcells PacBio).
I have generated a draft reference genome for an indivudual from a heterozygous (~4%) species (Canu+Haplomerger2).
I can use this reference for some…
aechchiki
- 2,676
- 11
- 34
9
votes
4 answers
Double-counting coverage of overlapped read pairs
EDIT: I do not want to make any modifications to the mapped reads, I simply want to ignore one read in a read pair if they overlap the same region.
I used samtools depth to calculate the depth of coverage for samples in the whole Exome region using…
d_kennetz
- 631
- 5
- 17
9
votes
2 answers
samtools depth print out all positions
I am trying to use samtools depth (v1.4) with the -a option and a bed file listing the human chromosomes chr1-chr22, chrX, chrY, and chrM to print out the coverage at every position:
cat GRCh38.karyo.bed | awk '{print $3}' | datamash sum…
719016
- 2,324
- 13
- 19