Most Popular
1500 questions
4
votes
1 answer
Tree Building Algorithm that treats gaps as deletions
I'm part of a nanopore sequencing experiment that will sequence several generations of viruses. The intent is to perform directed evolution by putting selective pressure on these viruses and tracking the various mutations that occur. See here for…
CCranney
- 51
- 3
4
votes
1 answer
How can I get the duration of the events in Nanopore?
PacBio has the concept of IPD (InterPulse Duration) which is the time between two detected consecutive sequences in the raw signal. I have been trying to extract this value in the Nanopore, but I cannot find them in the .fast5 files. But this info…
Marjan
- 309
- 1
- 6
4
votes
0 answers
Error occurrence after merging files with bcftools: wrong number of fields?
This question was also asked on Biostars
I have multiple vcf of CASES and CONTROLS variations annotated by VEP, SNPEff and SnpSift.
First pair vcf -> only variations| CASES and CONTROLS
Second pair vcf -> variations + SnpEff | CASES and…
L.Diago
- 141
- 2
4
votes
2 answers
What can be the reason of getting negative branches lengths after BEAST analysis?
BEAST2 is currently being used for tree reconstruction prior phylogeographic analysis. The sample size and loci are described below.
I thought that BEAST/BEAST2 does not allow negative lengths of branches on prior distribution level so I was…
Vovin
- 435
- 2
- 10
4
votes
1 answer
On the same strand, for the same gene, can exons be overlapping?
I want to get a set of exon regions for each protein coding gene.
I extracted a set of relevant information (chromosome, start, end, gene ID, gene name, gene type, exon number and exon ID) from a GTF using:
awk '$3 == "exon"' annotation.gtf \
|…
gc5
- 1,783
- 18
- 32
4
votes
1 answer
samtools view command not found error
When I tried to use samtools to split a bam file based on different chromosomes, I used this command:
samtools view input.bam -b chr21 | chr21.bam
However, I get error messages like this:
-bash: chr21.bam: command not found
[W::hts_idx_load3] The…
Scott XU
- 135
- 1
- 5
4
votes
0 answers
How to summarize multiple exon copy numbers into copy number of the corresponding gene
I have a matrix, sample by exon, containing a copy number value for each pair (sample, exon). I would like to generate a second matrix, sample by gene, where the copy number of the exons is consolidated in a single value per gene.
Due to the nature…
gc5
- 1,783
- 18
- 32
4
votes
1 answer
BWA mem | samtools view: Intermittent parsing error
Update
The issue was that bwa was running out of memory and failing, but that error wasn't floating to the top (see @Steve's answer, below). I was getting an error from samtools. I should have been able to figure that out, but hopefully this post…
Mark Ebbert
- 1,354
- 10
- 22
4
votes
1 answer
Programmatic way of accessing the FASTA sequence similarity tool (or similar) in Python
I am looking for a tool that performs a sequence similarity algorithm for proteins. More specifically, I am looking for something that would be usable in Python (or anything else usable in the command line actually) which has a similar output to the…
CubeHead
- 425
- 2
- 8
4
votes
1 answer
LD Score Regression Derivation hard to follow
I am trying to understand the derivations from Sullivan et al. (2015) in the Supplementary Material. There, it is mentioned in the first page that the least squares estimate of the j-th SNP effect, considering the polygenicity linear model $φ=Xβ +…
Vasilis Lemonidis
- 93
- 5
4
votes
1 answer
Co-occurrence networks in Metagenomics studies
I have recently acquired some 16S metagenomics data, and was wondering if anyone can speak of the potential limitations, challenges as well as advantages to conducting a network-based study on metagenomics, such as what is done w.r.t co-occurrence…
h3ab74
- 836
- 5
- 14
4
votes
2 answers
Arrange ggplot Figure for scRNA-seq data
I have generated a ggplot for 8 single-cell libraries, with the purpose of visualizing the tSNE facet plot by sample, colored by cell type -- with percentages. The best I could get to is this -
however, it looks too crowded, and I also want the…
ShaniS
- 51
- 1
4
votes
1 answer
Are the information stored in the PBD files in the protein data bank complete or incomplete?
I know that there are thousands of PDB files stored in Protein Data Banks.
Are all these files complete in terms of the information they store?
If YES, do the individual files get updated from time to time?
If YES, why/when do they get updated?
user366312
- 654
- 2
- 14
4
votes
1 answer
Assign multiple taxids to a sequence when constructing a local BLAST database
I recently had a script fail due to poor handling of BLAST output. The BLAST -outfmt staxids field usually returns a single taxid, but occasionally it returns two or more taxids separated by a semicolon, such as 556514;701533. Fixing the script to…
Daniel Standage
- 5,080
- 15
- 50
4
votes
1 answer
Updating the GFF3 + Fasta to GeneBank code
I'm trying to convert gff3 and fasta into a gbk file for usage in Mauve. I've found a solution but the code is outdated:
"""Convert a GFF and associated FASTA file into GenBank format.
Usage:
gff_to_genbank.py
raysteven
- 51
- 7