Most Popular
1500 questions
4
votes
2 answers
Motivation behind the neighbor-joining distance matrix recomputation
In each iteration of the neighbour-joining method of phylogenetic trees construction, after joining the nearest neighbours the (additive) distance matrix $D_{m \times m}$ is recomputed as
$$Q_{ij} = (m-2)d_{ij} - \Bigg( \sum_{k \neq i} d_{ik} \Bigg)…
Gogis
- 143
- 4
4
votes
1 answer
Passing custom distance matrix to T_Coffee
I am passing a custom distance matrix to T_Coffee (written in C), as per the docs, but can't seem to get it to work.
The documentation says to pass a matrix like this using BLAST…
sin tribu
- 221
- 1
- 5
4
votes
0 answers
PAM50 gene expression classification
I'm looking at running the PAM50 classifier on RNA-Seq from 138 breast cancer samples. However, the R package (genefu) that's useful for this does not have a particularly helpful vignette when it comes to processing the RNAseq counts because it uses…
user36196
- 291
- 1
- 6
4
votes
1 answer
What does the number mean in an HGVSp annotation?
Let's take the example p.Arg452Pro that I got from an annotated VCF file that contains an HGVSp column.
What does 452 mean in this case ?
Theoretically, is it possible to have the same HGVSp on the same gene but on a different transcript level…
user324810
- 1,115
- 5
- 21
4
votes
1 answer
Is there a tool that can perform a read-group-aware mpileup from a single file?
I would like to perform a samtools mpileup from a single file that contains thousands of read groups with different SM tags. I could split the bam by read group using samtools split, and then perform the mpileup, but splitting into thousands of…
winni2k
- 2,266
- 11
- 28
4
votes
2 answers
Downloading all COI sequences from BOLD fails
I have metabarcoding sequence data (COI) from bulk animal samples (including arthropoda, nematoda, annelida, mollusca) and I want to BLAST all of these sequences. I used following command to do this: blastn -remote -db nt -query COI_all.fasta…
Robvh
- 133
- 1
- 10
4
votes
1 answer
CCP4 file to a Python 3 numpy array or similar workaround
I would like to merge together several ccp4 formatted density maps (and do a few minor things).
So ideally I would like to open the ccp4/mrc files as numpy arrays in Python 3 and save the array as a ccp4 file. CCTBX can open ccp4 maps as numpy…
Matteo Ferla
- 4,234
- 5
- 19
4
votes
1 answer
conda doesn't install latest version of snakemake
I would like to use the --default-resources parameter for profiles which is available in later versions of snakemake.
To install snakemake, I created a new conda environment:
conda create -n snakemake -c conda-forge -c bioconda snakemake
The…
mrhd
- 363
- 1
- 7
4
votes
2 answers
How can I classify the 3 clades(S, G, V) of the coronavirus without using protein data?
On GISAID they classified the coronavirus using 4 clades(S, G, V, Other).
I downloaded around 1,000 complete genomes of the coronavirus from GISAID and I would like to classify each one as belonging to one of the 4 clades(S, G, V, Other).
On the top…
yuval
- 141
- 5
4
votes
1 answer
How did researchers derive the Ramachandran "validation" contours?
I'm a beginner to structural biology and for fun, calculated the tortional angles of some 100, 000 proteins. Here is my Ramachandran plot:
When I went to look for "canonical" Ramachandran plots, I discovered most researchers overlay a "contour"…
batlike
- 141
- 3
4
votes
1 answer
Comparing phylogenetic models with different datasets
I'm a linguist interested in phylogenetic tree inference using language data. I'm posting here because I'm using Bayesian phylogenetic methods in my work (probably using BEAST and/or RevBayes). For the purposes of this question, please accept the…
JaydenM-C
- 149
- 3
4
votes
1 answer
What is a proper way for random subsampling of metagenomic data?
Let's say we have a metagenomic sample that is paired-end FASTQ files including 10,000,000 DNA reads collected using shotgun sequencing.
How would one make a random subsample of the mentioned metagenomic sample with for example 1,000,000 reads? I…
Remy
- 53
- 4
4
votes
1 answer
What is the aim of insertion codes in the pdb file format?
In PDB file we always see insertion codes (ic) in a few proteins. Could anybody please tell me why it has been added in structure and what is the aim of that ???
user7355
- 41
- 2
4
votes
3 answers
Do additional peaks in percent GC of PacBio gDNA reads indicate contamination?
I have two sets of PacBio reads from genomic DNA of an Aspergillus species that were made from separate preps of the culture. One of them has two additional peaks at 38% and 60% in the percent GC histogram produced by FastQC. Do these additional…
brian
- 41
- 1
4
votes
3 answers
What are the different kinds of bioluminescent genes?
I know of the common green glow gene but I forgot the name and I also know that some algae glow blue. There are so many types of bioluminesent organisms, so I am wondering what species have which genes and what other genes are associated with their…
TristanSC90
- 53
- 6