Most Popular

1500 questions
9
votes
3 answers

A new paper suggests the Corona Virus has "Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1" - What does this mean?

Quote: We found 4 insertions in the spike glycoprotein (S) which are unique to the 2019-nCoV and are not present in other coronaviruses. Importantly, amino acid residues in all the 4 inserts have identity or similarity to those in the HIV-1 gp120…
SurpriseDog
  • 192
  • 1
  • 9
9
votes
1 answer

Large-scale gRNA design for a CRISPR screen

What are the best tools to design gRNAs in a high-throughput way for a CRISPR screen, e.g. targeting all protein-coding genes in a genome? I would like to take into account possible off-target effects, as well as to allow for flexibility in the PAM…
Sarah Carl
  • 362
  • 2
  • 11
8
votes
2 answers

How to isolate genes from whole genomes for phylogenetic tree analysis?

I have 446 whole Klebsiella Pneumoniae genomes I want to build a phylogenetic tree from. After reading about constructing phylogenetic trees it seems the only option for large numbers of genomes is to isolate a gene with low variability from…
Daniel Harris
  • 303
  • 2
  • 7
8
votes
3 answers

How to quickly determine mutations in a read of a sam file?

After DNA sequencing, I generated a sam file through alignment of a fastq file. Before using well known variant calling programs (eg. Annovar etc.), I want to pick some reads and know what kinds of mutations are there. Is there any quick way of…
user345394
  • 675
  • 6
  • 20
8
votes
3 answers

A tool or webserver for building PSSM matrix

I have some protein sequences and I want to build a position-specific scoring matrix (PSSM) for them and then upload this PSSM to NCBI PSI-BLAST. I used CHAPS program for this pupose but uploading the output PSSM gave me an error in NCBI PSI-BLAST.…
Sara
  • 777
  • 1
  • 6
  • 18
8
votes
3 answers

How can I read FCS files using open source libraries?

FCS is a patented data format used for storing flow cytometry data. The most recent version is FCS3.1. There is some documentation on the format, but there is no information on how to read these files. There are some R packages and a MATLAB code to…
WYSIWYG
  • 263
  • 2
  • 10
8
votes
3 answers

How to convert the .vcf (imputed) file with GT:GP format to GT:DS?

I have the genotyped data from impute2 output in .gen format (imputed to 1000G P3). The file has genotype posterior probabilities (GP:3 values per variant). I have converted .gen to .vcf using qctools and the .vcf file has GT:GP format. I need to…
Nilufer
  • 81
  • 1
  • 2
8
votes
3 answers

How to merge sam files together with adding read groups

I have three sequencing libraries of single individual mapped to a reference using bwa-mem. I would like to merge the three unsorted .sam files I have so, I can call variants and heterozygosity estimates using atlas. Atlas requires one input mapping…
Kamil S Jaron
  • 5,542
  • 2
  • 25
  • 59
8
votes
4 answers

How to do `bedtools intersection` using pandas alone?

I have two pandas Dataframes, using python3.x: import pandas as pd dict1 = {0:['chr1','chr1','chr1','chr1','chr2'], 1:[1, 100, 150, 900, 1], 2:[100, 200, 500, 950, 100], 3:['feature1', 'feature2', 'feature3', 'feature4', 'feature4'], …
EB2127
  • 1,413
  • 2
  • 10
  • 23
8
votes
5 answers

What is the standard way to work with a diploid reference genome? Complementary strands?

At the moment, the standard reference genomes (e.g. hg19, hg38) are haploid genomes. We know that the human genome is diploid. Naturally, the latter would be the respectively correct representation of the human genome. More and more biologists are…
ShanZhengYang
  • 1,691
  • 1
  • 14
  • 20
8
votes
1 answer

Publicly available, free, complete database for antibiotics names and classes?

This is a tough one I think: is there a publicly available, up-to-date, free, complete database for antibiotics names and classes? I am specifically looking for information like, e.g., cefpirome (is a) cephalosporin. I've looked in a couple of…
BaCh
  • 734
  • 4
  • 9
8
votes
2 answers

Getting a "system is computationally singular" error in sleuth

I am analysing 142 samples belonging to 6 batches. Additionally, those samples belong to 72 strains, which means that for most of the strains there are two samples. I could fit simple models (for strain and batches for instance), but when I get to…
mgalardini
  • 977
  • 7
  • 18
8
votes
3 answers

What are the ways to keep track of branches in the analysis?

I'm going through an RNA-seq pipeline in R/Bioconductor and want to try multiple parameters at subsequent steps, for example, running clustering with different settings, running RegressOut or not on unwanted effects etc. That's a lot of "versions",…
Peter
  • 2,634
  • 15
  • 33
8
votes
1 answer

What is the etymology of "Entrez ID"?

Since I have seen NCBI gene names were called "Entrez ID" for the first time, I am wondering where that comes from. Such a weird name! Does anybody know where that originates? My hypothesis is: in French, "entrez" can be translated to "please enter"…
francoiskroll
  • 221
  • 1
  • 3
8
votes
1 answer

lower mapping rates in salmon v0.13 compared to previous versions

Hi there :) Thanks for the tool! I recently updated to the new salmon (from 0.8... its been a couple years) and I noticed that my mapping percentages change dramatically between the two versions. For example, using the default settings in v0.8, I…