Most Popular

1500 questions
5
votes
2 answers

How do population genetics people define a population?

How do population genetics people define a population? Do they define it as a layman will do? say Africans, Americans, or so? Or is there a more scientific way of doing so? For example, I think defining one's population as one's allele frequencies…
Haohan Wang
  • 521
  • 3
  • 8
5
votes
1 answer

What is the correct way of dealing with the analysis of data from flow cytometry?

I would like to detect a change in expression of a molecule present on a cell type by flow cytometry. Assuming I am able to detect, using an antibody, a signal that represents the amount of the molecule I'm interested into. Also assume that the…
gabt
  • 348
  • 2
  • 13
5
votes
1 answer

python package for NNI neighbors

I am working on protein sequence data files for reconstructing phylogenetic tree and I need to generate all NNI-neighbours of the tree (two trees are NNI-neighbours if one can be transformed into another by one nearest neighbour interchange…
Sidra Younas
  • 503
  • 2
  • 13
5
votes
3 answers

How to automate NCBI genome download

I need to download all the completely assembled cyanobacterial genome's GenBank file(.gbff) from NCBI(RefSeq or INSDC ftp data). For this I think, the steps are: Need to find the completely assembled genomes. find the GenBank file URL based on the…
Arijit Panda
  • 285
  • 1
  • 8
5
votes
2 answers

Counting letters in phylip alignment columns with Biopython

I have been using python 3.6 and biopython 1.72 to work with protein data files. I am using a protein sequence file (phylip format), for example: 14 678 Zebrafish LSSCGVVSGD LISVILPASS LEETQTSSAA AHQTHTDQQA GGSHVSSSSS Fugu LASCGIVSGD…
Sidra Younas
  • 503
  • 2
  • 13
5
votes
1 answer

At what stage of a transcriptome assembly is it better to perform read contaminant filter?

I'm trying to assemble a bivalve transcriptome. Since bivalves are filter feeders, their transcriptomes tend to be highly contaminated by bacteria, algae and whatnot. Since I pooled several transcriptomes, I have a high amount of reads (>2B reads).…
LinuxBlanket
  • 309
  • 1
  • 10
5
votes
3 answers

Sampling haplotypes

I am trying to simulate different genome of peoples, I have data (VCF files) of various genes from the 1000K Gene project. I want to simulate different whole genomes i.e generate a new population by combining real haplotypes I have. I am wondering…
Kozolovska
  • 241
  • 1
  • 4
5
votes
1 answer

Should PCA be standardized for gene expression?

This is a theory/good practice question more than a technical one. If samples are being plotted on a PCA projection of gene expression data, I'm wondering whether it is standard (and if so, why) to center and scale the PCs. The reason I ask is that…
5
votes
1 answer

How to interpret PCA output statistically and biologically?

How can I interpret the PCA results statistically for biological data? I have used FactoMineR and factoextra libraries for PCA Scripts used: library(FactoMineR) res.PCA = PCA(df, scale.unit=TRUE, ncp=4, graph=F ) par(mfrow=c(1,2)) plot.PCA(res.PCA,…
Dendrobium
  • 187
  • 3
5
votes
2 answers

How to scale the size of heat map and row names font size?

I have an expression data matrix (120X15; 15 samples and 120 genes), my heatmap looks blurred and raw names (gene names) looks very small and can not read. How can I improve my scripts? Here is the example data df<-structure(list(X_T0 =…
Kynda
  • 95
  • 1
  • 1
  • 6
5
votes
1 answer

Simplest way to work out structural variant type?

In VCF 4.2, a structural variant (SV) can be described with the BND keyword in SVTYPE. For example, the following example is an insertion (from https://samtools.github.io/hts-specs/VCFv4.2.pdf): #CHROM POS ID REF ALT QUAL…
SmallChess
  • 2,699
  • 3
  • 19
  • 35
5
votes
2 answers

How to run MaxQuant in command line mode?

MaxQuant is a software package for mass spectroscopy and proteomics. There is a windows version and a linux version. To run on linux you have to use a program that is called mono. I think, it is developed by Microsoft, which I find quite nice from…
Soerendip
  • 1,295
  • 11
  • 22
5
votes
4 answers

tRNAscan-SE error: FATAL: Unable to find /usr/local/bin/cmsearch executable

I have downloaded tRNAscan-SE from here. After decompressing and untaring the file, I installed it using: ./configure make make install When I type tRNAscan-SE --help, I get the help page: tRNAscan-SE 2.0 (December 2017) FATAL: No sequence file(s)…
Biomagician
  • 2,459
  • 16
  • 30
5
votes
1 answer

Find paralogs in a draft genome

We generated a (diploid, chordata, highly heterozgous) genome using PacBio and we wanted to see whether it contains lineage-specific duplications (paralogs, basically). The genome is not in Ensembl yet. The only data we have at the moment…
aechchiki
  • 2,676
  • 11
  • 34
5
votes
2 answers

No variant found using GATK 4.0 HaplotypeCaller

I am doing variant calling on RNA-seq datasets from wheat which is hexaploid,the binary alignment (BAM) files were created using STAR version 2.6.0c and variant calling was done using GATK 4.0 HaplotypeCaller.The whole pipeline is as…