Most Popular

1500 questions
4
votes
1 answer

Counting hexamers in fasta sequence and identify its structure (and interruptions)

I have a lot of fasta files, each one with thousand of reads containing the hexameric motif "CCCTCT". The hexameric motif is highly continuous in most cases but interruptions may occur. I need to count the hexameric motif keeping the read ID and…
Amaru
  • 41
  • 2
4
votes
3 answers

Variant vs Allele vs SNP

Coming from a CS background. Reading through the wikipedia page, these all sounds like the same thing: Variant, Allele, and SNP. Variant/Allele/SNP: Some gene locus that differs from the idea human. For example, if 99% of humans have a T at some…
4
votes
0 answers

How to get phylogenetic tree from multiple genes?

I constructed a phylogenetic tree using a gene (example - secA). I had to gather the same gene sequence for all the required species from public database-NCBI and then constructed the tree after multiple sequence alignment. I used the MEGA X…
abelfit
  • 73
  • 4
4
votes
4 answers

Unable to open .bam file in C++ using SeqAn due to 'seqan::UnknownExtensionError'

I am trying to open .bam files in C++ to extract reads occurring at specific scaffolds and loci. I essentially want to call "samtools view sample.bam -o sample.sam scaffold:pos-pos" from C++. I have tried system("samtools view sample.bam -o…
annabelperry
  • 199
  • 1
  • 9
4
votes
1 answer

How do I get GO annotations for a list of UniProt IDs?

I have a list of UniProt ids that I want to get Gene Ontology annotations for. I need this information because I want this high-level information as an input to a neural network. The model I wish to develop is inspired by this paper:…
ChemBot
  • 43
  • 2
4
votes
1 answer

use Kallisto in galaxy

I want to use Kallisto for sequence alignment in Galaxy. Its description is: a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing…
Zahrae
  • 63
  • 3
4
votes
1 answer

Programmatically retrieve Metadata from SRA Run Selector

I previously asked a question about how to retrieve the Accession List associated with a SRA project. The answer was: esearch -db sra -query 'PRJNA491191[bioproject]' | efetch -format runinfo where PRJNA491191 is the bioproject that I'm interested…
4
votes
1 answer

How do you convert Raw Alignment Score to Bit Score?

I'm coding a pipeline where I make a lot of pairwise alignments, and I end up with raw alignment scores. But, I really need to look at my results in terms of bit scores. I know that the formula is: $𝑆′ = (ƛ*𝑆 − 𝑙𝑛(𝐾)) / 𝑙𝑛(2)$ But, I don't know what…
4
votes
1 answer

Annotating a .vcf with centimorgan information

Some programs (e.g. shapeit4) automatically annotate an INFO tag into a .vcf file which gives the cumulative genetic distance in cM between each SNP: ##fileformat=VCFv4.2 ##FILTER= ##fileDate=10/11/2020 -…
user438383
  • 1,679
  • 1
  • 8
  • 21
4
votes
2 answers

use same output in two processes in nextflow dsl2

This is my workflow: pre_align() pre_align.out.single_fastqs.view() get_fq_info(pre_align.out.single_fastqs) align_bwa(get_fq_info.out.fq_info) align_bowtie2(get_fq_info.out.fq_info) where I want to use the same output from get_fq_info as input…
aerijman
  • 645
  • 5
  • 14
4
votes
1 answer

Bruker MALDI-TOF bacteria species identification scoring algorithm

Just wondering whether anyone can point me to a research paper which describes how the scoring values are generated by the Bruker software for species identification of bacteria in MALDI-TOF MS. For example, there are countless papers describing the…
There
  • 151
  • 3
4
votes
0 answers

How can I use statistics to compare microbial phenotypes?

Note: this question has also been asked on Biostars I am currently trying to create a theoretical argument that a microbe's phenotype can affect gene expression in their host. I have 5 species of microbes, each with a different COG (Cluster of…
4
votes
1 answer

How is the odds ratio of disease risk conferred by a 1-standard deviation increase in PRS calculated?

One standard deviation from the mean is commonly used to calculate a polygenic risk score for GWAS, e.g. human genetic disease. Why is this a common metric, for example why not 2-SD or 1.96 SD as in the normal distribution and what is the…
Ramiro Magno
  • 165
  • 1
  • 7
4
votes
1 answer

Why did expression based subtypng of breast cancer gain much more acceptance than others

This is may not be entirely technical question but rather a academic question. But the technique behind the application is within the scope of bioinformatics. So I would try to ask here that: In each cancer type, there have been tons of papers that…
unicorn
  • 211
  • 1
  • 4
4
votes
2 answers

Why do molecular generation models maximize “penalized logP” as a measure of drug-likeliness?

I found that Lipinski's rule of five states that Log P (octanol-water partition coefficient, lipophilicity measure) usually should not exceed 5. Many papers about drug discovery machine learning models tell about maximization of "penalized logP",…
Slowpoke
  • 143
  • 3