Highest Voted Questions - Bioinformatics Stack Exchange

4

votes

1 answer

Counting hexamers in fasta sequence and identify its structure (and interruptions)

I have a lot of fasta files, each one with thousand of reads containing the hexameric motif "CCCTCT". The hexameric motif is highly continuous in most cases but interruptions may occur. I need to count the hexameric motif keeping the read ID and…

asked Nov 02 '21 at 00:29

Amaru

41
2

4

votes

3 answers

Variant vs Allele vs SNP

Coming from a CS background. Reading through the wikipedia page, these all sounds like the same thing: Variant, Allele, and SNP. Variant/Allele/SNP: Some gene locus that differs from the idea human. For example, if 99% of humans have a T at some…

genetics

asked Sep 17 '21 at 04:00

Matthaeus Gaius Caesar

141
1
3

4

votes

0 answers

How to get phylogenetic tree from multiple genes?

I constructed a phylogenetic tree using a gene (example - secA). I had to gather the same gene sequence for all the required species from public database-NCBI and then constructed the tree after multiple sequence alignment. I used the MEGA X…

asked Sep 02 '21 at 08:04

abelfit

73
4

4

votes

4 answers

Unable to open .bam file in C++ using SeqAn due to 'seqan::UnknownExtensionError'

I am trying to open .bam files in C++ to extract reads occurring at specific scaffolds and loci. I essentially want to call "samtools view sample.bam -o sample.sam scaffold:pos-pos" from C++. I have tried system("samtools view sample.bam -o…

asked Jun 27 '21 at 13:18

annabelperry

199
1
9

4

votes

1 answer

How do I get GO annotations for a list of UniProt IDs?

I have a list of UniProt ids that I want to get Gene Ontology annotations for. I need this information because I want this high-level information as an input to a neural network. The model I wish to develop is inspired by this paper:…

asked Jun 27 '21 at 13:14

ChemBot

43
2

4

votes

1 answer

use Kallisto in galaxy

I want to use Kallisto for sequence alignment in Galaxy. Its description is: a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing…

asked Jun 22 '21 at 11:48

Zahrae

63
3

4

votes

1 answer

Programmatically retrieve Metadata from SRA Run Selector

I previously asked a question about how to retrieve the Accession List associated with a SRA project. The answer was: esearch -db sra -query 'PRJNA491191[bioproject]' | efetch -format runinfo where PRJNA491191 is the bioproject that I'm interested…

asked Jun 08 '21 at 13:22

An Ignorant Wanderer

545
4
12

4

votes

1 answer

How do you convert Raw Alignment Score to Bit Score?

I'm coding a pipeline where I make a lot of pairwise alignments, and I end up with raw alignment scores. But, I really need to look at my results in terms of bit scores. I know that the formula is: $𝑆′ = (ƛ*𝑆 − 𝑙𝑛(𝐾)) / 𝑙𝑛(2)$ But, I don't know what…

asked May 29 '21 at 21:49

Peter Danilov

41
6

4

votes

1 answer

Annotating a .vcf with centimorgan information

Some programs (e.g. shapeit4) automatically annotate an INFO tag into a .vcf file which gives the cumulative genetic distance in cM between each SNP: ##fileformat=VCFv4.2 ##FILTER= ##fileDate=10/11/2020 -…

asked May 24 '21 at 13:19

user438383

1,679
1
8
21

4

votes

2 answers

use same output in two processes in nextflow dsl2

This is my workflow: pre_align() pre_align.out.single_fastqs.view() get_fq_info(pre_align.out.single_fastqs) align_bwa(get_fq_info.out.fq_info) align_bowtie2(get_fq_info.out.fq_info) where I want to use the same output from get_fq_info as input…

nextflow

asked May 22 '21 at 12:01

aerijman

645
5
14

4

votes

1 answer

Bruker MALDI-TOF bacteria species identification scoring algorithm

Just wondering whether anyone can point me to a research paper which describes how the scoring values are generated by the Bruker software for species identification of bacteria in MALDI-TOF MS. For example, there are countless papers describing the…

asked May 17 '21 at 19:46

There

151
3

4

votes

0 answers

How can I use statistics to compare microbial phenotypes?

Note: this question has also been asked on Biostars I am currently trying to create a theoretical argument that a microbe's phenotype can affect gene expression in their host. I have 5 species of microbes, each with a different COG (Cluster of…

asked May 13 '21 at 19:20

pythonbeginner44

81
2

4

votes

1 answer

How is the odds ratio of disease risk conferred by a 1-standard deviation increase in PRS calculated?

One standard deviation from the mean is commonly used to calculate a polygenic risk score for GWAS, e.g. human genetic disease. Why is this a common metric, for example why not 2-SD or 1.96 SD as in the normal distribution and what is the…

gwas

asked May 02 '21 at 23:41

Ramiro Magno

165
1
7

4

votes

1 answer

Why did expression based subtypng of breast cancer gain much more acceptance than others

This is may not be entirely technical question but rather a academic question. But the technique behind the application is within the scope of bioinformatics. So I would try to ask here that: In each cancer type, there have been tons of papers that…

asked Mar 31 '21 at 05:44

unicorn

211
1
4

4

votes

2 answers

Why do molecular generation models maximize “penalized logP” as a measure of drug-likeliness?

I found that Lipinski's rule of five states that Log P (octanol-water partition coefficient, lipophilicity measure) usually should not exceed 5. Many papers about drug discovery machine learning models tell about maximization of "penalized logP",…

asked Mar 29 '21 at 17:21

Slowpoke

143
3

Most Popular