Most Popular
1500 questions
4
votes
1 answer
Counting hexamers in fasta sequence and identify its structure (and interruptions)
I have a lot of fasta files, each one with thousand of reads containing the hexameric motif "CCCTCT". The hexameric motif is highly continuous in most cases but interruptions may occur. I need to count the hexameric motif keeping the read ID and…
Amaru
- 41
- 2
4
votes
3 answers
Variant vs Allele vs SNP
Coming from a CS background. Reading through the wikipedia page, these all sounds like the same thing: Variant, Allele, and SNP.
Variant/Allele/SNP: Some gene locus that differs from the idea human.
For example, if 99% of humans have a T at some…
Matthaeus Gaius Caesar
- 141
- 1
- 3
4
votes
0 answers
How to get phylogenetic tree from multiple genes?
I constructed a phylogenetic tree using a gene (example - secA). I had to gather the same gene sequence for all the required species from public database-NCBI and then constructed the tree after multiple sequence alignment. I used the MEGA X…
abelfit
- 73
- 4
4
votes
4 answers
Unable to open .bam file in C++ using SeqAn due to 'seqan::UnknownExtensionError'
I am trying to open .bam files in C++ to extract reads occurring at specific scaffolds and loci. I essentially want to call "samtools view sample.bam -o sample.sam scaffold:pos-pos" from C++. I have tried system("samtools view sample.bam -o…
annabelperry
- 199
- 1
- 9
4
votes
1 answer
How do I get GO annotations for a list of UniProt IDs?
I have a list of UniProt ids that I want to get Gene Ontology annotations for. I need this information because I want this high-level information as an input to a neural network. The model I wish to develop is inspired by this paper:…
ChemBot
- 43
- 2
4
votes
1 answer
use Kallisto in galaxy
I want to use Kallisto for sequence alignment in Galaxy. Its description is:
a program for quantifying abundances of transcripts from bulk and
single-cell RNA-Seq data, or more generally of target sequences using
high-throughput sequencing…
Zahrae
- 63
- 3
4
votes
1 answer
Programmatically retrieve Metadata from SRA Run Selector
I previously asked a question about how to retrieve the Accession List associated with a SRA project. The answer was:
esearch -db sra -query 'PRJNA491191[bioproject]' | efetch -format runinfo
where PRJNA491191 is the bioproject that I'm interested…
An Ignorant Wanderer
- 545
- 4
- 12
4
votes
1 answer
How do you convert Raw Alignment Score to Bit Score?
I'm coding a pipeline where I make a lot of pairwise alignments, and I end up with raw alignment scores. But, I really need to look at my results in terms of bit scores.
I know that the formula is:
$𝑆′ = (ƛ*𝑆 − 𝑙𝑛(𝐾)) / 𝑙𝑛(2)$
But, I don't know what…
Peter Danilov
- 41
- 6
4
votes
1 answer
Annotating a .vcf with centimorgan information
Some programs (e.g. shapeit4) automatically annotate an INFO tag into a .vcf file which gives the cumulative genetic distance in cM between each SNP:
##fileformat=VCFv4.2
##FILTER=
##fileDate=10/11/2020 -…
user438383
- 1,679
- 1
- 8
- 21
4
votes
2 answers
use same output in two processes in nextflow dsl2
This is my workflow:
pre_align()
pre_align.out.single_fastqs.view()
get_fq_info(pre_align.out.single_fastqs)
align_bwa(get_fq_info.out.fq_info)
align_bowtie2(get_fq_info.out.fq_info)
where I want to use the same output from get_fq_info as input…
aerijman
- 645
- 5
- 14
4
votes
1 answer
Bruker MALDI-TOF bacteria species identification scoring algorithm
Just wondering whether anyone can point me to a research paper which describes how the scoring values are generated by the Bruker software for species identification of bacteria in MALDI-TOF MS. For example, there are countless papers describing the…
There
- 151
- 3
4
votes
0 answers
How can I use statistics to compare microbial phenotypes?
Note: this question has also been asked on Biostars
I am currently trying to create a theoretical argument that a microbe's phenotype can affect gene expression in their host. I have 5 species of microbes, each with a different COG (Cluster of…
pythonbeginner44
- 81
- 2
4
votes
1 answer
How is the odds ratio of disease risk conferred by a 1-standard deviation increase in PRS calculated?
One standard deviation from the mean is commonly used to calculate a polygenic risk score for GWAS, e.g. human genetic disease.
Why is this a common metric, for example why not 2-SD or 1.96 SD as in the normal distribution and what is the…
Ramiro Magno
- 165
- 1
- 7
4
votes
1 answer
Why did expression based subtypng of breast cancer gain much more acceptance than others
This is may not be entirely technical question but rather a academic question. But the technique behind the application is within the scope of bioinformatics. So I would try to ask here that:
In each cancer type, there have been tons of papers that…
unicorn
- 211
- 1
- 4
4
votes
2 answers
Why do molecular generation models maximize “penalized logP” as a measure of drug-likeliness?
I found that Lipinski's rule of five states that Log P (octanol-water partition coefficient, lipophilicity measure) usually should not exceed 5.
Many papers about drug discovery machine learning models tell about maximization of "penalized logP",…
Slowpoke
- 143
- 3