Most Popular
1500 questions
7
votes
1 answer
wtdbg2: practical implications of k-mer fsize and psize choice
I am using wtdbg2 2.3 to assemble a human genome (sequenced on PromethION from a cell line). I filtered out reads with low average quality, and now I am trying to determine the parameters that will optimize overall assembly quality.
The wtdbg2…
Daniel Standage
- 5,080
- 15
- 50
7
votes
2 answers
How to check if indels in VCF files are left or right aligned?
I downloaded a VCF file from dbSNP, and I'm curious if the indels in the file are left-aligned for GRCH37 genome. The documentation doesn't say anything.
How can we tell if a VCF file has left or right aligned indels?
SmallChess
- 2,699
- 3
- 19
- 35
7
votes
2 answers
Get the mapping statistics of a single read from a BAM file
I have a BAM file, and I have a read ID. What is the simplest way to get mapping statistics of that read in human-readable format?
E.g. I might want: % identity of aligned bases; number of insertions and deletions; number of bases aligned to the…
roblanf
- 962
- 7
- 15
7
votes
2 answers
how to find the bound form of an enzyme structure?
For my undergraduate research I'm looking for a database that gives the bound form of a particular protein structure. Is there any database that provide us with such data? So far I've found following proteins related to my work
Ascorbate…
Loch23
- 91
- 4
7
votes
4 answers
Given a transcription factor, what genes does it regulate?
I have a list of transcription factors and I am interested in finding out which which genes might be transcribed as a result of the formation of transcription factor complexes with that transcription factor.
Any ideas on good databases? I've seen…
jaslibra
- 524
- 2
- 9
7
votes
1 answer
Multithread fastq processing with kseq.h in C++11?
Background
I am using the wonderful kseq.h as my fastq/fasta parser for a C++ project. My actual implementation uses something like this, where I am counting kmers of each sequence and performing a bunch of other processing on every read…
conchoecia
- 3,141
- 2
- 16
- 40
7
votes
2 answers
Truncating branch length values of Phylogenetic tree with biopython
I have been using biopython 1.72 to display my phylogenetic tree files.
Using the function 'Phylo.draw(pars_tree, branch_labels=lambda c: c.branch_length)' to display branch lengths as well on tree, the tree displays the branch lengths as including…
Sidra Younas
- 503
- 2
- 13
7
votes
3 answers
How to perform functional analysis on a gene list in R?
From an RNA-seq experiment I have about 17000 gene ids for 2 sample conditions arranged according to their log2 fold changes when compared to a control. I need to annotate these, but I've never done annotation before and am wondering how to do this…
J0HN_TIT0R
- 541
- 1
- 4
- 7
7
votes
2 answers
How to correct alpha, and not p-values themselves, for visualization purposes
I have a set of differentially methylated/expressed/whatever entities with p-values attached (example below).
entity_name p-value magnitude
entity1 0.04459 0.68
entity2 0.02283 0.99
...
entity_n 0.78 …
Ben D.
- 397
- 1
- 10
7
votes
2 answers
What methods should I use from PythonCyc API to query metabolites in BioCyc database?
I am using PythonCyc API in order to write a query for metabolites in BioCyc. The purpose of this API is to communicate with the database software of BioCyc- Pathway Tools. Pathway Tools is in lisp therefore, PythonCyc creates a bridge between…
astridmarilyn
- 71
- 3
7
votes
2 answers
Spliced vs. unspliced ratios for transcripts in RNA-seq data
Is there a computational tool for measuring what percentage of RNA is spliced in an RNAseq experiment?
I'm not particularly interested in complicated analyses that give ratios for all possible alternative splicing variations. I'd rather have a…
Jessime Kirk
- 181
- 4
7
votes
1 answer
What unit do I get on the y-axis of a metagene profile plot?
I start with a sorted and indexed bam file ("mapped.bam") representing the mapping of small reads on a reference genome, and a bed file ("genes.bed") containing the coordinates of a set of features of interest (let's say they are genes), for which I…
bli
- 3,130
- 2
- 15
- 36
7
votes
5 answers
What motif finding software is available for multiple sequences ~10Kb?
I have around ~3,000 short sequences of approximately ~10Kb long. What are the best ways to find the motifs among all of these sequences? Is there a certain software/method recommended?
There are several ways to do this. My goal would be to:
(1)…
ShanZhengYang
- 1,691
- 1
- 14
- 20
7
votes
1 answer
How can I create my own GO association file (gaf)?
This question is based on a question on BioStars posted >2 years ago by user jack.
It describes a very frequent problem of generating GO annotations for non-model organisms. While it is based on some specific format and single application…
Michael
- 173
- 11
7
votes
2 answers
Why are TPMs per 10k or 100k in many scRNA-seq studies?
I noticed that many scRNA-seq papers normalize TPMs to 10k or 100k as opposed to 1M (as the abbreviation defines them). It doesn't really matter since you are just moving the decimal point, so why mess with an established convention?
Additionally,…
burger
- 2,179
- 10
- 21