Most Popular
1500 questions
6
votes
3 answers
Fast filtering of intervals not falling within a certain distance from known genes
I would like to filter a bed file with intervals, ie. in the format of:
chr1 13800 14301
chr1 15500 16001
chr1 19400 19901
chr1 22800 23301
In particular, I want to filter out intervals that fall far from known genes, let's say…
gc5
- 1,783
- 18
- 32
6
votes
1 answer
Entrez or Ensembl gene IDs?
I'm using several datasets that are encoded using either Entrez or Ensembl IDs to specify genes, and need to decide on which to standardise on.
Are there any major reasons to use one over the other?
Which tools and conversion tables are best to use…
user657
6
votes
1 answer
How do I rewrite a read group using pysam?
I am trying to rewrite a SAM/BAM file with altered read group entries using pysam. In this simplified version, I want to take a BAM, and rewrite the SM tags in all read groups to the same string and copy the alignments with new header into a new…
mattm
- 754
- 7
- 19
6
votes
2 answers
Transcript Coordinate Ranges to Genomic Coordinates
I have 2 GFF3 files:
Features using transcript IDs as the landmarks. i.e. "CDS" feature types using coordinates from transcript space.
Features using chromosome IDs as the landmarks. i.e. "exon" feature types using coordinates from chromosome…
Nathan S. Watson-Haigh
- 407
- 3
- 11
6
votes
1 answer
What's the scaling for HOMER metagenes?
I'm trying to use HOMER to make a metagene profile over gene bodies using a bedgraph file I've generated. The problem is that every time I do, I get really weird scaling on the y-axis. I should be getting average values across the gene body on the…
bioinform_noob
- 101
- 4
6
votes
2 answers
How to achieve blast results according to the intuitive interpretation of `-max_target_seqs`?
Very recently a BLAST parameter -max_target_seqs n got a lot of attention. Instead of the intuitive interpretation (return the best n sequences) the parameters asks blast to return the first n sequences that pass the e-value threshold. This is…
Kamil S Jaron
- 5,542
- 2
- 25
- 59
6
votes
1 answer
State of the art in predicting Translation Initiation Sites
I'm working on a university project of predicting Translation Initiation Sites in human DNA. I searched the net for papers and documentation to get guidelines and inspiration, but I feel uncertain that I was able to find the state of the art in…
user548
6
votes
1 answer
GATK documentation for required depth to reliably call heterozygous mutation in diploid organism?
I'm looking for official GATK documentation (or a recent manuscript) that defines a general recommendation/requirement for sequencing depth to reliably call a heterozygous point mutation in a diploid organism (WGS). In this case, I'm working with…
Mark Ebbert
- 1,354
- 10
- 22
6
votes
4 answers
Convert SRA to FastA
I'm trying to get the FastA files for some accessions (like NC_001416.1). I did not managed to find an FTP server or direct link to these files (I want to get it from command line with wget, not from a web browser). But I found an "equivalent" file…
Poshi
- 221
- 1
- 8
6
votes
1 answer
What are some good practices to follow during EPIC DNA methylation data analysis?
I recently got some EPIC DNA methylation data and I was wondering what are some good practices to follow?
I am interested in knowing about normalization and differential analysis. Thank you.
deepseas
- 163
- 1
- 6
6
votes
2 answers
Sequence alignment using Markov Model
I am learning about applying Markov model to sequence alignment. The prof says that the transition probabilities from a gap-residue alignment to a residue-gap alignment and vice versa are both 0. Is there any biological/mathematical reason behind…
Zeyuan
- 163
- 3
6
votes
1 answer
Gathering data on bacterial organism growth conditions
let's say I have a list of organism names like the example below
Achromobacter xylosoxidans
Acidithiobacillus ferrooxidans
Chloroflexus aurantiacus
Clostridium perfringens
Aquaspirillum arcticum
The goal is to identify the optimal growth conditions…
Sam
- 61
- 1
6
votes
1 answer
Getting all NNI trees of a parsimony tree
I have an aligned protein sequence file which I have been using for reconstructing a parsimonious tree. I am currently using NNITreeSearcher._get_neighbors method from Biopython 1.72 but it's way to find one best scored tree only.
def _nni(self,…
Sidra Younas
- 503
- 2
- 13
6
votes
4 answers
Finding gene length using ensembl ID
I want to find the length of a list of genes, of Homo sapiens, that is reported in the GEO database. I have gathered the ensembl id's of those genes.
I understand this information can be parsed from the start and end position given in GTF file…
Natasha
- 125
- 2
- 10
6
votes
2 answers
How I can test my hypothesises computationally
I have single cell RNA-seq data on about 2000 cells in 9 time point. I have clustered my cells in each time point by Seurat. I am seeing in some time points I have 3 clusters while in another time points I have 2 clusters of cells (please look at…
Zizogolu
- 2,148
- 11
- 44