Highest Voted Questions - Bioinformatics Stack Exchange

6

votes

3 answers

Fast filtering of intervals not falling within a certain distance from known genes

I would like to filter a bed file with intervals, ie. in the format of: chr1 13800 14301 chr1 15500 16001 chr1 19400 19901 chr1 22800 23301 In particular, I want to filter out intervals that fall far from known genes, let's say…

asked Oct 28 '18 at 00:02

gc5

1,783
18
32

6

votes

1 answer

Entrez or Ensembl gene IDs?

I'm using several datasets that are encoded using either Entrez or Ensembl IDs to specify genes, and need to decide on which to standardise on. Are there any major reasons to use one over the other? Which tools and conversion tables are best to use…

asked Jun 07 '17 at 05:25

user657

6

votes

1 answer

How do I rewrite a read group using pysam?

I am trying to rewrite a SAM/BAM file with altered read group entries using pysam. In this simplified version, I want to take a BAM, and rewrite the SM tags in all read groups to the same string and copy the alignments with new header into a new…

asked Oct 09 '18 at 18:42

mattm

754
7
19

6

votes

2 answers

Transcript Coordinate Ranges to Genomic Coordinates

I have 2 GFF3 files: Features using transcript IDs as the landmarks. i.e. "CDS" feature types using coordinates from transcript space. Features using chromosome IDs as the landmarks. i.e. "exon" feature types using coordinates from chromosome…

asked Oct 05 '18 at 04:23

Nathan S. Watson-Haigh

407
3
11

6

votes

1 answer

What's the scaling for HOMER metagenes?

I'm trying to use HOMER to make a metagene profile over gene bodies using a bedgraph file I've generated. The problem is that every time I do, I get really weird scaling on the y-axis. I should be getting average values across the gene body on the…

asked Jun 07 '17 at 03:39

bioinform_noob

101
4

6

votes

2 answers

How to achieve blast results according to the intuitive interpretation of `-max_target_seqs`?

Very recently a BLAST parameter -max_target_seqs n got a lot of attention. Instead of the intuitive interpretation (return the best n sequences) the parameters asks blast to return the first n sequences that pass the e-value threshold. This is…

blast

asked Sep 28 '18 at 09:54

Kamil S Jaron

5,542
2
25
59

6

votes

1 answer

State of the art in predicting Translation Initiation Sites

I'm working on a university project of predicting Translation Initiation Sites in human DNA. I searched the net for papers and documentation to get guidelines and inspiration, but I feel uncertain that I was able to find the state of the art in…

asked Jun 06 '17 at 23:44

user548

6

votes

1 answer

GATK documentation for required depth to reliably call heterozygous mutation in diploid organism?

I'm looking for official GATK documentation (or a recent manuscript) that defines a general recommendation/requirement for sequencing depth to reliably call a heterozygous point mutation in a diploid organism (WGS). In this case, I'm working with…

asked Sep 24 '18 at 16:02

Mark Ebbert

1,354
10
22

6

votes

4 answers

Convert SRA to FastA

I'm trying to get the FastA files for some accessions (like NC_001416.1). I did not managed to find an FTP server or direct link to these files (I want to get it from command line with wget, not from a web browser). But I found an "equivalent" file…

asked Sep 19 '18 at 12:57

Poshi

221
1
8

6

votes

1 answer

What are some good practices to follow during EPIC DNA methylation data analysis?

I recently got some EPIC DNA methylation data and I was wondering what are some good practices to follow? I am interested in knowing about normalization and differential analysis. Thank you.

asked Jun 06 '17 at 20:58

deepseas

163
1
6

6

votes

2 answers

Sequence alignment using Markov Model

I am learning about applying Markov model to sequence alignment. The prof says that the transition probabilities from a gap-residue alignment to a residue-gap alignment and vice versa are both 0. Is there any biological/mathematical reason behind…

asked Sep 11 '18 at 08:30

Zeyuan

163
3

6

votes

1 answer

Gathering data on bacterial organism growth conditions

let's say I have a list of organism names like the example below Achromobacter xylosoxidans Acidithiobacillus ferrooxidans Chloroflexus aurantiacus Clostridium perfringens Aquaspirillum arcticum The goal is to identify the optimal growth conditions…

asked Aug 31 '18 at 18:59

Sam

61
1

6

votes

1 answer

Getting all NNI trees of a parsimony tree

I have an aligned protein sequence file which I have been using for reconstructing a parsimonious tree. I am currently using NNITreeSearcher._get_neighbors method from Biopython 1.72 but it's way to find one best scored tree only. def _nni(self,…

asked Aug 30 '18 at 21:24

Sidra Younas

503
2
13

6

votes

4 answers

Finding gene length using ensembl ID

I want to find the length of a list of genes, of Homo sapiens, that is reported in the GEO database. I have gathered the ensembl id's of those genes. I understand this information can be parsed from the start and end position given in GTF file…

asked Aug 26 '18 at 05:08

Natasha

125
2
10

6

votes

2 answers

How I can test my hypothesises computationally

I have single cell RNA-seq data on about 2000 cells in 9 time point. I have clustered my cells in each time point by Seurat. I am seeing in some time points I have 3 clusters while in another time points I have 2 clusters of cells (please look at…

asked Aug 22 '18 at 16:31

Zizogolu

2,148
11
44

Most Popular