Most Popular

1500 questions
6
votes
3 answers

Fast filtering of intervals not falling within a certain distance from known genes

I would like to filter a bed file with intervals, ie. in the format of: chr1 13800 14301 chr1 15500 16001 chr1 19400 19901 chr1 22800 23301 In particular, I want to filter out intervals that fall far from known genes, let's say…
gc5
  • 1,783
  • 18
  • 32
6
votes
1 answer

Entrez or Ensembl gene IDs?

I'm using several datasets that are encoded using either Entrez or Ensembl IDs to specify genes, and need to decide on which to standardise on. Are there any major reasons to use one over the other? Which tools and conversion tables are best to use…
user657
6
votes
1 answer

How do I rewrite a read group using pysam?

I am trying to rewrite a SAM/BAM file with altered read group entries using pysam. In this simplified version, I want to take a BAM, and rewrite the SM tags in all read groups to the same string and copy the alignments with new header into a new…
mattm
  • 754
  • 7
  • 19
6
votes
2 answers

Transcript Coordinate Ranges to Genomic Coordinates

I have 2 GFF3 files: Features using transcript IDs as the landmarks. i.e. "CDS" feature types using coordinates from transcript space. Features using chromosome IDs as the landmarks. i.e. "exon" feature types using coordinates from chromosome…
6
votes
1 answer

What's the scaling for HOMER metagenes?

I'm trying to use HOMER to make a metagene profile over gene bodies using a bedgraph file I've generated. The problem is that every time I do, I get really weird scaling on the y-axis. I should be getting average values across the gene body on the…
6
votes
2 answers

How to achieve blast results according to the intuitive interpretation of `-max_target_seqs`?

Very recently a BLAST parameter -max_target_seqs n got a lot of attention. Instead of the intuitive interpretation (return the best n sequences) the parameters asks blast to return the first n sequences that pass the e-value threshold. This is…
Kamil S Jaron
  • 5,542
  • 2
  • 25
  • 59
6
votes
1 answer

State of the art in predicting Translation Initiation Sites

I'm working on a university project of predicting Translation Initiation Sites in human DNA. I searched the net for papers and documentation to get guidelines and inspiration, but I feel uncertain that I was able to find the state of the art in…
user548
6
votes
1 answer

GATK documentation for required depth to reliably call heterozygous mutation in diploid organism?

I'm looking for official GATK documentation (or a recent manuscript) that defines a general recommendation/requirement for sequencing depth to reliably call a heterozygous point mutation in a diploid organism (WGS). In this case, I'm working with…
Mark Ebbert
  • 1,354
  • 10
  • 22
6
votes
4 answers

Convert SRA to FastA

I'm trying to get the FastA files for some accessions (like NC_001416.1). I did not managed to find an FTP server or direct link to these files (I want to get it from command line with wget, not from a web browser). But I found an "equivalent" file…
Poshi
  • 221
  • 1
  • 8
6
votes
1 answer

What are some good practices to follow during EPIC DNA methylation data analysis?

I recently got some EPIC DNA methylation data and I was wondering what are some good practices to follow? I am interested in knowing about normalization and differential analysis. Thank you.
deepseas
  • 163
  • 1
  • 6
6
votes
2 answers

Sequence alignment using Markov Model

I am learning about applying Markov model to sequence alignment. The prof says that the transition probabilities from a gap-residue alignment to a residue-gap alignment and vice versa are both 0. Is there any biological/mathematical reason behind…
Zeyuan
  • 163
  • 3
6
votes
1 answer

Gathering data on bacterial organism growth conditions

let's say I have a list of organism names like the example below Achromobacter xylosoxidans Acidithiobacillus ferrooxidans Chloroflexus aurantiacus Clostridium perfringens Aquaspirillum arcticum The goal is to identify the optimal growth conditions…
Sam
  • 61
  • 1
6
votes
1 answer

Getting all NNI trees of a parsimony tree

I have an aligned protein sequence file which I have been using for reconstructing a parsimonious tree. I am currently using NNITreeSearcher._get_neighbors method from Biopython 1.72 but it's way to find one best scored tree only. def _nni(self,…
Sidra Younas
  • 503
  • 2
  • 13
6
votes
4 answers

Finding gene length using ensembl ID

I want to find the length of a list of genes, of Homo sapiens, that is reported in the GEO database. I have gathered the ensembl id's of those genes. I understand this information can be parsed from the start and end position given in GTF file…
Natasha
  • 125
  • 2
  • 10
6
votes
2 answers

How I can test my hypothesises computationally

I have single cell RNA-seq data on about 2000 cells in 9 time point. I have clustered my cells in each time point by Seurat. I am seeing in some time points I have 3 clusters while in another time points I have 2 clusters of cells (please look at…
Zizogolu
  • 2,148
  • 11
  • 44