Highest Voted Questions - Bioinformatics Stack Exchange

8

votes

4 answers

Working with old genome builds

Is working with and relying on old genome builds still valid? For example NCBI36/hg18. Would results from papers based on old builds require LiftOver and re-analysis to be useful? A bit of context, this is related to other post, where we have aCGH…

asked May 31 '17 at 20:47

zx8754

1,042
8
22

8

votes

1 answer

Comparison of gene set enrichment statistics

I am performing a gene set enrichment analysis to determine if particular gene sets are coherently up- or down-regulated. I have seen several statistics for computing a p-value of GSEA-style enrichment. I'm particularly interested in the differences…

asked Dec 01 '17 at 18:28

Nuclear Hoagie

246
1
4

8

votes

2 answers

Comparing two genome annotations

I have one-dimentional array (human genome). Also I have two annotations for it, we can think about them as different peaks (it's nucleosome and secondary structures). How can we find correlation and/or causation between this annotations? Currently…

asked Nov 17 '17 at 19:26

D M

181
1

8

votes

1 answer

What's a good ontology for drug names?

I have ... A database with patient phenotypes in, stored as HPO terms Genetic data in whatever format I need I want ... To store drug names in a way that won't make my life difficult If it's feasible, to make them so that you could draw links…

asked Nov 17 '17 at 11:36

Algy Taylor

183
4

8

votes

3 answers

Have DNA motifs 6-12bp long, trying to get conservation scores

I have about 200 short nucleotide motifs (6-12 bp in length) from the human genome, and I'm trying to see how conserved they are across vertebrates. I was thinking that I'd need to make a bed file for each motif that lists all of its occurrences in…

asked May 30 '17 at 21:18

Eric Brenner

132
6

8

votes

3 answers

Least present k-mers in the human genome

What are the least present k-mers in the human genome at different sizes? Starting with k=4 and going up in size until k=10, what are the k-mers least seen (or not at all) in the human genome? I am only interested in the reference human genome, so I…

asked Nov 15 '17 at 10:22

719016

2,324
13
19

8

votes

4 answers

How can the cell line contribution be estimated from RNASeq data?

Using a laser-capture microdissection of cells a group of cells stained with the marker of interest was sequenced. In another cohort of patients (this is all human liver tissue) the whole tissue was sequenced (RNA-seq in both cases) Can I estimate…

asked May 30 '17 at 07:30

llrs

4,693
1
18
42

8

votes

2 answers

How to measure or assign hydrophobicity score values to individual amino acids of a PDB structure?

I want measure the hydrophobicity of each amino acid within a PDB structure file. Since I have the PDB file I want to consider the 3D information, rather than sequence-only measures such as GRAVY. I am not interested in measuring the SASA (solvent…

asked Oct 11 '17 at 15:48

Aalawlx

517
4
12

8

votes

3 answers

How to select the most representative pathways from a gene enrichment analysis?

I have perform an enrichment analysis to a cluster of genes. The output is a list of pathways and their p-value (the pathways are selected because p-value < 0.05). The list is still quite long, so I want to reduce it. For that purpose I have a…

pathway

asked May 26 '17 at 14:06

llrs

4,693
1
18
42

8

votes

2 answers

Converting Ensembl Gene IDs to Entrez Gene IDs through biomart

Well, I'm trying to convert a list of Human Gene referenced by Ensembl Gene IDs to Entrez Gene IDs. I have been advised to use biomart. I tried to get a kind of conversion table for all human genes. I don't know if my settings are wrong, but I…

asked Sep 06 '17 at 10:54

floatingpurr

315
1
2
7

8

votes

1 answer

Interpreting Intergrative Genomic Viewer (IGV)

I was following a tutorial on "Tuxedo Genome Guided Transcriptome Assembly Workshop" and was wondering how to interpret the following: From what I understand from 'Color Legends', the color blue represents something that is below normal and red…

asked Sep 05 '17 at 20:05

AlwaysTrying44

435
2
9

8

votes

1 answer

Convert local alignments to spliced alignments in SAM file

I mapped RNA reads to reference genome, using LAST in split mode, and converted the MAF alignment to SAM with maf-convert. My problem is that the transcripts are not reported in a spliced manner, meaning that a transcript_ID is reported several…

asked Aug 30 '17 at 13:24

aechchiki

2,676
11
34

8

votes

1 answer

Extracting expression data from GSE dataset downloaded from GEO

I have downloaded GSE16146 dataset from GEO using GEOquery R package. I would like to extract "Data table" from downloaded GSE16146. >library("GEOquery") >GSE16146 <- getGEO("GSE16146") >Table(GSE16146)[1:5,] This returns the following error: >…

asked Aug 30 '17 at 07:15

panbar

81
1
4

8

votes

4 answers

Introduce errors in reference transcripts according to external dataset error model

I would like to modify some reference transcripts from Ensembl (D. melanogaster) to introduce a controlled rate of random errors in the sequences. The idea would be to introduce random base substitutions in these sequences, no indels for now,…

asked Aug 22 '17 at 16:39

aechchiki

2,676
11
34

8

votes

2 answers

Which measure should be used in a PCA or RNA-seq data? TPM or counts?

I'm trying to understand the magnitude of batch effects in my RNA-seq samples, and I was wondering which expression units are more suitable to draw a PCA. I'm thinking of either counts or TPM, but things like rlog or vst could work…

rna-seq

asked Aug 18 '17 at 10:13

mgalardini

977
7
18

Most Popular