Most Popular
1500 questions
8
votes
4 answers
Working with old genome builds
Is working with and relying on old genome builds still valid?
For example NCBI36/hg18. Would results from papers based on old builds require LiftOver and re-analysis to be useful?
A bit of context, this is related to other post, where we have aCGH…
zx8754
- 1,042
- 8
- 22
8
votes
1 answer
Comparison of gene set enrichment statistics
I am performing a gene set enrichment analysis to determine if particular gene sets are coherently up- or down-regulated. I have seen several statistics for computing a p-value of GSEA-style enrichment. I'm particularly interested in the differences…
Nuclear Hoagie
- 246
- 1
- 4
8
votes
2 answers
Comparing two genome annotations
I have one-dimentional array (human genome). Also I have two annotations for it, we can think about them as different peaks (it's nucleosome and secondary structures). How can we find correlation and/or causation between this annotations? Currently…
D M
- 181
- 1
8
votes
1 answer
What's a good ontology for drug names?
I have ...
A database with patient phenotypes in, stored as HPO terms
Genetic data in whatever format I need
I want ...
To store drug names in a way that won't make my life difficult
If it's feasible, to make them so that you could draw links…
Algy Taylor
- 183
- 4
8
votes
3 answers
Have DNA motifs 6-12bp long, trying to get conservation scores
I have about 200 short nucleotide motifs (6-12 bp in length) from the human genome, and I'm trying to see how conserved they are across vertebrates.
I was thinking that I'd need to make a bed file for each motif that lists all of its occurrences in…
Eric Brenner
- 132
- 6
8
votes
3 answers
Least present k-mers in the human genome
What are the least present k-mers in the human genome at different sizes?
Starting with k=4 and going up in size until k=10, what are the k-mers least seen (or not at all) in the human genome? I am only interested in the reference human genome, so I…
719016
- 2,324
- 13
- 19
8
votes
4 answers
How can the cell line contribution be estimated from RNASeq data?
Using a laser-capture microdissection of cells a group of cells stained with the marker of interest was sequenced. In another cohort of patients (this is all human liver tissue) the whole tissue was sequenced (RNA-seq in both cases)
Can I estimate…
llrs
- 4,693
- 1
- 18
- 42
8
votes
2 answers
How to measure or assign hydrophobicity score values to individual amino acids of a PDB structure?
I want measure the hydrophobicity of each amino acid within a PDB structure file. Since I have the PDB file I want to consider the 3D information, rather than sequence-only measures such as GRAVY.
I am not interested in measuring the SASA (solvent…
Aalawlx
- 517
- 4
- 12
8
votes
3 answers
How to select the most representative pathways from a gene enrichment analysis?
I have perform an enrichment analysis to a cluster of genes. The output is a list of pathways and their p-value (the pathways are selected because p-value < 0.05). The list is still quite long, so I want to reduce it. For that purpose I have a…
llrs
- 4,693
- 1
- 18
- 42
8
votes
2 answers
Converting Ensembl Gene IDs to Entrez Gene IDs through biomart
Well, I'm trying to convert a list of Human Gene referenced by Ensembl Gene IDs to Entrez Gene IDs. I have been advised to use biomart.
I tried to get a kind of conversion table for all human genes. I don't know if my settings are wrong, but I…
floatingpurr
- 315
- 1
- 2
- 7
8
votes
1 answer
Interpreting Intergrative Genomic Viewer (IGV)
I was following a tutorial on "Tuxedo Genome Guided Transcriptome Assembly Workshop" and was wondering how to interpret the following:
From what I understand from 'Color Legends', the color blue represents something that is below normal and red…
AlwaysTrying44
- 435
- 2
- 9
8
votes
1 answer
Convert local alignments to spliced alignments in SAM file
I mapped RNA reads to reference genome, using LAST in split mode, and converted the MAF alignment to SAM with maf-convert.
My problem is that the transcripts are not reported in a spliced manner, meaning that a transcript_ID is reported several…
aechchiki
- 2,676
- 11
- 34
8
votes
1 answer
Extracting expression data from GSE dataset downloaded from GEO
I have downloaded GSE16146 dataset from GEO using GEOquery R package.
I would like to extract "Data table" from downloaded GSE16146.
>library("GEOquery")
>GSE16146 <- getGEO("GSE16146")
>Table(GSE16146)[1:5,]
This returns the following error:
>…
panbar
- 81
- 1
- 4
8
votes
4 answers
Introduce errors in reference transcripts according to external dataset error model
I would like to modify some reference transcripts from Ensembl (D. melanogaster) to introduce a controlled rate of random errors in the sequences. The idea would be to introduce random base substitutions in these sequences, no indels for now,…
aechchiki
- 2,676
- 11
- 34
8
votes
2 answers
Which measure should be used in a PCA or RNA-seq data? TPM or counts?
I'm trying to understand the magnitude of batch effects in my RNA-seq samples, and I was wondering which expression units are more suitable to draw a PCA. I'm thinking of either counts or TPM, but things like rlog or vst could work…
mgalardini
- 977
- 7
- 18