Most Popular

1500 questions
8
votes
4 answers

Working with old genome builds

Is working with and relying on old genome builds still valid? For example NCBI36/hg18. Would results from papers based on old builds require LiftOver and re-analysis to be useful? A bit of context, this is related to other post, where we have aCGH…
zx8754
  • 1,042
  • 8
  • 22
8
votes
1 answer

Comparison of gene set enrichment statistics

I am performing a gene set enrichment analysis to determine if particular gene sets are coherently up- or down-regulated. I have seen several statistics for computing a p-value of GSEA-style enrichment. I'm particularly interested in the differences…
Nuclear Hoagie
  • 246
  • 1
  • 4
8
votes
2 answers

Comparing two genome annotations

I have one-dimentional array (human genome). Also I have two annotations for it, we can think about them as different peaks (it's nucleosome and secondary structures). How can we find correlation and/or causation between this annotations? Currently…
D M
  • 181
  • 1
8
votes
1 answer

What's a good ontology for drug names?

I have ... A database with patient phenotypes in, stored as HPO terms Genetic data in whatever format I need I want ... To store drug names in a way that won't make my life difficult If it's feasible, to make them so that you could draw links…
Algy Taylor
  • 183
  • 4
8
votes
3 answers

Have DNA motifs 6-12bp long, trying to get conservation scores

I have about 200 short nucleotide motifs (6-12 bp in length) from the human genome, and I'm trying to see how conserved they are across vertebrates. I was thinking that I'd need to make a bed file for each motif that lists all of its occurrences in…
Eric Brenner
  • 132
  • 6
8
votes
3 answers

Least present k-mers in the human genome

What are the least present k-mers in the human genome at different sizes? Starting with k=4 and going up in size until k=10, what are the k-mers least seen (or not at all) in the human genome? I am only interested in the reference human genome, so I…
719016
  • 2,324
  • 13
  • 19
8
votes
4 answers

How can the cell line contribution be estimated from RNASeq data?

Using a laser-capture microdissection of cells a group of cells stained with the marker of interest was sequenced. In another cohort of patients (this is all human liver tissue) the whole tissue was sequenced (RNA-seq in both cases) Can I estimate…
llrs
  • 4,693
  • 1
  • 18
  • 42
8
votes
2 answers

How to measure or assign hydrophobicity score values to individual amino acids of a PDB structure?

I want measure the hydrophobicity of each amino acid within a PDB structure file. Since I have the PDB file I want to consider the 3D information, rather than sequence-only measures such as GRAVY. I am not interested in measuring the SASA (solvent…
Aalawlx
  • 517
  • 4
  • 12
8
votes
3 answers

How to select the most representative pathways from a gene enrichment analysis?

I have perform an enrichment analysis to a cluster of genes. The output is a list of pathways and their p-value (the pathways are selected because p-value < 0.05). The list is still quite long, so I want to reduce it. For that purpose I have a…
llrs
  • 4,693
  • 1
  • 18
  • 42
8
votes
2 answers

Converting Ensembl Gene IDs to Entrez Gene IDs through biomart

Well, I'm trying to convert a list of Human Gene referenced by Ensembl Gene IDs to Entrez Gene IDs. I have been advised to use biomart. I tried to get a kind of conversion table for all human genes. I don't know if my settings are wrong, but I…
floatingpurr
  • 315
  • 1
  • 2
  • 7
8
votes
1 answer

Interpreting Intergrative Genomic Viewer (IGV)

I was following a tutorial on "Tuxedo Genome Guided Transcriptome Assembly Workshop" and was wondering how to interpret the following: From what I understand from 'Color Legends', the color blue represents something that is below normal and red…
AlwaysTrying44
  • 435
  • 2
  • 9
8
votes
1 answer

Convert local alignments to spliced alignments in SAM file

I mapped RNA reads to reference genome, using LAST in split mode, and converted the MAF alignment to SAM with maf-convert. My problem is that the transcripts are not reported in a spliced manner, meaning that a transcript_ID is reported several…
aechchiki
  • 2,676
  • 11
  • 34
8
votes
1 answer

Extracting expression data from GSE dataset downloaded from GEO

I have downloaded GSE16146 dataset from GEO using GEOquery R package. I would like to extract "Data table" from downloaded GSE16146. >library("GEOquery") >GSE16146 <- getGEO("GSE16146") >Table(GSE16146)[1:5,] This returns the following error: >…
panbar
  • 81
  • 1
  • 4
8
votes
4 answers

Introduce errors in reference transcripts according to external dataset error model

I would like to modify some reference transcripts from Ensembl (D. melanogaster) to introduce a controlled rate of random errors in the sequences. The idea would be to introduce random base substitutions in these sequences, no indels for now,…
aechchiki
  • 2,676
  • 11
  • 34
8
votes
2 answers

Which measure should be used in a PCA or RNA-seq data? TPM or counts?

I'm trying to understand the magnitude of batch effects in my RNA-seq samples, and I was wondering which expression units are more suitable to draw a PCA. I'm thinking of either counts or TPM, but things like rlog or vst could work…
mgalardini
  • 977
  • 7
  • 18