3

As the question states, I am interested in an analysis similar to Gene Set Enrichment Analysis (ranked gene sets) but focused on locus-level data instead of genes.

To explain in greater detail: I have a set of genomic coordinates from DNA methylation data that are scored based on their contribution to components from a non-negative matrix factorization. I am interested in better understanding the biological underpinnings of what these NMF components represent. I know I can use Locus Overlap Analysis tools like GREAT or LOLA to tell me what GO/Functions are enriched in a particular BED file, but this type of analysis doesn't take into account the scores/rank of the loci in the BED file as would be done in an analysis akin to GSEA.

Does anyone know of any tools/methods for analyzing functional enrichment while taking advantage of feature/locus ranks/scores with locus-level data?

I know I probably didn't explain this very clearly, so please let me know if I can clarify anything for you. Thanks

Reilstein
  • 367
  • 1
  • 14

2 Answers2

2

I tinkered with a program GSEA-SNP quite a few years back, which claims that it does a similar ranking procedure with SNPs. It carries out its procedure by first linking SNPs with genes, then running an algorithm similar to GSEA. There's a bit more detail in the paper.

Unfortunately this is a space I've got a bit further away from in recent years, so I don't know if anyone has carried out the legwork to generalise the algorithm for sets of any type where each item within each set fits a continuous (but not necessarily normal) distribution.

gringer
  • 14,012
  • 5
  • 23
  • 79
  • This is a step in the right direction and better than I have been able to find, so thank you! I haven't been able to find a non GWAS version of these tools, but these versions might work for my application. Thanks! – Reilstein Jan 05 '18 at 19:06
0

You can use GSEA-like methods for any type of data. The basis of the test is to analyze if there is a preference for a group (a gene set) in the ranking of the other variable.

So you first need to find where the functions you are interested are in the genome (To associate a range/position with a given function). Then using this association you can test if the ranking of the locus is associated with a specific function that is in a given area.

I haven't seen anything like that implemented but you could do it yourself using biomaRt to retrieve data from where a gene (or a function) is located and fgsea to test which functions are associated with your score.

llrs
  • 4,693
  • 1
  • 18
  • 42