1

I'm having a bit of confusion about how can I retrieve all the genes that are associated to a specific gene ontology.

Lets say I need to find all the genes that are associated with this GO entry:

GO:0030098 lymphocyte differentiation

This is the graph representing the parents of my GO. In the same page I find the Child Terms that represent the children of my term.

Now, I can download the goa file (related to the organism I'm interested into) containing the GO associations and grep my term to extract a list of associated genes but, as far as I can understand, this list does not represent the whole graph that actually describe my GO term.

Should I also get the genes in the childred terms of my term? Or in the parents terms? If yes, then again I have to go back in the graph and find the parents/children of the found parents/children and so on, till the whole graph is visited.

Is this the correct approach? Or should I only focus on those genes related to my GO:0030098, and that's it?

I found this: Retrieving a list of human genes having GO associations and this but I think they don't address my question since they do not describe the parent-children association.

Also I found this, on the GO FAQ, but what I need to do is quite the opposite.

Summarising: I need to find the genes associated to a specific GO term. Should I also look for its parents or children or not?

Any clues?

llrs
  • 4,693
  • 1
  • 18
  • 42
gabt
  • 348
  • 2
  • 13

1 Answers1

4

You can use BioMart to filter by your GO term and get the genes as attributes. BioMart is ontology-aware so will pull out all genes associated with your term and with any of its child terms.

There is no need to look up the GO child terms as BioMart already deals with this. A term may have more than one parent term, but if it's associated with a gene then everything about that term is associated with the gene. The term is not associated with the gene via parent X or parent Y, it means that both are relevant. So if the child term is associated with the gene, then the parent term is by proxy.

If you're not already familiar with BioMart, there's a help video.

Here's a BioMart query that gets you exactly what you need in human using the web interface

If you want to work with the R interface to biomaRt, here's some documentation and this is the identical query to above:

library(biomaRt)
ensembl=useEnsembl(biomart="ensembl", dataset='hsapiens_gene_ensembl')
getBM(attributes = c('ensembl_gene_id','external_gene_name', 'go_parent_term', 'name_1006'), 
      filters = 'go', 
      values = 'GO:0030098', 
      mart = ensembl)
Emily_Ensembl
  • 1,769
  • 7
  • 9
  • ok, so as you're suggesting what I should look, in GO, is for the children of my GO term. But a child of my term can contain genes from other parents, isn't it?

    according to this link http://www.geneontology.org/page/ontology-structure it seems that some kind of association may appear between different branches.

    – gabt Nov 08 '18 at 09:33
  • 1
    No, don't worry about looking in GO. Just put the one term you have into BioMart. It will find all genes associated with your term and all genes associated with its children.

    A term may have more than one parent term, but if it's associated with a gene then everything about that term is associated with the gene. The term is not associated with the gene via parent X or parent Y, it means that both are relevant. So if the child term is associated with the gene, then the parent term is by proxy.

    – Emily_Ensembl Nov 08 '18 at 10:11
  • @Emily_Ensembl please [edit] the comments to your answer (the comments might be deleted and there are very useful comments). Thanks for answering this – llrs Nov 08 '18 at 10:27
  • Why would the comments be deleted? – Emily_Ensembl Nov 08 '18 at 10:28
  • Emily, one more detail. The example query you sent, is equivalent to this piece of code (changing the mart and the GO entry)? http://127.0.0.1:19280/library/biomaRt/doc/biomaRt.html#retrieve-all-entrezgene-identifiers-and-hugo-gene-symbols-of-genes-which-have-a-map-kinase-activity-go-term-associated-with-it. – gabt Nov 08 '18 at 11:07
  • I don't have permission to connect to that. Can you send it as a proper URL. – Emily_Ensembl Nov 08 '18 at 11:10
  • @Emily_Ensembl Because in Stackexchange the are for clarifications and improvements of questions and answers, and as such the content is assumed to be added to the relevant question or answer. Also they are considered "second class citizens"... – llrs Nov 08 '18 at 11:11
  • Emily, if you click on it then it does not work, if you copy-paste in a new tab, definitely it will. Otherwise, from the page that it opens, I'm focusing on the section 4.6 Retrieve all entrezgene identifiers and HUGO gene symbols of genes which have a “MAP kinase activity” GO term associated with it. – gabt Nov 08 '18 at 11:18
  • @gabrielet the first section of your link (http://127.0.0.1:19280) is for a local port of your machine! We can't access your machine, so we cannot see the code you are mentioning – llrs Nov 08 '18 at 11:20
  • ops! sorry Llopis and Emily, you're right. This should work https://bioconductor.org/packages/release/bioc/vignettes/biomaRt/inst/doc/biomaRt.html#retrieve-all-entrezgene-identifiers-and-hugo-gene-symbols-of-genes-which-have-a-map-kinase-activity-go-term-associated-with-it. – gabt Nov 08 '18 at 11:40
  • Yes, that's it. But change the attributes to whatever you want to see in your results table. – Emily_Ensembl Nov 08 '18 at 11:44