4

I need a command in R to retrieve all human genes associated with a Gene Ontology entry. I tried to look for it online but did not find it.

Daniel Standage
  • 5,080
  • 15
  • 50
  • @DanielStandage I am not sure this question is about the human genome. – llrs Sep 20 '17 at 07:06
  • @Llopis When he says "retrieve all human genes" it definitely makes me think of the human genome. :-) – Daniel Standage Sep 20 '17 at 16:15
  • @Daniel yes, but the question is not about the human genome. Or should we also use gene as tag here ? Maybe we use different definitions of the tag – llrs Sep 20 '17 at 16:18

2 Answers2

6

Here's an example for the mouse genome:

library(org.Mm.eg.db)
select(org.Mm.eg.db, c("GO:0048406"), c("GENENAME","SYMBOL"), c("GO"))

You get output like:

                                                    GENENAME SYMBOL
1                                     pregnancy zone protein    Pzp
2 nerve growth factor receptor (TNFR superfamily, member 16)   Ngfr
3 nerve growth factor receptor (TNFR superfamily, member 16)   Ngfr
4                                             neurotrophin 3   Ntf3
5             neurotrophic tyrosine kinase, receptor, type 1  Ntrk1
6            furin (paired basic amino acid cleaving enzyme)  Furin
7              proprotein convertase subtilisin/kexin type 6  Pcsk6
8                                                 sortilin 1  Sort1
9                                      alpha-2-macroglobulin    A2m

Edit: Given your comment that you want instead a list of all genes with a GO term associated with them (for human):

library(org.Hs.eg.db) # Install it from bioconductor yourself
unlist(as.list(org.Hs.egSYMBOL)[mappedkeys(org.Hs.egGO)])

The mappedkey() part gets the index into the bimap of all entries with a valid GO mapping. That's then used to subset the symbol bimap (after converting to a list).

Devon Ryan
  • 19,602
  • 2
  • 29
  • 60
6

I guess the following code will help,

source("https://bioconductor.org/biocLite.R")
biocLite("biomaRt")
library("biomaRt")
ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
#listAttributes(ensembl)
mapping <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol", "go_id"), mart = ensembl)
head(mapping)
arup
  • 604
  • 5
  • 15