4

I have a list of about 50k 'Protein IDs' from Reactome. Is there a simple way to get all the corresponding 'Pathway IDs' for each protein? What is the best service to use? (I'm guessing I can use the Reactome API, but I don't necessarily want to hit that 50k times...).

Protein IDs from reactome look like this:

R-HSA-49155
R-HSA-199420

The corresponding Reactome 'Pathway IDs' for those Protein IDs would be:

R-HSA-49155  R-HSA-110331
R-HSA-49155  R-HSA-110330
R-HSA-49155  R-HSA-110357
R-HSA-199420 R-HSA-1660499
R-HSA-199420 R-HSA-202424
R-HSA-199420 R-HSA-199418
Dan
  • 612
  • 3
  • 12
  • 3
    Please [edit] your question and give us a few IDs as examples we can use to test our approaches. And why don't you want to use the API? This sort of thing is precisely what APIs are for. You can always limit it to only N requests per minute or whatever they need. – terdon Jul 06 '17 at 09:09
  • Yes there is a way in R to do this with the "reactome.db" library, but it is unclear from your question what these protein IDs are, and what organism you use. You'll probably have to convert you protein IDs to eg first, if you use human data you'll need for example "org.Hs.eg.db" library. – benn Jul 06 '17 at 10:47
  • 3
    I must say that I tried to answer this question using reactome.db of Bioconductor and I couldn't. The current version (1.59.1) doesn't provide the reactome ID for a given ENTREZ ID, only the pathways IDs. I submitted a question to the support site – llrs Jul 07 '17 at 09:43
  • 1
    This should be an easy task for biomaRt in R, hoewever, I found out that they use the reactome_gene IDs as if they were pathways IDs. So it doesn't work properly. – benn Jul 11 '17 at 10:24
  • Thanks for trying... I guess the API is the way to go? I just don't like issuing 50k web-requests, although I guess it isn't really a problem. Just created a bounty for the lulz. – Dan Jul 11 '17 at 15:29

2 Answers2

6

If you don't mind hitting it 50k times and are OK with python3...

from urllib import request
import json

def getPathways(proteinID):
    baseURL = 'http://reactome.org/ContentService/data/query'
    PathwayIDs = set()
    try:
        response = request.urlopen('{}/{}'.format(baseURL, proteinID)).read().decode()
        data = json.loads(response)
        if 'consumedByEvent' in data:
            for event in data['consumedByEvent']:
                PathwayIDs.add(event['stId'])
        if 'producedByEvent' in data:
            for event in data['producedByEvent']:
                PathwayIDs.add(event['stId'])
    except:
        pass
    return PathwayIDs

Usage would then be something like:

l = ['R-HSA-49155', 'R-HSA-199420', '']
for rid in l:
    ids = getPathways(rid)
    for _ in ids:
        print("{}\t{}".format(rid, _))

Which would produce:

R-HSA-49155 R-HSA-110239
R-HSA-49155 R-HSA-110240
R-HSA-49155 R-HSA-110238
R-HSA-49155 R-HSA-110356
R-HSA-199420    R-HSA-8948800
R-HSA-199420    R-HSA-6807106
R-HSA-199420    R-HSA-6807206
R-HSA-199420    R-HSA-6807126
R-HSA-199420    R-HSA-8847968
R-HSA-199420    R-HSA-8850997
R-HSA-199420    R-HSA-2321904
R-HSA-199420    R-HSA-8948775
R-HSA-199420    R-HSA-8944497
R-HSA-199420    R-HSA-6807134
R-HSA-199420    R-HSA-8850945
R-HSA-199420    R-HSA-8873946

Note that this will silently ignore invalid or missing IDs such as '' (that's the try and except above. Note also that these are different pathway IDs than what you provided in your example. The main reason is that the protein IDs you showed are not always involved in the pathways IDs you showed (in my example, they always are).

Devon Ryan
  • 19,602
  • 2
  • 29
  • 60
  • This looks great, but for some reason fails here: http://plantreactome.gramene.org/ContentService/ when I query with this ID: R-OPU-1129769. However, if I use that ID I do get lists of pathways here: http://plantreactome.gramene.org/ContentService/data/pathways/low/entity/R-OPU-1129769?speciesId=9007655

    Sorry for using human reactome as an example when I wanted data from plant reactome (I assumed it would work equally well in both places).

    – Dan Jul 19 '17 at 08:49
3

On Reactome website they have at the download page mapping files (uniprot, ensembl, etc.), but unfortunately not for the protein IDs you are using (stable identifiers).

I had contact with their helpdesk, and they sent me a file containing all protein IDs to the pathways. Exactly what you need. I have asked them if they wanted to put it on their download page as well, but not sure if they want to do this. Meanwhile you can ask for the file as well, or get it from my google drive.

I assume you know how to get your Protein IDs and pathways from this file, using e.g., R?

benn
  • 3,571
  • 9
  • 28