10

How can I manipulate protein-interaction network graph from the String database using STRINGdb bioconductor package and R?

I have downloaded the whole graph for Homo sapiens from STRING, which has about 20.000 proteins.

  1. How do I read the file using that package?
  2. How do I filter things I don't need? Supposing that I want to keep tumor data, as an example.
Iakov Davydov
  • 2,695
  • 1
  • 13
  • 34
A M
  • 103
  • 7

1 Answers1

4

I think the easiest way is to download the graph using STRINGdb.

library(STRINGdb)
string_db <- STRINGdb$new(version="10", species=9606,
                          score_threshold=400, input_directory="" )
full.graph <- string_db$get_graph()

Now you can use igraph, to manipulate the graph. Let's assume you want to take 200 proteins with the highest degree, i.e. number of edges they have.

library(igraph)

# see how many proteins do you have    
vcount(full.graph)

# find top 200 proteins with the highest degree
top.degree.verticies <- names(tail(sort(degree(full.graph)), 200))

# extract the relevant subgraph
top.subgraph <- induced_subgraph(full.graph, top.degree.verticies)

# count the number of proteins in it
vcount(top.subgraph)

How to get disease specific genes?

There's no GO annotation for cancer or Alzheimer's disease. It is out of scope of the GO consortium.

What you can do, you can either take KEGG Pathways annotation, or manually select list of relevant GO-terms. Or acquire the list from one of the papers. For example annotation term 05200 corresponds to the cancer KEGG pathway. You can easily retrieve proteins associated with the annotation:

cancer.pathway.proteins <-
    string_db$get_term_proteins('05200')$STRING_id

And then perform subgraphing as described above.

Alternatively you can try to get an enrichment score for an every gene given it's neighbors (the way enrichment is shown on the string-db website). Then you can keep only those having top enrichment scores. Probably get_ppi_enrichment_full or get_ppi_enrichment functions will help you to do that.

Iakov Davydov
  • 2,695
  • 1
  • 13
  • 34