4

I assume that each person on Facebook is represented as a node (of a Graph) in Facebook, and relationship/friendship between each person(node) is represented as an edge between the involved nodes.

Given that there are millions of people on Facebook, how is the Graph stored?

nvmme
  • 143
  • 4

3 Answers3

6

Strange as it sounds, graphs and graph databases are typically implemented as linked lists. As alluded to here, even the most popular/performant graph database out there (neo4j), is secretly using something akin to a doubly-linked list.

Representing a graph this way has a number of significant benefits, but also a few drawbacks. Firstly, representing a graph this way means that you can do edge-based insertions in near-constant time. Secondly, this means that traversing the graph can happen extremely quickly, if we're only looking to either step up or down a linked list.

The biggest drawback of this though comes from something sometimes called The Justin Bieber Effect, where nodes with a large number of connections tend to be extremely slow to evaluate. Imagine having to traverse a million semi-redundant links every time someone linked to Justin Bieber.

I know that the awesome folks over at Neo4j are working on the second problem, but I'm not sure how they're going about it, or how much success they've had.

indico
  • 4,219
  • 20
  • 21
3

Having worked with Facebook data a bit (harvested from Facebook users) we stored it just as a pair of values: USER_ID, FRIEND_USER_ID.

But I guess your questions is a bit deeper? You can store it in different ways, depending on your research question. One interesting option is triads for example - http://mypersonality.org/wiki/doku.php?id=list_of_variables_available#triads

LauriK
  • 591
  • 2
  • 6
1

When I worked with social network data, we stoted the "friendship" relation in a database in the table Friends(friend_a, friend_b, ...) with a B-Tree index on (friend_a, friend_b) plus also some partitioning.

In our case it was a little bit different since the graph was directed, so it wasn't really "friendship", but rather "following/follower" relationship. But for friendship I would just store two edges: both (friend_a, friend_b) and (friend_b, friend_a)

We used MySQL to store the data, if it matters, but I guess it shouldn't.

Alexey Grigorev
  • 2,880
  • 1
  • 13
  • 19