Facebook's Huge Database

Question

I assume that each person on Facebook is represented as a node (of a Graph) in Facebook, and relationship/friendship between each person(node) is represented as an edge between the involved nodes.

Given that there are millions of people on Facebook, how is the Graph stored?

score 6 · Accepted Answer · answered Aug 04 '14 at 15:59

Strange as it sounds, graphs and graph databases are typically implemented as linked lists. As alluded to here, even the most popular/performant graph database out there (neo4j), is secretly using something akin to a doubly-linked list.

Representing a graph this way has a number of significant benefits, but also a few drawbacks. Firstly, representing a graph this way means that you can do edge-based insertions in near-constant time. Secondly, this means that traversing the graph can happen extremely quickly, if we're only looking to either step up or down a linked list.

The biggest drawback of this though comes from something sometimes called The Justin Bieber Effect, where nodes with a large number of connections tend to be extremely slow to evaluate. Imagine having to traverse a million semi-redundant links every time someone linked to Justin Bieber.

I know that the awesome folks over at Neo4j are working on the second problem, but I'm not sure how they're going about it, or how much success they've had.

score 3 · Answer 2 · answered Aug 03 '14 at 20:21

Having worked with Facebook data a bit (harvested from Facebook users) we stored it just as a pair of values: USER_ID, FRIEND_USER_ID.

But I guess your questions is a bit deeper? You can store it in different ways, depending on your research question. One interesting option is triads for example - http://mypersonality.org/wiki/doku.php?id=list_of_variables_available#triads

score 1 · Answer 3 · answered Aug 04 '14 at 09:48

When I worked with social network data, we stoted the "friendship" relation in a database in the table Friends(friend_a, friend_b, ...) with a B-Tree index on (friend_a, friend_b) plus also some partitioning.

In our case it was a little bit different since the graph was directed, so it wasn't really "friendship", but rather "following/follower" relationship. But for friendship I would just store two edges: both (friend_a, friend_b) and (friend_b, friend_a)

We used MySQL to store the data, if it matters, but I guess it shouldn't.

Facebook's Huge Database

3 Answers3