20

Let $G$ be an unweighted undirected graph with $n$ vertices and $m$ edges. Is it possible to preprocess $G$ and produce a data structure of size $m \cdot \mathrm{polylog}(n)$ so that it can answer queries of the form "distance between $u$ and $v$" in time O(n)?

The problem seems too basic to be unsolved.

ilyaraz
  • 1,569
  • 18
  • 33
  • 1
    In response to your final remark about “too basic to be unsolved”: If the query must be answered in a constant time, it is indeed well-studied. But the point of your question is that you allow O(n) time for a query (instead of O(1) or trivial O(m)). Although I think that it is an interesting question, I do not think that the question is of fundamental importance. – Tsuyoshi Ito Oct 23 '11 at 21:41
  • 1
    @TsuyoshiIto -- I don't see why the question loses its "fundamental importance" if it allows super-constant but sub-linear query time. For example, suppose I can solve the above problem with the constraint that distance queries can be answered in $O(n^{1-\varepsilon})$ time for some $\varepsilon > 0$, and the processing time is at most $O(n^{3 - \varepsilon})$ -- wouldn't this give me a sub-cubic algorithm for computing shortest paths? I personally think that this is a very interesting question. – Rachit Dec 14 '11 at 16:48
  • I don't know there is a general way or not, but there is a good way in bounded treewidth graphs, see Shortest Path query in bounded treewidth graphs – Saeed Dec 26 '11 at 11:44
  • Also if $m = \Omega(n^2)$ you can create a shortest path trees (starting from each node) and then answer shortest path query (by related root) in $O(n)$, or by similar way may be you can construct a data structure with less memory size. – Saeed Dec 26 '11 at 11:52

2 Answers2

9

This is a very interesting question. At a high level, you are asking whether one can preprocess a graph such that shortest path queries become independent of the density of the graph, without using much extra space -- interesting, but as you say, unresolved.

If you are happy with approximate distances, here is a way to get a $2$-approximation. Let $G$ be a weighted undirected graph with $n$ nodes and $m$ edges. It is shown in the following paper that for approximate distance queries, designing data structures for graphs with $m$ edges is no harder than graphs in which each node has degree bounded by $m/n$:

R. Agarwal, P. B. Godfrey, S. Har-Peled, Approximate distance queries and compact routing in sparse graphs, INFOCOM 2011

So, assume that $G$ is a $m/n$-degree bounded graph. Sample $\alpha = O(m/n)$ nodes uniform randomly; call these landmark nodes. During the preprocessing phase, store the distance from each landmark node to each other node in the graph; this requires $O(m)$ space. For each node $u$, store its closest landmark node $\ell(u)$. Also, store the graph within the data structure, say as an adjacency list.

When queried for distance between $u$ and $v$, grow balls around both nodes -- ball of node $w$ is defined as the set of nodes that are strictly closer to $w$ than to its closest landmark node, say $\ell(w)$. It can be shown that the size of each ball is $O(n^2/m)$, in expectation. Let $\Gamma(u) = B(u) \cup N(B(u))$, where $B(u)$ is the ball of node $u$ and $N(B(u))$ is the set of neighbors of nodes in $B(u)$. It can be shown that the size of $\Gamma(u)$ is $O(n)$, in expectation.

Answering the query: if $\Gamma(u) \cap \Gamma(v) \neq \emptyset$, return $\min_{x \in \Gamma(u) \cap \Gamma(v)} \{d(u, x) + d(v, x)\}$; else if $d(u, \ell(u)) \leq d(v, \ell(v))$, return $d(u, \ell(u)) + d(\ell(u), v)$; else return $d(v, \ell(v)) + d(\ell(v), u)$. It is easy to show that this is a $2$-approximation.

In terms of query time, note that growing balls takes $O(n)$ time for a $m/n$-degree bounded graph; constructing $\Gamma(u)$ and $\Gamma(v)$ given respective balls takes $O(n)$ time (since neighbors are stored within the data structure); and checking whether $\Gamma(u) \cap \Gamma(v)$ is empty or not also takes $O(n)$ time.

The above bounds are in expectation; I think it is easy to derandomize the construction. Unfortunately, this technique does not seem to allow getting approximation better than $2$. Its a very interesting question though ....

Rachit
  • 838
  • 6
  • 17
  • The above technique does not exploit the fact that your input graph is unweighted; may be there is something interesting that you can do by exploiting that fact but I can't think of a way to retrieve exact distances. For instance, it is possible to reduce the query time to $O(n^2/m)$ and get distance bounded by $2d+1$, where $d$ is the exact distance between $u$ and $v$. – Rachit Dec 22 '11 at 10:12
  • I realized that I don't understand why it is a 2-approximation. Thorup-Zwick in the same situations gives 3-approx. – ilyaraz Jan 19 '12 at 01:04
  • @ilyaraz: Note that Thorup-Zwick scheme does not require to check $\Gamma(u) \cap \Gamma(v)$ (and hence, answers queries in almost constant time). See the paper that I mentioned above for stretch $2$ proof. – Rachit Jan 19 '12 at 05:41
4

What you need is called a "distance oracle". Unfortunately, I am not very familiar with distance oracles, so I can only refer you to the seminal paper due to Thorup and Zwick:

Mikkel Thorup and Uri Zwick. Approximate distance oracles. STOC '01, 2001.

Here is an excerpt from the abstract:

Let $G = (V, E)$ be an undirected weighted graph with $|V| = n$ and $|E| = m$. Let $k$ be an integer. We show that $G = (V, E)$ can be preprocessed in $O(kmn^{1/k})$ expected time, constructing a data structure of size $O(kn^{1+1/k})$, such that any subsequent distance query can be answered, approximately, in $O(k)$ time. The approximate distance returned is of stretch at most $2k - 1$, i.e., the quotient obtained by dividing the estimated distance by the actual distance lies between 1 and $2k - 1$. [...] The space requirement of our algorithm is [...] essentially optimal.

According to their results, what you request is basically doable even for weighted graphs: choosing $k=1$ yields a distance oracle of size $O(n^2)$ obtained in expected time $O(mn)$, which can answer your shortest path queries with $1$-stretch in $O(1)$ time!

Distance oracles is a very well researched field, so you will be able to dig further I believe.

Gabor Retvari
  • 367
  • 1
  • 9
  • 3
    Journal version: JACM 2005. – Tsuyoshi Ito Oct 23 '11 at 20:48
  • 3
    Using $O(n^2)$ space one can naively store a look-up table. So, this paper (which I was aware of) is irrelevant here. – ilyaraz Oct 23 '11 at 21:11
  • 1
    Fair enough. The result most close to what you request AFAIK is for graphs with average degree $\Theta(\log n)$ that gives stretch-2 paths with $O(n^{3/2})$ space in $O(\sqrt{n})$ query time. (R. Agarwal, P. Godfrey, and S. Har-Peled, "Approximate Distance Queries and Compact Routing in Sparse Graphs", INFOCOM 2011.) – Gabor Retvari Oct 23 '11 at 22:00
  • Using Bourgain's embedding of metrics into $\ell_1$, one can come up with an oracle with space $O(n \log^2 n)$, query time $O(\log^2 n)$ and multiplicative error $O(\log n)$. – ilyaraz Oct 24 '11 at 05:49