18

Suppose $T$ is an constant-degree tree whose structure we do not know. The problem is to output the tree $T$ by asking queries of the form: "Does the node $x$ lie on the path from node $a$ to node $b$?". Assume that each query can be answered in constant time by an oracle. We know the value of $n$, the number of nodes in the tree. The objective is to minimize the time taken to output the tree in terms of $n$.

Does there exist an $o(n^2)$ algorithm for the above problem?

Assume that the degree of any node in $T$ is at most 3.


What I know

Bounded diameter case is easy. If the diameter of the tree is $D$, then we can get a divide-and-conquer algorithm:

Any binary tree has a good separator that divides the tree into components of size no less than 1/3n.

  1. Pick any vertex x. If it is a good separator label that and recurse.
  2. Find all the 3 neighbors of x.
  3. Move in the direction of the neighbor which has the largest number of nodes. Repeat Step 2 with the neighbor.

Since finding the separator takes at most $D$ steps, we get a $O(nD\log n)$ algorithm.

An $O(n\;\log^2 n)$ randomized algorithm. (moved from comments below)

Pick two vertices x and y randomly. With 1/9 probability they will lie on the opposite sides of a separator. Pick the middle node of the path from $x$ to $y$. See if it is a separator, if not do binary search.

It takes $O(n\;\log n)$ expected time to find the separator. So we get a $O(n\;\log^2 n)$ randomized algorithm.


Background. I learnt about this problem from a friend who works in probabilistic graphical models. The above problem roughly corresponds to learning the structure of a junction tree using an oracle which, given three random variables X,Y and Z, can tell the value of mutual information between X and Y given the value of Z. If the value is close to zero, we may assume that Z lies on the path from X to Y.

Jagadish
  • 1,955
  • 21
  • 27

2 Answers2

5

No. The following simple adversary strategy implies that any algorithm to reconstruct an $n$-node tree requires at least $\binom{n-1}{2} = n(n-1)/2$ "betweenness" queries.

Arbitrarily label the nodes $0,1,2,\dots,n-1$. The adversary answers all queries as though the tree is a star with vertex $0$ in the center; think of $0$ as the root and the other nodes as its children.

Between?(a,x,b)
    if x=0 return TRUE else return FALSE

Now suppose the algorithm halts after performing less than $n(n-1)/2$ queries. Then there must be two vertices $y$ and $z$, neither equal to zero, such that the algorithm has not queried any permutation of the triple $(0,y,z)$. If the algorithm claims that the tree is not a star with center $0$, the adversary reveals its input, proving the algorithm wrong. The adversary then reveals that $x$ is actually the only child of $y$, proving the algorithm wrong again.

Update: Oops, just noticed the degree constraint. Fortunately, this is not a major hurdle. Replace node $0$ with your favorite binary tree, with the other $n-1$ nodes as leaves in some unknown order, and then reveal this subtree to the reconstruction algorithm. Reconstructing the resulting $(2n-3)$-node binary tree still requires at least $n(n-1)/2$ queries. Equivalently, reconstructing an $m$-node binary tree requires at least $(m+3)(m+2)/8$ queries. (I'm sure a more subtle construction would improve the constant.) As Jagadish points out, this generalization doesn't work; queries about internal nodes in the tree impose an ordering on the leaves, which reduced the number of necessary queries.

Jeffε
  • 23,129
  • 10
  • 96
  • 163
  • My question is about constant-degree trees. This argument doesn't work for that case, right? – Jagadish Mar 19 '12 at 14:53
  • @Jagadish: Yes, the argument still works; see my update – Jeffε Mar 19 '12 at 14:56
  • @Jeffe I don't understand the modification. Is the longest diameter among the graphs which the adversary keeps O(log n) then? – Jagadish Mar 19 '12 at 15:11
  • Yes. To put it more simply: The adversary has either a complete binary tree with depth $\lg n$, or a complete binary tree with one leaf removed and reinserted as a child of another leaf. – Jeffε Mar 19 '12 at 15:19
  • @Jeffe You argument seems to imply Omega(n^2) bound for a randomized algorithm too...but there is an O(n log n) randomized algorithm for this case. – Jagadish Mar 19 '12 at 15:21
  • If the diameter of the tree is D, then I have an algorithm that runs in O(nD log n), so your argument doesn't hold. In the star-graph case, the adversary had n^2 graphs to start with but in the modification that's not the case, right? – Jagadish Mar 19 '12 at 15:29
  • 2
    @Jagadish: (1) I do not think that this proof of a lower bound works for randomized algorithms. The adversary can still construct a failing example, but that does not contradict the hypothesis that the randomized algorithm works correctly with high probability. (2) By the way, it seems that you asked the question knowing the answer. What did you do that for? – Tsuyoshi Ito Mar 19 '12 at 15:37
  • @Jagadish: Oops, you're right; my argument for constant-degree trees is broken. Queries about triples (a,x,b) where x is in the replacement tree reveal information about the left-to-right ordering of the leaves. – Jeffε Mar 19 '12 at 15:51
  • @Jagadish: Do you have a reference for the randomized O(n log n)-time algorithm? (Is that only for bounded-degree trees?) – Jeffε Mar 19 '12 at 15:53
  • @JɛffE I'm not sure about $O(n \log n)$, but something like $O(n^{\log_2{3}})$ is possible (for bounded degree). Pick each vertex independently with probability 1/2, construct an induced tree recursively, then use its center as "pivot" to split the tree into 2 almost equal parts and construct them recursively (the split can be done in linear time with some technicality, if I am not mistaken). P.S. I mean, that's obviously $o(n^2)$ so the question sounds strange. +1 to JɛffE's comment. – Dmytro Korduban Mar 19 '12 at 16:12
  • @JɛffE Pick two vertices randomly. With nearly 1/8 probability they will be leaves (since it is a complete binary tree with extra layer of leaves). So the median of the path connecting them will be the root. It takes O(n) expected time to label the root after which we recurse. So we get a O(n log n) randomized algorithm for this case. – Jagadish Mar 19 '12 at 18:38
  • @Tsuyoshi I meant a randomized algorithm for the case when we know the tree to be an almost complete binary tree (JɛffE's construction). Sorry if my earlier comment looked like I knew a O(n log n) randomized algorithm for the actual problem. – Jagadish Mar 19 '12 at 18:54
  • 2
    I see. Thanks for the explanation, and also thanks for editing the question! – Tsuyoshi Ito Mar 19 '12 at 19:41
  • @JɛffE I think the above randomized algorithm can be modified to work for binary trees: Pick two vertices $x$ and $y$ randomly. With 1/9 probability they will lie on opposite sides of a separator. Pick the middle node of the path from $x$ to $y$. See if it is a good separator, if not do binary search. This takes $O(n \log n)$ expected time to find the separator. So we get a $O(n;\log^2 n)$ randomized algorithm. Anyway, my question is about the deterministic case, so this probably doesn't say much.. – Jagadish Mar 20 '12 at 06:32
  • 4
    If you have a randomized algorithm, then you have an algorithm. Determinism is overrated. – Jeffε Mar 20 '12 at 08:54
  • I agree. But I'm unable to find an explicit deterministic algorithm. – Jagadish Mar 20 '12 at 13:34
  • 1
    This problem reminds me of sorting/matching nuts and bolts. A randomized algorithm that runs in $O(n \log n)$ time with high probability is easy — it's just randomized quicksort. There is a deterministic $O(n\log n)$-time algorithm, but it is seriously nontrivial. – Jeffε Mar 21 '12 at 08:28
  • The previous work mentioned in that paper looks more readable. In Alon et.al's paper (which achieves $O(n;\log^4 n)$-time) this sentence caught my eye: "The algorithm constructs `deterministic samples' by applying expander graphs in an interesting way." Finding the median nut is definitely harder than finding the median of $n$ numbers. So I'm wondering if there is a paper that explains how to find the median of $n$ numbers in $o(n;\log n)$ time using expander graphs. – Jagadish Mar 22 '12 at 18:54
5

Anindya Sen and I have a paper in ALT '13 where we give an $\tilde O(n \sqrt{n})$ algorithm for this problem. We don't know if a better algorithm is possible.

Jagadish
  • 1,955
  • 21
  • 27