Confusing running time analysis for the Divide & Conquer algorithm of Hamiltonian Path problem

Question

In the Hamiltonian Path problem we are given a graph $G=(V,E)$ and two distinct vertices $\{s,t\}$ and we ask if there is a path from $s$ to $t$ which traverses all other vertices exactly once. Clearly, the brute-force algorithm enumerates all possible vertex orderings and runs in $O(n!)$.

There is a elegant D&C (divide & conquer) style algorithm in Gurevich1987 which is stated to run in $O(2^{2n}\times2^{O(l)})$ with $l=\log_2 n$. But after checking several times the proof and performing an analysis in a different way, I would believe that the correct running time is rather $O(2^{2n}\times n^{O(l)})$. Interestingly, in that paper, this expression also appeared once, perhaps it's a typo, but it makes things really confusing for me.

In the following, I first provide some pointers then put the main idea of D&C and my own analysis.

The paper: expected computation time for hamiltonian path problem
The algorithm is called HPA3 on page 499 and 500.
P499, 3rd line from the bottom: ...run-time...$2^{2n}$ times a polynomial in $n$
P500, point (b), $T(n)=2^{2n}\times n^{O(l)}$ (which is a contradiction)
P500, ...choose a $c$ such that $T(n)=2^{2n+cl}$....

Main idea of D&C Very simple, guess the middle vertex (note $m$) in the path $(s,t)$, then try all possible subset $A\subset V$, $|A|=(n-3)/2$, assuming the the path is somehow like $s,A,m,(V-A),t$. This divides the input instance into two half-size instances, and it yields a recurrence relation: $T(n)\leq (n-2)2^{n-3}T((n-3)/2)$, with $b$ some unknown constant.

My analysis To be simpler, note $n$ as the number of vertices except $\{s,t\}$. Then $T(n)\leq n2^nT(n/2)\\ = 2^{n(1+1/2+1/4...1/n)}\times (n*n/2*n/4*...*1)\\ \leq 2^{2n}\times n^{\log n} \ (2^{1+2+...+\log n}) \\ = 2^{2n}\times n^{(\log n)/2}$

The author uses induction to prove, but for me it's wrong. Could anyone make a judgement?

[edit 07/19]: I realized that asymptotically $n^{\log n}<2^n$ but the lhs is greater than a polynomial in $n$ anyway no?

[edit 07/20]:as the comments and answers stated, we can basically conclude that there exists some issue in that paper and the correct complexity is $O(4^nn^{log n})$. However this can be also viewed as "$O((4\times\epsilon)^n)$ for any $\epsilon>1$", so let's say the complexity is infinitely close to $4^n$ from above.

P.S. for any constant positive $c>1$, $n^{\log n}=c^{(\log_c n)\log n}\leq \max(c^{\log^2 n},c^{\log_c^2 n})<c^n$ (asymptotically)

Yes, this is a typo, the actual running time of their algorithm is $4^n n^{\log{n}}$ (see, e.g., "Exact Algorithms for Exact Satisfiability and Number of Perfect Matchings" by Bjorklund and Husfeldt, A Taxonomic Introduction to Exact Algorithms by Husfeldt, or [1]). [1]:http://cstheory.stackexchange.com/a/11557/7454 — Alex Golovnev, Jul 20 '16 at 08:06
Thanks @AlexGolovnev for providing evidences, the summary in your post is very nice. I just realized that the complexity ($O(4^nn^{log n})$) can be also viewed as "$O((4+\epsilon)^n)$ for any $\epsilon>0$", so let's say the complexity is infinitely close to $4^n$ from above, so their typo is not so far from the truth...:) — Leo, Jul 20 '16 at 08:50

Ryan Williams · Answer 1 · 2016-08-09T17:19:54.873

Just a note: the $n^{\log n}$ factor can be replaced by a polynomial, without much trouble. Here's one way to do it, not at all optimized...

Imagine there is a global graph $G$ with vertex set $[n]$, and a current induced subgraph $G' = (V',E')$ of $G$, with $n'=|V'|$. Instead of guessing the middle vertex along with each of the ${n' \choose \lfloor n'/2\rfloor}$ subset choices for $S \subseteq V'$, one can recurse only on the choice of $S$. Assume the recursive calls return $n \times n$ matrices $A_S$ and $A_{V'-S}$ such that:

$A_S[i,j]$ equals the minimum cost of a Hamiltonian tour that starts with some vertex $i \in V$ (not necessarily in $S$), traverses only through vertices in $S$, then ends with some vertex $j \in V$ (not necessarily in $S$).
$A_{V'-S}[i,j]$ is defined similarly.

The algorithm should then return the $n\times n$ matrix $A_{G'}$, where for all $i,j=1,\ldots,n$, $A_{G'}[i,j] = \min_{S \subseteq V'} \min_{k\in V'} (A_S[i,k]+ A_{V'-S}[k,j])$. Note that we do not have to store all these matrices; we could for example maintain $O(1)$ matrices per recursion layer (and the recursion depth is $O(n)$). After we compute the matrix product $A'[i,j] = \min_k (A_S[i,k]+ A_{V'-S}[k,j])$, we can simply take the MIN of $A'[i,j]$ with the current $A_{G'}[i,j]$ over all $i,j$, then erase $A_S$ and $A_{V'-S}$ before going to the next subset $S$. Thus the space usage is $poly(n)$.

Now the time recurrence is $T(n',n) = {n' \choose \lfloor n'/2\rfloor} \cdot (T(\lfloor n'/2\rfloor,n) + T(\lceil n'/2\rceil,n) + poly(n))$ for all $n' \leq n$ and $T(1,n) = poly(n)$, which yields $T(n',n) \leq 4^{n'} \cdot n^k$ for some $k \geq 1$. To verify this, note that the RHS of the recurrence is upper bounded by

$2^{n'+1} T(n'/2,n) + 2^{n'} \cdot poly(n) \leq 2^{n'+1} \cdot (4^{n'/2} (n/2)^k) + 2^{n'} n^k$ $= 2 \cdot 4^{n'} (n/2)^k + 2^{n'} n^k \leq 4^{n'} n^k/2^{k-1} + 2^{n'} n^k$,

which is $\leq 4^{n'} n^k$ for $k \geq 3$ (for example).

It's a nice idea but i have some trouble with the recurrence: if i understood well, for each already computed $A_S$ and $A_{V'-S}$ we need to compute $A'$ (which takes $(n'/2)^3$) then update $A_{G'}$ (which take $(n'/2)^2$), and these costs should be a multiplicative instead of additive factor in the final recurrence. — Leo, Aug 09 '16 at 07:34
Yes, there was a misplaced parenthesis. Updated my answer, hope it's clearer. — Ryan Williams, Aug 09 '16 at 17:07
It's now clear for me. It's interesting to see that this approach is faster than the old one. While the framework is still Divide&Conquer, the use of matrices has some taste of Dynamic Programming. By using more memory (matrix) we gain in time. Maybe the result worth being mentioned somewhere in the literature :) — Leo, Aug 17 '16 at 09:25

score 4 · Accepted Answer · answered Jul 19 '16 at 18:18

It does look like their proof for the time bound has a minor error, though that part of the theorem statement itself looks right.

To adapt their proof, instead of picking $c$ so that for large $n$,

$2^{n + b\log(n)} \cdot 2^{(2n + c\log(n))/2 + 1} \le 2^{n + c\log(n)}$,

you want to pick $c$ to have

$2^{n + b\log(n)} \cdot 2^{2(n/2 + 1) + c\log(n/2+1)} \le 2^{2n + c\log(n)}$

for large $n$. (The "$/2 + 1$" gets pre-composed with $n + c\log(n)$, not post-composed.) This overall gives a time bound of $2^{2n + c\log(n)}$. Of course, only $c = \Omega(\log(n))$ can achieve this, but a choice of $c = O(\log(n))$ does work. This gives the stated time bound of $2^{2n}\cdot n^{O(\log(n))}$.

In fact they state the complexity as $2^{2n}2^{O(log(n))}$, both in the introduction part and also P499, 3rd line from the bottom. However, they did write $2^{2n}n^{O(log(n))}$ in the Theorem3.1, point (b), and the proof concludes that $T(n)= 2^{2n}2^{O(log(n))}$... So you see the confusion! :P
Also, they "choose c as a constant". For me, I agree that the right complexity is what you stated, not theirs. — Leo, Jul 20 '16 at 07:02

Confusing running time analysis for the Divide & Conquer algorithm of Hamiltonian Path problem

2 Answers2