What is the worst case complexity of Conjugate Gradient?

Question

Let $A\in \mathbb{R}^{n\times n}$, symmetric and positive definite. Suppose it takes $m$ units of work to multiply a vector by $A$. It is well known that performing the CG algorithm on $A$ with condition number $\kappa$ requires $\mathcal{O} (m\sqrt{\kappa})$, units of work.

Now, of course, being a $\mathcal{O}$ statement this is an upper-bound. And the CG algorithm can always terminate in zero steps with a lucky initial guess.

Do we know if there exists a RHS and an initial (unlucky) guess that will require $\mathcal{\Theta}(\sqrt{\kappa})$ steps? Put another way, is worst-case work-complexity of CG really $\Theta( m \sqrt{\kappa})$?

This question arises when I tried to determine if the benefit of a preconditioner (lower $\kappa$) outweighed its cost (higher $m$). Right now, I am working with toy problems and would like to have a better idea before I implement anything in a compiled language.

You could presumably construct a pessimal initial guess by running the CG algorithm "backwards" and putting suitable energy into each of the $A$-orthogonal search directions that the algorithm requires all the steps. — origimbo, Mar 17 '16 at 16:11

Richard Zhang · Answer 1 · 2016-03-21T15:54:39.703

The answer is a resounding yes. The convergence rate bound of $(\sqrt{\kappa}-1) / (\sqrt{\kappa}+1)$ is sharp over the set of symmetric positive definite matrices with condition number $\kappa$. In other words, knowing nothing more about $A$ than its condition number, CG really can take $\sim\sqrt{\kappa}$ iterations to converge. Loosely speaking, the upper-bound is attained if the eigenvalues of $A$ are uniformly distributed (i.e. "peppered") within an interval of condition number $\kappa$.

Here's a more rigorous statement. Deterministic versions are more involved but work using the same principles.

Theorem (Worst-case choice of $A$). Pick any random orthogonal matrix $U$, let $\lambda_1,\ldots,\lambda_n$ be $n$ real numbers uniformly sampled from the real interval $[1,\kappa]$, and let $b=[b_1;\ldots;b_n]$ be $n$ real numbers sampled i.i.d. from the standard Gaussian. Define $$A=U\mathrm{diag}(\lambda_1,\ldots,\lambda_n)U^T.$$ Then in the limit $n\to\infty$, conjugate gradients will convergence with probability one to an $\epsilon$ accurate solution of $Ax=b$ in no less than $\Omega(\sqrt{\kappa}\log\epsilon^{-1})$ iterations.

Proof. The standard proof is based on optimal Chebyshev polynomial approximations, using techniques found in a number of places, such as Greenbaum's book or Saad's book.

The bound is not sharp, as the answer explains later, If the eigenvalues are not uniformly distributed, cg converges faster, since it is not a stationalry iteration. Thus, we need to know more about the matrix. — Guido Kanschat, Mar 20 '16 at 03:39
@GuidoKanschat: Good point, and I've fixed the statement to clarify that sharpness is attained over all $A$ with condition $\kappa$. — Richard Zhang, Mar 21 '16 at 15:50
The proof boils down to minimizing $|p(A)|$ in the space of order-$k$ polynomials satisfying $p(0)=1$. Equivalently this is $\min_p \max_{\lambda\in\Lambda(A)} |p(\lambda)|$. In the stated limit, $\Lambda(A)\to[1,\kappa]$, and the solution for the minimax problem is then the Chebyshev polynomial, whose error converges as $\sim\sqrt{\kappa}$ — Richard Zhang, Mar 21 '16 at 21:27

fred · Answer 2 · 2016-03-21T17:19:38.597

Taking this as my original question: Do we know if there exists a RHS and an initial (unlucky) guess that will require $\Theta(\sqrt{\kappa})$ steps?

The answer to the question is "no". The idea of this answer comes from the comment from Guido Kanschat.

Claim: For any given condition number $k$, there exists a matrix $A$, with that condition number for which the CG algorithm will terminate in at most two steps (for any given RHS and initial guess).

Consider $A\in \mathbb{R}^{n\times n} $ where $A=\mathrm{diag}(1,\kappa,\kappa,\ldots, \kappa)$. Then the condition number of $A$ is $\kappa$. Let $b\in \mathbb{R}^n$ be the RHS, and denote the eigenvalues of $A$ as $\lambda_i$ where $$\lambda_i = \left\{\begin{array}{ll}1 & i=1\\ \kappa & i\not= 1 \end{array} \right. . $$

We first consider the case where $x^{(0)} \in \mathbb{R}^n$, the initial guess, is zero. Denote $x^{(2)}\in \mathbb{R}^n$ as the second estimate of $A^{-1}b$ from the CG algorithm. We show that $x^{(2)} =A^{-1}b$ by showing $\langle x^{(2)}-A^{-1}b, A(x^{(2)}-A^{-1}b)\rangle =0$. Indeed, we have

\begin{align*} \langle x^{(2)}-A^{-1}b, A(x^{(2)}-A^{-1}b)\rangle &= \left\| x^{(2)}-A^{-1}b \right\|_A^2 \\ &=\min_{p\in \mathrm{poly}_{1} } \left\| (p(A)-A^{-1}) b \right\|_A^2\\ &=\min_{p\in \mathrm{poly}_{1} } \sum_{i=1}^n (p(\lambda_i) - \lambda_i^{-1})^2 \lambda_i b_i^2 \\ &\le \sum_{i=1}^n (\widehat{p}(\lambda_i) - \lambda_i^{-1})^2 \lambda_i b_i^2 = 0 \end{align*}

Where we use the first order polynomial $\widehat{p}$ defined as $\widehat{p}(x)= (1+\kappa-x)/\kappa$. So we proven the case for $x^{(0)}= 0$.

If $x^{(0)} \not = 0$, then $x^{(2)}= \overline{x^{(2)}}+ x^{(0)}$ where $\overline{x^{(2)} }$ is the second estimate of the CG algorithm with $b$ replaced with $\overline{b} = b-A x^{(0)}$. So we have reduced this case to the previous one.

@origimbo If your question was directed to me, the answer is, "I don't know." — fred, Mar 21 '16 at 16:36

What is the worst case complexity of Conjugate Gradient?

2 Answers2