9

Let $A\in \mathbb{R}^{n\times n}$, symmetric and positive definite. Suppose it takes $m$ units of work to multiply a vector by $A$. It is well known that performing the CG algorithm on $A$ with condition number $\kappa$ requires $\mathcal{O} (m\sqrt{\kappa})$, units of work.

Now, of course, being a $\mathcal{O}$ statement this is an upper-bound. And the CG algorithm can always terminate in zero steps with a lucky initial guess.

Do we know if there exists a RHS and an initial (unlucky) guess that will require $\mathcal{\Theta}(\sqrt{\kappa})$ steps? Put another way, is worst-case work-complexity of CG really $\Theta( m \sqrt{\kappa})$?

This question arises when I tried to determine if the benefit of a preconditioner (lower $\kappa$) outweighed its cost (higher $m$). Right now, I am working with toy problems and would like to have a better idea before I implement anything in a compiled language.

fred
  • 1,000
  • 6
  • 12
  • 5
    You could presumably construct a pessimal initial guess by running the CG algorithm "backwards" and putting suitable energy into each of the $A$-orthogonal search directions that the algorithm requires all the steps. – origimbo Mar 17 '16 at 16:11

2 Answers2

9

The answer is a resounding yes. The convergence rate bound of $(\sqrt{\kappa}-1) / (\sqrt{\kappa}+1)$ is sharp over the set of symmetric positive definite matrices with condition number $\kappa$. In other words, knowing nothing more about $A$ than its condition number, CG really can take $\sim\sqrt{\kappa}$ iterations to converge. Loosely speaking, the upper-bound is attained if the eigenvalues of $A$ are uniformly distributed (i.e. "peppered") within an interval of condition number $\kappa$.

Here's a more rigorous statement. Deterministic versions are more involved but work using the same principles.

Theorem (Worst-case choice of $A$). Pick any random orthogonal matrix $U$, let $\lambda_1,\ldots,\lambda_n$ be $n$ real numbers uniformly sampled from the real interval $[1,\kappa]$, and let $b=[b_1;\ldots;b_n]$ be $n$ real numbers sampled i.i.d. from the standard Gaussian. Define $$A=U\mathrm{diag}(\lambda_1,\ldots,\lambda_n)U^T.$$ Then in the limit $n\to\infty$, conjugate gradients will convergence with probability one to an $\epsilon$ accurate solution of $Ax=b$ in no less than $\Omega(\sqrt{\kappa}\log\epsilon^{-1})$ iterations.

Proof. The standard proof is based on optimal Chebyshev polynomial approximations, using techniques found in a number of places, such as Greenbaum's book or Saad's book.

Richard Zhang
  • 2,485
  • 15
  • 26
  • 1
    The bound is not sharp, as the answer explains later, If the eigenvalues are not uniformly distributed, cg converges faster, since it is not a stationalry iteration. Thus, we need to know more about the matrix. – Guido Kanschat Mar 20 '16 at 03:39
  • @GuidoKanschat: Good point, and I've fixed the statement to clarify that sharpness is attained over all $A$ with condition $\kappa$. – Richard Zhang Mar 21 '16 at 15:50
  • The proof boils down to minimizing $|p(A)|$ in the space of order-$k$ polynomials satisfying $p(0)=1$. Equivalently this is $\min_p \max_{\lambda\in\Lambda(A)} |p(\lambda)|$. In the stated limit, $\Lambda(A)\to[1,\kappa]$, and the solution for the minimax problem is then the Chebyshev polynomial, whose error converges as $\sim\sqrt{\kappa}$ – Richard Zhang Mar 21 '16 at 21:27
0

Taking this as my original question: Do we know if there exists a RHS and an initial (unlucky) guess that will require $\Theta(\sqrt{\kappa})$ steps?

The answer to the question is "no". The idea of this answer comes from the comment from Guido Kanschat.

Claim: For any given condition number $k$, there exists a matrix $A$, with that condition number for which the CG algorithm will terminate in at most two steps (for any given RHS and initial guess).

Consider $A\in \mathbb{R}^{n\times n} $ where $A=\mathrm{diag}(1,\kappa,\kappa,\ldots, \kappa)$. Then the condition number of $A$ is $\kappa$. Let $b\in \mathbb{R}^n$ be the RHS, and denote the eigenvalues of $A$ as $\lambda_i$ where $$\lambda_i = \left\{\begin{array}{ll}1 & i=1\\ \kappa & i\not= 1 \end{array} \right. . $$

We first consider the case where $x^{(0)} \in \mathbb{R}^n$, the initial guess, is zero. Denote $x^{(2)}\in \mathbb{R}^n$ as the second estimate of $A^{-1}b$ from the CG algorithm. We show that $x^{(2)} =A^{-1}b$ by showing $\langle x^{(2)}-A^{-1}b, A(x^{(2)}-A^{-1}b)\rangle =0$. Indeed, we have

\begin{align*} \langle x^{(2)}-A^{-1}b, A(x^{(2)}-A^{-1}b)\rangle &= \left\| x^{(2)}-A^{-1}b \right\|_A^2 \\ &=\min_{p\in \mathrm{poly}_{1} } \left\| (p(A)-A^{-1}) b \right\|_A^2\\ &=\min_{p\in \mathrm{poly}_{1} } \sum_{i=1}^n (p(\lambda_i) - \lambda_i^{-1})^2 \lambda_i b_i^2 \\ &\le \sum_{i=1}^n (\widehat{p}(\lambda_i) - \lambda_i^{-1})^2 \lambda_i b_i^2 = 0 \end{align*}

Where we use the first order polynomial $\widehat{p}$ defined as $\widehat{p}(x)= (1+\kappa-x)/\kappa$. So we proven the case for $x^{(0)}= 0$.

If $x^{(0)} \not = 0$, then $x^{(2)}= \overline{x^{(2)}}+ x^{(0)}$ where $\overline{x^{(2)} }$ is the second estimate of the CG algorithm with $b$ replaced with $\overline{b} = b-A x^{(0)}$. So we have reduced this case to the previous one.

fred
  • 1,000
  • 6
  • 12