Why does conjugate gradient work with this nonsymmetric preconditioner?

Question

In this previous thread the following multiplicative way to combine symmetric preconditioners $P_1$ and $P_2$ for the symmetric system $Ax=b$ was suggested: \begin{align} P_\text{combo}^{-1} :=& P_1^{-1} + P_2^{-1}(I - A P_1^{-1})\\ =&P_1^{-1} + P_2^{-1} - P_2^{-1} A P_1^{-1}. \end{align}

This combined preconditioner is not symmetric. However, I have tried using it in conjugate gradient anyways in several different contexts, and the method always seems to converge just fine. Why is this?

Example 1: Random matrices.

% Testing multiplicative combination of preconditioners
n = 500;
[U,S,~] = svd(randn(n,n)); A = U*S*U';
[W1,S1,~] = svd(randn(n,n)); noise1 = W1*S1*W1';
[W2,S2,~] = svd(randn(n,n)); noise2 = W2*S2*W2';
P1 = A + 0.5 * noise1;
P2 = A + 0.5 * noise2;

solve_P = @(x) P1\x + P2\x - P2\(A*(P1\x));

b = randn(n,1);
x_true = A\b;
pcg(A,b,1e-6,n);
pcg(A,b,1e-6,n,P1);
x = pcg(A,b,1e-6,n,solve_P);
norm(x_true - x)/norm(x_true)

Here's the output I get:

pcg converged at iteration 127 to a solution with relative residual 9.9e-07.
pcg converged at iteration 62 to a solution with relative residual 6.8e-07.
pcg converged at iteration 51 to a solution with relative residual 8.1e-07.
relative error= 4.23e-07

Example 2: Combining multigrid with incomplete cholesky for solving the Poisson equation.

n = 50; N = n^2;
A = gallery('poisson',n); %Laplacian on n-by-n grid, zero dirichlet BC

L = ichol(A);
solve_P1 = @(x) L'\(L\x);
% Combinatorial multigrid: http://www.cs.cmu.edu/~jkoutis/cmg.html
solve_P2 = cmg_sdd(A); 

solve_P = @(x) solve_P1(x) + solve_P2(x) - solve_P1(A*solve_P2(x));

b = randn(N,1);
x_true = A\b;
pcg(A,b,1e-6,N);
pcg(A,b,1e-6,N,solve_P1);
pcg(A,b,1e-6,N,solve_P2);
x = pcg(A,b,1e-6,N,solve_P);
disp(['relative error= ', num2str(norm(x_true - x)/norm(x_true),3)])

For this I get the results:

pcg converged at iteration 131 to a solution with relative residual 8.4e-07.
pcg converged at iteration 44 to a solution with relative residual 6e-07.
pcg converged at iteration 19 to a solution with relative residual 7e-07.
pcg converged at iteration 12 to a solution with relative residual 4.7e-07.
relative error= 5.2e-07

Notes:

I've also seen the same qualitative behavior on matrices arising in more complicated/realistic situations.
Given an incorrect solution to $Ax=b$, the error $e$ and the residual $r$ are related by the error equation $Ae=r$. One can view this combo preconditioner as approximately solving the original equation using $P_1$ instead of $A$, then approximately solving the error equation with $P_2$ instead of $A$, then adding the approximate error back to correct the original approximate solution.

In your first formula, I think there should be no $x$. – Wolfgang Bangerth Feb 24 '16 at 23:09 — Wolfgang Bangerth, Feb 24 '16 at 23:09
@WolfgangBangerth Thanks, that was a typo, fixed. – Nick Alger Feb 24 '16 at 23:42 — Nick Alger, Feb 24 '16 at 23:42

score 4 · Answer 1 · answered May 19 '17 at 07:18

In short, orthogonalization of the Krylov vectors occurs with respect to the operator, but not with respect to the preconditioner.

Alright, so say we want to solve $Ax=b$ with preconditioner $B$. the preconditioned-CG iteration is basically:

\begin{align*} \hat{v}_1=\tilde{v}_1 =& Bb\\ v_1 =& \tilde{v}_1 / c_1\\ \\ \hat{v}_i =& BAv_{i-1}\\ \tilde{v}_i=&\hat{v}_i-\sum\limits_{j=1}^{i-1} \langle Av_k,\hat{v}_i\rangle v_k\\ v_i =& \tilde{v}_i/c_i\\ c_i =& \sqrt{\langle \tilde{v}_i,\tilde{v}_i\rangle} \end{align*} In other words

$\hat{v}_i$ - New Krylov vector
$\tilde{v}_i$ - Orthgonalized Krylov vector
$v_i$ - Normalized Krylov vector

Now, what's interesting from this is that we get a number of properties. The two that really matter for you are

$\textrm{span}\{v_i\}_{i=1}^m = \textrm{span}\{(BA)^{i-1}(Bb)\}_{i=1}^m$
$V^TAV = I$

The first property follows from how $\hat{v}_i$ is generated. We're subtracting things from it and normalizing, but we're basically just getting a bunch of $BAv_{i-1}$ where $v_1$ is a scaling of $Bb$. The second property follows from $\tilde{v}_i$. Here, we do not use $B$ in the orthogonalization. Orthogonalization depends purely on $A$ being symmetric positive definite.

Now, say we want to solve the system $Ax=b$. If we choose $x$ in our Krylov space, we solve for $\beta$ in $$ AV\beta = b $$ However, this is probably overdetermined. Hence, we solve the normal equations, so that $$ V^TAV\beta = V^Tb $$ From above, we know that $V^TAV=I$, so we really have that $$ \beta = V^Tb $$ or that $$ x = VV^Tb. $$ Note, this holds regardless of the preconditioner $B$. In fact, we could take random vectors and $A$-orthogonalize them against each other and the same would hold. Really, the only thing that $B$ affects is the Krylov space. As long as $B$ helps cluster the eigenvalues of $A$, the algorithm performs well. Though, if $B$ is nonsymmetric, we may have to look at something like the pseudospectra to figure that out. I'm not entirely sure. Mostly, cluster the spectra.

Do we lose anything with a nonsymmetric $B$? Probably. Normally, in CG, we have properties like $\langle Br_i,r_j\rangle = 0$ for all $i\neq j$. I'm pretty sure that no longer holds. Also, we normally have $\langle BAv_i,Av_j\rangle=0$ for $|i-j|\geq 2$. That may or may not hold. Really, any result that has $B$ I'd be suspicious of.

Anyway, I'm pretty sure it's possible to code CG in a way that'd break everything with a nonsymmetric $B$. However, if we code it using the equations above, there's nothing that really prohibits it from working with a nonsymmetric $B$. I do it all the time and it seems to work fine.

score 1 · Answer 2 · answered Sep 09 '17 at 14:30

1

Read http://www.sciencedirect.com/science/article/pii/S1877050915010492 "Nonsymmetric Preconditioning for Conjugate Gradient and Steepest Descent Methods"

answered Sep 09 '17 at 14:30

user25364

11
1

6

Welcome to SciComp.SE! Answers should not just contain a link, since these can go away at any time or -- as in this case -- bounce off a paywall. Could you summarize the contents of this paper? Otherwise this is more of a comment than an answer. – Christian Clason Sep 09 '17 at 14:54

Why does conjugate gradient work with this nonsymmetric preconditioner?

2 Answers2

Linked