Is using iterative methods to solve a linear system always superior to inversing the matrix?

Question

I have a silly question. Is it always more computationally efficient to use iterative methods to solve for some matrix $A$, $Ax=b$, where $x$ and $b$ change but $A$ stays constant, compared to computing the inverse $A^{-1}$?

The rationale for computing the inverse would be that after computing the inverse it can be reused to solve the new fields. Shouldn't simply calculating the single matrix-vector product $A^{-1}b$ be faster when spread for multiple $x$ and $b$, on the order of thousands of different $x$ and $b$?

The context of the matrix $A$ is that it comes from a finite difference scheme, and in these cases is the reason why $A^{-1}$ is not used because of its memory requirements, i.e., fill-in that gets introduced in the inversion destroys the sparsity of A? Taking a mesh of $512^3$ we would have $8$ fields per mesh point so we have $512^3 \times 8$ elements. Then the matrix would be the size of $(512^3 \times 8)^2$ where it is clear that if $A^{-1}$ can not be sparse as $A$ is the memory requirements and possibly performance blows because of the fill-in. Is the issue why this approach is not used or something else?

whpowell96 · Answer 1 · 2023-05-31T01:58:43.573

First off, there are basically no scenarios where one would ever actually compute and store $A^{-1}$ in memory, even for small problems. An LU factorization offers both superior efficiency and stability. There are a few important reasons why iterative methods can be superior to factorization methods on some important problems.

One you have already mentioned is that for problems involving discretizations of PDEs, $A$ is typically very sparse. This means that the memory requirements for storing the LU factors of $A$ can be prohibitive compared to storing $A$, as these are typically dense. However, even if you have enough memory, it is often still more efficient to use iterative methods. The complexity of computing an LU factorization is $\mathcal{O}(n^3)$, which becomes intractible very quickly. Your problem has $n\approx 10^9$, so I feel confident in saying that computing and storing an LU factorization would be prohibitive. To my knowledge, there are no direct methods that outperform iterative methods on problems of this size. Optimized LU factorization algorithms will outperform iterative methods until about $n\approx 10^4$, and there are sparse direct methods that can push that a couple more orders of magnitude higher, but your problem is at the point where iterative methods are really the only feasible choice in my opinion.

On the other hand, the most performant iterative solvers used in practice (Krylov subspace methods) can compute $A^{-1}b$ to desired precision with $\mathcal{O}(n)$ complexity with proper algorithm selection and preconditioning. This is best-case behavior, where $A$ is sparse enough (or optimized enough, e.g., Toeplitz or circulant matrices) that a matvec costs $\mathcal{O}(n)$ operations and the preconditioner is good enough that the number of iterations required for convergence is roughly independent of $n$. This can be achieved for some elliptic problems and proper multigrid preconditioners. However, we are not always so lucky in practice. If you have poor preconditioning, then the number of Krylov iterations can scale like $\mathcal{O}(n)$ (with $n$ iterations being worst-case but guaranteed convergence), which makes the entire algorithm $\mathcal{O}(n^2)$. Where you end up between $\mathcal{O}(n)$ and $\mathcal{O}(n^2)$ is almost all due to preconditioning, regardless of $b$ (ignoring trivial cases where $b$ happens to be extremely close to an eigenvector of $A$.)

Even though iterative methods may be the best choice for computing $A^{-1}b$ once, you are correct that the cost of the Krylov iterations may become a burden if you have multiple $b$'s. Here is a recent post discussing a similar issue of solving $Ax=b$ for large systems and many different $b$'s. There are many helpful suggestions in the comments there for ways to speed up subsequent iterative solves using solutions of previous solves (Krylov subspace recycling, deflation spaces, starting iterations at the previous solution). There are also methods known as block Krylov methods that are built with multiple RHS's and are very efficient (see the papers linked here), but I think the best piece of advice for a problem of this size is to invest the time in finding the best preconditioner you can.

Thank you for your answer! It is crazy that with Multigrid you can get O(N) solver, if I understand your answer correctly, which is faster than with an exact than using LU factorization or the inverse which are both O(N^2). The ideas of subspace recycling and etc.seem like a good solution to leverage the fact that we are keeping A constant. Additionaly it probably makes sense to use sufficient compute time to get a good preconditioner that we can simply reuse. One note that on the linked discussion Wolfgang seems to state that LU should always be faster, but I guess this applies only to — Touko Puro, May 29 '23 at 19:24
$\mathcal{O}(n)$ is still very much a best-case scenario and you need optimal preconditioners for your problem but even if you don't get all the way there, preconditioned Krylov methods are still the most scalable methods. — whpowell96, May 29 '23 at 19:26
--continuation-- smaller systems? And okay since anyway we probably don't have enough memory to store LU factorization, and it doesn't parallize that well on GPUs which is a significant downside — Touko Puro, May 29 '23 at 19:29
Given that LU decompositions can only be feasibly computed for smaller systems, when you have one, use it. There are block decompositions and incomplete factorizations that do parallelize well with MPI but idk about GPUs. These methods are typically used as preconditioners for the more standard iterative methods as opposed to standalone solvers, however. — whpowell96, May 29 '23 at 19:38

Is using iterative methods to solve a linear system always superior to inversing the matrix?

1 Answers1