20

I am aware about that inverting a matrix to solve a linear system is not a good idea, since it is not as accurate and as efficient as directly solving the system or using LU, Cholesky or QR decomposition.

However, I have not been able to check this with a practical example. I have tried this code (in MATLAB)

M   = 500;    
A   = rand(M,M);
A   = real(expm(1i*(A+A.')));
b   = rand(M,1);

x1  = A\b;
x2  = inv(A)*b;

disp(norm(b-A*x1))
disp(norm(b-A*x2))

and the residuals are always of the same order (10^-13).

Could someone provide a practical example in which inv(A)*b is much less inaccurate than A\b?

------Question update------

Thank you for your answers. However, suppose that we have to solve $n$ times a system $Ax = b$, where $A$ is always the same matrix. Consider that

-$A$ is full, and thus $A^{-1}$ requires the same memory storage than $A$.

-The condition number of $A$ is small, hence $A^{-1}$ can be computed with accuracy.

In that case, would not it be more efficient to compute $A^{-1}$ rather than to use a LU decomposition? For example, I have tried this Matlab code:

%Set A and b:
M           = 1000; 
A           = rand(M,M);
A           = real(expm(1i*(A+A.')));
b           = rand(M,1);

%Times we solve the system:
n           = 3000;

%Performing LU decomposition:
disp('Performing LU decomposition')
tic
[L,U,P]     = lu(A);
toc
fprintf('\n')

%Solving the system n times with LU decomposition:
optsL.LT    = true;   %Options for linsolve
optsU.UT    = true;
disp('Solving the system n times using LU decomposition')
tic
for ii=1:n
    x1      = linsolve(U, linsolve(L,P*b,optsL) , optsU);
end
toc
fprintf('\n')

%Computing inverse of A:
disp('Computing inverse of A')
tic
Ainv        = inv(A);
toc
fprintf('\n')

%Solving the system n times with Ainv:
disp('Solving the system n times with A inv')
tic
for ii=1:n
    x2  = Ainv*b;
end
toc
fprintf('\n')

disp('Residuals')
disp(norm(b-A*x1))
disp(norm(b-A*x2))

disp('Condition number of A')
disp(cond(A))

For a matrix with condition number about 450, the residuals are $O(10^{-11})$ in both cases, but it takes 19 seconds for solving the system n times using the LU decomposition, whereas using the inverse of A it only takes 9 seconds.

Manu
  • 459
  • 3
  • 9
  • 8
    the MATLAB help page for inv gives a good example. Look under the section titled Solve Linear System. – GoHokies Mar 17 '17 at 12:28
  • 1
    btw, what is the condition number of your $A$ matrix? I don't have MATLAB on my PC at work so I can't check, but I presume it's small enough for you to get an accurate inverse... – GoHokies Mar 17 '17 at 12:34
  • 2
    I had a look at Trefethen and Bau (exercise 21.4), and they describe it purely it terms of computation cost, $2n^3$ flops vs. $\frac23 n^3$ flops. So even if you find residuals similar (have you tried checking more poorly conditioned matrices, as in GoHokies's comment?), the unnecessary computation cost alone is probably worth the advice. – Kirill Mar 17 '17 at 22:33
  • 3
    Your matrix size is much too small and well-conditioned for this comparison. Not that there aren't relevant problems where you have such matrices, but the received opinion that you shouldn't invert is meant for a different setting (e.g., the one mentioned by Chris Rackauckas in his answer). In fact, for small and -- certifiably -- well-conditioned matrices, computing the inverse may indeed be the better option. An extreme case would be 3x3 rotation (or, more realistically, affine transformation) matrices. – Christian Clason Mar 18 '17 at 14:58
  • It's just that inverting where you shouldn't may lead to catastrophic results, while solving when you could invert may just take a bit longer -- hence the categorical formulation of the imperative. – Christian Clason Mar 18 '17 at 15:02
  • Just to put a face to things: In the context of discretization of PDEs, a 16k x 16k matrix with condition number ~1e4 is considered a small and very nicely-behaved toy example. (Again, this is not the only interesting context, but one so prevalent that it has pretty much dominated numerical linear algebra for the second half of the 20th century -- including the received wisdom.) – Christian Clason Mar 18 '17 at 15:09
  • 2
    If you need to repeatedly solve Ax=b with the same A and it's small enough to take the inverse, you can instead save the LU-factorization and reuse that. – Chris Rackauckas Mar 19 '17 at 05:00
  • Instead of using a random $b$, try to take one for which $|Ab| \ll |A|$; this should change things, intuitively. – Federico Poloni Mar 19 '17 at 11:28
  • @Kirill It's a different use case here. Check the code: OP is computing [L, U] = lu(A) or inv(A) one time only, and then re-using it to solve 3000 different linear systems. So the cost of computing the factorization/inverse doesn't matter, it's only the cost to apply it that is multiplied by 3000. Under these conditions, I can easily imagine inv(A)*b beating those linsolves; first of all because matrix multiplication is easier to optimize to high performance and parallelize than two repeated linsolves, second because those linsolves probably do a bazillion of constant-time checks. – Federico Poloni Mar 19 '17 at 11:36
  • @FedericoPoloni I agree, you're right, but the question changed, so my comment makes less sense now; the original question was literally comparing A\b with inv(A)*b. – Kirill Mar 19 '17 at 17:23
  • Thank you all for your answers. The matter is now much more clear for me. There is an article (http://link.springer.com/article/10.1007/s10915-006-9112-x) that compares the cost of using LU and linsolve vs using inv(A) for solving a system many times, it may also be of interest. – Manu Mar 20 '17 at 10:31
  • 1
    just a remark: MATLAB's inv actually computes the LU decomposition of $A$, as explained here: inv performs an LU decomposition of the input matrix (or an LDL decomposition if the input matrix is Hermitian). It then uses the results to form a linear system whose solution is the matrix inverse inv(X). – GoHokies Mar 20 '17 at 11:02

2 Answers2

15

Here is a quick example which is very practical related to memory usage in PDEs. When one discretizes the Laplace operator, $\Delta u$, for example, in the Heat Equation

$$ u_t = \Delta u + f(t,u) .$$

To solve it numerically, one ends up with sparse matrices $A$, and a method of lines discretization then solve

$$ u_t = Au + f(t,u) $$

The canonical 1D example is the Strang matrix. Implicit methods will need to invert $A$, or some form of $I-\gamma A$. For a discretization with 5 points in space, let's see what happens to this operator. We can generate it easily in Julia using SpecialMatrices.jl:

julia> using SpecialMatrices
julia> Strang(5)
5×5 SpecialMatrices.Strang{Float64}:
 2.0  -1.0   0.0   0.0   0.0
-1.0   2.0  -1.0   0.0   0.0
 0.0  -1.0   2.0  -1.0   0.0
 0.0   0.0  -1.0   2.0  -1.0
 0.0   0.0   0.0  -1.0   2.0

This special matrix is tridiagonal (many other discretizations are banded, and thus still sparse). This means that you can store it by only storing 3 arrays, or in this case, it can be "lazy" (i.e., no array is needed and you can have a "pseudo-array" type which generates the values on demand). This means that even for very large spatial discretizations with $n$ points, sparse matrix formats can store this in $\mathcal{O}(3n)$ memory (and lazy formats can do this in $\mathcal{O}(1)$!).

However, let's say we want to invert the matrix.

julia> inv(collect(Strang(5)))
5×5 Array{Float64,2}:
 0.833333  0.666667  0.5  0.333333  0.166667
 0.666667  1.33333   1.0  0.666667  0.333333
 0.5       1.0       1.5  1.0       0.5
 0.333333  0.666667  1.0  1.33333   0.666667
 0.166667  0.333333  0.5  0.666667  0.833333

Notice that the inverse of this sparse matrix is dense. Thus, if we do not know the analytical solution of the inverse (which is true for most sparse matrices arising from PDE discretizations), the amount of memory that will be required for the inverse is $\mathcal{O}(n^2)$. This means that the inverse will take up a massive amount of extra memory, and your memory limit will thus be determined by the size of the dense inverse matrix.

Instead, direct solver methods of \ and iterative solvers like those in IterativeSolvers.jl will solve $Ax=b$ without ever computing $A^{-1}$, and thus only have the memory requirement of having to store $A$. This can greatly expand the size of the PDEs you can solve.

As others have mentioned, condition number and numerical error is another reason, but the fact that the inverse of a sparse matrix is dense gives a very clear "this is a bad idea".

DanielRch
  • 472
  • 1
  • 4
  • 12
Chris Rackauckas
  • 12,320
  • 1
  • 42
  • 68
13

Normally there are some principal reasons to prefer solve a linear system respect to use the inverse. Briefly:

  • problem with the conditional number (@GoHokies comment)
  • problem in the sparse case (@ChrisRackauckas answer)
  • efficiency (@Kirill comment)

Anyway, as @ChristianClason remarked in comments, can be some cases where the use of the inverse is a good option.

In the note/article by Alex Druinsky, Sivan Toledo, How Accurate is inv(A)*b? there are some consideration about this problem.

According with the paper the principal reason for the general preference to use solve the linear system is inside these two estimates ($x$ is the true solution): $$ \text{inverse} \quad || x_V - x || \leq O(\kappa^2(A) \epsilon_{machine})\\ \text{ backward stable (LU, QR,...)} \quad || x_{backward-stable} - x || \leq O(\kappa(A) \epsilon_{machine}) $$

Now the estimate for the inverse can be improve, under some condition over the inverse, see theorem 1 in the paper, but $x_V$ can be conditionally accurate and not backward stable.

The paper show the cases when this happens ($V$ is the inverse)

(1) $V$ is not a good right inverse, or

(2) $V$ is such a poor left inverse that $||x_V||$ is much smaller than $||x||$, or

(3) the projection of $b$ on the left singular vectors of $A$ associated with small singular values is small.

So the opportunity to use or not the inverse depend by the application, may you can check the article an see if you case satisfies the condition to obtain the backward stability or if you don't need of it.

In general, in my opinion, is more safe solve the linear system.

Kirill
  • 11,438
  • 2
  • 27
  • 51
Mauro Vanzetto
  • 1,340
  • 1
  • 11
  • 21