Can the solution of a linear system of equations be approximated for only the first few variables?

Question

I have a linear system of equations of size mxm, where m is large. However, the variables that I'm interested in are just the first n variables (n is small compared to m). Is there a way I can approximate the solution for the first m values without having to solve the entire system? If so, would this approximation be faster than solving the full linear system?

Not unless your forcing function is also restricted to the first n variables. If it is, you can form the Schur complement, though it is likely dense. If your original operator is sparse, it might not be worth it. — Jack Poulson, Feb 06 '12 at 23:43
I suppose that you could use gaussian elimination starting from the lower right corner of the matrix. This would be ~2x as faster than regular gaussian elimination if you only care about the first few elements and stop halfway through. I don't know how it would compare to iterative methods. — Dan, Feb 07 '12 at 02:10
@Dan: I'm not quite sure I understand the approach you propose... could you elaborate? — Paul, Feb 07 '12 at 02:16
I'll put it another way: reverse the order of the variables so you're looking for the last variables instead of the first. Then, use Gaussian elimination to put the matrix into row-echelon form, but not reduced row-echelon form. Then, only complete backsubstitution for the unknowns you care about. — Dan, Feb 07 '12 at 04:03
Uhm, I guess that solving for only a (very) small subset of the variables is one of the few cases where [http://en.wikipedia.org/wiki/Cramer%27s_rule ] might be of some use. I hear it can be implemented reasonably efficiently on parallel systems. — OscarB, Feb 07 '12 at 08:48
@Dan: You are suggesting that he form the Schur complement system, which I described above. — Jack Poulson, Feb 07 '12 at 16:05
@OscarB: Please no. Cramer's rule is an atrocity in floating point arithmetic. I have never heard of it being used for serious calculations, and it takes a decent amount of thought to avoid factorial complexity, where it is still not competitive with Gaussian elimination. — Jack Poulson, Feb 07 '12 at 16:06
@Paul: Could you clarify your question? When you say you have a linear system of equations of size $n \times n$ with $n$ large, I'm assuming that you care about the first $k$ variables, with $k < n$. — Geoff Oxberry, Feb 08 '12 at 03:35
@Paul: Most model order reduction is used in the context of large ODE or DAE systems. Sometimes, the reduction methodologies are motivated by ODE or DAE systems that arise from the discretization of PDEs. I haven't seen model reduction used on purely algebraic equations. (If you have, please send me references, because I'm doing my thesis on model reduction methods and would be very interested to see it.) If you'd like, I could sketch out what model reduction might look like if we treat algebraic equations as a degenerate case of a differential-algebraic equation system. — Geoff Oxberry, Feb 08 '12 at 04:37
@GeoffOxberry: Actually, I am very interested in model order reduction too... perhaps a degenerate case might shed some light in both respects (linear systems and MOR). — Paul, Feb 08 '12 at 04:48
@JackPoulson - do you mind summarizing your comment as an answer? I think it's the most correct solution and I don't want it lost in the comments. — Aron Ahmadia, Feb 11 '12 at 11:56

score 17 · Answer 1 · answered Feb 11 '12 at 16:55

Forming the Schur complement

Suppose that you have permuted and partitioned your matrix into the form

$$A=\left(\begin{array}{cc}A_{11} & A_{12} \\ A_{21} & A_{22}\end{array}\right),$$

such that $A_{22}$ contains your degrees of freedom of interest and is much smaller than $A_{11}$, then one can form the Schur complement

$$ S_{22} := A_{22} - A_{21} A_{11}^{-1} A_{12},$$

either through a partial right-looking LU factorization or the explicit formula, and then $S_{22}$ can be understood in the following sense:

$$S_{22} x = y \;\;\rightarrow\;\; \left(\begin{array}{cc}A_{11} & A_{12}\\ A_{21} & A_{22}\end{array}\right) \left(\begin{array}{c}\star\\ x\end{array}\right)=\left(\begin{array}{c}0\\ y\end{array}\right),$$

where $\star$ represents the 'uninteresting' portion of the solution. Thus, provided a right-hand side which is only nonzero in the degrees of freedom of the Schur complement $S_{22}$, we need only solve against $S_{22}$ in order to get the portion of the solution corresponding to those degrees of freedom.

Computational complexity in unstructured dense case

Setting $N$ to the height of $A$ and $n$ to the height of $A_{22}$, then the standard method for computing $S_{22}$ is to first factor $L_{11} U_{11} := A_{11}$ (let's ignore pivoting for now) in roughly $2/3 (N-n)^3$ work, then to form

$$ S_{22} := A_{22} - (A_{21} U_{11}^{-1})(L_{11}^{-1} A_{12}) = A_{22} - A_{21} A_{11}^{-1} A_{12}$$

using two triangle solves requiring $n(N-n)^2$ work each, and then performing the update to $A_{22}$ in $2n^2 (N-n)$ work.

Thus, the total work is roughly $2/3 (N-n)^3 + 2n(N-n)^2 + 2n^2 (N-n)$. When $n$ is very small, $N-n \approx N$, so the cost can be seen to be roughly $2/3 N^3$, which is the cost of a full factorization.

The benefit is that, if there is a very large number of right-hand sides to be solved with the same system of equations, then $S_{22}$ could potentially be reused a large number of times, where each solve would only require $2n^2$ work (rather than $2N^2$ work) if $S_{22}$ is factored.

Computational complexity in the (typical) sparse case

If your sparse system arose from some type of finite-difference or finite-element approximation, then sparse-direct solvers will almost certainly be able to exploit some of the structure; 2d systems can be solved with $O(N^{3/2})$ work and $O(N \log N)$ storage, while 3d systems can be solved with $O(N^2)$ work and $O(N^{4/3})$ storage. The factored systems can then be solved with the same amount of work as the storage requirements.

The point of bringing up the computational complexities is that, if $n \approx \sqrt{N}$ and you have a 2d system, then since the Schur complement will likely be dense, the solve complexity given the factored Schur complement will be $O(n^2) = O(N)$, which is only missing a logarithmic factor versus solving the full system! In 3d, it requires $O(N)$ work instead of $O(N^{4/3})$.

It is thus important to keep in mind that, in your case where $n=\sqrt{N}$, there will only be significant savings if you're working in several dimensions and have many right-hand sides to solve.

This is a great summary of the schur complement method and when it is computationally efficient to use it! — Paul, May 18 '12 at 13:54

score 13 · Accepted Answer · edited Mar 04 '12 at 15:07

13

As others have pointed out, this is difficult to do with a direct solver. That said, it isn't that hard to do with iterative solvers. To this end, note that most iterative solvers in one way or another minimize the error with respect to some norm. Oftentimes, this norm is either induced by the matrix itself, but sometimes it is also just the l2 vector norm. But that doesn't have to be the case: you can choose which norm you want to minimize the error (or residual) in, and you could, for example, choose a norm in which you weigh the components you care about with 1 and all others with 1e-12, i.e. for example something like $|| x ||^2 = \sum_{i=1}^5 x_i^2 + $(1e-24)$\sum_{i=6}^N x_i^2$ and corresponding scalar product. Then write all steps of the iterative solver with respect to this norm and scalar product, and you get an iterative solver that pays significantly more attention to the vector elements you care about than to the others.

The question of course is whether you need fewer iterations than with the norm/scalar product that weighs all components equally. But that should indeed be the case: let's say you only care about the five first vector elements. Then you should need at most five iterations to reduce the error by a factor of 1e12 since five iterations is what is needed for the 5x5 system that describes them. That's not a proof but I'm pretty certain that you should indeed get away with a far smaller number of iterations if the weight in the norm (1e-12 above) is smaller than the tolerance with which you want to solve the linear system iteratively.

edited Mar 04 '12 at 15:07

Paul

12,045
7
56
129

answered Feb 11 '12 at 21:30

Wolfgang Bangerth

55,373
59
119

2

Hmm, good point. I would be interested in seeing a real example, as I somewhat worry about the effects of only attempting to resolve a few degrees of freedom; even though the residual might be small, perhaps the norm of the error is still quite large (do to effectively ignoring most of the operator). – Jack Poulson Feb 11 '12 at 21:42
Intuitively, this only seems to work if the components of the very small system truly dominate the answer in an L2 (or the norm you understand your error to be measured in) sense. Otherwise, I think Jack's concern is valid, but I'd definitely be interested in even seeing a numerical proof of this... – Aron Ahmadia Feb 12 '12 at 08:09
One would have to make sure you take a method that minimizes the error, not the residual. I think MinErr might be a good starting point. – Wolfgang Bangerth Feb 12 '12 at 17:10
@WolfgangBangerth: I am not familiar with MINERR: is this the main reference? – Jack Poulson Feb 13 '12 at 03:03
Aw, I suppose I should know but I don't. MINERR was just something I seemed to recall, but I can't tell anymore where I heard about it... – Wolfgang Bangerth Feb 15 '12 at 14:25
1

Even that is not enough, because you will be inaccurate. You cannot get a few components accurately using this weighting. – Matt Knepley Mar 03 '12 at 18:29
I agree with Matt, this will not work at all for most problems because your weighting completely misses the low energy modes which account for most of the error. To see this, consider a long elastic beam clamped at one end. You can't sample the displacement at the free end of the beam without getting the global deformation correct. A locally weighted residual would only get the local deformation correct. Depending on the problem, you might be able to use a multilevel/adaptive method to accurately resolve the long-range modes, but only resolve the local modes in the region of interest. – Jed Brown Mar 04 '12 at 15:34
More precisely, with all methods I am familiar with, you only have two choices. In the first, you minimize an operator norm of the error (the $A$-norm for CG, the $A^TA$ norm for GMRES) in which you can scale the operator to target some part you are interested in at the expense of making the norm weaker. This may "converge" faster, but since the norm is weaker, the solution will be less accurate at a given tolerance. This is the effect you see when implementing Dirichlet boundary conditions with penalties and then using unpreconditioned residual. – Jed Brown Mar 04 '12 at 16:26
With a preconditioned residual, the scaling does not affect convergence or approximation space, so it does not change the approximation or the convergence rate. Alternatively, you work with an error-minimizing method like SYMMLQ in which case your scaling only affects the approximation space (which, again, you probably undo by preconditioning), and have no control over convergence on those selected modes. @JackPoulson, the paper you cite that uses the term MINERR refers to Szyld & Widlund 1993 for the method, but it does not use that name. – Jed Brown Mar 04 '12 at 16:33

score 6 · Answer 3 · edited Jun 17 '20 at 09:48

The model reduction approach

Since Paul asked, I'll talk about what happens if you use projection-based model reduction methods on this problem. Suppose that you could come up with a projector $\mathbf{P}$ such that the range of $\mathbf{P}$, denoted $\mathcal{R}(\mathbf{P})$, contains the solution to your linear system $\mathbf{Ax} = \mathbf{b}$, and has dimension $k$, where $k$ is the number of unknowns for which you wish to solve in a linear system.

A singular value decomposition of $\mathbf{P}$ will yield the following partitioned matrix:

$$\mathbf{P} = \left[ \begin{array}{cc}\mathbf{V} & * \end{array} \right]\left[\begin{array}{cc}\mathrm{diag}(\mathbf{1}_{k}) & \mathbf{0} \\ \mathbf{0} & \mathbf{0}\end{array}\right]\left[\begin{array}{c} \mathbf{W}^{T} \\ *\end{array}\right].$$

The matrices obscured by stars matter for other things (like estimating error, etc.), but for now, we'll avoid dealing with extraneous details. It follows that

$$\mathbf{P} = \mathbf{VW}^{T}$$

is a full rank decomposition of $\mathbf{P}$.

Essentially, you'll solve the system

$$\mathbf{PAx} = \mathbf{Pb}$$

in a clever way, because $\mathbf{V}$ and $\mathbf{W}$ also have the property that $\mathbf{W}^{T}\mathbf{V} = \mathbf{I}$. Multiplying both sides of $\mathbf{PAx} = \mathbf{Pb}$ by $\mathbf{W}^{T}$ and letting $\mathbf{y} = \mathbf{V}\widehat{\mathbf{x}}$ be an approximation for $\mathbf{x}$ yields

$$\mathbf{W}^{T}\mathbf{A}\widehat{\mathbf{x}} = \mathbf{W}^{T}\mathbf{b}.$$

Solve for $\widehat{\mathbf{x}}$, premultiply it by $\mathbf{V}$, and you have $\mathbf{y}$, your approximation for $\mathbf{x}$.

Why the Schur complement approach is probably better

For starters, you have to pick $\mathbf{P}$ somehow. If the solution to $\mathbf{Ax} = \mathbf{b}$ is in $\mathcal{R}(\mathbf{P})$, then $\mathbf{y} = \mathbf{x}$, and $\mathbf{y}$ isn't an approximation. Otherwise, $\mathbf{y} \neq \mathbf{x}$, and you introduce some approximation error. This approach doesn't really leverage at all the structure you mentioned wanting to exploit. If we pick $\mathbf{P}$ such that its range is the standard unit basis in the coordinates of $\mathbf{x}$ you want to calculate, the corresponding coordinates of $\mathbf{y}$ will have errors in them. It's not clear how you'd want to pick $\mathbf{P}$. You could use an SVD of $\mathbf{A}$, for instance, and select $\mathbf{P}$ to be the product of the first $k$ left singular vectors of $\mathbf{A}$ and the adjoint of the first $k$ right singular vectors of $\mathbf{A}$, assuming that singular vectors are arranged in decreasing order of singular value. This choice of projector would be equivalent to performing proper orthogonal decomposition on $\mathbf{A}$, and it would minimize the L$_{2}$-error in the approximate solution.

In addition to introducing approximation errors, this approach also introduces three extra matrix multiplies on top of the linear solve of the smaller system and the work needed to calculate $\mathbf{V}$, and $\mathbf{W}$. Unless you're solving the same linear system a lot, only changing the right hand side, and $\mathbf{P}$ is still a "good" projection matrix for all of those systems, those extra costs will probably make solving the reduced system more expensive than solving your original system.

The drawbacks are much like JackPoulson's approach, except that you're not quite leveraging the structure that you mentioned.

score 4 · Answer 4 · edited Feb 11 '12 at 17:17

4

The long answer is...sort of.

You can re-arrange your system of equations such that the farthest right $k$ columns are the variables which you wish to solve for.

Step 1: Perform Gaussian Elimination so that the matrix is upper triangular. Step 2: solve by back substitution for only the first (last) $k$ variables which you are interested in

This will save you the computational complexity of having to solve for the last $n-k$ variables via back-substitution, which could be worth it if $n$ is as large as you say. Keep in mind that a fair amount of work will still have to be done for step 1.

Also, keep in mind that restricting the order in which you are going to perform back-substituion may restrict the form of the matrix (it takes away the ability to exchange columns) which could possibly lead to an ill conditioned system, but I am not sure about that - just something to keep in mind.

edited Feb 11 '12 at 17:17

Geoff Oxberry

30,394
9
64
127

answered Feb 08 '12 at 03:19

drjrm3

2,139
2
19
22

Gaussian elimination requires $O(n^3)$ work, but backward substitution only requires $O(n^2)$. Thus, as $n$ grows larger, the percentage of time spent in the triangle solve becomes vanishingly small. – Jack Poulson Feb 08 '12 at 03:35
which is why the answer is "sort of" instead of "yes" =) – drjrm3 Feb 08 '12 at 13:08
It makes sense that it can be done this way... However, the bulk of the computation in a Gaussian Elimination is in the forward elimination phase, yielding an O(n^3) complexity despite the truncated backward substitution phase. I was hoping there was a faster method... – Paul Feb 08 '12 at 20:16