A number of numerical problems are easy to solve when condition number $\kappa$ of the problem is low. For instance, conjugate gradient descent complexity scales as $O(\sqrt{\kappa})$.
However
- "condition number" is numerically unstable, becoming meaningless for large problems
- a system that's well conditioned for all practical purposes can have an infinite condition number.
Are there alternatives that address these problems? I came across one stable measure which ties performance of gradient descent to the density of eigenvalues around 0.
- has it been described in literature?
- is there an analogue for conjugate gradient descent or other solvers?
For gradient descent applied to solve $Ax=b$ through least-squares minimization, you can show that if squared singular value density $f(x)$ of $A$ diverges as $\sim x^p$ for $x\to 0$, then residual sum of squares (RSS) for $t\to \infty$ steps decays as $\sim \frac{1}{t^{p+2}}$
For precise meaning of $\sim$, see Theorem 2.3 of Coqueret's "probabilistic Laplace Transforms" paper.
For instance, if $A$ is a product of many matrices with IID random entries, this density diverges as $x^{-1}$, you would expect that RSS $h(t)$ to decay as $O(t^{-1})$, which is confirmed by simulations below on a product of 100 random $1000\times 1000$ matrices
How you could show this theoretically:
- Assume step size $\alpha=1$ and $\|A\|=1$ without loss of generality
- This gives RSS after $t/2$ steps as $\sum_i h_i(1-h_i)^{t}$ where $h_i$ is $i$th squared singular value of $A$
- Use bound techniques like here to replace $(1-h_i)^t$ with $\exp(-t h_i)$
- Introduce density $g(x)=x f(x)$ to express RSS as $h(t)=E_g \exp(-t x)$
- $h(-t)$ is the moment-generating function of density $g(x)$, use Tauberian theorems to obtain RSS at $t\to \infty$ from density at $0$
rcondlike estimate) and computing the things like the density of eigen values around zero is probably going to be more difficult and expensive than whatever iterative algorithm you otherwise were planning to apply. Unless you can assume a specific underlying structure for $A$ (like your example) where you can derive something analytical. – Mikael Öhman Aug 21 '23 at 00:30