I would like to ask Python to compute the determinant of a large symmetric matrix where all off diagonal entries are known. The diagonal entries could vary. Since I need to compute the determinant many times with different diagonal entries, it seems a waste of time when the computation involves the multiplications of all those known entries over and over again. Is there a way to pre-compute the determinant so that it is a function of diagonal entries with some prefactors?
-
1I doubt there’s a way, unless you know more about the symmetric part. For example, if there’s a way of completing the diagonal elements in a way that makes the symmetric part be a low-rank matrix, you can effectively replace the computation of the determinant by that of computing the determinant of a matrix of size equal to this rank. – Amit Hochman Mar 25 '24 at 15:21
-
1This is not an answer to your question, but a comment: Computing the determinant of large matrices in general is impractical (and often meaningless, too, if the matrix results from approximating infinite-dimensional operators). Update formulas therefore only help you if you can even compute the initial determinant, which as mentioned is often too expensive to be practical. – Wolfgang Bangerth Mar 26 '24 at 15:16
5 Answers
I cannot say whether this will provide economies in computation, but I will bring in known categories of matrices for which, moreover, I may be missing some known property that helps further.
I will use the property ${\rm det}(AB) = {\rm det}(A)\cdot {\rm det}(B)$. We have a symmetric matrix $M$. Then we can decompose it as $$M= D + H,$$ where $D$ is diagonal with the diagonal entries, and $H$ is a symmetric "hollow" matrix, because it has a zero main diagonal. Then $$H = T_L + T_U$$
where $T_L$ is a strictly lower triangular matrix and $T_U$ is a strictly upper triangular matrix -"strictly" because they have a zero diagonal (In passing, I note that $T_L = T_U^T$, and that $T_L-T_U$ and $T_U-T_L$ are skew-symmetric matrices).
Being strictly triangular, they are nilpotent. So (for $I$ the identity matrix), we have $${\rm det}(I + T_L) = {\rm det}(I + T_U) = 1.$$
More over, $D^{-1}T_L$ and $D^{-1}T_U$ are also strictly triangular and so nilpotent.
Going back to $M$ we have
$$M = D + T_L + T_U = D(I + D^{-1}T_L + D^{-1}T_U),$$
so $${\rm det}(M) = {\rm det}(D) \cdot {\rm det}(I + D^{-1}T_L + D^{-1}T_U).$$
Maybe computing each time the determinant of $(I + D^{-1}T_L + D^{-1}T_U)$ is more efficient (it has ones in its diagonal)
--
Also, a further elaboration, possibly helpful (or not), is the following:
Since $D^{-1}T_L$ is nillpotent we have
$$N_L \equiv I + D^{-1}T_L \implies {\rm det}(N_L) = 1,$$
and
$$N_L^{-1} = \sum_{j=0}^{k_L}\left(-D^{-1}T_L\right)^j,$$ where $k_L$ is the index of the nilpotent matrix $D^{-1}T_L$, namely the smallest integer for which $$k_L : (D^{-1}T_L)^{k_L} = 0.$$
So we can write
$${\rm det}(M) = {\rm det}(D) \cdot {\rm det}(I + N_L^{-1}D^{-1}T_U),$$
in case there are routines to compute things related to nilpotent matrices like the inverse of the matrix $N_L$.
- 171
- 3
Your problem is very interesting, and here's my stab at it. Let $D$ be a diagonal matrix, which is unknown and changes, and $A$ be your symmetric, off-diagonal matrix , which is known and does not change. Let us assume that $A$ is invertible. Then, you want to find
$$ \text{det}(A+D) $$
Let there be only two non-zero entries in $D$, namely $d_{11}$ and $d_{22}$. Then,
$$ D = d_1d_1^T + d_2d_2^T $$
where $$ d_1 = \sqrt{d_{11}}e_1 \text{ and } d_{2} = \sqrt{d_{22}}e_2 $$
where the first entry of $e_1$ is $1$ and zero everywhere else, and similarly for $e_2$.
Now, by this answer,
$$ \text{det}(A+d_1d_1^T) = \text{det}(A)(1+d_1^TA^{-1}d_1) $$
Repeating the same procedure we get,
$$ \text{det}(A + d_1d_1^T + d_2d_2^T) = \text{det}(A+d_1d_1^T)(1+d_2^T(A+d_1d_1^T)^{-1}d_2) $$
You can add more rank one updates as necessary.
I assume that you have some means for computing $\text{det}(A)$ efficiently. Then, in order for this procedure to be efficient, we only need to compute $d_1^TA^{-1}d_1$ and $d_2^T(A+d_1d_1^T)^{-1}d_2$ efficiently. Since $d_1 = \sqrt{d_{11}}e_1$, this requires computing $A_{11}^{-1}$ and in the second case $(A+d_1d_1^T)^{-1}_{22}$ efficiently.
This paper, and references therein, might help. The first line is promising: "The focus of this paper is fast algorithms for extracting the diagonal of the inverse of a given matrix"
Or, having computed the diagonal entries of $A^{-1}$, you might want to investigate if you can modify the Sherman-Morrison-Woodbury formula to compute $(A+d_1d_1^T)^{-1}_{22}$ efficiently.
Hope this helps.
- 742
- 1
- 7
- 20
In general, the determinant is a multilinear polynomial of all the diagonal entries, with $2^N$ coefficients. Fully precomputing the determinant as a function of diagonal entries is therefore infeasible.
It's likely that a better solution is only possible if the diagonal entries for which you want to evaluate the determinant have some kind of structure or relation to each other. NNN's answer gives a solution if only one or two entries change between evaluations, using the matrix inverse. Changing all entries iteratively with this method is asymptotically equivalent and realistically slower than a full recomputation of the determinant.
- 151
- 2
I don't think you can use this to speed things up for a general dense symmetric matrix $M$.
One option is to note that the determinant of a matrix is equal to the product of its eigenvalues. Now let $A$ and $B$ be two diagonalizable matrices that commute, then they are simultaneously diagonalizable (https://math.stackexchange.com/questions/1374836/eigenvalues-of-the-sum-of-two-matrices-one-diagonal-and-the-other-not) and you have $$\det(A+B) = \det(P\Lambda_AP^{-1}+P\Lambda_BP^{-1}) = \det(\Lambda_A +\Lambda_B) = \prod_{i=1}^n(\lambda_{A,i}+\lambda_{B,i}).$$
In your case you can split $M$ as $M = D + E$ where $D$ is diagonal. The two matrices are symmetric, hence diagonalizable, but in the general case they do not commute (see https://math.stackexchange.com/a/1698051/463794), so you cannot use this except in trivial cases (e.g. if the diagonal matrix is just a rescaling of the identity).
$\newcommand{\sgn}{\operatorname{sgn}}$
Another option is to directly consider the determinant definition for a matrix $M=A+B$: \begin{align} \det M &= \sum_{\sigma \in S_n} \sgn(\sigma)M_{1,\sigma(1)}\ldots M_{n,\sigma(n)} \\ &= \sum_{\sigma \in S_n} \sgn(\sigma)(A_{1,\sigma(1)}+B_{1,\sigma(1)}) \ldots (A_{n,\sigma(n)}+B_{n,\sigma(n)}) \\ &= \sum_{\sigma \in S_n} \sgn(\sigma) \sum_{r=0}^n \sum_{\alpha \in \mathcal{C}^n_r} A_{\alpha_1,\sigma(\alpha_1)}\ldots A_{\alpha_r,\sigma(\alpha_r)} B_{\alpha^c_1,\sigma(\alpha^c_1)} \ldots B_{\alpha^c_{n-r},\sigma(\alpha^c_{n-r})} \end{align}
In the above $\mathcal{C}^n_r = \{\alpha\in\{1,\ldots,n\}^r\,:\, \alpha_i<\alpha_j, \, i<j\}$ is the set of vectors with increasing integer indices in $\{1,\ldots,n\}$ of dimension $r$. Since we can choose $r$ elements out of $n$ in ${n \choose r}$ ways this is also the size of this set. We sum in $0\leq r \leq n$ then the number of summands in $\sum_{r=0}^n\sum_{\alpha}$ is $2^n$ since $$2^n = (1+1)^n = \sum_{r=0}^n {n \choose r},$$ even without accounting for the sum over $S_n$. The fact that $A$ is diagonal means that only terms $A_{\alpha_{i},\alpha_i}$ survive, which decreases the number of of permutations we need to consider in the sum over $S_n$ from $n!$ to $(n-r)!$, namely $$\det[A+B] = \sum_{r=0}^n \sum_{\alpha\in\mathcal{C}^n_{r}} \sum_{\sigma\in S_{n-r}} \sgn(\sigma)A_{\alpha_1,\alpha_1}\ldots A_{\alpha_r,\alpha_r}B_{\alpha^c_1,\sigma(\alpha^c_1)}\ldots B_{\alpha^c_{n-r}, \sigma(\alpha^c_{n-r})}.$$
This amounts to $$\sum_{r=0}^n{n \choose r}(n-r)! = \sum_{r=0}^n \frac{n!}{r!} = \sum_{r=0}^n n(n-1)\ldots (n-r+1)> n!$$ operations, which is useless since the determinant computation from scratch requires $O(n!)$, e.g. see https://www.wolframalpha.com/input?i=sum_%7Br%3D0%7D%5En+n%21%2Fr%21. In the best case scenario, where $B$ is also diagonal you still get $2^n$ operations and terms that you need to store, which is again prohibitive. Of course if $A$ and $B$ were just diagonal one could directly compute $\prod_{i=1}^n (A_{ii}+B_{ii})$ which is of complexity $O(n)$ (but that is not your case), it's just that when you open up the parentheses this becomes highly unpleasant. In either case the above expansion seems to be only of theoretical significance for a double digit $n$. You could argue that sparse matrices will also cancel out a lot of terms, but even there you still have the bound of $2^n$ for diagonal matrices, so things are not nice for a practical algorithm. I think my above derivation is equivalent to the formula here https://math.stackexchange.com/a/2679211/463794.
What looks to be an alternative (and probably better) approach is also available here: https://math.stackexchange.com/a/3758359/463794 and https://maa.org/sites/default/files/0746834207570.di020741.02p0009e.pdf
$$\det(A + B) = \sum_{r=0}^n\sum_{\alpha\in\mathcal{C}_{r,n}}\sum_{\beta\in\mathcal{C}_{r,n}}(-1)^{s(\alpha)+s(\beta)} \det(A[\alpha, \beta]) \det(B[\alpha^c, \beta^c]),$$
which for a diagonal matrix $A$ simplifies to:
$$\det(A + B) = \sum_{r=0}^n\sum_{\alpha\in\mathcal{C}_{r,n}}\det(A[\alpha, \alpha]) \det(B[\alpha^c, \alpha^c]).$$
The scaling is again fairly bad, even if you assume that you have precomputed the principal submatrix determinants of $B$ (of which there are $2^n$ mind you, and there are some hidden terms in the determinants of the matrices computations). My guess is that practically relevant methods estimate the determinant as the product of the eigenvalues, because that seems like a much more efficient way to go about it. If you use some Krylov subspace method to estimate the eigenvalues and the shift $D$ is not too large maybe you could reuse the Krylov subspaces or guesses from previous iterations.
This paper "Determinant Approximation" by Ipsen and Lee seems to be more practically relevant than the above. It proposes a method for approximating the determinant and compares to other methods so you can go through the references and see what you get.
- 2,082
- 8
- 14
Depending on which Algorithm you intend to use to calculate your determinant, you may have a couple of options from a purely computational perspective.
The first option would be rather difficult to implement correctly:
- You could use Constant Folding to generate a function that calculates the determinant based on your input diagonal.
- This would be rather cutting edge code generation: you would have to take the AST of your function, apply your A+M matrices as inputs to your function as if they were a set of constants and variables tied to a matrix, perform some form of loop unrolling and then constant propagation where constant lookups to constant matrix values are replaced with the constant values themselves.
- For example if you have an operation
A[15,1] + A[16,1]whereA[15,1]=2andA[16,1]=24then you'd end up with the expression2 + 24which then further optimizes to26. However, if your operation isA[15,15] + A[16,15]andA[16,15]=32then your expression is folded toA[15,15] + 32. - In theory your result would be a function whose constant operations have been rolled up whenever possible, leaving open operations that only involve your diagonal input variables.
- One benefit of this is that you can see how complex your output function is and if the optimization is worth it.
- This is essentially writing a customized compiler for your operation, hence the complexity and difficulty in implementing this right.
The second option is far simpler but has some performance and complexity costs which may or may not be worth the effort:
- You can adapt your program to memoize all functions in your algorithm.
- That way already-calculated intermediate values can simply be reused.
- As a bonus, if two diagonals are similar, some of the memoized output can be reused.
But there are several drawbacks to this approach:
- There is a cost to memoizing values: either increased memory expenditure if the values are stored in RAM, or decreased performance if values are stored to disk. Looking up memoized values in an index also incurrs a performance cost.
- It is difficult to quantify how efficient memoization is unless statistics are logged on each memoized operation.
- Finding a match in your index can be tricky if floating point values are involved. Approximate matches may be preferable when exact matches may produce redundant calculations.
This leads to a unique set of complexities:
- Your choice of algorithm would determine which functions to memoize.
- A "naive" approach would mean that every operation in your algorithm would need to be memoized. But doing this on every addition and multiplication would likely incur unnecessary overhead.
- Instead, each major function should be memoized. For example, if a decomposition operation breaks the original matrix into smaller matrices, these smaller pieces may have been previously calculated and yield known results.
- This means that an entire matrix would act as the lookup key for your memoized values. An efficient algorithm would be required to efficiently find matching matrices in a database.
Both approaches require complicated implementations. Thus, validating the correctness of your optimizations would also be tricky. Calculations that may look correct on a sample of inputs may yield entirely incorrect results if there are bugs in the code; but such bugs may hide if the input samples don't touch the optimization in question. This is more likely to be encountered in the Memoization solution.
Now, if you are planning to use GPUs in your calculations it may be difficult to take advantage of the above optimizations.
I hope this helps even in some small way!
- 131
- 2