Least-squares for a diagonal matrix

Question

This is a follow-up to a different question I asked with more detail.

For $v\in\mathbb{R}^n$, denote $D_v\in\mathbb{R}^n$ as the diagonal matrix with elements in $v$. Given a "tall" matrix $B\in\mathbb{R}^{n\times m}$, I would like to solve the following optimization problem: $$\min_{v\in\mathbb{R}^n} \|X-B^\top D_v B\|_{\mathrm{Fro}}^2$$ Assuming I calculcated it properly, first-order optimality gives the linear system $(BB^\top\circ BB^\top)v=(BX\circ B)\mathbb{1}$, where $\circ$ denotes the elementwise (Hadamard) product and $\mathbb{1}\in\mathbb{R}^n$ is the vector of all ones. I have checked that this system is invertible for my application.

The problem is, the matrix $BB^\top\circ BB^\top$ is very large relative to the size of $B$. I can afford to take the SVD of $B$ (and that of $X$) but not to construct this large, dense matrix.

Is there anything I can do to solve this system directly without resorting to an iterative solver? If I have to do it iteratively, what is the fastest iteration for systems that come from this least-squares problem?

Is the invertibility of $BB^T \circ BB^T$ guaranteed? I tried making up some random $B$'s in matlab with rank $r$, and the rank of the Hadamard matrix seems to go like $r+1 \text{ -choose- }2$. — Nick Alger, Sep 28 '14 at 20:41

Richard Zhang · Accepted Answer · 2014-11-08T02:28:44.213

The closed-form solution is available by projecting the problem into the space spanned by B. To see this, note that we have

$$\min_{v\in\mathbb{R}^{n}}\|X-BD_{v}B^{T}\|_{F},$$ but if we introduce $\tilde{X}$ such that $X=B\tilde{X}B^{T}$, then the optimization is reduced $$\arg\min_{v\in\mathbb{R}^{n}}\|B\tilde{X}B^{T}-BD_{v}B^{T}\|_{F}\equiv\arg\min_{v\in\mathbb{R}^{n}}\|\tilde{X}-D_{v}\|_{F},$$ and clearly the best we can do is to set $v=\mathrm{diag}(\tilde{X})$, where the “diag” operator isolates the diagonal of its input matrix.

Now, to obtain $\tilde{X}$, assuming that $B$ has full column rank, is a simple matter of performing a projection onto the range space of $B$, $$\tilde{X}=(B^{T}B)^{-1}B^{T}XB(B^{T}B)^{-1}.$$ Proof that $X=B\tilde{X}B^{T}$ is obtained by substitution (check this). If $B^TB$ is singular then use the pseudo-inverse. Combined, we have proved the following result.

Theorem 1. The optimization has the following closed form solution

$$v=\mathrm{diag}[(B^{T}B)^{-1}B^{T}XB(B^{T}B)^{-1}].$$

Finally, it's worth noting that you shouldn't be forming the matrix explicitly if computational load is a concern. Instead, you should take a cholesky factorization of $X$, perform one size of the matrix-matrix product, and then implicitly compute the diagonal.

Least-squares for a diagonal matrix

1 Answers1

Linked