7

Define the element-wise thresholding operator $T_\tau(\cdot)$ with threshold $\tau$ as $$ [T_\tau(X)]_{i,j} = \begin{cases} X_{i,j} &\mbox{if } |X_{i,j}| \ge \tau, \\ 0 & \mbox{if } |X_{i,j}| < \tau. \end{cases} $$ Clearly, $T_\tau(X)$ can be computed in quadratic $O(n^2)$ time for some $n\times n$ matrix $X$.

Question: Suppose that the thresholded matrix $T_\tau(X)$ contains only $O(n)$ nonzero elements. Is it possible to compute $T_\tau(X)=T_\tau(UV^T)$ in linear $O(n)$ time (on a serial computer) from a low-rank factorization $U,V\in\mathbb{R}^{n\times r}$ of $X=UV^T$? (Take the rank $r$ to be an absolute constant, i.e. $r=O(1)$)


Some insights: Let the rows of $U$ and $V$ be written as $u_{1},\ldots,u_{n}$ and $v_{1},\ldots,v_{n}$, as in $$ U=\begin{bmatrix}u_{1}^{T}\\ u_{2}^{T}\\ \vdots\\ u_{n}^{T} \end{bmatrix},\qquad V=\begin{bmatrix}v_{1}^{T}\\ v_{2}^{T}\\ \vdots\\ v_{n}^{T} \end{bmatrix}. $$ Then we have $X_{i,j}=u_{i}^{T}v_{j}$. By Hölder's inequality, we have $$ |u_{i}^{T}v_{j}|\le\|u_{i}\|_{p}\|v_{j}\|_{q}\qquad\frac{1}{p}+\frac{1}{q}=1. $$ So it seems like we can just compute the $p$-norm of all rows of $U$ and $V$ in $O(n)$ time, and threshold the rows directly. However, this approach seems to be extremely conservative.

Richard Zhang
  • 2,485
  • 15
  • 26
  • Does $T_\tau(X)$ has one non-zero value per row/column due to some special structure? – Tolga Birdal Jan 24 '18 at 07:41
  • @TolgaBirdal in practice, yes, because our choice of $X$ tends to be diagonally dominant. However, let's ignore this structure for now and ask if the low-rank property has any use at all. – Richard Zhang Jan 24 '18 at 18:01

1 Answers1

1

Some improvement to the conservative approach that is proposed in the question. I am not sure if you already implicitly took this into account while describing direct filtering of rows using Hölder's inequality.


Sorry for the change of notations ($U\mapsto A$ and $V\mapsto B$) for the low-rank representation, but it makes sense to use $U$ and $V$ for the other things in the following derivations. They would make use of the low-rank property and the low-rank representation of the original matrix, in particular.

For a matrix $X=AB^T$ (represented by a low-rank factorization), one can compute its SVD (economic form) in $\mathcal O\left((n+m)r^2+r^3\right)$ operations. Here, $X\in\mathbb F^{m\times n}$, of rank $r$.

  1. Compute QR decompositions of $A=Q_AR_A$ and $B=Q_BR_B$ in $\mathcal O(mr^2)$ and $\mathcal O(nr^2)$ operations, respectively.
  2. Compute matrix $P=R_AR_B^T$ and its SVD $P=U_P\Sigma V_P^T$ in $\mathcal O(r^3)$ operations since $P,R_A,R_B\in \mathbb F^{r\times r}$. Notice, that the diagonal matrix of singular values $\Sigma$ does not only correspond to $P$, but to the original matrix $X$, as well.
  3. Compute matrices $U=Q_AU_P$ and $V=Q_BV_P$ in $\mathcal O(mr^2)$ and $\mathcal O(nr^2)$ operations, respectively.

Now, we got an SVD decomposition of the original matrix $X=U\Sigma V^T$ since $$ X=AB^T=Q_AR_A(Q_BR_B)^T=\underbrace{Q_AR_A}_A\underbrace{R_B^TQ_B^T}_{B^T}=\underbrace{Q_AU_P}_{U}\Sigma \underbrace{V_PQ_B^T}_{V^T}=U\Sigma V^T $$

Now, we can use Hölder's inequality on

  • $A$ and $B$ (original proposal in the question)
  • $U\Sigma$ and $V^T$
  • $U$ and $\Sigma V^T$
  • $U\sqrt{\Sigma}$ and $\sqrt{\Sigma}V^T$ (which is the most balanced)

This will allow to improve the estimates and filter our more things, especially if $A$ and $B$ are imbalanced. Technically, $r$ different checks can be made without ruining the required complexity; however, I don't think that would make much impact in practice.

Anton Menshov
  • 8,672
  • 7
  • 38
  • 94