8

I have the following situation: I have a sequence of vectors $x_1, x_2,.. $ and for each I want to compute the product $Ax_i$ where $A$ is fixed at the outset. Although there is no information about the structure of $x_i$, $A$ typically has a particular pattern where many values are repeated and I would like to compute these products as fast as possible.

One example of $A$ looks like this:

enter image description here

Here the white regions are 0.

I wonder if there is some way of storing information about $A$ or modify it somehow that would allow me to reduce the number of operations for each product. For rows that are all 0 this is trivial – one can just store the row indicies that indicate such rows. It's also possible to store information about which rows are duplicated so as to reuse row computations. I have also considered ordering the rows of the matrix such as to minimize the mean difference between each row and only compute the difference at each row. This seems to run into problems for the more complicated patterns, however.

I was wondering if there is any known methods for these kinds of problems.

Edit: another idea I had is that since the no. of unique values in the matrix is fairly low, one could decompose the product as $Ax = A_1x + A_2x + \dots A_nx$ where $A_i$ contains only one unique value, but I'm still not sure if this can provide any advantage for this problem.

Slug Pue
  • 189
  • 7
  • 1
  • By breaking the matrix into blocks, if a block has all rows the same, you can multiply the block with a particular vector once and reuse that partial dot product for every row in the block. 2) If a block has all columns the same, the optimization by D.W. works. 3) If you can batch process a bunch of vectors, you can do matrix multiplication on the GPU. 4) If you decompose into $A_1, A_2, \ldots$ then you can write each as a constant times a binary matrix, and multiplication by a binary matrix just requires additions so you only need one multiplication per distinct value.
  • –  Mar 15 '18 at 00:56
  • If there are $d$ distinct values in a row of the matrix, and $d$ is much less than the total number of elements in a row (which looks to often be the case here), you only need to use $d$ multiplications to calculate the dot product of that row with the $x$ vector, since multiplication distributes over addition: e.g., $ax_{i1} + bx_{i2} + bx_{i3} + ax_{i4} + ax_{i5} = a(x_{i1} + x_{i4} + x_{i5}) + b(x_{i2} + x_{i3})$. – j_random_hacker Mar 14 '18 at 19:32
  • 2
    I can see some blocks of columns that, in many rows, are identical (e.g., a block of columns that is all-yellow in many rows). For a given vector $x_i$, if you compute the sum of the elements of $x_i$ in that block, then you can use that to speed up things for those rows. – D.W. Mar 14 '18 at 18:44
  • 1
    To get an idea, now are you using Blas? – Mauro Vanzetto Mar 15 '18 at 14:32
  • @MauroVanzetto not using any linalg libraries. Everything is stored as builtin C/C++ vectors/arrays – Slug Pue Mar 15 '18 at 14:45
  • 1
    And now how do you make the product? I try to do a practical consideration. The use, direct or indirect through other library, of Blas permit to use in near optimal mode your hardware (thing very difficult to obtain by a custom matrix vector product). So maybe with the use of Blas you can archive a big speed up with a limited effort. – Mauro Vanzetto Mar 15 '18 at 14:57
  • Why not use Cuthill–McKee or Reverse Cuthill–McKee (RCM) to permute the rows and/or columns and obtain a band matrix? In Python, use reverse_cuthill_mckee. In MATLAB, use symrcm. – Rodrigo de Azevedo Mar 15 '18 at 16:15
  • @MauroVanzetto currently this is done by plain nested looping, going row-major through the matrix – Slug Pue Mar 15 '18 at 16:18