I actually wrote the original code in Matlab for A*B, both A and B sparse. Pre-allocation of space for the result was indeed the interesting part. We observed what Godric points out -- that knowing the number of nonzeros in AB is as costly as computing AB.
We did the initial implementaion of sparse Matlab around 1990, before the Edith Cohen paper that gave the first practical, fast way to estimate the size of AB accurately.
We put together an inferior size estimator, and if we ran out of space in mid-computation, doubled the allocation and copied the partially computed result.
I don't know what's in Matlab now.
Another possibility would be to compute AB one column at a time. Each column can be stored temporarily in a sparse accumulator (see the sparse Matlab paper for an explanation of these), and space allocated to hold the exactly known size of the result column. The result would be in scattered compressed sparse column form -- each column in CSC but no intercolumn contiguity -- using 2 vectors of length numcols (col start, col length), rather than one, as meta-data. Its a storage form that may be worth a look;
it has another strength -- you can grow a column without reallocating the whole matrix.
nnz(A*B)in advance? Is it not feasible to just start with a rough estimate and thenreallocwhile computing the matrix product, if necessary? A model implementation may be found in the 2nd chapter of the sparse backslash book by Tim Davis. – Stefano M Jul 17 '12 at 20:29reallocmethod but my GPU has limited memory and even though dynamic memory allocation is possible in CUDA,one must allocate the chunk of memory in advance to usemallocandfreein GPU. Further, I dont want to use page locked memory (** that might incur extra PCI-EX transfers *) which can potentially slow down the computations. As one of the answers has already mentioned,I am thinking of launching a kernel which will simulate `(AB)for just for the sake of calculatingnnz(A*B)` and then use it allocate the exact memory for resultant matrix. – Recker Jul 17 '12 at 20:40A*Bis the way to go... not only for allocating memory, but also for an efficient implementation of the product itself. Good luck. – Stefano M Jul 17 '12 at 21:11