7

I need to do a large number of matrix operations in Matlab, but one of the matrices is 4-D and my normal instincts for correct vectorization are failing me. Right now I'm using a loop. Maybe somebody can fix my dumb brain?

for k=1:kNum
   for l=1:lNum
      L(k,l,:) = squeeze(M(k,l,:,:))\squeeze(P(k,l,:));
   end
end
Colin K
  • 285
  • 1
  • 9
  • I wasn't sure if this should go to SO, but I checked the FAQ and it left me feeling that this is on topic here. If not I apologize. Feel free to migrate it. – Colin K May 05 '12 at 22:01
  • 1
    It's on topic, don't worry about it :) – Aron Ahmadia May 06 '12 at 06:28
  • Just to be clear, you're doing a backsolve in each case? How big are the two trailing dimensions of M? I'm not sure vectorization is going to help you here... – Aron Ahmadia May 06 '12 at 15:49
  • @aron: Yes, exactly. The first two dimensions are purely organizational, rather than mathematical. The trailing dimensions of M are something like [100 7]. I see what you mean about vectorization not helping, since each solve is unrelated; but in my experience, if you can get Matlab to do the looping "under the hood" it is usually much faster. – Colin K May 06 '12 at 16:38
  • In case it is at all enlightening, I will mention: each solve is operating on a time series of values of one pixel in an image (actually, its fourier transform). That operation is parameterized by the position of that pixel. – Colin K May 06 '12 at 16:47

1 Answers1

8

I don't think that it is possible to vectorize this expression very much. Generally, vectorization is used with assignments to vectors along with functions which output seperate results for each item, for instance

for i = 1:N
   A(i) = sin(i)
end

is equal to

i = 1:N
A(i) = sin(i)

because sin does elementwise sine of the elements. When solving linear equation sets using mldivide/"backslash operator" the solution depends on all the elements in the vector - you can't just do element wise mldivide unless the matrix has special properties.

For optimizing this loop, you could consider using parfor-loops (the looping variables are normal numbers incremented by one, so it should be as simple as doing "parfor" instead of "for). This requires the parallel computing toolbox. You could also unroll the loops to get better performance, if the numbering of the indices is somewhat predictible.

Or, you could try to write an .mex-file doing the same operation. In C/Fortran looping is as fast as / faster than MATLAB vectorization.

Another thing to consider is reordering the memory access: Since matrices are just long arrays accessed using a [i + j*dim1 + k*dim1*dim2] pattern it is important to ensure that the inner loop corresponds to the memory which is stored sequentially. For 2D matrices, at least, doing

for k=1:kNum
   for l=1:lNum
       L(k,l,:) = squeeze(M(k,l,:,:))\squeeze(P(k,l,:));

is slower than

for l= 1:lNum
   for k=1:kNum
       L(k,l,:) = squeeze(M(k,l,:,:))\squeeze(P(k,l,:));

since the elements corresponding to the k-index are stored sequentially. Mathworks has some information on it, but it is not really something specific to MATLAB per se, any somewhat advanced text on arrays for C or FORTRAN should go into more details. There is also some cache stuff going on under the hood here.

edit: While not useful here as the right hand side and left hand side is different in each solution, if you are doing something like

for i = 1:N
    solution(:,i) = A\b(:,i)
end

you might as well do

solution = A\b
moyner
  • 824
  • 5
  • 12
  • 1
    Maybe you have a different idea of what vectorization means than do I. I realize that each solve involves all the elements of the vector. But imagine that I wanted to multiply a bunch of vectors, $b_i$, by a matrix, A. I could put all my vectors in a matrix B, and then just do AB. This is a lot faster than looping over my vectors and doing $Ab_i$ in a loop, even though each vector-matrix multiply is unrelated to the others. I am looking for a way to extend this to higher dimension. – Colin K May 06 '12 at 21:34
  • Good tip though about the sequential access. +1 :) – Colin K May 06 '12 at 21:35
  • I understand what you're getting at, and I have updated my answer for a similar case. You could do some optimizations whenever you want to solve a linear system for many different right hand sides, but for your specific problem I don't see any more obvious optimizations. – moyner May 07 '12 at 08:32