0

this is my first time asking a question so please forgive me if I do something wrong.

I'm trying to code Strassen's algorithm in matlab and it seems to work, but it is very slow (depending on the cut-off, it can already take over a second for 64x64 matrices).

It's slower than the naive implementation (with 3 loops). Is there something I'm doing horribly wrong? Below is the function, input is already of correct size (2^n)

function [ D ] = ActualStrassen( PaddedA, PaddedB )

   [m n] = size(PaddedA);

   if n < 5
      D = PaddedA*PaddedB;

   else
      A11 = PaddedA( 1:n/2     , 1:n/2    );
      A12 = PaddedA( 1:n/2     , n/2+1:end);
      A21 = PaddedA( n/2+1:end , 1:n/2    );
      A22 = PaddedA( n/2+1:end , n/2+1:end);
      B11 = PaddedB( 1:n/2     , 1:n/2    );
      B12 = PaddedB( 1:n/2     , n/2+1:end);
      B21 = PaddedB( n/2+1:end , 1:n/2   );
      B22 = PaddedB( n/2+1:end , n/2+1:end);

      M1=ActualStrassen( A11 + A22 , B11 + B22);
      M2=ActualStrassen( A21 + A22 , B11      );
      M3=ActualStrassen( A11       , B12 - B22);
      M4=ActualStrassen( A22       , B21 - B11);
      M5=ActualStrassen( A11 + A12 , B22      );
      M6=ActualStrassen( A21 - A11 , B11 + B12);
      M7=ActualStrassen( A12 - A22 , B21 + B22);

      D11= M1 + M4 - M5 + M7;
      D12= M3 + M5;
      D21= M2 + M4;
      D22= M1 - M2 + M3 + M6;

      D = [D11, D12; D21, D22];

   end
end
Hoki
  • 11,264
  • 1
  • 22
  • 42
salicum
  • 1
  • 2
  • 1
    I suggest you count the number of (elemental) additions and multiplications your code actually uses to compute the product of two 64*64 arrays and compare with the number for the 'naive' `O(n^3)` algorithm. Then consider the cost, in time, of all those recursive calls to `ActualStrassen` and the allocation of memory they require. Then read around a bit on SO and look at answers to similar questions, for example http://stackoverflow.com/questions/13559928/why-is-my-strassens-matrix-multiplication-slow?rq=1 That refers to a C++ implementation but much of the accepted answer is relevant for you – High Performance Mark Oct 04 '14 at 23:21
  • So ok, my computer is fast, but `tic; A = rand(64); B = rand(64); D = ActualStrassen(A,B); toc` gives `Elapsed time is 0.052790 seconds.`. We're very far from a second. – Jonathan H Oct 05 '14 at 00:49
  • with matlab, the internal routines are heavily optimized. The number of statements executed is likely to have as much effect as what is done in them. Speed wise, the inbuilt matrix multiply is almost certainly goimg to be the fatest. – camelccc Oct 05 '14 at 10:43

0 Answers0