Is there any catch on using `zgemm3m` vs regular `zgemm`?

Question

I've just (to my embarrassment) encountered a BLAS-like extension of a matrix-matrix product subroutine gemm in Intel MKL: gemm3m. This subroutine (particular versions: cgemm3m and zgemm3m) allows performing matrix-matrix multiplication for complex-valued matrices using fewer arithmetic operations.

The gemm3m documentation claims that it

...reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for large matrices.

Looking at the provided error analysis in the Application Notes, I don't see anything "criminal": $$ \hat{C}=\text{fl}(C_1+iC_2)=\text{fl}\big((A_1+iA_2)(B_1+iB_2)\big)=\hat{C}_1+i\hat{C}_2 $$ $$ ||\hat{C}_1-C_1||\leq 2(n+1)u||A||_\infty||B||_\infty+\mathcal O(u^2)\\ ||\hat{C}_2-C_2||\leq 4(n+4)u||A||_\infty||B||_\infty+\mathcal O(u^2) $$ where $A,B,C\in\mathbb C^{n\times n}$ are complex matrices, $A_{1,2},B_{1,2},C_{1,2}\in\mathbb R^{n\times n}$ are their real and imaginary parts, respectively, $i=\sqrt{-1}$; $\hat{C}\in\mathbb C^{n\times n}$ and $\hat{C}_{1,2}\in\mathbb R^{n\times n}$ are the result of floating-point operations on $A$ and $B$ accoring to the gemm3m matrix-matrix multiplication algorithm. $|u|<\epsilon_\text{mach}$ if the floating-point arithmetic is IEEE-754 and no underflow\overflow happens.

So, is there any catch on using zgemm3m vs regular zgemm? Is there a situation where I should avoid using zgemm3m?

I didn't look into he details, but Higham's paper discusses this type of matrix multiplication: Stability of a method for multiplying complex matrices with three real matrix multiplications. — wim, Jul 31 '19 at 01:15
Example: $z_1=a+ib$ and $z_2=c+id$. If a=0.1; b=13e-10; c=0.3; d=31e-10; then in double precision (with octave): a*d+b*c = 7.000000000000001e-10, and (a+b)*(c+d)-a*c-b*d = 6.999999983771085e-10, which is less accurate. — wim, Jul 31 '19 at 09:19
There is also a recent algorithmic paper in ACM TOMS on the 3m and 4m methods for complex matrix-matrix multiplication. I forgot the authors (Field van Zee?) but they are from the Austin group doing linear algebra. If you can't find it, let me know and I'll find it for you! — Wolfgang Bangerth, Aug 01 '19 at 02:45
@WolfgangBangerth I think I found it: 3m-4m complex MMP, ACM TOMS. Thanks a lot for the lead! — Anton Menshov, Aug 01 '19 at 15:45

Is there any catch on using `zgemm3m` vs regular `zgemm`?

0 Answers0