1

I have a piece of inline GCC assembly code for ARM, pasted below:

inline void example(int *dst, int src)
{
    int t2,t3,t4,t5,t6,t7,t8;
    int table_copy = (int)(&table[0]);
    asm volatile(
            "ldr    %[t8], [%[dstptr]]              \n"
            "and    %[t2], %[t1], #255              \n"
            "mvn    %[t4], %[t8]                    \n"
            "and    %[t3], %[t4], #255              \n"
            "smulbb %[t4], %[t2], %[t3]             \n"
            "add    %[t3], %[t8], %[t4], lsr #8     \n"
            "and    %[t4], %[t3], #255              \n"
            "ldr    %[t6], [%[table], %[t4], lsl #1]\n"
            "strb   %[t4], [%[dstptr], #0]          \n"
            "and    %[t3], %[t8], #0xff0000         \n"
            "smlabb %[t7], %[t2], %[t6], %[t2]      \n"
            "and    %[t6], %[t1], #0xff0000         \n"
            "lsr    %[t2], %[t8], #8                \n"
            "rsb    %[t4], %[t3], %[t6], lsr #16    \n"
            "lsr    %[t8], %[t8], #24               \n"
            "lsl    %[t5], %[t1], #16               \n"
            "smulbb %[t6], %[t7], %[t4]             \n"
            "rsb    %[t5], %[t2], %[t5], lsr #24    \n"
            "rsb    %[t4], %[t8], %[t1], lsr #24    \n"
            "smulbb %[t1], %[t7], %[t4]             \n"
            "smulbb %[t4], %[t5], %[t7]             \n"
            "add    %[t3], %[t3], %[t6], lsr #15    \n"
            "add    %[t1], %[t8], %[t1], lsr #15    \n"
            "strb   %[t3], [%[dstptr], #2]          \n"
            "strb   %[t1], [%[dstptr], #3]          \n"
            "add    %[t2], %[t2], %[t4], lsr #15    \n"
            "strb   %[t2], [%[dstptr], #1]          \n"

        :[t1]"+r"(src), "=m"(*dst), // outputs
        [t2]"=r"(t2), [t3]"=r"(t3), [t4]"=r"(t4), [t5]"=r"(t5), [t6]"=r"(t6), [t7]"=r"(t7), [t8]"=r"(t8) // temporaries, not really used as output
        :"m"(*dst), [dstptr]"r"(dst), [table]"r"(table_copy) // inputs
        :);
}

I'm using local variables t2..t8 as temporaries to allow the compiler to allocate the registers as it sees fit when inlining the function. The problem is that some of the locals are assigned to the same register as the input parameters (such as dstptr). Below is a fragment of the compiled code, in the first line we can see that dstptr was assigned to register r2, but the fourth line shows that t3 is also in register r2.

0x5762c  <+0x01e4>        00 50 92 e5  ldr  r5, [r2]
0x57630  <+0x01e8>        ff 30 01 e2  and  r3, r1, #255    ; 0xff
0x57634  <+0x01ec>        05 e0 e0 e1  mvn  lr, r5
0x57638  <+0x01f0>        ff 20 0e e2  and  r2, lr, #255    ; 0xff
0x5763c  <+0x01f4>        83 02 6e e1  smulbb   lr, r3, r2
0x57640  <+0x01f8>        2e 24 85 e0  add  r2, r5, lr, lsr #8
0x57644  <+0x01fc>        ff e0 02 e2  and  lr, r2, #255    ; 0xff
0x57648  <+0x0200>        8e 90 93 e7  ldr  r9, [r3, lr, lsl #1]

The code compiles correctly if I mark dstptr and table as output constraints, like this:

        :[t1]"+r"(src), "=m"(*dst),[dstptr]"+r"(dst), [table]"+r"(table_copy),
        [t2]"=r"(t2), [t3]"=r"(t3), [t4]"=r"(t4), [t5]"=r"(t5), [t6]"=r"(t6), [t7]"=r"(t7), [t8]"=r"(t8)
        :"m"(*dst)
        :);

Should this really be needed? Is the compiler free to alias input and output parameters?

The code compiled correctly with gcc 7, but fails with gcc 9.

  • Yes, GNU C inline asm is designed for wrapping single asm instructions which read all their inputs before writing any of their outputs. If that's not the case, you need to declare some/all outputs (and `"+r" RMW operands) as early clobber; see the linked duplicates. You just got lucky with gcc7. – Peter Cordes Aug 12 '21 at 13:19
  • Ok, that explains much. Thanks for pointing me towards the duplicate. – Andrzej Szombierski Aug 12 '21 at 13:37
  • https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html "Use the ‘&’ constraint modifier (see Modifiers) on all output operands that must not overlap an input. Otherwise, GCC may allocate the output operand in the same register as an unrelated input operand, on the assumption that the assembler code consumes its inputs before producing outputs." – Nate Eldredge Aug 12 '21 at 19:10

0 Answers0