I have a piece of inline GCC assembly code for ARM, pasted below:
inline void example(int *dst, int src)
{
int t2,t3,t4,t5,t6,t7,t8;
int table_copy = (int)(&table[0]);
asm volatile(
"ldr %[t8], [%[dstptr]] \n"
"and %[t2], %[t1], #255 \n"
"mvn %[t4], %[t8] \n"
"and %[t3], %[t4], #255 \n"
"smulbb %[t4], %[t2], %[t3] \n"
"add %[t3], %[t8], %[t4], lsr #8 \n"
"and %[t4], %[t3], #255 \n"
"ldr %[t6], [%[table], %[t4], lsl #1]\n"
"strb %[t4], [%[dstptr], #0] \n"
"and %[t3], %[t8], #0xff0000 \n"
"smlabb %[t7], %[t2], %[t6], %[t2] \n"
"and %[t6], %[t1], #0xff0000 \n"
"lsr %[t2], %[t8], #8 \n"
"rsb %[t4], %[t3], %[t6], lsr #16 \n"
"lsr %[t8], %[t8], #24 \n"
"lsl %[t5], %[t1], #16 \n"
"smulbb %[t6], %[t7], %[t4] \n"
"rsb %[t5], %[t2], %[t5], lsr #24 \n"
"rsb %[t4], %[t8], %[t1], lsr #24 \n"
"smulbb %[t1], %[t7], %[t4] \n"
"smulbb %[t4], %[t5], %[t7] \n"
"add %[t3], %[t3], %[t6], lsr #15 \n"
"add %[t1], %[t8], %[t1], lsr #15 \n"
"strb %[t3], [%[dstptr], #2] \n"
"strb %[t1], [%[dstptr], #3] \n"
"add %[t2], %[t2], %[t4], lsr #15 \n"
"strb %[t2], [%[dstptr], #1] \n"
:[t1]"+r"(src), "=m"(*dst), // outputs
[t2]"=r"(t2), [t3]"=r"(t3), [t4]"=r"(t4), [t5]"=r"(t5), [t6]"=r"(t6), [t7]"=r"(t7), [t8]"=r"(t8) // temporaries, not really used as output
:"m"(*dst), [dstptr]"r"(dst), [table]"r"(table_copy) // inputs
:);
}
I'm using local variables t2..t8 as temporaries to allow the compiler to allocate the registers as it sees fit when inlining the function. The problem is that some of the locals are assigned to the same register as the input parameters (such as dstptr). Below is a fragment of the compiled code, in the first line we can see that dstptr was assigned to register r2, but the fourth line shows that t3 is also in register r2.
0x5762c <+0x01e4> 00 50 92 e5 ldr r5, [r2]
0x57630 <+0x01e8> ff 30 01 e2 and r3, r1, #255 ; 0xff
0x57634 <+0x01ec> 05 e0 e0 e1 mvn lr, r5
0x57638 <+0x01f0> ff 20 0e e2 and r2, lr, #255 ; 0xff
0x5763c <+0x01f4> 83 02 6e e1 smulbb lr, r3, r2
0x57640 <+0x01f8> 2e 24 85 e0 add r2, r5, lr, lsr #8
0x57644 <+0x01fc> ff e0 02 e2 and lr, r2, #255 ; 0xff
0x57648 <+0x0200> 8e 90 93 e7 ldr r9, [r3, lr, lsl #1]
The code compiles correctly if I mark dstptr and table as output constraints, like this:
:[t1]"+r"(src), "=m"(*dst),[dstptr]"+r"(dst), [table]"+r"(table_copy),
[t2]"=r"(t2), [t3]"=r"(t3), [t4]"=r"(t4), [t5]"=r"(t5), [t6]"=r"(t6), [t7]"=r"(t7), [t8]"=r"(t8)
:"m"(*dst)
:);
Should this really be needed? Is the compiler free to alias input and output parameters?
The code compiled correctly with gcc 7, but fails with gcc 9.