I'm learning assembler for MicroPython (ARM Thumb2 instruction set for PyBoard).
Is there a quicker way to check the sign (positive/negative) of an FPU register (s0) than this?
@micropython.asm_thumb
def float_array_abs(r0, r1):
label(LOOP)
vldr(s0, [r0, 0])
vmov(r2, s0) # 1
cmp(r2, 0) # 2
itt(mi) # 3
vneg(s0, s0)
vstr(s0, [r0, 0])
add(r0, 4)
sub(r1, 1)
bgt(LOOP)
This works but it doesn't seem like the 'right' solution (not sure the sign of r2 always matches the sign of s0) and I suspect it must be possible in less than two instructions.
UPDATE 1:
Based on the comments (thanks) I have improved the speed of the code further:
@micropython.asm_thumb
def float_array_abs1(r0, r1):
label(LOOP)
ldr(r2, [r0, 0])
cmp(r2, 0) # this works for some reason
bge(SKIP)
vmov(s0, r2)
vneg(s0, s0)
vstr(s0, [r0, 0]) # this can be skipped if not negative
label(SKIP)
add(r0, 4)
sub(r1, 1)
bgt(LOOP)
But it still leaves the question, is this a robust way of determining sign of an FP value?
For reference here are the byte representations of four float values on my system:
-1.0 0xbf800000
-0.0 0x80000000
0.0 0x00000000
1.0 0x3f800000
I guess if this is hardware dependent then I shouldn't be relying on this to determine the sign...
I think this might be the 'proper' way to do it (i.e. proper FPU comparison):
def float_array_abs2(r0, r1):
mov(r2, 0)
vmov(s1, r2)
label(LOOP)
vldr(s0, [r0, 0])
vcmp(s0, s1)
vmrs(APSR_nzcv, FPSCR)
itt(mi)
vneg(s0, s0)
vstr(s0, [r0, 0])
add(r0, 4)
sub(r1, 1)
bgt(LOOP)
But I timed this and it is 11% slower than the code above (float_array_abs1). So it would be nice to use the earlier code if it is a reliable solution.
UPDATE 2:
@Ped7g proposed the method and 0x7FFFFFFF (see comments).
I tested this and it does work. Here is the code:
@micropython.asm_thumb
def float_array_abs3(r0, r1):
movwt(r3, 0x7FFFFFFF)
label(LOOP)
ldr(r2, [r0, 0])
and_(r2, r3)
str(r2, [r0, 0])
add(r0, 4)
sub(r1, 1)
bgt(LOOP)
CORRECTION: It is faster than float_array_abs1 above. This appears to be the best solution but is it robust?