I want to optimize my x86-64 program. How do I decide which instructions are "the best ones"? How one measures that one certain piece of assembly code is faster than the other?
Example 1:
xmm0 [ 1 | 2 | 3 | 4 ]
xmm1 [ 0 | x | 0 | 0 ]
I want to move x in the place of 2. So, I can do
pslldq xmm1, 4 # like in the picture
shufps xmm0, xmm0, 0x39
movss xmm0, xmm1
shufps xmm0, xmm0, 0x93
or
blendps xmm0, xmm1, 0x4
or
insertps xmm0, xmm1, 0x50
or etc. Which one is the fastest / easiest?
Example 2:
I want to have 1.0 in xmm0 without reading from memory. I can do
pcmpeqw xmm0, xmm0
pslld xmm0, 25
psrld xmm0, 2
or
mov eax, 0x3f800000
movd xmm9, eax
# shift to the right position ...
or etc. Which one is the fastest / easiest?