If I have a register eax filled with a dword I can access the two lower registers using al and ah. If I would like to to access the other two bytes of the register what sort of commands would I need?
-
3`bswap` or some rotate, for example. Also, avoid it if possible. – Jester Jan 24 '18 at 22:59
-
1If you have BMI2, `rorx edx, eax, 16` will copy+rotate efficiently. – Peter Cordes Jan 24 '18 at 23:43
-
1[Get the upper half of the EAX register](https://stackoverflow.com/q/15230268/995714) – phuclv Jan 25 '18 at 03:02
-
Possible duplicate of [Get the upper half of the EAX register](https://stackoverflow.com/questions/15230268/get-the-upper-half-of-the-eax-register) – Fifoernik Jan 25 '18 at 14:39
-
1As @PeterCordes (I think) pointed out elsewhere if you don't have BMI2 you can also use `shld ecx, eax, 16` to copy & get the top 16-bits into the lower 16-bits. It's efficient on Intel (1 cycle tput, 3 cycles latency) but sucks on Ryzen (6 !! mops). – BeeOnRope Jan 25 '18 at 23:20
2 Answers
IMO the best would be to shr (shift right bits) x8 and use AL to get the values you need. The use of AH register is highly unrecommended by optimization manual (from Intel):
3.5.1.12 Zero-Latency MOV Instructions
In processors based on Intel microarchitecture code name Ivy Bridge, a subset of register-to-register move operations are executed in the front end (similar to zero-idioms, see Section 3.5.1.7). This conserves scheduling/execution resources in the out-of-order engine. Most forms of register-to-register. MOVZX are hence Zero-Latency for reg32, reg8 (if not AH/BH/CH/DH)
movzx esi, al ; esi = eax & 0xff
shr eax, 8 ; eax >>= 8;
movzx ecx, al
shr eax, 8
movzx ebx, al
shr eax, 8
You will have first byte in eax, 2nd in ebx, 3rd in ecx and last byte (the one that was the lowest part of eax at the origin) in esi. Also it is nasm syntax I am not familiar with masm so you may need some tweaks.
- 7,991
- 4
- 44
- 60
-
4[Reading AH adds 1 cycle of latency on current Intel CPUs (Haswell/Skylake), but has no throughput penalty](https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to). `movzx esi, al` / `movzx edi, ah` / `shr eax,16` / repeat is often good. Last I checked, gcc and clang read AH for unpacking the low 2 bytes, but don't use `shr eax,16` to get the next two. – Peter Cordes Jan 24 '18 at 23:41
-
Haha thank you Peter, I kept reading your comments all day on stackoverflow, along with optimization manuals, but perhaps I missed something, really thankful for all the detailed answers and comments ;) – Antonin GAVREL Jan 24 '18 at 23:47
Without knowing the exact purpose, it is hard to determine what would be the best method, but you can tell by the other answer and comments, there is a few different ways to skin this cat. I'm just going to share another example of a method I've used quite often.
push ebp
mov ebp, esp
mov eax, 141f2d72H
push eax
Now the contents of memory pointed to by EBP-4 or ESP is;
72 2D 1F 14
Now there are plenty of combinations you can do to address the data as a byte or word.
mov al, [bp-1] AL = 14H
mov ax, [bp-2] AX = 141FH
I'm not advocating this is a better way than the other examples, just a method I've found to work effectively for some of the stuff I do.
- 1,188
- 7
- 17
-
1Store/reload has at least 5 cycle latency (or sometimes 4 on SKL). This is normally a bad option, unless you're bottlenecked on ALU throughput. You should definitely use `movzx` loads, or 8-bit / 16-bit ALU instructions with memory operands, though, not `mov ax, [ebp-2]` – Peter Cordes Jan 25 '18 at 23:58