x86 multi-byte NOP and instruction prefix

Question

As a small recall, the x86 architecture defines 0x0F 0x1F [mod R/M] as a multi-byte NOP.

Now I'm looking at the specific case of an 8-byte NOP: I have got

0x0F 0x1F 0x84 0x__ 0x__ 0x__ 0x__ 0x__

where the last 5 bytes have got arbitrary values.

The third byte, [mod R/M], split up gives:

modrm

mod = 10b: argument is reg1 + a DWORD-sized displacement
reg2 = 000b: (we don't care)
reg1 = 100b: indicates that the argument is instead the SIB byte + a DWORD-sized displacement.

Now, as a concrete example, if I take

0x0F 0x1F 0x84 0x12 0x34 0x56 0x78 0x9A

I've got

SIB = 0x12
displacement = 0x9A785634: a DWORD

Now I add the 0x66 instruction prefix to indicate that the displacement should be a WORD instead of a DWORD:

0x66 0x0F 0x1F 0x84 0x12 0x34 0x56 0x78 0x9A

I expect 0x78 0x9A to be 'cut off' and be treated as a new instruction. However, when compiling this and running objdump on the resulting executable, it still uses all 4 bytes (a DWORD) as displacement.

Am I misunderstanding the meaning of 'displacement' in this context? Or does the 0x66 prefix not have any effect on multi-byte NOP instructions?

The 0x66 prefix does not affect the size of the displacement - only the operand size. — 500 - Internal Server Error, Dec 31 '14 at 01:13
So, if the components of an instructions are: Prefix, Opcode, Mod R/M, SIB, Displacement, Immediate, then 0x66 affects the *Immediate* value? Alright, it kind of makes sense: the multi-byte NOP does not take an immediate value, and thus 0x66 does not have any effect. Thank you! — ayekat, Dec 31 '14 at 15:40
Related: [Long multi-byte NOPs: commonly understood macros or other notation](https://stackoverflow.com/q/25545470/427158) — maxschlepzig, Feb 21 '20 at 18:44

Johan · Answer 1 · 2016-04-01T17:13:08.503

The 66H prefix overrides the size of the operand to 16 bit.
It does not override the size of the address, if you want that you use 67H

Here's a list of all operands.

        F0h = LOCK  -- locks memory reads/writes
        String prefixes
        F3h = REP, REPE  
        F2h = REPNE      
        Segment overrides
        2Eh = CS
        36h = SS
        3Eh = DS
        26h = ES
        64h = FS
        65h = GS
        Operand override 
        66h. Changes size of data expected to 16-bit
        Address override 
        67h. Changes size of address expected to 16-bit

However it is best not to create your own nop instructions, but stick to the recommended (multi-byte) nops.

According to AMD the recommended multibytes nops are as follows:

Table 4-9. Recommended Multi-Byte Sequence of NOP Instruction

bytes  sequence                encoding

 1      90H                            NOP
 2      66 90H                         66 NOP
 3      0F 1F 00H                      NOP DWORD ptr [EAX]
 4      0F 1F 40 00H                   NOP DWORD ptr [EAX + 00H]
 5      0F 1F 44 00 00H                NOP DWORD ptr [EAX + EAX*1 + 00H]
 6      66 0F 1F 44 00 00H             NOP DWORD ptr [AX + AX*1 + 00H]
 7      0F 1F 80 00 00 00 00H          NOP DWORD ptr [EAX + 00000000H]
 8      0F 1F 84 00 00 00 00 00H       NOP DWORD ptr [AX + AX*1 + 00000000H]
 9      66 0F 1F 84 00 00 00 00 00H    NOP DWORD ptr [AX + AX*1 + 00000000H]

Intel does not mind up to 3 redundant prefixes, so nop's up to 11 bytes can be constructed like so.

 10     66 66 0F 1F 84 00 00 00 00 00H     NOP DWORD ptr [AX + AX*1 + 00000000H] 
 11     66 66 66 0F 1F 84 00 00 00 00 00H  NOP DWORD ptr [AX + AX*1 + 00000000H]

Of course you can also eliminate nops by prefixing normal instructions with redundant prefixes.

e.g.

rep mov reg,reg //one extra byte

or forcing the cpu to use longer versions of the same instruction.

test r8d,r8d is one byte longer than: test edx,edx

The instructions with immediate operands have short and long versions.

and edx,7 //short
and edx,0000007  //long

Most assembler will helpfully shorten all instructions for you, so you'll have to code the longer instructions yourself using db

Interspersing these in strategic locations can help you align jump targets without having to incur delays due to the decoding or execution of a nop.

Remember on most CPU's executing nop's still uses up resources.

The [yasm docs](http://www.tortall.net/projects/yasm/manual/html/manual.html#nasm-immediate) say you can write `add rax, dword 1` to force a 32bit immediate instead of imm8, but it doesn't work in practice. (Still shortened to imm8 with `yasm 1.3.0` from the Ubuntu package.) Also, you can pad with a REX prefix even if the instruction doesn't need one (without changing registers). `db 0x40` / `test edx,edx` still decodes as `test edx,edx`. A REX with no bits set only changes the meaning of an insn that uses AH/BH/CH/DH. (To SIL, etc, in some order I'd have to look up) — Peter Cordes, Apr 02 '16 at 02:21
Update: YASM: `add rax, strict dword 1` forces a 32-bit immediate. NASM: `add rax, strict qword 1` forces a 32-bit immediate. NASM and YASM each choke on the other one, complaining about sizes. But anyway, `strict` is the essential keyword in NASM syntax for immediates and addressing-mode displacements. — Peter Cordes, Sep 13 '17 at 04:11
I recommend to avoid using `rep` prefixes where not defined already as Intel tends to give them new meanings later on. For example, `rep mov` has a meaning within Intel HLE now. — fuz, Jan 10 '18 at 19:47

x86 multi-byte NOP and instruction prefix

1 Answers1

Linked