Why did Intel chose 90h machine code for their NOP instruction instead of a 0?

Question

Probably just a historical question, if anyone knows.

@SimonRichter although it is not an illegal opcode, but 00, 00 or add [rax], al in x64 is almost guaranteed to cause an exception for some random binary data. — c00000fd, Nov 02 '23 at 08:01
IIRC MIPS has 0x00000000 has a NOP because of the use of zero registers. — qwr, Nov 02 '23 at 16:22

score 14 · Answer 1 · answered Oct 31 '23 at 18:01

14

From the Intel manual under NOP:

The one-byte NOP instruction is an alias mnemonic for the XCHG (E)AX, (E)AX instruction.

This XCHG mnemonic is encoded as 90+reg encoding used as a second parameter in the exchange. (E)AX has an encoding of 0 (CX - 1, DX - 2, BX - 3 ect.) and from that XCHG (E)AX,(E)AX has value of 0x90 = NOP.

answered Oct 31 '23 at 18:01

Paweł Łukasik

4,912
1
14
27

3

A lot of architectures that have a no-op that isn’t a string of null bytes have similar reasoning to this, it’s almost always a single cycle (or minimal length) instruction that does not change state, and it’s usually cheaper to not implement the extra decoder logic for an additional op-code that will just be decoded to the same instruction. – Austin Hemmelgarn Nov 01 '23 at 01:53
12

This led to a corner case in x86-64, because XCHG in long mode without a REX prefix still defaults to 32 bits, so 0x90 looks like XCHG EAX, EAX which would not be a no-op (it would clear the top 32 bits of EAX), but it's special-cased to be NOP anyway. If you actually want XCHG EAX, EAX for some reason you have to write 0x87 0xC0, which is a valid but over-long encoding in 32-bit mode. – hobbs Nov 01 '23 at 02:41
1

@AustinHemmelgarn That was true for simpler processors, with modern processors you want the front end to know what you're doing and toss the NOP, instead of doing the work to schedule and execute on an a ALU. Now you're hoping people are doing NOPs or re-setting registers in very specific ways and checking all of them (idioms). AArch64 has an explicit NOP, it's in the HINT space since it's only going to get as far as the front end. – user71659 Nov 01 '23 at 04:02
1

@user71659: Right, modern ISAs should clearly document a single NOP encoding which will be handled efficiently. But there's no problem with choosing that encoding to be a special case of the operands for another opcode, like add x0, x0, x0 on a CPU like RISC-V where x0 is a zero register. Maybe a tiny bit in terms of the combinatorial logic in the decoders to not have a NOP also match the opcode for something else. – Peter Cordes Nov 01 '23 at 08:02
1

This was the de-facto situation on x86 for years, with it clearly documented in optimization manuals that 90h was a 1-byte NOP that would be handled efficiently. (P6 introduced longer NOP encodings). It evolved into that state with toolchains already having those NOP patterns before x86-64 had to make it explicit. – Peter Cordes Nov 01 '23 at 08:02
The newer 0Fh, 1Fh, modRM NOP instruction in the Intel architecture was specifically designed as such. I wonder why they didn't do it from the get-go and used the xchg eax, eax instead? Also why not mov eax, eax. Also @hobbs good point about xchg eax, eax - there's clearly a special handling there, as otherwise it would've cleared the high 32-bits of RAX, which it doesn't. But, for instance, if you try xchg ecx, ecx, that will clear 32-high bits of RCX. And finally, to Peter's point, RISC is a totally different ballgame. They have that catch-all x0/w0 register for that specific purpose – c00000fd Nov 01 '23 at 11:36
4

@c00000fd: This design decision dates back to 8086 in the late 70s. mov ax, ax is a 2-byte instruction (with the same encoding as mov eax, eax in 32-bit mode.) The only 1-byte 8086 instruction with no architectural effect was xchg ax,ax, part of the exchange-with-accumulator group of opcodes where the other register number is in the low 3 bits. (https://www.felixcloutier.com/x86/xchg) There was no real need to document 90h NOP separately, and original 8086 probably didn't run it any faster. – Peter Cordes Nov 01 '23 at 15:01
1

@c00000fd: They have that catch-all x0/w0 register for that specific purpose - not all RISCs have a zero register. 32-bit ARM doesn't. AArch64 does (xzr / wzr), but it shares a register number with x0 / w0; which one you get depends on context (the opcode and maybe something else.) Do any other RISCs use w0 as a register name at all? Because in AArch64, that's not a zero register. Also, NOPs aren't the main purpose of having a zero register, they're to avoid the need for separate opcodes for sub vs. cmp and stuff like that. And to avoid needing a mov-immediate opcode. – Peter Cordes Nov 01 '23 at 15:05
1

@c00000fd: if you're curious about design intent in the 8086 ISA, The 8086 Primer (https://stevemorse.org/8086/) is a good place to check. Steve Morse more or less designed the ISA on paper himself, and later wrote a book about how to program it, including intended use-cases of instructions like lods and loop that reveal the design intent there. (But mostly it's written for programmers learning 8086, not a discussion of design decisions. Still, if he mentions the NOP mnemonic, he might say something interesting about it.) – Peter Cordes Nov 01 '23 at 15:08
@PeterCordes I should probably read what I type. I misspoke. I meant to say "ARM" instead of "RISC". – c00000fd Nov 01 '23 at 17:18
I think you hit the nail on the head with that xchg instruction is the only single-byte opcode that allowed "not to do anything". That is probably why they chose it for the original NOP back then. – c00000fd Nov 01 '23 at 17:25
@c00000fd: Ok, well like I said, unlike many RISCs, AArch64's w0 / x0 aren't zero registers; wzr / xzr are separate registers, and whether you get them or the stack pointer for register number 31 depends on the opcode. add x0, x0, x0 isn't a NOP. AArch64 does still have some instruction encodings with no effect on the architectural state, like lsl w5, w5, #0. According to the manual (https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/LSL--immediate-), the encodeble shift count range includes 0, so that bit-pattern isn't special-cased as something else. – Peter Cordes Nov 01 '23 at 17:25
1

@PeterCordes The 8086 Primer briefly mentions the NOP mnemonic on page 164, highlighting that it is a 1-byte instruction, and that it doesn't access memory. Some listed uses are as a placeholder to be filled in later (possibly by self-modifying code), or for precise timing delays. The NOP mnemonic/alias seems to be a feature of ASM-86 (the assembly language), not the official name for the instruction encoded by 0x90, since that instruction is still listed as XCHG AX, AX in the opcode table on page 266. – camerondm9 Nov 01 '23 at 21:01
@camerondm9: the opcodes 90h and 66h, 90h have a clear special handling in x64, like was pointed out by hobbs above. There's also a host of other specialized NOP instructions, added later, as the 0Fh, 1Fh opcodes. Those can have memory operands but they don't seem to access them. They are more of a multi-byte space placeholders I guess. – c00000fd Nov 05 '23 at 08:57
@PeterCordes I forgot to mention, there's also a legacy fnop with the opcode D9h, D0h for an FPU NOP. I honestly have no idea where they were going with it. – c00000fd Nov 05 '23 at 08:59
@c00000fd It makes sense that special handling for NOP instructions was added in the 25 years between the original 8086 and when the x86-64 architecture was released. The 8068 Primer was published before the instruction set had even been extended to 32 bits with the the 80386, so of course it can't refer to things that came later. The whole point of referencing it was to see what the designer of the original instruction set was thinking, in response to your question: "I wonder why they didn't do it from the get-go and used the xchg eax, eax instead?" – camerondm9 Nov 12 '23 at 08:46

score 3 · Answer 2 · answered Nov 01 '23 at 15:14

3

I don't think that was the reason why 0x00 didn't become a NOP, but from today's security perspective it would totally make sense to not have 0x00 as a NOP. This is because today's architectures are mostly von Neumann architectures that happily mix data in between code. My guess is that 0x00 is the most common data value as it is used to initialize newly allocated memory and most default values for integers are also probably zero.

In code injection exploits you usually inject executable code as data and then trigger a bug to make the CPU execute this injected data. If most data bytes are a NOP when (mis)used as instruction, most of the data becomes a so-called nop slide.

A nop slide is very useful for an attacker as he might not be sure at which point exactly his injected code is in memory. When everything before his injected code are NOPs, he just needs to be lucky and hit somewhere into the nop slide which will then lead the CPU to execute the attacker’s injected code at the end. That's why NOP slides can be used to defeat protections like ASLR, which loads memory at random positions.

I don't think all these security considerations where made at the point they designed the instruction set, but maybe they had a guess that its not a good idea to make the default value for most data also a NOP when interpreted as instruction.

answered Nov 01 '23 at 15:14

Fee

173
1
4

2

So they basically lucked out with not making a NOP a 00h opcode. – c00000fd Nov 01 '23 at 17:16
@c00000fd: Yes. 16-bit x86 00 00 is add [bx+si], al. In other modes, it's add [rax], al. The choice to have the "mode" bits of the ModR/M byte use 00 as a memory addressing mode (rather than register direct) was probably arbitrary, since 8086 doesn't have memory protection so a memory destination isn't going to fault and help you debug a bad jump target. Only with protected mode did that become an advantage vs. if it had decoded as add al, al which isn't a NOP but won't fault either. – Peter Cordes Nov 01 '23 at 17:34
On some RISCs, 00 00 00 00 runs without faulting, even if it might not be a NOP. (Although on ISAs where register number 0 is a zero register that discards writes, it is a NOP.) So it works as a NOP sled, allowing execution to reach whatever's after the padding as long as it lands somewhere in it. Modern designs like RISC-V and AArch64 avoid that, with RISC-V I think even guaranteeing that the all-zero encoding will definitely fault, not usable as an extension. It's good for ease of debugging, too, like if you let execution fall off the end of _start, it'll fault right there. – Peter Cordes Nov 01 '23 at 17:38
Many architectures require memory to be aligned to some offset. The fill bytes are then often choosen to be something that would crash when interpreted as instruction. In case of x86 e.g. 0xCC which is bascially a software breakpoint for a debugger. Without a debugger the program would instantly crash. With a debugger you have a nice breakpoint and can debug the root cause more easily. So, nice for debugging and prevents NOP slides. – Fee Nov 01 '23 at 21:00
@Fee: And in fact you don't need to fill with them. You just need to place them in front of functions. A long NOP slide into a breakpoint is a slow breakpoint. – Joshua Nov 02 '23 at 00:25
@Fee: IDK if you were replying to me; I only just saw it since you didn't @ notify me. Padding in Linux ELF executables to align program segments tends to be done with 00 bytes by GNU Binutils ld. And GCC chooses to pad between functions with the usual .p2align, not overriding the fill, so we get long NOPs instead of CC int3 breakpoints. (MSVC fills between functions with CC). Implicit zeros like beyond the end of a segment mapping to fill the rest of a page, or farther for the .bss section, are also zero-filled, and so is dynamically-allocated memory unless written in user-space – Peter Cordes Nov 05 '23 at 09:04
The padding between functions is probably not security-sensitive, which I assume is why nobody bothered to change it. ASLR randomization has page granularity, so the low 12 bits of a function address will always be the same for a given build. And with normal build options, will always be 16-byte aligned. So an absolute jump aiming for the start of a function might miss by a multiple of 16 bytes in a different build, but won't land in the padding leading up to it. (And long NOPs tend not to be NOPs if you start decode from the wrong offset.) – Peter Cordes Nov 05 '23 at 09:07

Why did Intel chose 90h machine code for their NOP instruction instead of a 0?

2 Answers2