Probably just a historical question, if anyone knows.
2 Answers
From the Intel manual under NOP:
The one-byte NOP instruction is an alias mnemonic for the XCHG (E)AX, (E)AX instruction.
This XCHG mnemonic is encoded as 90+reg encoding used as a second parameter in the exchange. (E)AX has an encoding of 0 (CX - 1, DX - 2, BX - 3 ect.) and from that XCHG (E)AX,(E)AX has value of 0x90 = NOP.
- 4,912
- 1
- 14
- 27
-
3A lot of architectures that have a no-op that isn’t a string of null bytes have similar reasoning to this, it’s almost always a single cycle (or minimal length) instruction that does not change state, and it’s usually cheaper to not implement the extra decoder logic for an additional op-code that will just be decoded to the same instruction. – Austin Hemmelgarn Nov 01 '23 at 01:53
-
12This led to a corner case in x86-64, because XCHG in long mode without a REX prefix still defaults to 32 bits, so 0x90 looks like XCHG EAX, EAX which would not be a no-op (it would clear the top 32 bits of EAX), but it's special-cased to be NOP anyway. If you actually want XCHG EAX, EAX for some reason you have to write 0x87 0xC0, which is a valid but over-long encoding in 32-bit mode. – hobbs Nov 01 '23 at 02:41
-
1@AustinHemmelgarn That was true for simpler processors, with modern processors you want the front end to know what you're doing and toss the NOP, instead of doing the work to schedule and execute on an a ALU. Now you're hoping people are doing NOPs or re-setting registers in very specific ways and checking all of them (idioms). AArch64 has an explicit NOP, it's in the HINT space since it's only going to get as far as the front end. – user71659 Nov 01 '23 at 04:02
-
1@user71659: Right, modern ISAs should clearly document a single NOP encoding which will be handled efficiently. But there's no problem with choosing that encoding to be a special case of the operands for another opcode, like
add x0, x0, x0on a CPU like RISC-V wherex0is a zero register. Maybe a tiny bit in terms of the combinatorial logic in the decoders to not have a NOP also match the opcode for something else. – Peter Cordes Nov 01 '23 at 08:02 -
1This was the de-facto situation on x86 for years, with it clearly documented in optimization manuals that
90hwas a 1-byte NOP that would be handled efficiently. (P6 introduced longer NOP encodings). It evolved into that state with toolchains already having those NOP patterns before x86-64 had to make it explicit. – Peter Cordes Nov 01 '23 at 08:02 -
The newer
0Fh, 1Fh, modRMNOP instruction in the Intel architecture was specifically designed as such. I wonder why they didn't do it from the get-go and used the xchg eax, eax instead? Also why not mov eax, eax. Also @hobbs good point aboutxchg eax, eax- there's clearly a special handling there, as otherwise it would've cleared the high 32-bits of RAX, which it doesn't. But, for instance, if you try xchg ecx, ecx, that will clear 32-high bits of RCX. And finally, to Peter's point, RISC is a totally different ballgame. They have that catch-allx0/w0register for that specific purpose – c00000fd Nov 01 '23 at 11:36 -
4@c00000fd: This design decision dates back to 8086 in the late 70s.
mov ax, axis a 2-byte instruction (with the same encoding asmov eax, eaxin 32-bit mode.) The only 1-byte 8086 instruction with no architectural effect wasxchg ax,ax, part of the exchange-with-accumulator group of opcodes where the other register number is in the low 3 bits. (https://www.felixcloutier.com/x86/xchg) There was no real need to document90hNOP separately, and original 8086 probably didn't run it any faster. – Peter Cordes Nov 01 '23 at 15:01 -
1@c00000fd: They have that catch-all x0/w0 register for that specific purpose - not all RISCs have a zero register. 32-bit ARM doesn't. AArch64 does (xzr / wzr), but it shares a register number with
x0/w0; which one you get depends on context (the opcode and maybe something else.) Do any other RISCs usew0as a register name at all? Because in AArch64, that's not a zero register. Also, NOPs aren't the main purpose of having a zero register, they're to avoid the need for separate opcodes forsubvs.cmpand stuff like that. And to avoid needing a mov-immediate opcode. – Peter Cordes Nov 01 '23 at 15:05 -
1@c00000fd: if you're curious about design intent in the 8086 ISA, The 8086 Primer (https://stevemorse.org/8086/) is a good place to check. Steve Morse more or less designed the ISA on paper himself, and later wrote a book about how to program it, including intended use-cases of instructions like
lodsandloopthat reveal the design intent there. (But mostly it's written for programmers learning 8086, not a discussion of design decisions. Still, if he mentions the NOP mnemonic, he might say something interesting about it.) – Peter Cordes Nov 01 '23 at 15:08 -
@PeterCordes I should probably read what I type. I misspoke. I meant to say "ARM" instead of "RISC". – c00000fd Nov 01 '23 at 17:18
-
I think you hit the nail on the head with that xchg instruction is the only single-byte opcode that allowed "not to do anything". That is probably why they chose it for the original NOP back then. – c00000fd Nov 01 '23 at 17:25
-
@c00000fd: Ok, well like I said, unlike many RISCs, AArch64's
w0/x0aren't zero registers;wzr/xzrare separate registers, and whether you get them or the stack pointer for register number 31 depends on the opcode.add x0, x0, x0isn't a NOP. AArch64 does still have some instruction encodings with no effect on the architectural state, likelsl w5, w5, #0. According to the manual (https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/LSL--immediate-), the encodeble shift count range includes 0, so that bit-pattern isn't special-cased as something else. – Peter Cordes Nov 01 '23 at 17:25 -
1@PeterCordes The 8086 Primer briefly mentions the NOP mnemonic on page 164, highlighting that it is a 1-byte instruction, and that it doesn't access memory. Some listed uses are as a placeholder to be filled in later (possibly by self-modifying code), or for precise timing delays. The NOP mnemonic/alias seems to be a feature of ASM-86 (the assembly language), not the official name for the instruction encoded by 0x90, since that instruction is still listed as
XCHG AX, AXin the opcode table on page 266. – camerondm9 Nov 01 '23 at 21:01 -
@camerondm9: the opcodes
90hand66h, 90hhave a clear special handling in x64, like was pointed out by hobbs above. There's also a host of other specialized NOP instructions, added later, as the0Fh, 1Fhopcodes. Those can have memory operands but they don't seem to access them. They are more of a multi-byte space placeholders I guess. – c00000fd Nov 05 '23 at 08:57 -
@PeterCordes I forgot to mention, there's also a legacy
fnopwith the opcodeD9h, D0hfor an FPU NOP. I honestly have no idea where they were going with it. – c00000fd Nov 05 '23 at 08:59 -
@c00000fd It makes sense that special handling for NOP instructions was added in the 25 years between the original 8086 and when the x86-64 architecture was released. The 8068 Primer was published before the instruction set had even been extended to 32 bits with the the 80386, so of course it can't refer to things that came later. The whole point of referencing it was to see what the designer of the original instruction set was thinking, in response to your question: "I wonder why they didn't do it from the get-go and used the xchg eax, eax instead?" – camerondm9 Nov 12 '23 at 08:46
I don't think that was the reason why 0x00 didn't become a NOP, but from today's security perspective it would totally make sense to not have 0x00 as a NOP. This is because today's architectures are mostly von Neumann architectures that happily mix data in between code. My guess is that 0x00 is the most common data value as it is used to initialize newly allocated memory and most default values for integers are also probably zero.
In code injection exploits you usually inject executable code as data and then trigger a bug to make the CPU execute this injected data.
If most data bytes are a NOP when (mis)used as instruction, most of the data becomes a so-called nop slide.
A nop slide is very useful for an attacker as he might not be sure at which point exactly his injected code is in memory.
When everything before his injected code are NOPs, he just needs to be lucky and hit somewhere into the nop slide which will then lead the CPU to execute the attacker’s injected code at the end.
That's why NOP slides can be used to defeat protections like ASLR, which loads memory at random positions.
I don't think all these security considerations where made at the point they designed the instruction set, but maybe they had a guess that its not a good idea to make the default value for most data also a NOP when interpreted as instruction.
- 173
- 1
- 4
-
2
-
@c00000fd: Yes. 16-bit x86
00 00isadd [bx+si], al. In other modes, it'sadd [rax], al. The choice to have the "mode" bits of the ModR/M byte use00as a memory addressing mode (rather than register direct) was probably arbitrary, since 8086 doesn't have memory protection so a memory destination isn't going to fault and help you debug a bad jump target. Only with protected mode did that become an advantage vs. if it had decoded asadd al, alwhich isn't a NOP but won't fault either. – Peter Cordes Nov 01 '23 at 17:34 -
On some RISCs,
00 00 00 00runs without faulting, even if it might not be a NOP. (Although on ISAs where register number 0 is a zero register that discards writes, it is a NOP.) So it works as a NOP sled, allowing execution to reach whatever's after the padding as long as it lands somewhere in it. Modern designs like RISC-V and AArch64 avoid that, with RISC-V I think even guaranteeing that the all-zero encoding will definitely fault, not usable as an extension. It's good for ease of debugging, too, like if you let execution fall off the end of_start, it'll fault right there. – Peter Cordes Nov 01 '23 at 17:38 -
Many architectures require memory to be aligned to some offset. The fill bytes are then often choosen to be something that would crash when interpreted as instruction. In case of x86 e.g.
0xCCwhich is bascially a software breakpoint for a debugger. Without a debugger the program would instantly crash. With a debugger you have a nice breakpoint and can debug the root cause more easily. So, nice for debugging and prevents NOP slides. – Fee Nov 01 '23 at 21:00 -
@Fee: And in fact you don't need to fill with them. You just need to place them in front of functions. A long NOP slide into a breakpoint is a slow breakpoint. – Joshua Nov 02 '23 at 00:25
-
@Fee: IDK if you were replying to me; I only just saw it since you didn't @ notify me. Padding in Linux ELF executables to align program segments tends to be done with
00bytes by GNU Binutilsld. And GCC chooses to pad between functions with the usual.p2align, not overriding the fill, so we get long NOPs instead ofCCint3 breakpoints. (MSVC fills between functions withCC). Implicit zeros like beyond the end of a segment mapping to fill the rest of a page, or farther for the.bsssection, are also zero-filled, and so is dynamically-allocated memory unless written in user-space – Peter Cordes Nov 05 '23 at 09:04 -
The padding between functions is probably not security-sensitive, which I assume is why nobody bothered to change it. ASLR randomization has page granularity, so the low 12 bits of a function address will always be the same for a given build. And with normal build options, will always be 16-byte aligned. So an absolute jump aiming for the start of a function might miss by a multiple of 16 bytes in a different build, but won't land in the padding leading up to it. (And long NOPs tend not to be NOPs if you start decode from the wrong offset.) – Peter Cordes Nov 05 '23 at 09:07
00illegal :/ – Simon Richter Nov 02 '23 at 07:0200, 00oradd [rax], alin x64 is almost guaranteed to cause an exception for some random binary data. – c00000fd Nov 02 '23 at 08:01