Undocumented instructions in x86 CPU prior to 80386?

Question

I have questions regarding some x86 instructions that were documented for the 80386 and later x86 CPUs, but not for earlier chips.

"OR reg/mem16, immed8" (0x83/1) "AND reg/mem16, immed8" (0x83/4) and "XOR reg/mem16, immed8" (0x83/6) These opcodes are "not used" on 8086/8088 according to this manual: https://edge.edx.org/c4x/BITSPilani/EEE231/asset/8086_family_Users_Manual_1_.pdf page 4-31 (but the meaning of "not used" is unclear). Old assemblers seem to avoid generating these, even if this means a four-byte instruction instead of a three-byte one (except for AX).
XLAT with a segment override prefix: Manuals for 8086 and 80286 only mention that it uses the DS register. It was first mentioned in the 80386 manual that it accepts segment prefixes.

Does somebody have any information on what these instructions actually do on CPUs where they are undocumented (8086/8088/80286/NEC V20)?

There are plenty of resources about undocumented 8086 opcodes per se (https://www.os2museum.com/wp/undocumented-8086-opcodes-part-i/ surveys some, for example), but under the constraint that those are documented for the 80386? I doubt you’ll find much. — user3840170, Jun 05 '21 at 08:01

score 11 · Answer 1 · answered Jun 04 '21 at 23:46

11

They behave like aliases for the documented instructions. They are present and functionally equivalent. There's also SALC (D6 but not on the NEC), ICEBP (F1), string instructions using ES: override; and AAM and AAD accepting values other than 10, though those last two are fairly obvious.

answered Jun 04 '21 at 23:46

peter ferrie

1,314
2
11
25

7

I suspect that during the design process, it wasn't clear whether it would be easier to have microcode that would treat the operand as an 8-bit value, treat the operand as a 4-bit value, or always perform the operation in a manner suitable for the value 10. Specifying the opcode has having a second-byte value of 0x0A accommodates all of those possibilities. – supercat Jun 05 '21 at 16:25

640KB · Answer 2 · 2022-12-01T15:01:31.723

Let's try it on real hardware...

I ran some of these on a vintage Turbo XT with a real V20, here are the results.

TLDR? Conclusion:

AAM imm8 - works the same as 8088, where second byte is used as divisor.
AAD imm8 - does not work the same as 8088, second byte is not used and 0x0A is always used.
LODSB with segment override - works the same as 8088, segment override for index register can be CS:[SI] or ES:[SI].
STOSB with segment override - does not accept segment overrides (which is the same as 8088).
XLAT with segment override - works the same as 8088, segment override for base address can be CS:[BX] or ES:[BX].
SALC (0xD6) - does not work like 8088. Sets AL to 0xCD with CF and AL to 0x00 with NC.

Test methodology:

First, AAM imm8 (D4 xx):

Works the same as 8088, where second byte is used as divisor.

Correct: 0x55 / 0xA = 8, 0x55 % 0xA = 5

Correct: 0x55 / 0x10 = 5, 0x55 % 0x10 = 5

AAD imm8 (D5 xx):

No matter the second byte, the operation always uses 0xA producing 0x0037 when AL is 0x55.

LODSB with segment override (LODS BYTE PTR CS:[SI] in MASM syntax):

Works the same as 8088, where ES: and CS: are used.

STOSB with segment overrides:

The byte is always written to ES:DI, at 3000:200 in this example, the segment override is ignored. This is consistent with documentation, though wanted to test since there was an earlier comment about string operations and segment overrides.

XLAT with segment override (XLAT ES:[BX] or XLAT CS:[BX] in MASM syntax):

In the above I picked an arbitrary value of 0x200 for the base address (BX) and put unique values for that offset at each segment (DS, ES and CS). So:

DS:0200 contains 01 02
CS:0200 contains 03 04
ES:0200 contains 05 06

Calling ES: XLAT when AL=0 should load the value at ES:0200 which is 05, and it does as expected. If segment override was ignored it would have loaded 01.

Likewise CS: XLAT when AL=1 should load the value at CS:0201 which is 04, and also does as expected. If segment override was ignored it would have loaded 02.

Based on this, segment overrides for XLAT on a V20 do seem to work consistent with the 808x.

Update:

Confirmed on real hardware that SALC does NOT work on V20 (lest there be any doubt).

stosb is documented as not accepting a segment override; all default uses of es in string operations cannot be overridden. (scasb is the same, movsb and cmpsb allow an override but it only affects the ds:si operand, not the es:di one.) — ecm, Jun 06 '22 at 16:20

score 9 · Answer 3 · answered Jun 05 '21 at 16:22

9

In the days of "no unnecessary transistors allowed", it was common to simply let the decoding logic do whatever it naturally did in cases that weren't intended to have defined behavior. The 6502 was particularly notorious. At the same time, designers often wanted the possibility of using those cases for new behaviors in the future. So, it was common to declare those cases "undefined", making the programmer responsible for any future compatibility problems even if the actual behavior was predictable.

answered Jun 05 '21 at 16:22

John Doty

2,344
6
12

5

Whilst that's true, it doesn't address the specific question of what behaviour these opcodes have on the 8086 or 80286. – Toby Speight Jun 07 '21 at 07:21
5

8086 has no illegal-instruction exception; every byte sequence ran as something. But later CPUs, at least starting with 286, do have a #UD exception. But a few special undocumented opcodes like SALC don't trigger that, even in 32-bit mode on modern x86 CPUs. – Peter Cordes Apr 21 '22 at 07:44

score 6 · Answer 4 · answered Jun 04 '21 at 23:47

MAME's 8086/88/186/188/286 emulator here, V20/V30/V33/V33A emulator here, and V30MZ emulator here all support 83/1, 83/4, and 83/6, and they all don't support a segment override for XLAT (the prefix is allowed but ignored). Search for 0x83 and 0xd7 to find the implementations.

The fact that they all agree doesn't necessarily mean they're correct, since they all seem to have been forked from common code at some point. But I suppose that whoever implemented 83/x with no CPU-version test and put an explicit DS in the XLAT implementation probably knew what they were doing.

score 2 · Answer 5 · edited Jun 05 '21 at 14:23

2

These opcodes are "not used" on 8086/8088 according to this manual: [...] but the meaning of "not used" is unclear [...]

Err... what of 'not used' is unclear? For a CPU, its manual is holy scripture, to be taken word for word. If it says something then it must be. Assuming otherwise, even if it can be proven, is blasphemous.

Not each opcode combination possible does make sense. In this case they are simply not assigned to any functionality, because it would not make much sense for the basic 16 bit CPU. They are essentially duplicates.

The block 80..83 must be seen as one encoding case. The the basic opcode is 1000.00sw, marking an ALU operation with

w defining whether it's an 8 or 16 bit operation and
s defining whether a sign extension is to be applied or not

83h is a bit special, as with a 16 bit target and a 16 bit immediate, a sign extension wouldn't make much sense, would it? So having both set (word and sign extension) marks a signed 8 bit immediate to follow instead of a 16 bit one.

w=0; s=0 -> 8 bit target and 8 bit immediate
w=0; s=1 -> 8 bit target and 8 bit immediate
w=1; s=0 -> 16 bit target and 16 bit immediate
w=1; s=1 -> 16 bit target and 8 bit immediate

The ALU operation to be used is specified in the second byte as xxxOOOxx:

000 -> ADD
001 -> OR
010 -> ADC
011 -> SBB
100 -> AND
101 -> SUB
110 -> XOR
111 -> CMP

So, while sign extension does make much sense for arithmetic operation, it isn't as useful for logical operations.

No matter if it's about byte handling (82h, <op> rm8,i8) or word handling (83h, <op> rm16,i16) sign extension would be of no difference to non-sign-extending opcodes (80h/81h). They are simply the same operation.

There are two oddities here:

The whole 82h opcodes are kind of redundant with the 80h range. Only the 8086 manual marks them in that single table as explicit opcodes. Later manuals (186, 286) still list the S/W encoding for arithmetic immediate operations, but no longer list the 82h opcodes (or any unassigned opcode at all). The 386 manual also drops the entire description of the S/W encoding.
Why only the logic operation within 82/83h have been marked as unused in the 8086 manual can only be speculated. It's possible that the structure of the ALU simply didn't allow the use of sign extension and bit operations at the same time.

So, why were they introduced with the 80386?

For space reasons. Regardless of target size selected, 83h always uses an 8 bit immediate. While this is not a really big thing in an 8/16 bit environment, it becomes more useful when target size depends on modal operation - not to mention substantial space savings in 32 bit code, at least as long as operations are fine with only 7 usable bits.

XLAT with a segment override prefix: Manuals for 8086 and 80286 only mention that it uses the DS register. It was first mentioned in the 80386 manual that it accepts segment prefixes.

Oh, that's an easy one. The operand of XLAT is not a memory location, which has a segment assigned, but the register AL. One can't overwrite a non-existent segment. The ability to use a segment is simply an addition introduced with the 80386.

Does somebody have any information on what these instructions actually do on CPUs where they are undocumented (8086/8088/80286/NEC V20)?

Not really sure what that should be good for, as it's simply not defined.

edited Jun 05 '21 at 14:23

Toby Speight

1,611
14
31

answered Jun 05 '21 at 03:28

Raffzahn

222,541
22
631
918

2

That the instructions are simply not defined is the obvious conclusion from what the linked manual says, but I cannot find any evidence that is the case. Every other reference says these instructions work fine, going back to the 8088/86. Do you have an independent source for this claim that these instructions are not defined/unsupported? – Cody Gray - on strike Jun 05 '21 at 07:06
4

@CodyGray If the manual doesn’t define them, they are undefined by, ahem, definition. Whether an opcode is defined or not is entirely a matter of official documentation. Independent sources can report on how undefined opcodes behave in practice, but cannot actually define them. – user3840170 Jun 05 '21 at 07:47
Ah, is what is linked in the question a copy of the official manual? That wasn't clear to me. It certainly isn't a link to an official looking source. I assumed it was a third-party compilation (there were lots of those in the days before the Internet, some of them even professionally done). If the Intel manual says they are undefined, then, yeah, that'd be a pretty good clue, you're right, @user3840170. – Cody Gray - on strike Jun 05 '21 at 08:53
@CodyGray The copyright notice at the bottom says ‘Mnemonics © Intel, 1979’ (asserting copyright claim on the assembly language, meaning the document itself was probably created by Intel as well), and there is a PDF annotation next to it whose contents suggests this is the official manual. – user3840170 Jun 05 '21 at 11:08
@CodyGray It is the regular 8086 manual. See here for the very same page on Achive.org – Raffzahn Jun 05 '21 at 13:31
1

Why would sign extension not be useful for logical operations? If one wants to clear the bottom 2 bits of SI, being able to AND with -4 (sign-extended 0xFC) would seem like it should be more efficient than having to AND with a 16-bit operand 0xFFFC. If you want to say Intel didn't consider that such operations would be useful, that may be true, but situations where they would be useful aren't particularly obscure. – supercat Jun 07 '21 at 20:12
SALC hangs on the V20. I don't know about others. – peter ferrie Jun 11 '21 at 20:12
Some undefined operations proved to be so crucial that they saved a CPU from being absolute garbage (286 LOADALL) – DarkAtom Jul 01 '21 at 10:04
@supercat: It seems this answer is trying to explain why the 82h range of opcodes exist at all, and is saying that Intel chose to undefine them. They're fully redundant with 80h, both op r/m8, imm8 with nowhere to sign-extend into. (The 83h range with op r/m16, imm8 are useful as you say. This answer incorrectly describes 83h as op rm16, imm16, disagreeing with http://ref.x86asm.net/coder32.html#x83 and Intel's current manuals https://www.felixcloutier.com/x86/add). 82h is invalid in 64-bit mode (#UD on SKL), but is valid in 32-bit mode, undoc but decoded by CPU and objdump – Peter Cordes Apr 21 '22 at 08:22
@supercat: Oh, I see this answer is also saying the 8086 (and 286?) didn't define and r/m16, sign_extended_imm8. Weird. Yeah, clearly useful, e.g. for and bx, -2 to align a pointer or something, in 3 bytes instead of 4. Also with positive operands if you care about the FLAGS results, e.g. or bx, 1 makes it odd, and also sets SF according to the MSB. Or or si, 1 doesn't have an or sil, 1 alternative so there's a code-size cost to do 16-bit operations at all on regs that don't have a name for their low8 – Peter Cordes Apr 21 '22 at 08:28
1

Later documenting an instruction or feature that older CPUs actually did support is something vendors can do. I guess with 386, Intel retroactively documented the fact that 83 f3 07 runs as xor bx, 7 in 16-bit mode on existing CPUs. They've done somewhat similar things more recently, e.g. documenting that the AVX feature flag implies 16-byte aligned loads/stores are atomic. And documenting that rep nop runs as a nop on older CPUs (instead of pause), that rep bsf runs as bsf on CPUs that don't decode it as tzcnt, etc. (Before then, rep bsf was undefined, not future-proof) – Peter Cordes Apr 21 '22 at 08:36
1

@PeterCordes: I think a more interesting example of retroactive documentation would be with the AAD and AAM instructions. While this is just conjecture, I suspect that the people designing the instruction set didn't know whether it would take more microcode to implement base-ten operations in a manner that didn't involve the constant 0x0A, or to implement operations with a base specified in the machine code, so the opcodes were specified as having an operand which the CPU may or may not use. As for the concept in general, I wish the designers of the 6502 had gone back and... – supercat Apr 21 '22 at 15:16
...retroactively documented a few of the more interesting instruction bit patterns. There are some situations, for example, where code may need to load many values and mask them with the same value. If the output address is either in zero page, or is an I/O device where values to be processed sequentially will be sent to the same address, patterns like "LDA (zp),y / SAX output" can save three cycles per byte for ROM code versus having to use "AND zp" on each value. And the DCP opcode is simply wonderful for many tasks, even though some addressing modes... – supercat Apr 21 '22 at 15:21
@supercat: I'm only vaguely familiar with 6502 so I can follow the words, but I don't know those specific instructions. I'll take your word for it, though. – Peter Cordes Apr 21 '22 at 15:23
@PeterCordes: If one bitwise-ors the opcodes for STA and STX, the resulting opcode will store the bitwise-AND of the values in the A and X registers, since the only difference between STA and STX is whether the A or X register is gated onto the pre-charged bus. If one combines the opcodes, each bit in the bus will get cleared if the corresponding bit was clear in either register. DCP combines the opcodes for DEC and CMP, and (IMHO surprisingly, but reliably on NMOS parts) will behave as though it performs a DEC, and then performs a CMP on the result. Interestingly, while CMP has eight... – supercat Apr 21 '22 at 15:27
...addressing mode and DEC only four, seven of the addressing modes that work with CMP also work with the combined opcode. – supercat Apr 21 '22 at 15:27
BTW, do you like the observation about AAD/AAM? It's easy to imagine how something like AAD could be processed fairly efficiently by computing AL += (AH << 1); AL += (AH << 3) without needing to have the bit pattern for "ten" anywhere, but that it might be more efficient to use repurposed multiplication logic instead. – supercat Apr 21 '22 at 15:35
@supercat: Yes, AAD/AAM being originally documented as opaque 2-byte opcodes to allow alternative implementation choices, instead of as the immediate multiply / divide they now are, makes a ton of sense. And is a pretty solid example for this question. Already upvoted Peter Ferrie's answer that included that example, and your earlier comment. – Peter Cordes Apr 21 '22 at 15:55

Undocumented instructions in x86 CPU prior to 80386?

5 Answers5

Let's try it on real hardware...

TLDR? Conclusion:

Test methodology:

Update:

Linked