Why are PDP-7-style microprogrammed instructions out of vogue?

Question

DEC, and at least some of their computers, especially those in the 18-bit family and 12-bit family, had these opr instructions, which contained many bitfields which encoded something like "subinstructions". Things like

clear the accumulator
increment the accumulator
rotate the accumulator one place leftward
complement the accumulator
skip if the accumulator is zero

The nature of these simple operations is such that it's convenient to encode each one in some bit or bitfield in the instruction word, and to have the computer execute each one in a statically scheduled manner. My understanding is that's because they are often used together¹, and have simple encodings.

A later computer like the Z80 or ARM7 needs to fetch, decode and execute a separate instruction to perform each of these operations, which might not be as space or time efficient.

From what I can tell, using DEC-style microcoded instructions to perform any number of simple operations of a single register, has fallen out of vogue, or are at least not nearly as common on modern instruction set architectures. Why is this?

1: Not only to load small integers into the accumulator, as in cla cll cml inc rtl to set the accumulator to 6 on the PDP-8, but also for examining or manipulating bitfields, probably long division, etc.

What you're describing are VLIW architectures - except the question you have is quite unclear - adding 'Why this' to a description isn't exactly a question. — Raffzahn, Apr 12 '19 at 14:19
@Raffzahn I think I've identified a trend; I am asking if it's there, and if so, what's motivated it. My understanding of VLIW is that the operations are dyadic, or have variable transitivities, but on the PDP 7 et al., the operations were all strictly monadic. — Omar and Lorraine, Apr 12 '19 at 14:26
This is a little off topic. The DEC PDP-6 had 16 variations on the Boolean operations. It used four bits out of the opcode field to specify a truth table for the corresponding Boolean operation. Thus it was able to implement 16 operations with about the same logic that it would have taken to implement just one. — Walter Mitty, Apr 12 '19 at 14:36
@Wilson VLIW is not intrinsic tied to any kind of operation. The basic idea is that there is no (general) decoding, but each function unit that can be initiated separate will get it's own mark in the instruction field. THus the decoder stage can be removed - or at least quite simplified. — Raffzahn, Apr 12 '19 at 14:48
To my mind, orthogonality of instruction set (a la VAX and PDP-11) beats mix'n'match OPR instructions. I want to be able to 'clear' any valid operand, so having a bit somewhere in another instruction that allows me to clear just a particular register is more valuable. It's not clear (heh heh) to me that you can do the "microcoding instruction bits" thing at the same time as supporting generalized operands. — dave, Apr 12 '19 at 21:44
@WalterMitty Are you able to share any details about that? If it's on the PDP-6, then it would be on the PDP-10, and I'm not sure I remember anything like that from the PDP-10. — Omar and Lorraine, Apr 13 '19 at 11:58
Yes these opcodes are in the PDP-10 as well. Open the opcode list and take a close look at opcodes 400-477. If you convert the opcodes from octal to binary, you will find four bits that provide a truth table for the operation in question. SETZ has all four of these bits set to zero, and SETO has all four set to one. AND has three zeroes and a one. — Walter Mitty, Apr 13 '19 at 13:36
As of Z80 it is not hard to guess why it uses a different idea: 8 bit data bus. You don't want to fetch extra data where not necessary. ARM7 might be more strange, unless it is made backward compatible. — UncleBod, Apr 14 '19 at 12:54
The instruction bits that I am talking about in the pdp-6 or pdp-10 are bits 3,4,5 and 6. Bit 6 is the high order bit in the third octal digit of the opcode. Bits 7 and 8 contain the mode: -,I,M, or S. Bits 0,1, and 2 have the octal value 4 for all 64 of the instructions in question. — Walter Mitty, Apr 15 '19 at 15:34
Your example instruction with RTL (Rotate Two Left) produces 6; for 3 use RAL (Rotate AC (and Link) (One) Left). :-) — dave_thompson_085, May 21 '19 at 00:33

Raffzahn · Accepted Answer · 2019-04-15T11:49:03.093

[...] had these opr instructions, which contained many bitfields which encoded something like "subinstructions"[...]

What you describe is basically a (V)LIW instruction format - at least that's what it might be called today. That's what computers started out with. Separate bits for each function to be applied to the value addressed.

The DEC is somewhat of a bad example here, as its accumulator instructions are a special kind, already a bastard between clean all over LIW and dedicated encoding. The LIW aspect is used only for this accumulator subset.

Zuse's machines, like the Z22, might make a better example with their ability to have each and every instruction carry multiple operations.

A later computer like the Z80 or ARM7 needs to fetch, decode and execute a separate instruction to perform each of these operations,

Yes - and no. For one, not all possible combinations could be used together, resulting in illegal instructions. In fact, depending on the machine's construction, most of these combinations were illegal. And that's why dedicated instructions took over. Let's assume, there are like 8 different operational units in the data path. Having one bit for each in the instruction word makes easy decoding, as each would just be wired up with the enable for a single function, resulting in a fast and simple machine structure.

Of these 256 combinations (of which one would be a nop), many would not make sense - think shifting left and shifting right, or adding and subtracting at the same time. By encoding only the 20 useful combinations into a 5 bit field, 3 bits (almost half) could be freed - at the cost of an additional decoding stage.

Now, back in the old times, when machines were word-orientated (e.g. 36 bits in one word), there was much space - even resulting in unused bits. No need to add a decoding stage. Even worse, doing so would slow down the execution. Well, only a bit, but it would.

The situation changed when machines became byte-orientated and variable length instruction formats were used. Here cramping down the 8 unit lines into a single encoded 5-bit field enabled it to squeeze into a byte while leaving room for more (like a register number), without the need to fetch two bytes. Heck, it even leaves 12x8 instruction points for other encodings/irregular instructions without needing more.

which might not be as space or time efficient.

That's partially true for the time efficiency, but not space - space-wise it's an extreme saving enabling more compact code. The inner workings are (can be) still (mostly) the same, but less visible. Instead of setting a shift and an add bit, there's now a Add-And-Shift instruction.

Then again, by now encoding it into a single byte instead of a full 36 bit word, the CPU can fetch the instructions at the same speed (byte bus vs. word bus) or even 4 times the speed (word sized bus) than before. So with memory always being the slowest part, tighter encoding does not only save space, but also speeds up execution - despite the additional decoding stage.

From what I can tell, [this] has fallen out of vogue, or are at least not nearly as common on modern instruction set architectures.

Not nearly as common on the surface is maybe the point here. For one, explicit VLIW instructions are still a thing (think Itanium), but more importantly, they are always an option for internal workings of modern CPUs. Where 'traditional' code gets first decoded into sub-operations, and these later get either combined to LIW instructions again, or scheduled in parallel over different function units.

In fact, the mentioned ARM makes another good point for it to vanish. ARM had traditionally the ability to have every instruction being executed conditionally (much like Zuse did first). Cool when thinking in sequential execution, but a gigantic hurdle when it comes to modern CPUs with the ability to reorder instructions according to available data and function units. It makes rescheduling not just a hard task, but almost impossible. Even worse, ARM featured DEC-like condition handling, where each and every load did change the flags.

Bottom line: Just because something isn't (always) visible to the user-side programmer, doesn't mean it isn't there.

The Transmeta CPUs were a somewhat recent example of CPUs that used a (proprietary) VLIW instruction set internally, and another completely different one (namely x86) externally. In the Itanium, the VLIW bundles have explicit parallelism semantics (Intel calls this Explicit Parallel Instruction Computing (EPIC)), i.e. one VLIW bundle is 2 64 bit words with 3 41 bit instructions and a 5 bit "template" that tells the CPU what kinds of instructions the three instructions are and what the data dependencies are. — Jörg W Mittag, Apr 13 '19 at 07:06
@Wilson Ouch. Ok, to start with, Z4 is an original Zuse (the man himself) design, while the Z22 was imagined by Theodor Fromme (call it design lead) with much help from Heinz Zemanek and Rudolf Bodo both designed the Mailüfterl and made the schematics for the Z22. The idea was to design the tube based Z22 in a way that it may be transistorized later. Which happend with the Z23. Which is a quite remarkable planing at that time. ... more to follow — Raffzahn, Apr 13 '19 at 13:52
@Wilson you asked for it: how-were Zuse Z22 Instructions Encoded? ... wasted another perfect good day - even included some German for you to test your knowledge :)) — Raffzahn, Apr 14 '19 at 00:10
You've got some terrific information there! I can see that this kind of instruction word easily encodes many useless combinations. Much worse than the PDP 10 for example. — Omar and Lorraine, Apr 14 '19 at 07:44
think shifting left and shifting right – easy to fix, make it a 1-bit shift right and a two-bit shift left → all combinations are useful. Same for subtraction, it can be encoded as operand inversion, and let's have another operation "increment" and we again get four useful combinations. — Janka, May 21 '19 at 11:58
@Janka what about, say, a processor with hardware divide and multiply? — Tommy, May 21 '19 at 16:34
@Tommy: The opcode would need a repeat option. Conditional PC increment, +1, +2, +3 that would be useful in a lot of cases. — Janka, May 21 '19 at 19:57

score 18 · Answer 2 · answered Apr 12 '19 at 14:41

18

The PDP-7 was a one address machine. All instructions occupied 18 bits. The operations that manipulated the accumulator didn't reference memory, and therefore didn't need an address. But the address bits were in the instruction anyway, because all instructions were encoded in an 18 bit word. So why not use these unused bits to get more use out of the instruction bits?

Once you get to opcodes with a variable number of operand addresses, the need to economize in this way goes away.

answered Apr 12 '19 at 14:41

Walter Mitty

6,128
17
36

7

To add to this, the PDP-7 is from an era when it was common for the width of the address bus to be less than the width of the data bus. In this case, you could fit a full 13-bit address into an 18-bit instruction word, which meant that you could pack an entire instruction (including the operand address) into a single word. Compare this to a CPU like the 6502 with 8-bit words and 16-bit addresses: if you can't fit an address into an instruction word then naturally they must come in extra bytes that follow the opcode byte. (continued) – Ken Gober Apr 13 '19 at 14:56
5

The flip side of being able to fit the address into the instruction word was that you wasted a lot of bits for instructions that did not need an operand address or jump address. So the PDP-7 style sub-instructions were essentially a way to use unused bits in the instruction word to encode additional instructions, allowing many more instructions to be added without the cost of widening the word size, the only caveat being that the extra instructions had to be ones that didn't need to include an address. – Ken Gober Apr 13 '19 at 15:00
3

@KenGober, I think you and I are saying the same thing, in different words. Thanks for adding a little clarity. – Walter Mitty Apr 13 '19 at 19:36
Worth mentioning that the HP minis had very similar stuffed instructions. – Maury Markowitz May 22 '19 at 19:16
The designers of the DEC machines before the PDP-11 got this from the MIT Lincoln Labs TX-0, designed by Wesley Clark. It had 2 opcode bits: store AC, add to AC, cond jump, and operate on the AC (the "microcoded" instruction). [When the machine moved from MIT LL to the MIT campus, the memory was replaced with a smaller one, so fewer address bits were needed, so the opcode size was increased, and more instructions were added.] Wes used a very similar 2-bit opcode instruction set in a later 10-bit machine at LL, the L-10. http://www.bitsavers.org/pdf/mit/tx-0/MIT_TX-0_InstructionSet.txt – Aron Insinga May 22 '23 at 15:30

score 3 · Answer 3 · edited Mar 11 '22 at 16:11

Another good example of this basic architecture is the HP2100 series, which had a series of bit-field instructions that performed things like test-and-branch. They could combine up to eight instructions in some cases. This basic idea was relatively common in the few minicomputers I've looked at.

The reason they fell out of favor was that they require expansion of the microcode engine to store more state, and thus they make simple pipelining more difficult.

Consider the 6502; it ran at roughly 2x the speed of other CPUs of the era at any given clock rate because it was always fetching the next instruction while the last was completing. This was simple to implement even though the instructions took anywhere from 2 to 7 cycles to complete.

But imagine if those instructions are, effectively, macros. In that case you have to decode and the feed the individual sub-instructions into the pipeline. Not impossible, but certainly complex. So the question ultimately comes down to code density - the 6502 was designed as a microcontroller, so density was not super-important while gate count was. In a mini, which cost the equivalent of $100,000 but may have still only had 4k of memory, things look very different.

I'm not sure I would say these are VLIW. As I understand it, VLIW extracts parallelism by encoding different instructions for different functional units into a single instruction. This is not the same; these multi-instructions loops over the same functional units. It seems more in keeping with something like the 68k's microprogram, no?

score 1 · Answer 4 · answered Jan 15 '20 at 16:26

Modern CPU architectures also need to facilitate things like memory protection, so they need to implement restrictions on what machine code can and cannot effect depending on context. You do not want normal application code to be able to mess with other code in a multiuser/multitasking environment, and you certainly do not want it to be able to crash the machine wholesale.

An architecture allowing a near arbitrary number of instructions with unintuitive effects, maybe even side effects of "illegal" instructions that arise from causing hardware-level conflicts, or even worse creating odd conditions that will cause unexpected behaviour of instructions further down the line, will make this very hard.

In a security context, even something that has the side effect of probabilistically or deterministically defining the behaviour of an undefined action can be bad news - this can facilitate unwanted monitoring of or communication with other running code....

Could you say exactly how this kind of instruction word gives rise to the security risks you mean? — Omar and Lorraine, Jan 16 '20 at 06:58
Think of spectre/meltdown .... and how much in that direction you could do in a CPU that lets you mess with its component parts unchecked! — rackandboneman, Jan 16 '20 at 18:41

Why are PDP-7-style microprogrammed instructions out of vogue?

4 Answers4

Linked