How was microcode implemented in retro processors?

Question

How was microcode implemented in retro processors such as the Z80 or 8080?

Was the microcode standard (for example a manual for the processor outlining all possible micro-instructions and the standard combinations to implement documented instructions)? If so, what are some keywords to use to find these manuals online?

Or was the microcode custom written by each manufacturer of the processor (such that for example an ADD instruction might be slightly different between manufacturers)?

Could the microcode be read and changed by a user?

I'za ignorant hack, but pretty sure the Z80 and 8080 weren't microcoded at all. Just gates, gates, and more gates. — Will Hartung, Jun 05 '18 at 18:42
Do you know any retro processors that did? (Retro because easier to understand their architecture). — Jet Blue, Jun 05 '18 at 18:48
The only microprocessors I know of were the AMD 29000 series. We used them for body scanners. The words could be as big as you wanted: we had a 192 bit instruction set. The development kit was really flaky. The 8" floppy disk was more reliable than the memory. — cup, Jun 05 '18 at 22:21
Interestingly, Babbage's Analytical Engine was microcoded. The machine read instructions from punch cards which selected which microcode to execute. The microcodes were encoded as pegs on a rotating drum (like a music box) — slebetman, Jun 06 '18 at 01:14
Back then, if there were errata you didn't upload new microcode: you coded around the bugs! :-) — ErikF, Jun 06 '18 at 01:53
I don't think "retro" is the right term for a 8080 or a Z80. It implies being made to a pattern before its time, whereas these two were (at least) of their time. I'd suggest 'early' as a better replacement for that word. — Toby Speight, Jun 06 '18 at 10:00
For a very instructive video series on how early microprocessors can work, see: https://eater.net/8bit/ — user3486184, Jun 21 '18 at 23:53
Microcode is part of the processor design, same as the actual logic gates are. The answer will be similar to - can two different manufacturers of the same processor tweak a gate here and there? — user253751, Aug 03 '18 at 00:33

score 42 · Accepted Answer · edited Jun 18 '20 at 08:29

How was microcode implemented in retro processors such as the Z80 or 8080?

None of these chips (likewise 6800 and 6502) use microcode the same way as it's used today. The decoding isn't as strictly separated from execution logic.

Example 1: 6502

The 6502, for example, has a 'rather' simple structure built from a timing circuit counting instruction cycle and an instruction register, followed by a decoder PLA where instruction plus timing information is transformed into control signals which are fed into the execution units.

The cycle counter starts at 0 and shifts (!) through the maximum 7 states. When an instruction ends, it gets reset to zero for the next one. The 6502 PLA is essentially a one-dimensional decoder transforming the combined instruction plus state into one or more control signals for the execution units. Since the PLA allows partial decoding, one entry can fire on different instructions. For example, all instructions loading the second byte as immediate share one single PLA entry (microcode line). In comparison with textbook microcode engines, this is equivalent to a kind of compressed microcode.

Example 2: Z80

The Z80, in contrast, is based around a more complicated structure. Here the instruction is sent through a (rather) simple decoder. The decoder is built from a PLA. Simple because 8080/Z80 code offer fewer variants (i.e. addressing modes) and is quite regular compared with a 6502. The decoder creates multiple lines for each instruction or instruction group, which are fed into the second stage.

Two counters are set up in parallel. One counting the Memory (or Machine) states (M1..M5), the other Time states (T1..T5) within (*1). With a new instruction the M counter gets reset to 1, with each M state the T counter gets reset.

The second stage of instruction decode is something that is much like another PLA, but built out of single gates (*2). The gates are set up much like a grid with the timing signals (M1..M5 & T1..T5) as horizontal input and the PLA output (instructions/instruction groups) as vertical input. The gate is built along the vertical line and combining its signal with one or more timing signals using AND/NOR logic (*3). The output of these gates again are fed into the discrete logic of the execution units.

There are other differences between these CPUs that can be explained due to the different structure, but less relevant here.

Long Story Short

While both machines are kind of microprogrammed, only the 6502 can somewhat be compared to what today is told about microprogramming.

Was the microcode standard (for example a manual for the processor outlining all possible micro-instructions and the standard combinations to implement documented instructions)? If so, what are some keywords to use to find these manuals online?

Simply no.

Or was the microcode custom written by each manufacturer of the processor (such that for example an ADD instruction might be slightly different between manufacturers)?

Together with the last sentence this sounds as if you believe that there is something like a standard microcode machine and that CPUs are just different microcode files for such.

Sorry to disappoint you, but there ain't no such thing. CPUs are individual designs and microprogram engines are special for each. That's true not only for early microprocessors like above, but even more so today. Creating these mechanics are the core device to enable certain features and/or performance.

Even with a much more 'standard' CPU like an ARM1, the instruction sequencer is a rather special circuit with only 4 states that get not just counted but also repeated or skipped according to a state machine supplied by the microinstructions. Nothing like your textbook micro instruction counter. The microcode ROM itself contains just 42 (?) instruction and are intervened in a rather unique way.

Could the microcode be read and changed by a user?

No. There is not only no use case, but more important, it would bust the transistor count many-fold, not only for early chips.

Cost of Loadable Microcode

Lets take the more simple 6502 as example. Its PLA consists of 130x21 possible connections. That's 2730 points that each need (when set) a transistor (function). In reality, not all are populated, but for a loadable microcode each of these intersections need not only to have a transistor, but several: there must be, in addition to the one doing the connection, a static RAM cell for its value, made of 6 transistors. Additionally, address decoders and write lines are needed. Calculating the equivalent of 10 transistors per bit could therefore be a good assumption.

That's 27,300 transistors. That's about the same as a whole 8086 CPU. Considering that the 6502 has a total of less than 4000 transistor functions that's a total overkill and way outside everything possible back then.

But why do we have now loadable microcode?

With today's incomparably huge CPUs, the relation is different. A few thousand, even ten or hundred thousand additional transistor function aren't a big deal, but more than compensated by the ability for late time update.

Fine, but 1970s mainframes also had loadable microcode?

Yes, here again the ability for updates did justify the additional cost (*4) - just this time it was about the ability to sell new instructions / upgrades to existing customers.

Okay, But What About Second Source and Derivatives

I was thinking more along the lines of second sources and derivatives such as these ones for the Z80. That is whether the microinstructions would be the same between them (some standard Z80 microinstruction set). But it sounds like the answer is no.

(This additional question has been taken from the comments - see below)

Yes ... err .. no ... wait, lets split that.

Second Source

A second source usually is meant to produce exactly the same chip, so customers have a fall back in case the primary source had trouble to deliver (or wants a better price :). Many companies wouldn't even think to buy chips that weren't available from a second source to avoid any lock in. Especially in the beginning, chip manufacturers where eager to get a second source for their designs to attract large customers.

Second sourcing was usually done by exchanging not only licences and schematics, but also all information down to masks. The resulting chips where supposed to be identical in workings, down to gate level. Only differences below, usually due to different manufacturing processes were (somewhat) acceptable. The intention was that second source chips can be ordered and used interchangeably with original ones.

Bottom line: Yes, as it's exactly the same CPU.

Derivates

Derivatives are a different issue, these are chips made to work like the original but extend it, either electrically (like lower voltage) (*5), or with additional functionality. When not changing much (and done by a licensee) these extensions are often done by little additions without changing the basic structure. On the other end, chips with large extensions or such 'just' being binary compatible may have not only a different microcode, but also greatly different internal workings. So no, rather not the same micro instructions, not even the same format.

*1 - this internal structure is also the reason for this otherwise unusual notation used throughout all Z80 documentation.

*2 - The structure is quite similar to a gate array - except that only needed inputs/functions are cast into silicon.

*3 - CMOS offers the feature to build rather complex gates with multiple inputs in a simple way - to some extend this is maybe the most unique feature about the Z80 design.

*4 - Well, for very early technology also loadable microcode was a way to speed up operation, but that's again a different story :)

*5 - Here exists a grey area between simple second sourcing and extending.

Great answer. I understand PLA to mean "Programmable Logic Array," but the microcode wasn't actually programmable, was it? Do you mean "something analogous to a PLA, but not programmable?" — Wayne Conrad, Jun 06 '18 at 02:05
@WayneConrad Well, I guess that depends on your value of programmable :)) It is not field programmable or reprogrammable, but mask programmable. To change it's content only one lacer has to be edited. Much the same way a ROM is done. — Raffzahn, Jun 06 '18 at 08:04
I thought IBM used the term "microcode" to really mean firmware, for their mainframes. — LawrenceC, Jun 06 '18 at 12:23
@LawrenceC Most /360ff where microprogrammed. Models often just differed by microcode, and 'hardware'-upgrades where done by changing the boot floppy. Sometimes a few additional wires had to be added (wraped) to enable otherwise already existing hardware. — Raffzahn, Jun 06 '18 at 12:37
I'm pretty sure, but cannot prove, that most 8-bit micros used mask-programmed ROM for their microcode, not any kind of PLA. — Wayne Conrad, Jun 06 '18 at 13:20
@WayneConrad a) jous assuming somethign I didn't state, but more important b) mind to explain the difference between a PLA and a mask programmed ROM? :)) — Raffzahn, Jun 06 '18 at 15:22
@Raffzahn With regards to "sounds as if you believe that there is something like a standard microcode machines and CPUs are just different microcode files for such", I was thinking more along the lines of second sources and derivatives such as these ones for the Z80. That is whether the microinstructions would be the same between them (some standard Z80 microinstruction set). But it sounds like the answer is no. — Jet Blue, Jun 06 '18 at 15:54
For the difference between a PLA and a ROM: https://electronics.stackexchange.com/q/5825/3651 . The chief distinction I am thinking of here is that a PLA has fuses that are blown to program the device. A mask programmed ROM has no fuses. — Wayne Conrad, Jun 06 '18 at 16:03
@JetBlue Oh, I see. No, second sourcing was usualy done by exchanging not only licences and schematics, but also all information down to masks. A second source usually is ment to produce exactly the same chip, so customers have a fall back in case the primary source got troubl to deliver.. I'll add this to the Answer. Ok? — Raffzahn, Jun 06 '18 at 17:18
@WayneConrad: The term "PLA" is also used for mask-programmed devices that have a grid of selectively-connected transistors and (for NMOS devices) allow each output to go high only when all of the associated inputs are low. Input signals may sometimes have two rows associated with them, one of which would receive an inverted form of the other, but what's important is that the circuit is laid out as a grid. — supercat, Jun 06 '18 at 18:55
@supercat That makes sense. I learned something today. Thanks! — Wayne Conrad, Jun 06 '18 at 18:58
@WayneConrad: I think the term goes back to days when a PLA would be a circuit board with a grid of locations which would be "programmed" by soldering in transistors or diodes where required. — supercat, Jun 06 '18 at 19:03
"The cycle counter starts at 0 and shifts (!) through the maximum 7 states. " There are 8 states though. There is at least one undocumented instruction that takes as many as eight cycles. — Omar and Lorraine, Jun 13 '18 at 09:39
@Wilson: I read somewhere else that read-modify-write instructions use some other state logic. Once the main state machine reaches the point where it has a known-good memory address, the next transition to a different state will be to the "fetch next instruction" state, even though depending upon the instruction the CPU may spend zero, one, or three cycles (or arbitrarily longer if READY is low) in the state where it has a known good memory address. — supercat, Jun 20 '18 at 20:09
@Wilson: Even if one only looks at trying to support DEC ZP, DEC ZP,X, DEC ABS, and DEC ABS,X, the logic required to map a cycle number to ALU inputs would end up with a lot of duplication. Cycle #4 of DEC ZP, for example, needs to compute (fetchedValue-1) but cycle #4 of DEC ABS,X needs to compute (fetchedValue+1) for use in case a page crossing occurred. Easier to have separate logic which sits in "memory computation complete" mode for 3 cycles with the same ALU inputs each cycle, and writes the ALU output to memory on the third of those cycles. — supercat, Jun 20 '18 at 20:23
@Wilson: Thinking about it, I suspect the "kill" opcodes stall the processor because one bit pattern is used to decide whether to wait in the "memory address complete" state until the "time in read-modify-write-with-memory-complete" state counter indicates the third state, but a different bit pattern is used to trigger progression through the latter state. — supercat, Jun 20 '18 at 20:26

Omar and Lorraine · Answer 2 · 2018-06-06T06:15:57.323

There are retro computers that have readable and writeable microcode, but not the ones you mentioned in your question. And the ability to change the microcode was extremely rare in the kinds of CPUs you are thinking about.

I'll describe the way the microcode worked in the 6502, or the Decode ROM as it's usually called there (same concept, different name). It's not going to be an accurate description of 6502 internals since that's not what you're asking about, but just to give you an idea of how it could work. When the 6502 reads an instruction, it's kept in a register. There is also a three-bit counter which counts from 0 to 7, increasing by one with each clock cycle, and resetting each time the 6502 starts a new instruction. That register and the counter are both inputs to a huge ROM. The outputs from that ROM are signals like

"put the accumulator on that bus"
"get the ALU to add two numbers"
"get the output from the ALU and save it in X"
"fetch the next instruction and reset the counter"

Any number of those signals might fire each time the counter increments. In the case of a 6502, these outputs mostly go straight to the random control logic, which takes these signals, does some random stuff I don't understand, and apparently controls various other parts of the chip. If you are interested in how this might work, I'll refer you to the Visual6502 project. It's a reimplementation of every single transistor of the 6502.

Or was the microcode custom written by each manufacturer of the processor (such that for example an ADD instruction might be slightly different between manufacturers)?

No the microcode is part of the design. If the design was licensed and produced by someone else, then the microcode will usually be identical. Of course, the design might be extended to include new instructions; then the microcode will be added to allow that to happen. It's also happened that some chips have been microcoded to emulate some other completely different design.

As for the PDP-11/60, certain configurations had writeable microcode. The microcode listings are only a quick google away. And as for the PDP-10, some had a writeable control store (control store is yet another word for the same thing)

The Prime 300 I used in uni had a microcode capability. Took forever to compile. — cup, Jun 05 '18 at 22:23
LSI-11/03 (or in a box PDP-11/03) also had an option for Writable Control Store, using the socket designed for the optional EIS/FIS microcode chip but instead wiring it to a second board (also quad height) full of RAM. This is similarly on bitsavers under 1103, EK-KUV11-TM. — dave_thompson_085, Jun 26 '18 at 08:03

Brian H · Answer 3 · 2018-06-05T19:42:24.970

10

The canonical examples for early microcoded CISC microprocessors are probably the Intel 8086 and the Motorola 68000. Of the two, the 8086 is the simpler (29K transistors vs. 70K in the 68000). But both of these CISC processors will present challenges in terms of their complexity and in terms of finding resources about the proprietary internal microcodes.

Somewhat surprisingly, the RISC ARM1 microprocessor made use of microcode. Such approach was normally eschewed for a RISC CPU, but it was used in the ARM1 similar to how it was used in the MOS 6502. So, like the 6502 case, it is a limited use and thus more approachable as a learning aid. And, there is this article that looks like a pretty good stepping off point.

Excerpt from linked article:

The ARM1's microcode is an order of magnitude smaller than other microcoded processors. The ARM1's microcode has a 42×36 microcode, for 1512 bits in total. The 8086 used a 504×21 microcode (over 10,000 bits) while the 68000 has a 544×17 microcode and 366×68 nanocode (over 34,000 bits).

edited Jun 05 '18 at 19:42

answered Jun 05 '18 at 19:28

Brian H

60,767
20
200
362

1

For gory details, the patent https://patents.google.com/patent/US4325121A describes the MC68000 microcode. – Grabul Jun 06 '18 at 00:13
ARM is not as RISCy as MIPS or SPARC. Modern ARM32 is really not a RISC, although it is a load/store machine with fixed-width instructions. The program-counter is exposed as an architectural register (which you can write to with any instruction to branch). It even has load and store-multiple instructions that take a bitmap of which registers to store, and complex addressing modes including post-increment and pre-decrement (writing back the AGU result to the register). push {r4,r5,r6,lr} is an alias for stmdb sp!, {r4,r5,r6,lr} http://www.davespace.co.uk/arm/introduction-to-arm/stack.html – Peter Cordes Jun 06 '18 at 05:00
2

Anyway, push and pop are microcoded even on modern ARM implementations, because they store a variable number of words depending on how many bits are set in the bitmap. And you can pop into pc to return along with restoring the rest of the registers you pushed, by using pc where you pushed lr. In MIPS you need to load a saved LR back into a register an jr lr to return. ARM is a pretty nice machine-code format that can get a lot of work done per instruction byte, like x86, while still being relatively easy to decode (unlike x86). – Peter Cordes Jun 06 '18 at 05:04
@PeterCordes: Are they really "microcoded", or is there simply a set of flip flops that a bit for each register indicating whether it needs to be read or written, along with some flip flops indicating the address, whether it needs to be read or written, whether it should increment or decrement, and whether it needs to also update the CPU register, and hardware that ensures that whenever registers remain to be written that operation takes precedence over the next instruction fetch? – supercat Jun 07 '18 at 15:09
@supercat: I've read that it's microcoded, but I don't remember if that was a reliable source. On a modern ARM with OoO exec + register renaming, a dedicated hardware mechanism like that seems very unlikely. It might be plausible on a modern in-order ARM. – Peter Cordes Jun 07 '18 at 20:07
@PeterCordes: To my mind, describing something as "microcoded" would suggest that the control store is an addressable memory (typically ROM) of non-trivial size, with a non-trivial portion of the address being sequenced rather than produced directly from external stimuli. According to another post on this page, the Intel 8086 microcode ROM was something like 504 words of 21 bits each. An ARM might use microcode to support things like the debug interface, but for the main instruction set the number of states for all instructions would seem rather limited. – supercat Jun 07 '18 at 21:02
@supercat: I expect it's something like modern x86: most instructions decode directly, but a few complex instructions like push / pop indirect to microcode ROM. An ARM would have less microcode than x86, but a few things probably use it. (modern x86 decodes some "simple" instructions directly to more than 1 uop, but ARM is a load/store machine so probably doesn't need to do that. That's unrelated to the point I'm trying to make about indirection to a microcode ROM or not.) – Peter Cordes Jun 07 '18 at 21:07

score 10 · Answer 4 · answered Jun 06 '18 at 18:08

I was in college when the 8086 and 6502 came out, and took a digital electronics course at that time in which I designed a simple CPU using microprogramming techniques. From studying the 8086 and 6502, I could imagine that they were microcoded, but there was no way to tell for sure from the outside. Microcode, if it existed, was all burned into the chip and was not available to the outside.

I did a lot of programming on the PDP11. There were lots of different models of the PDP11. Most were implemented with microcode, each model had a different micro architecture, but they all implemented the same PDP-11 instruction set.

I read an article back in the day about a project at (I think) Carnegie Mellon where they took a PDP11/45, removed the microcode (which was in ROM) and replaced it with RAM into which they could download custom microcode. I thought that was incredibly cool at the time, and wished I could work on that project. I ended up going to Cornell for grad school, where the CS department had bought a couple of PDP11/60s, which we used to run Unix. The 11/60 had user-programmable microcode. I did a project where I added a couple of instructions to the machine, but we had to work on the machines during off hours because we couldn't run Unix at the same time we were hacking microcode. (I don't remember whether this was just departmental policy or if there was a technical reason.) Anyway, after the department got some Vaxes, the PDP-11s weren't used much, and someone converted them into terminal servers, and someone else took the inner loop of the terminal server code and converted it into microcode, so these PDP-11s ended up being able to handle ten times the number of terminals than they could before.

Some years later, I read about a marketing project that DEC had done to analyze the need for custom microcode. The PDP-11/60 sold well, and the number one reason people gave for buying it (instead of a different model of PDP-11) was because of the user-writable microcode. However, only 5% of the PDP-11/60s had the feature used! Cornell was in the 5%.

Now, PDP-11s were not microprocessors, at least not the ones I am talking about. But this should give a sense of what the world was like back then. Microcode was something that people who implemented CPUs cared about, and normal programmers didn't imagine that they could touch it or that they should even care. A few companies let their customers change it, but most didn't.

Don't know about PDP-11's, but C-MU had PERQ workstations, circa 1980. A PERQ had either 4K or 8K words of 48-bit wide, writeable control store. It was loaded at boot time by an auxiliary Z-80, which afterward functioned as an I/O channel. Microcode was written for two different instruction sets that I know of; one for Pascal (based on the UCSD p-code system), and one for Spice Lisp. — Solomon Slow, Jun 07 '18 at 16:40

score 5 · Answer 5 · answered Jun 06 '18 at 21:51

5

There is at least one documented example of one hardware design being used as two different processors via customer specific microcode: Some very small, late implementations of IBMs S/370 (eg those sold as PC expansion cards) used an 68000 with a different microcode as the main CPU.

answered Jun 06 '18 at 21:51

rackandboneman

5,710
18
23

Would you be able to provide more information about that S/370? – Omar and Lorraine Jun 13 '18 at 09:42
There was a system called the XT/370 that was based on an IBM/XT with an S/370 core on an ISA card. This is known to definitely use the modified 68000. – rackandboneman Jun 13 '18 at 09:48
How can it be, when the System/370 has many more registers than a m68k? – Omar and Lorraine Jun 13 '18 at 10:06
No idea :) maybe there are more registers in the 68k hardware than are normally used, maybe they are mapped to RAM? – rackandboneman Jun 14 '18 at 00:55
It's been a long time since I programmed either, but IIRC the 370 had sixteen general-purpose (more or less) registers, while the M68000 had eight general purpose and eight address registers, all registers being 32 bits. – David Thornley Jun 21 '18 at 17:46
1

@Wilson The XT/370 used a pair of modified 68000 processors. – Alex Hajnal Jul 12 '18 at 02:15

How was microcode implemented in retro processors?

5 Answers5

Example 1: 6502

Example 2: Z80

Long Story Short

Cost of Loadable Microcode

But why do we have now loadable microcode?

Fine, but 1970s mainframes also had loadable microcode?

Okay, But What About Second Source and Derivatives

Second Source

Derivates

Linked