Why didn't early single-chip CPUs support multiplication instructions

Question

Early single-chip silicon CPUs like the Zilog Z80 or MOS 6502 did not have a multiply instruction at all. Was this because the technology did not exist at the time to implement it, was it too expensive or was there simply no need for such an instruction (like FPUs for Amiga power users, the majority of people could and did get by without one)?

http://www.cpushack.com/2017/12/19/chip-of-the-day-trw-mpy-16aj-making-multiplication-manageable/ what a multiplier looks like in the 70s. — user3528438, Sep 20 '20 at 17:54
Have a look at how many transistors could fit on the silicon over time, and you might have an idea why — Thorbjørn Ravn Andersen, Sep 21 '20 at 12:06
It made doing things that are naturally multiplicative much more challenging. I did a limited OCR system (that involved rotation) on a Z80. We created our spatial filters so that we could use a table driven, simple set of shift+add multiplication routines. We had a table we generated off-line that described trigonometric data (basically sine between 0 and 45 deg). If I remember correctly, there were only 4 bits per point (maybe 8) and it was an index into a table that described how points in the rotated rectangle were to be translated using the algorithm we had chosen/come up with. — Flydog57, Sep 21 '20 at 21:12
Lacking a hardware multiply only really affects the speed of performing a task - it doesn't prevent the task being done, just do it with a software function. There is a bit of a parallel with FPGAs, they didn't have hardware multiply in low-cost versions until 10-15 years ago, eg Xilinx Spartan 2 vs Spartan 3. — Kevin White, Sep 21 '20 at 23:15
Transistors were expensive and because of transistor-sizes one could not pack too many ones together. You could do multiply with bit-shifting or using tables (which is slower). Later one did program fast multiplication functions or sub-routines by calling them on execution. And later than that transistor cost went down and packing them to tinyer size. And no in those days you really didnt need an own multiplication instruction. Its handy to have however, since one don't need waste more cycles on other instructions that gives the same result. — Natural Number Guy, Sep 22 '20 at 10:42
I recall Motorola 68000 had a 16-bit × 16-bit multiply instruction, but it took 70 cycles. For comparison, adding two 16-bit numbers took two cycles. Multiplication is a pretty complex operation. — liori, Sep 22 '20 at 11:53
IIRC, the multiply instruction in ARMv6 CPUs in 2008 took a small but variable number of cycles to execute, based on how many bits were set in the operands. So it took I think 4 cycles for the multiply, but it could exit early. Everything else executed in a single cycle. (Note there was no integer divide instruction at all... multiplication is hard, division is harder.) — fadden, Sep 22 '20 at 15:41
@KevinWhite "only really affects the speed of performing a task": up to a point, the point being when the operation /has/ to complete in a given number of mSec. — Mark Morgan Lloyd, Sep 22 '20 at 19:45
@liori If one of the numbers in question was less than 35, wouldn't it have been faster to just do repeated addition? — nick012000, Sep 23 '20 at 03:34
@nick012000: if the chip had space for some logic that would recognize this specific case, then maaaybe… but the logic itself might take some time as well. And if it was multiplication by compile-time constant, the programmer would usually code it by hand with bit shifts and additions. — liori, Sep 23 '20 at 06:21
You have shifting, so there's always some multiplication support. — Polluks, Sep 23 '20 at 14:11

score 63 · Accepted Answer · edited Sep 22 '20 at 14:14

63

Fast multiplier circuits as used today take enormous amounts of logic, far beyond what would have been cost-effective (or perhaps even possible) in the mid-70s for an inexpensive microprocessor. Even slow multiplier circuits (as would appear later on chips like the 6809, 68000 or 8086) use a fair bit of logic and would have very considerably added to the cost, perhaps forcing a multi-chip design with all the complications that entails.

The first lines of microprocessors were primarily targeted at embedded control applications where rapid multiplication is rarely needed, so that was likely a factor too.

edited Sep 22 '20 at 14:14

Ernest Friedman-Hill

103
3

answered Sep 20 '20 at 16:31

RETRAC

13,656
3
42
65

10

I think the last para explains something important fpr the OP: these weren't considered CPU's at the time; they weren't made to power microcomputers. They were microprocessors for "smart" devices. – Owen Reynolds Sep 21 '20 at 01:38
I am not convinced by the "not enough logic" argument. A "slow multiplier" can be implemented by iterative shifts and conditional adds. The amount of microcode and extra logic to do this is much less than handling interrupts. And yet, many chips chose to implement interrupt handling but not multiplication. Your last paragraph is more relevant: multiplication wasn't needed for most applications. – DrSheldon Sep 23 '20 at 17:26
@DrSheldon The designers didn't even budget for 16 bit arithmetic on the 6502, or DSUB to DADD on the 8080. Numerous things seem missing from both designs in hindsight, and multiplication is one of them, but fairly far down the list. – RETRAC Sep 23 '20 at 21:04
2

@RETRAC: The 6502 does perform 16-bit address arithmetic, including logic to skip the high-byte computation when adding or subtracting zero. – supercat Sep 29 '20 at 20:19
@supercat: and yet they didn't have the transistor budget to connect most of that logic up to make it available in the instruction set. Which would come before a multiplier as a design priority in most cases, I figure. – RETRAC Sep 29 '20 at 20:38
@RETRAC: The 6502 doesn't really have any place to keep a computed high byte, other than the top eight bits of the address latch or program counter, but its ability to combine a 16-bit address computation with an access is for many purposes more useful than the 8080's approach of requiring a separate computation step. – supercat Sep 29 '20 at 20:45

Artelius · Answer 2 · 2020-09-21T01:14:16.270

57

You don't need it

Multiplying two arbitrary bytes together has limited practical value. (If you want to multiply by a constant you can hardcode the optimal sequence of instructions to do so.)

Obviously it would be nice to have but the expense isn't worth it.

In an arcade game... you basically never need to multiply a thing. To draw lines or circles, you can use Bresenham's algorithms. For nonlinear control problems, values from 0-255 are of pretty limited accuracy and you probably want floating-point anyway.

For financial calculations (or things like pocket calculators), you want to use BCD to avoid rounding errors. For spreadsheets or graphing programs, you need floating-point.

In microcontrollers, sometimes lookup tables are actually better for "almost multiplication" problems because you can put fudge factors in them to deal with responses of the physical system—motors or whatever.

Special mention goes to Elite, which managed to do real-time 3D graphics in 1984... now that could have really used multiply and divide instructions.

edited Sep 21 '20 at 01:14

answered Sep 21 '20 at 01:07

Artelius

1,030
6
8

10

+1 because I have been looking for years how to pick what pixels overlap with a perfect mathematical line but did not know it had a formal name: what Bresenham's algorithms. – DKNguyen Sep 21 '20 at 01:57
I remember reading through the Atari 800's source code to see how diagonal lines were drawn - how could they do that without a division to figure out the slope? I remember it being very clever, but now I can't even remember how it was done. – Marc Bernier Sep 21 '20 at 19:26
4

multiplication has its uses in arcade games. Atari wouldn't have developed their math box for their arcade machines unless they needed faster mathematics than the CPU could provide – scruss Sep 21 '20 at 23:02
FWIW, a run-sliced Bresenham implementation has a lower per-pixel cost, but requires an integer divide during setup. Elite for the Apple II took a half-step, using their division table to simplify the error update step. Stellar 7 for the Apple II uses multiply and divide for its line clipping as well as its vertex transforms. Many games "faked" 3D (e.g. Epoch), but the free-moving wireframe games had to do the math. – fadden Sep 22 '20 at 15:35
"For financial calculations (or things like pocket calculators), you want to use BCD to avoid rounding errors." - BCD have benefit of convertion to/from the string. I doubt anyone is using BCD nowadays for financial calculation as opposed to storing number of cents. – Maja Piechotka Sep 23 '20 at 06:20
Nowadays there exist even decimal floating point numbers as per IEEE 754-2008 (https://en.wikipedia.org/wiki/Decimal_floating_point#IEEE_754-2008_encoding), however you are right that storing integer numbers in pure BCD representation is of little usefullnes nowadays. – lvd Sep 23 '20 at 15:53
@lvd: A decimal floating-point format may have been defined, but that doesn't imply that anyone uses it or makes any effort to implement it efficiently, since decimal floating-point isn't really suitable for accounting purposes because of magnitude-dependent silent round-off. When using proper accounting types, given a=1.0/3.0;b=0.0;, the operation sequence b+=a; b+=10000.0; b-=10000.0; b-=a; should leave b equal to zero, but when using floating-point types--whether decimal or binary--it won't. – supercat Sep 23 '20 at 16:40

score 11 · Answer 3 · answered Sep 21 '20 at 07:20

11

Slow multiplication implementations made with conventional ALUs and microprograms had another problem. There are a lot of machine cycles to execute a command. So much so that it becomes noticeable with intensive interrupt work. And for 8-bit microprocessors, with the exception of the case with the Atari 2600, working with interrupts generated by the graphics subsystem logic was very relevant.

A useful article about one of the first massively available single-chip 16 by 16 multipliers - here.

answered Sep 21 '20 at 07:20

Wheelmagister

2,131
6
15

I remember using first the TRW parts and then the ADSP1010 MAC chip. The big problem with the TRW devices was the heat output. They had a massve lump of aluminium stuck to the top of the packace to help with the cooling. They were replaced in the end by general purpose programmable DSPs. – uɐɪ Sep 21 '20 at 12:43
This is a great point. I suspect it's why there weren't many sophisticated microcoded instructions on any of the 8 bit processors. (The Z80's LDIR depends only on register state, for example, so can it be easily interrupted.) Something like a multiply microcode routine could fit easily enough on chip, but the logic to support interrupting it would require full exception handling which is far too much logic for those little designs to fit. – RETRAC Oct 07 '21 at 21:11

lvd · Answer 4 · 2020-09-21T13:02:47.950

4

It seems to be purely arbitrary (or pragmatic) choice of the designers, one of the main factors being the size of microcode ROM or PLA. As an example, I'll take soviet K1801VM1 CPU. Its latest modification, VM1G, does support multiplication. The only change is microcode, not even the size of microcode ROM or PLA. For the reference, look at this reverse-engineered verilog of the CPUs: https://github.com/1801BM1/cpu11/tree/master/vm1, specifically, cpu11/vm1/hdl/wbc/rtl/vm1_plm.v (two microcode versions in two modules).

Another example, though not an early one, is MC68HC05 embedded CPU. Being otherwise simplistic, it does support multiplication too.

edited Sep 21 '20 at 13:02

answered Sep 21 '20 at 09:36

lvd

10,382
24
62

How do you know that the вм1г is only change is microcode? Are the microcode listings available? – Omar and Lorraine Sep 21 '20 at 12:51
Yes, the listings are available. I've updated the answer with the link to the reverse-engineered к1801вм1а and к1801вм1г. – lvd Sep 21 '20 at 13:03
Another answer pointed out interrupt latency as a reason not to provide a slow microcoded multiply. So it's not purely arbitrary, especially if your microcode can't abort the calculation on external interrupt. – Peter Cordes Sep 22 '20 at 18:20
1

It is NOT a reason, actually. Given a slow non-interruptable multiply, you are NOT obliged to use it, especially knowing you'd need fast interrupt response :) – lvd Sep 22 '20 at 21:34

score 4 · Answer 5 · answered Sep 21 '20 at 12:44

4

From the beginning of electronic computation, this was a common design decision when building a computer using minimal circuitry. The Manchester Baby, operational in 1948, had no multiplication hardware. Later, low end minicomputers such as the PDP-8 lacked hardware multiplication. For some, like the PDP11/20, there was an add-on peripheral for it.

answered Sep 21 '20 at 12:44

John Doty

2,344
6
12

The multiply unit on the PDP-8 was optional. – Erik Eidt Sep 21 '20 at 15:01
2

The question was "why" – Omar and Lorraine Sep 21 '20 at 17:24
2

@OmarL And the "why" was minimal circuitry. – John Doty Sep 21 '20 at 17:41
4

Well, come to that, the Baby didn't have an addition instruction either. (What it had was 'load negated' and 'subtract' instructions). And definitely the reasons why were: they didn't want to add more circuitry, and it wasn't needed. – dave Sep 21 '20 at 20:19

score 3 · Answer 6 · answered Sep 23 '20 at 01:55

Prior to microprogrammed/PLA-programmed processors, it took an enormous amount of control logic to manage a simple multiply (and forget about even trying floating point). Especially with early single-chip designs there simply was not enough chip space for the control logic.

With the invention of microprogrammed processors it became more practical to include multiply/divide and even floating point operations.

Only when graphics, etc, created a demand (and chip density improved) did hardware-ifying the operations become economically attractive.

(I worked on a microprogrammed processor for RCA in the early 70s. We were basically duplicating the 360 instruction set on an early LSI-based system, and it was a bear working out the microprogram logic.)

score 0 · Answer 7 · answered Sep 23 '20 at 00:40

0

cost tradeoff vs time to compute faster

that is why the 8086 was that way and they had the 8087 if you wanted faster math

z80 was just a newer slightly better version of the 8080

answered Sep 23 '20 at 00:40

yukfoo

9
1

2

Welcome to Retrocomputing Stack Exchange! Can you elaborate a little on this answer? (It's okay as it is, but I get the feeling you know more about this topic.) – wizzwizz4 Sep 23 '20 at 06:34
The 8086 does have an integer multiply instruction. The 8087 is a floating-point unit, not an ALU. – user3840170 Oct 10 '20 at 07:30

score 0 · Answer 8 · answered Sep 29 '20 at 19:58

0

I want to boil down this answer to engineering decisions (as are many choices made in engineering).

A good engineer works with limits. One limit at the time was cost for producing the chip. More transistors leads to larger chip size leads to lower yield in the production and quickly increasing costs. In order to keep costs down you remove the "expensive" parts, such as multiplication and division circuits.

Did it work? Well, rumors has it that the 6502 was used in Apple computers instead of 6800 because it had a lower cost. So, yes, lower cost worked.

If we, playing with numbers here, assume that a 6502 with multiplication would cost 3 times as much -- would it be used? Probably not is my guess.

answered Sep 29 '20 at 19:58

ghellquist

366
3
8

A generic ‘well, it was cheaper’ answer is pretty useless. I’d wager it is still cheaper to produce a CPU without a multiplication instruction now, and yet most contemporary designs do feature it. A real answer would address what factors actually went into the trade-off besides just cost. – user3840170 Oct 10 '20 at 07:32
@user3840170 You are of course right. But price is often one of very most important limitations. I would really like to have a luxury BMW but cannot afford it. Fortunately there are a lot of less expensive cars to buy. These, less expensive, are made with less expensive materials and less expensive components -- in effect designed towards a lower target price. The same goes for most engineering projects. Possible exception might be the Apollo program, sending people to the Moon. – ghellquist Oct 10 '20 at 09:50

Why didn't early single-chip CPUs support multiplication instructions

8 Answers8

You don't need it