21

The 6502 had special support for BCD arithmetic, because it was widely used in those days; this much, it had in common with other CPUs.

But the 8080 and 6800 implemented this in the form of a 'decimal adjust after addition' instruction, the idea being that to add two pairs of BCD digits, you first of all add the bytes using the ordinary binary add instruction, which also sets half and full carry flags from four and eight bits respectively, then DAA checks the results and the flags and performs a fixup to make them correct for a BCD add.

Yet the 6502, designed by some of the inventors of the 6800, takes a different strategy; it provides a BCD mode, as a global flag which when set, causes all addition and subtraction to give the correct results for BCD until cleared.

This seems intuitively like a 'higher level' design, one that does more work for the programmer in order to more closely match intuition. That is particularly surprising because generally speaking the 6502 was a simpler processor that skimped on programmer comfort for the sake of lower cost.

Why did the 6502 designers choose this way of handling BCD? Was there some lesson learned after the 6800, some reason they figured this was the right way to do it after all?

user3840170
  • 23,072
  • 4
  • 91
  • 150
rwallace
  • 60,953
  • 17
  • 229
  • 552
  • 1
    Interesting question, I was recently thinking about posting a similar one. I don't know, but I imagine that the 6502 way is simpler to implement. Less state to preserve between addition and adjustment. But then you've got a horrible mode flag to save and restore when you're doing your interrupts and ting. – Omar and Lorraine Nov 09 '20 at 21:27
  • 2
    Another possibility, given the sparseness of the 6502's opcode map, would have been to fully decode the ROR and INC logic so that they'd only trigger on opcodes 011xxx10 and 111xxx10, respectively, and then use opcode bit 0 as a "BCD mode" flag. Alternatively, if all read-modify-write opcodes were fully decoded in such fashion, it would have been possible to shuffle around opcodes so that ORA/AND/EOR/CMP would have patterns 11smmms1 [where ss selects among those instructions] while 0scmmmd1 would be add or subtract (chosen by s), with c controlling carry suppression, and d controlling bcd. – supercat Nov 09 '20 at 21:44
  • 1
    @OmarL Seems to me you have exactly the same amount of state? The accumulator and carry flag exist in both kinds of CPU, so it's a choice between the half carry flag and the decimal mode flag, i.e. one extra bit of state either way. – rwallace Nov 09 '20 at 21:52
  • @supercat Right, that would be the third way to do it, have an 'add BCD' instruction. I would think that the obvious way; in particular, it would eliminate the need for any extra state. I wonder if there is some reason they didn't do it that way. – rwallace Nov 09 '20 at 21:54
  • 2
    @rwallace: I have a strong suspicion that decisions about what instructions should be included were made before decisions about encoding. There's so much empty space in the opcode map that there would have been room to have many more instructions with eight addressing modes, which could have eliminated the need to have specialized logic to have instructions that only support a subset of addressing modes. Of course, the fact that people were able to make chips and have them work at all in 1976 given the tools that existed is pretty impressive in any case. – supercat Nov 09 '20 at 23:18
  • 2
    @rwallace: Still, without in any way faulting the designers of the 6502, I wonder whether a 6502-ish chip where all opcodes whose bottom two bits weren't both clear were processed using the same eight addressing modes controlled by the next eight addressing bits (using the same logic for all of them), except that read-modify-write or store-immediate would be replaced with branches, would have required more or less silicon than the actual design where many instructions support only a subset of addressing modes? – supercat Nov 09 '20 at 23:27
  • 1
    @OmarL There were only seven flags as it was in the 6502 and that includes B which wasn't a real flag. All the flags including D fit into a single byte for the purposes of saving state. – JeremyP Nov 10 '20 at 11:04
  • 1
    @JeremyP: Maintaining the state of the D flag requires a latch formed by two transistors and two passive pull-up elements, plus another few transistors or so to allow the state to be stored or loaded by an PHP, BRK, PLP, or interrupt dispatch, as well as a few to set or clear the latch in response to SED or CLD. – supercat Nov 10 '20 at 20:52

1 Answers1

24

From the patent, "Integrated circuit microprocessor with parallel binary adder having on-the-fly correction to provide decimal results" (US3991307A) by Peddle et al. in 1975:

It is desirable in microprocessors to improve the speed by reducing the number of cycles necessary to perform a given operation. However, it is also desirable to have a minimum number of components, and it is typically not justified to have both a binary and a decimal adder in the same microprocessor. The prior art has typically chosen the disadvantage of using two cycles to get a decimal sum or difference rather than to suffer the cost of a separate decimal adder.

This invention takes a new approach to the problem: it uses only a binary adder to get the decimal sum or difference of two numbers, but does it in a single cycle of the binary adder, thus significantly improving the speed of operation without suffering the cost of an additional decimal adder. In accordance with the invention, the binary sum of two bcd operands is corrected by suitable gating as it travels from the binary adder to another part of the microprocessors, e.g., the accumulator, so that it becomes the binary coded decimal sum or difference of the two operands by the time it reaches that other part of the microprocessor.

Cody Gray - on strike
  • 1,554
  • 14
  • 21
Kelvin Sherlock
  • 2,566
  • 12
  • 19
  • 8
    Aha, thanks! So that explains why they moved on from the approach of using a separate adjustment instruction, though I am still curious why a status flag rather than a separate 'add decimal' instruction. – rwallace Nov 09 '20 at 22:02
  • @rwallace The expenditure of multiple opcodes from a limited space (one for each addressing mode) was difficult to justify for a function that would be used comparatively rarely in most applications. I think this would also have complicated the microcode logic, which in most CPUs of the time was a fairly large cost, and had been carefully optimised in the 6502 for that reason. A mere two opcodes, to set and clear a status flag, was a much simpler solution that didn't add complexity to the CPU. – Chromatix Nov 10 '20 at 20:55
  • 1
    @Chromatix: The opcode space on the 6502 isn't particularly "limited". Because there are no opcodes at all of the form xxxxxx11, it would have been simple to specify that all opcodes of that form will behave as an add or subtract, using bits 2-4 to select the addressing mode, bit 5 to select carry suppression, bit 6 to select decimal mode, and bit 7 to select addition or subtraction. Doing that would have freed up opcodes x11xxx01 to place two more eight-addressing-mode instructions. – supercat Nov 11 '20 at 00:01
  • 2
    Unfortunately, the architecture was chosen to justify a patent and thereby protect the design, but did not have any real technical merit. The adder circuit ended up being the critical path in the processor, which made the 6502 slower than e.g. a z80 running at their maximum rated speed. In general, a 1MHz (typical at the time) 6502 is half as fast as a 4MHz z80 (again, typical use). A classic example of legal considerations impairing performance for all users. – d3jones Nov 11 '20 at 00:17
  • 2
    Kevin, you should probably elaborate on why this would be important or desirable in the processor. That will also keep the answer from having no content besides copypasta + cite. – Harper - Reinstate Monica Nov 11 '20 at 04:03
  • @d3jones Now that is an interesting argument! Actually 1 MHz 6502 is really the same clock speed as 4 MHz Z80 (1 MHz bus speed, the Z80 just makes visible the four internal T-states per bus cycle), the reason for the performance difference is the Z80 has more registers, richer instruction set, can sometimes do 16-bit operations in one instruction. But that doesn't necessarily contradict what you are saying. The Z80 double clocks a 4-bit adder. Is it possible that the reason they were able to do this was that not supporting decimal directly in the adder, simplifies the adder? – rwallace Nov 11 '20 at 06:11
  • 2
    @rwallace No, it's just that they have more clock cycles per instruction, during which the ALU is available for use. The smaller ALU is also necessary to keep costs down with the Z80's overall more complex design. – Chromatix Nov 11 '20 at 07:16
  • @d3jones: Would anything in the patent have precluded the use of opcode bit 0 instead of the D flag as a means of selecting binary or decimal mode? I think what's noteworthy is the means by which BCD arithmetic is performed, rather than the means via which it is requested. – supercat Oct 26 '21 at 22:17
  • The patent is really about performing the full BCD addition in a single clock cycle. Other processors of the era had a separate "decimal adjust" instruction, but that takes at least one more clock cycle. Using bit 0 instead of a flag really doesn't change that fact. – d3jones Oct 27 '21 at 23:31
  • @d3jones: Given that the only instructions which use BCD mode target the accumulator, I wonder how hard it would have been to make the 65C02 use two accumulator registers and two carry flags, one set of which would apply the direct result of the ALU and the second set of which would receive its value a cycle later, using dedicated cleanup circuitry, eliminating the extra cycle for BCD math. The earliest the 65C02 would need to care about the result of a BCD operation would the third cycle of a PHA, STA, or the "third" cycle of a branch (the byte after the branch will be fetched... – supercat Jan 16 '22 at 21:41
  • ...in any case, whether the branch is taken or not, so the CPU wouldn't need to care about the flags produced by a preceding add or subtract until the start of the cycle where the next instruction was fetched). – supercat Jan 16 '22 at 21:43