How did the Apple II forward binary instructions to the Z80 software card with CPM?

Question

Microsoft produced the Z80 Softcard for the Apple II enabling it to run CPM and many Z80 binary programs.

This seems to be an unthinkable achievement. (For a kid who had an Apple IIe but was only a child at the time).

Can you imagine in the 1990s plugging a Motorola 68020 card into a Intel 286 PC and running Mac programs? (Or vice versa - plugging an Intel card into a Mac and running Windows programs on the hardware.)

How can you design a Computer System, to plug in a card, then share the video RAM, and coordinate that some binary instructions go to one processor and some to the other? How do you coordinate the I/O?

I've just finished reading Charles Petzold's book, Code: The Hidden Language of Computer Hardware and Software. In it Charles explains building relays into gates, gates into logic components, and logic components into computing machines.

Reading this book means I can imagine building a CPU out of logic gates and logic components, but can't imagine how you would switch control from one to another.

My question is: How did the Apple II forward binary instructions to the Z80 software card with CPM?

(Or vice versa - plugging an Intel card into a Mac and running Windows programs on the hardware.) That actually exists Here you can see an example — ACGuy, Jan 09 '17 at 11:16
I suspect this is too broad. A large number of computers at the time had co-processors, each taking over the hardware in its own way. Instructions weren't "forwarded" to the co-processor. Although, there were processors like the Transputer which were specifically designed to share instructions amongst themselves. — Chenmunka, Jan 09 '17 at 11:28
I still have an IBM compatible 80486 here that has a Motorola 68040 CPU card in one of its slots and "Hardware-emulates" a Sinclair QL. So: "Yes, I can imagine that" - The card is dated 1993 — tofro, Jan 09 '17 at 12:29
Thanks @Chenmunka - can you explain the nature of the cutover? — hawkeye, Jan 09 '17 at 13:05
There was also an "IBM" card for the Amiga 2000 that allowed you to run DOS/Windows 3.11 apps. — cbmeeks, Jan 09 '17 at 14:50
This seems reasonably on-topic and specific enough to attract good answers. A full discussion of the different ways a backplane or system can host multiple guest CPUs is out of scope, of course, but certainly the specifics of this card in this system at this time can be sufficiently hand-waved and the rest left up as an exercise for the reader. — , Jan 09 '17 at 16:57
Extending @cbmeeks a bit- The Amiga 2000, 3000 and 4000 could have an "IBM Bridgeboard" installed (A2088, A2286 and A2386) which would let you run IBM software (incl. Windows) on an Amiga. Effectively a PC on a card, if you didn't install a dedicated VGA card it would end up simply creating a window on the Amiga Workbench that was a CGA display. All I/O peripherals could be shared by the Amiga, though the mentioned models above did have some ISA slots so you could install actual PC hardware. Still, the machines were able to integrate to some degree. It worked well. — bjb, Jan 09 '17 at 17:33
For the BBC Micro (another 6502-based machine) there was an expansion called the Z80 Second Processor which allowed CP/M to run, among other things. This was in the mid-1980s. — psmears, Jan 09 '17 at 18:16
They've made all sorts CPU cards. I've used a PC with an IBM mainframe (ESA/390) on card and another PC with a Unisys mainframe (A series) on card. — , Jan 09 '17 at 21:00
Also, the Commodore 128 had both a MOS 8502 (Used to run CBM Basic 7.0 or GEOS) and a Z80 CPU (that could be used for CP/M) on the same motherboard! — PhasedOut, Jan 10 '17 at 18:38
Stellation also made a 68008 softcard for the Apple II that used a similar technique, alternate cycle DMA to allow the 68k to use the Apple II memory. — hotpaw2, Jun 22 '17 at 16:56

PeterI · Answer 1 · 2017-01-09T17:49:56.723

40

They both shared the same memory so it didn't really forward instructions. The Z80 card stopped the 6502 running using the DMA signals and the system swapped between the two by writing to $CN00 where N is the slot number.

Since the memory was shared the Z80 stuffed some values (A,X,Y,P) into the 6502 zero page ($F045 and up from the Z80 side) stored the address it wanted to call into $F3D0 ($03D0 from the 6502 side) and wrote a byte to the softcard.

To get back again the routine at $3C0 was called from the 6502.

Full gory details are in this copy of the Microsoft manual http://apple2online.com/web_documents/microsoft_softcard_-_software_and_hardware_details.pdf which also contains the schematics for the card (and it's all TTL so in theory fairly tractable).

edited Jan 09 '17 at 17:49

answered Jan 09 '17 at 14:32

PeterI

5,297
1
16
44

9

Yep. The thing that I think is key is that unlike on modern systems where the expansion bus is some kind of complex point-to-point message carrier, the bus on a machine like the Apple II is basically just most of the CPU control lines pulled out to some headers. A card has the same access to memory and peripherals as the CPU does, and can, in effect, take over from the CPU. – hobbs Jan 09 '17 at 16:46
3

On the Apple ][ the bus is a little better than that as it provides some decode help for cards but basically until PCI most buses were pretty much buffered versions of the processor signals (or backward compatible versions). – PeterI Jan 09 '17 at 17:57
1

Actually now I've had time to re-read the whole of the PDF the explanation of how the 6502/Z80 memory cycles work is pretty clear. – PeterI Jan 09 '17 at 21:09

AnoE · Accepted Answer · 2017-01-10T07:42:37.833

TL;DR: this longish answer address the "mystic" property of the question; i.e., the sense of wonder about how this could be possible; not the actual workings of the specific components.

The gory details have been given in other answers, but here is a broader outlook on the issue:

Remember that in that time, computers (certainly home computers) were, compared to today, very simple things. For example, at the days of the 8 bit computers like Atari 800XL, C64 and so on, it was not unheard of for a kid to own a big book which not only had the complete schematic of the whole computer, but also in such great detail that you could literally "see" and eventually comprehend every single detail right into the deepest depths of it.

There were many discrete parts (e.g., many logic chips of the 74xx family) which in themselves were very easy to understand. The most complex item was the CPU, the separate video chip, and the separate sound chip, but every one of these items was of very finite complexity. You could easily get a 100% intuitive understanding of any of them; their whole functional and internal description would fit into a not too thick physical paper book each.

In that time, if you did things like plug a different processor (card) into a computer, if you had the documentation of what it did, you could readily trace the electrical connection and see where what took place. Another example; you could retrofit one of those 8bit computers with an enormous amount of RAM (256KB, making it a total of unbelievable 320KB of RAM, albeit with heavy paging involved); and here also the actual mechanism was right in front of your eyes, in plain sight, even if those computers didn't even had provisions to fit additional RAM, coming from the factory. One would directly solder the RAM expansion to the actual CPU pins (or PCB traces).

This was very different from today where individual systems-on-a-chip are very much incomprehensible, and even a complete description of their outward interface would take not a single book, but whole libraries, if printed out on paper.

Finally, back then there was no OS to speak of; while they already had something vaguely akin to a BIOS, using it was almost optional (when you were programming at the assembler/machine code level, which was very much achievable by kids teaching themselves from paper books). There was no user/kernel-space distinction, no memory protection, no virtualization, nothing of that kind. Even in BASIC, it would not take long until you used PEEK and POKE to manipulate the whole RAM and especially control registers, the video chip etc., directly and freely.

Reading this book means I can imagine building a CPU out of logic gates and logic components, but can't imagine how you would switch control from one to another.

But this is what happens every day when multiple MCUs communicate. For example, even on the arguably simplest bus around, the I2C bus, which uses only two wires, there is a very well defined protocol of who gets to talk when, which even allows multiple masters at the same time.

Scale that up to your question, and it is a simple matter to craft a protocol where one of the CPUs more or less "shuts up" and waits until it gets signaled to pick up work again. It could be as simple as each CPU having an "enable" pin, and a simple discrete logic, probably involving a flipflop, making sure that only one CPU is enabled at any one time. There certainly is no magic about this; and yes, if by chance both CPUs were active at the same time, things would go wrong quickly, there certainly was true no multiprocessing back then, in home computers.

Dave Tweed · Answer 3 · 2017-01-09T13:55:06.817

This is a very broad question, so a correspondingly broad answer would be simply that each processor treats the other processor's instructions as binary data, without trying to interpret it in any way. On the Apple II, the 6502 simply copies this data from the disk to the Z-80's memory, and then the Z-80 can execute it.

Shared I/O is a little more complicated. Normally, there would be some way for the two processors to communicate with each other, often using that same shared memory in conjunction with some mutual "mailbox" interrupts. Special code running on the "host" processor (the 6502) would emulate the I/O APIs that the application software running on the "guest" processor (the Z-80) is expecting to see. The lowest-level I/O driver code on the guest processor would also be replaced with custom code to support this interface.

score 1 · Answer 4 · answered Jan 10 '17 at 06:44

1

iirc, the 6502 was actually 'off the bus' half of the time (it only controlled the bus when the clock was at one polarity, and ignored the bus when the clock was the opposite polarity). This allowed the Z80 to have essentially full-speed access to the bus without diminishing the 6502's access; the processors essentially interleaved their access, by design.

answered Jan 10 '17 at 06:44

AVRguru

11
1

3

Not quite, the 50/50 bus was Z80 / Video. The 6502 was held off the bus using the DMA signals and the Z80 refresh cycle was used to keep the 6502 internal data latches alive (see the PDF I linked too for more details). – PeterI Jan 10 '17 at 15:34
To expand on @PeterI's comment, the unexpanded system design shared the bus 50/50 between the 6502 and the video. The system also relied on the video scanning to keep the dynamic memory refreshed, so the designers of the Z80 card couldn't interfere with that. Instead, the Z80 cycles replaced the 6502 cycles, and they cleverly used the Z80 memory refresh cycle that follows every opcode fetch -- a "free" idle cycle, if you will -- as an opportunity to let the 6502 run for one clock and refresh its internal dynamic logic. – Dave Tweed Jan 10 '17 at 16:53
@DaveTweed: Unless a Z80 card plugged into the 6502 socket, I don't see how it could stall the 6502's clock. I'd expect a CP/M card that plugged into a slot to use the READY line, which may be used to stall read cycles indefinitely provided the main CPU clock is kept running. – supercat Jun 08 '23 at 20:37
@supercat: I never said that anything was done to the 6502 clock. It was all done using the RDY line, keeping the 6502 from advancing its internal state while the Z80 was using the bus. – Dave Tweed Jun 09 '23 at 11:29
@DaveTweed: The RDY line can be used to stall the 6502 indefinitely. Some systems like the Atari 400/800 gate the actual clock to stall the 6502, which necessitates letting the 6502 run at least once every (IIRC) ten cycles to refresh its dynamic logic, but systems that use RDY to stall the CPU impose no such requirement. The 6502 might be allowed to run for the purpose of allowing a limited form of "parallel" processing, but I don't think the dynamic register refresh issue would be a factor unless the clock were stalled. – supercat Jun 09 '23 at 15:04
@DaveTweed: Further, allowing the 6502 to do anything useful would require a mechanism to stall the Z80 any time the 6502 performs a write cycle. If the 6502 tries to execute "INC $F0", and RDY holds it on the read from $F0, then once RDY is released the 6502 won't poll RDY again until the third following cycle. – supercat Jun 09 '23 at 15:08
@DaveTweed: If e.g. the Z80 card had 4K of onboard RAM for the Z80 address space $C000-$CFFF, and no means of accessing that part of the Apple's address space, letting the 6502 run once per Z80 M1 cycle would allow a 6502 loop like $00FA: BIT $C0xx / JMP $00FA" to be patched by the Z80 to operate a soft switch, but that seems a bit crude. – supercat Jun 09 '23 at 15:15
@supercat: OK, digging deeper (the link above no longer works, but I found a copy here. See 6502 Referesh on page 2-33), it seems that the clock was involved. But the Apple II DMA hardware already had the logic to stop the clock, so the Z80 card did not need to access the CPU pins directly to do that. – Dave Tweed Jun 09 '23 at 16:11
@DaveTweed: I'd thought that DMA on the Apple II was expected to stall the CPU by having I/O cards assert RDY three cycles before they wanted to start performing DMA operations, which would allow continuous DMA operations of arbitrary length without stopping the CPU clock. I wonder why Apple didn't opt to support that approach? – supercat Jun 09 '23 at 17:44

How did the Apple II forward binary instructions to the Z80 software card with CPM?

4 Answers4