Anyone up for decompiling some 8080 code for Kaleidoscope?

Question

I came across a series of blog posts about Kaleidoscope, a demo program for the Cromemco Dazzler graphics card. The series implements an emulator for the system so that it could run this code again, which you can see in the page. It was a four-part series, with the last entry explaining the actual workings of the code.

Unfortunately, the author stopped working on his blog before part 4 was posted. In part 3 he describes the basic layout in pseudo-code, leaving the key details as do_something. Being unfamiliar with either the 8080 or Dazzler, figuring that bit out is beyond me.

I'm wondering if anyone out there would be willing to take up the task of decompiling this into a complete pseudo-code description? The code is only ~90 lines long. I'd love to make a Swift+SpriteKit conversion.

I think it was this Dazzler demo that inspired Dan Silva to make the version that shipped with the Amiga 1000 on the EA Kaleidoscope demo disk. — Brian H, Oct 01 '22 at 20:46
@BrianH That, or the COLOR DEMO that came with Apple's 1979 DOS 3.2 Disk. In fact, I believe the programs on the disk were quite inspired by Cromemco's Dazzler Games pack, as other titles match as well. — Raffzahn, Oct 01 '22 at 22:27
BTW: The do_someting() is by far the most simple part. Just a bit of bit shifting to transform a 32x32 pixel address into a 9 bit memory address and a 1 bit upper/lower pixel selector. Much more intriguing is how he create the pseudo randomness of the waves. — Raffzahn, Oct 02 '22 at 01:09
IIRC I saw a version of this for ZX Spectrum (using attributes only) not sure anymore if it was BASIC or asm (but most likely BASIC) IIRC it was in some magazine like Bit, Zenit or Elektronika ... — Spektre, Oct 02 '22 at 09:11

Raffzahn · Accepted Answer · 2022-10-02T17:11:00.090

27

I had a look at the source and added a few comments (*1). Might not be the expected full high level abstract description, but should be still helpful as a first, tiny step to see its workings.

Doing this I think I can understand why the 4th instalment never happened: The program structure might be somewhat alien to today's programmers, expecting neat initialization and complex but clean math, supported by verbose handling of data elements :))

This is none of that. It's short, highly intervened, extreme compact and down to the bare minimum - can't imagine any compiler getting that kind of code when programmed without already implying these optimizations. It may take a second or third reading to see the beauty.

*1 - Bloated that beautiful 90 lines up to 255. Sorry. But it's at least a nice binary FFh complementing the code length of 7Fh :)

edited Oct 02 '22 at 17:11

answered Oct 02 '22 at 03:42

Raffzahn

222,541
22
631
918

I had a look at the code about half a day ago and saw right away the first part of what you've added about the ports and the board setup. I also noticed that on initial entry to the code, it was using registers (and a carry bit, as well) that were not initialized but were used in the looping process. So I also assumed that this was how it was "randomized" to start. But you did a very nice job with the rest, which I didn't take time to study yet. +1, of course! – jonk Oct 02 '22 at 07:31
1

nice comments +1 I managed to make simple C++ version from your code version have added the code as answer ... – Spektre Oct 02 '22 at 09:04
1

Nice work indeed. Line 15 should be "VID@BF = 200H", though, shouldn't it? – Michael Graf Oct 02 '22 at 09:37
1

@MichaelGraf you're right, that one slipped. Sorry. It was like 6 am when I finished that "oh lets just give it a short look" phase, so I expect a few more details to come up. Regarding structured programming (beside that I do structured assembly since about the same time that programs is dated), the code does quite well adhere to structured programming rules. There are no wild jumps. Spectre's C quite linear adaption proving that point. The comment is more about the way how intervened its data elements are. Anyone having to implement that would use way more variables and interaction. – Raffzahn Oct 02 '22 at 12:39
@jonk: On their technique of using uninitialized registers to get a "random" starting state: was that actually effective on an 8080? Even if the contents are "undefined" per spec, on a lot of machines it's more that they are predictable but undocumented. Especially from one reset to the next on the same piece of hardware. So would this really have given you a different pattern every time you ran it? – Nate Eldredge Oct 02 '22 at 16:18
@NateEldredge You may have to take into account that back then RESET wasn't just something that happened once a day (or less), usually as part of a power up, but was a pretty normal operation procedure. Thus, pressing reset might end up with the last register state when the button got pressed - which can be pretty random, even without incooperating the power up behaviour of that CPU. – Raffzahn Oct 02 '22 at 16:24
2

@Raffzahn: Yeah, I get that you would RESET all the time, e.g. just to get a fresh kaleidoscope pattern. But that's a good point about the registers possibly retaining their previous contents on a "warm" reset. I was assuming a "cold" reset where you would tend to expect something fairly consistent, unless the designers deliberately fed them with some kind of noise. – Nate Eldredge Oct 02 '22 at 16:35
1

@NateEldredge The 8080 was NMOS and simple. I'm pretty sure I read somewhere (back in the 1970's) that the registers came up random and were also unchanged over a RESET event. I'd have to dig up my original book sets to be sure. (My wife wants me to dump them, very much, and it's possible that I have done so. Must check.) – jonk Oct 02 '22 at 20:06
1

@Raffzahn - just following up now. Multiple post, my apologies... – Maury Markowitz Oct 03 '22 at 18:17
So it appears the basic logic is: inner loop 64 times, on alternate passes draw black or the current color (based on the last bit of the decremented H), each time changing X and Y. At the end of the inner loop at 64, update X, Y and the mask, decrement the color, and start the inner loop again. Is that basically it? – Maury Markowitz Oct 03 '22 at 18:18
If so, one thing I don't get: E is set to one-half of the color in H on 120, but on 183 tested against 0 to see if it's looped 64 times. On line 188 it's reset to 64 if it has. But I'm not sure I understand how that value is surviving being reset on line 120 to some other value as part of H? – Maury Markowitz Oct 03 '22 at 18:28
@MauryMarkowitz For the first comment: Yes, that's the way it is. Second one: Please note that the all registers (except A) are saved in 110 (see comment) and restored in 179.. It might help to remember that the 8080 treats the stack only as 16 bit words, i.e. register pairs, so PUSH D pushed D and E (likewise B does BC and H does HL). So these 3 push free up all registers to be used during drawing, and restored later. Zilog changed the mnemonic to be PUSH DE etc. – Raffzahn Oct 03 '22 at 23:10
@Raffzahn - Ahhhh, that double-push is not something that is obvious! YLSED. Good on Zilog! – Maury Markowitz Oct 04 '22 at 14:42

Spektre · Answer 2 · 2022-10-02T15:50:17.500

25

Thanks to @Raffzahn commented code (in his answer) I was able to port the code into simple C++ code:

//---------------------------------------------------------------------------
const int VRAM_xs=64,VRAM_ys=64;// resolution
BYTE VRAM[VRAM_ys][VRAM_xs];    // video ram
const DWORD pal[16]=            // color palette (VCL pf32bit format)
    {
    //00BBGGRR
    0x00000000,
    0x00000080,
    0x00008000,
    0x00008080,
    0x00800000,
    0x00800080,
    0x00808000,
    0x00808080,
    0x00000000,
    0x000000FF,
    0x0000FF00,
    0x0000FFFF,
    0x00FF0000,
    0x00FF00FF,
    0x00FFFF00,
    0x00FFFFFF,
    };
BYTE x=0,y=0,m=0;               // Kaleidoscope state
void Kaleidoscope()
    {
    const int xc=VRAM_xs>>1,yc=VRAM_ys>>1;  // center of screen for mirroring
    BYTE c,xx,yy,cc;
    // render
    for (c=0;c<32;c++)
        {
        // update position
        yy=y;
        y+=(x >>2)&m;
        x-=(yy>>2)&m;
        // render 4x mirrored pixels
        xx=x>>3;
        yy=y>>3;
        if (c&1) cc=c>>1; else cc=0;
        VRAM[yc-yy][xc-xx]=cc;
        VRAM[yc-yy][xc+xx]=cc;
        VRAM[yc+yy][xc-xx]=cc;
        VRAM[yc+yy][xc+xx]=cc;
        }
    x++; y++; m++;
    }
//---------------------------------------------------------------------------

You just call the Kaleidoscope() on each frame and then just visualize the content of VRAM[][] in gfx api used... In VCL I did it like this:

void TMain::draw()
    {
    if (!_redraw) return;
// clear buffer
bmp-&gt;Canvas-&gt;Brush-&gt;Color=clBlack;
bmp-&gt;Canvas-&gt;FillRect(TRect(0,0,xs,ys));

Kaleidoscope();

int x,y,xx,yy,xxx,yyy;
DWORD c;
for (yy=0,y=0;y&lt;VRAM_ys;y++,yy+=pixel_sz)
 for (xx=0,x=0;x&lt;VRAM_xs;x++,xx+=pixel_sz)
    {
    c=pal[VRAM[y][x]&amp;15];
    for (yyy=yy;yyy&lt;yy+pixel_sz;yyy++)
     for (xxx=xx;xxx&lt;xx+pixel_sz;xxx++)
      pyx[yyy][xxx]=c;
    }

// render backbuffer
Main-&gt;Canvas-&gt;Draw(0,0,bmp);
_redraw=false;
}

Where Main is app window and pyx[ys][xs] is direct pixel access to backbuffer bitmap bmp... Also BYTE,DWORD types are unsigned 8 and 32 bit integers so use whatever you have at disposal in case you do not have them or typedef them ...

Here preview:

PS I am not familiar with the Dazzler so The palette I created might be wrong and also I changed the iteration a bit (count also 0 as there where not as many black pixels as colored ones and the image tends to grow too much to my taste)...

edited Oct 02 '22 at 15:50

answered Oct 02 '22 at 08:57

Spektre

7,278
16
33

2

Looks fine to me (drawing sequence isn't the same, Colour palette is right (assuming it's RGB high to low)). The Blackness is really not that important, as the eye will find patterns anyway. Nice made. Thanks for taking the ugly job of C coding :)) Would be nice to add tat to the repository for completeness. – Raffzahn Oct 02 '22 at 12:30
@Raffzahn yes I did not do a 1:1 port for convenience (probably I differ in the storing/restoring state and next position calculation slightly so the "PRNG" behaves slightly differently), if you want You can add the code to your repositiory on your own I do not use GIT ... – Spektre Oct 02 '22 at 15:37
:)) Me neither. I just figured it would be a good way to handle it without adding a huge pile of source code to RC.SE. Thanks, will add your answer mostly verbatim for completeness. – Raffzahn Oct 02 '22 at 15:41
1

@Raffzahn I found out one difference... you probably got wrong comment ; New X = Last X - ((New Y >> 2) AND MASK) it should be also from Last Y once I repaired it in my code right after // update position comment the output is much nicer (less noisy) I updated both code and preview – Spektre Oct 02 '22 at 15:52
1

Maybe much nicer, but it's not what I see in that code. Calculation of New X starts with the content of A, which still contains New Y, shifted by two (l.101), Masked (l.103) and then subtracted from Last X (l.104..106) to be moved into B, which is X. That is, unless I missed anything (or simply suck at reading 8080. – Raffzahn Oct 02 '22 at 16:09
@Raffzahn it might be also a hidden bug in original code just like in this case Understand/Reverse simple (but good quality) TTS engine after repairing bug the sound quality improved in comparison to original code – Spektre Oct 02 '22 at 16:13
1

Not sure if I would call it a bug. The original code did work quite well without saving Last Y across the calculation of New Y. So I'd say yes, your code seems to be a useful improvement, but it doesn't reflect the original algorithm. I guess Mr. Wang had multiple constrains here. Also, like any first, having a solution is infinite more valuable than having no solution :)) – Raffzahn Oct 02 '22 at 16:20
1

Well done! So just out of curiosity - with your standard compile options just as is - how many bytes now? – davidbak Oct 02 '22 at 20:51
@davidbak Interesting I did stop me from asking the same. In theory pure 8086 16 Bit code should be quite comparable in size :)) Needs to be just for that kaleidoscope() function - otherwise the whole 8080 address space might be not enough :))) Then again, it misses the whole nibble management, as here a 4 KiB RAM is used with one pixel per byte, not two as the Dazzler does. – Raffzahn Oct 02 '22 at 21:26
@Raffzahn - what is this theory you speak of? In theory binaries compiled by a C++ compiler and linked with the toolchain's linker should include only those routines needed ... in theory. And also in theory any libs or language system routines included should be also only be as large as needed and only depend on minimal amounts of other language system routines. And so on, transitively. In theory! However, in practice ... things are different. What's the size of his resulting executable file eh? (And I know you know this about C and especially C++ vs asm!!!) – davidbak Oct 02 '22 at 21:33
(You probably meant to consider just the size of the algorithm itself. But ... IDC about that! If you have to key this program in as bytes into some hex editor before you can run it ... that's the comparison to be made!) – davidbak Oct 02 '22 at 21:35
@davidbak We're both well aware about the bloat of HLL - and libraries especial -But there's also the difference of interface. handling some Window output is simply way more complex than the Dazzler. The theoretical side I was referring to was 8086 assembly vs 8080 assembly in an otherwise similar environment. I would think the 8086 would add only minor overhead, if not save some instructions. – Raffzahn Oct 02 '22 at 21:40
1

yes! so that's the question - here's a totally capable program < 175 bytes long. And now it is 45 years later - 45 years of Moore's law, Amdahl's law, etc. etc.!!! So how far have we gotten?? Surely that same graphics that was ~175 bytes in 1976 would be .... what? An order of magnitude better at least! Thus: only 17 bytes today? Perhaps fewer?? I just want to quantify the improvement! – davidbak Oct 02 '22 at 21:51
1

@davidbak Moore's law states that the number of transistors on a microchip doubles about every two years. More transistors = more memory = more bloat! – Bruce Abbott Oct 02 '22 at 23:24
@davidbak I expect maybe half in size of 8080 code on x86 asm (MS-DOS style com executable) with 320x200x256col VGA mode as x86 can do operation without the need to move the stuff to/from accumulator and that VGA mode does not even need dissect the color to nibbles... here similar app for comparison no signal demo – Spektre Oct 02 '22 at 23:57
@davidbak More capability doesn't mean shorter code. On the contrary, more possibilities means more entropy, which means longer code. – Radvylf Programs Oct 03 '22 at 03:05
I sorry to have wasted everyone's time - but on the other hand, maybe you guys should learn to recognize a joke .... – davidbak Oct 03 '22 at 03:52
3

@davidbak hopeless. we're nerds. – Raffzahn Oct 03 '22 at 07:51

score 1 · Answer 3 · answered Oct 04 '22 at 16:35

1

Ok, here is my Swift+SpriteKit version. I'd love it if someone could check the logic in the main updateKalidescope method.

Very annoying: in assembler, adding 200 to a register containing 200 is perfectly legal and you'll get the 144 that you want. The same appears true for the C++ code above? In Swift, you have to clamp the results, because using a UInt8 will get you an overflow. sigh

A curiosity: this is built using Apple's SpriteKit template, which uses SKShapeNode. This gave <1 fps! Changing it to SKSpriteNode gives >60. It seems SKShapeNode is re-drawing (or something) every sprite even if its unchanged.

It's currently updating 60 fps, does that match the Dazzler, or would it have been 30 (or infinity?)

answered Oct 04 '22 at 16:35

Maury Markowitz

19,803
1
47
138

1

Yeah, in C++ unsigned integral types (I'm assuming that's how BYTE is defined) are specifically allowed to overflow. (Not signed types though.) Assembler is of course a free-for-all. – davidbak Oct 04 '22 at 17:42
The Dazzler version is not synchronized to anything. This is early stuff, way before any modern OS or CPU's fast enough to create speed issues. With the given value ranges and a 2 MHz 8080 it simply adds up to a decent update rate. – Raffzahn Oct 04 '22 at 18:35
@Raffzahn - if you had to guess, how many loops per second? I set it to 120 in my code and that seems OK. – Maury Markowitz Oct 07 '22 at 13:43
@MauryMarkowitz Seriously, no idea. I did link a video on the Github page showing what seems to be a genuine system (but could also be a faster one). Otherwise one can assume that a 2 MHz 8080 runs as 3-400 kOP/s, so a back of the napkin calculation would take ca 150 instructions (50 main prog+4x25 subroutine) and 64 inner loops per pixel and yield 30-40 (quadruple) pixel set per second. So 120 might be rather high, more what a 4-6 MHz CPU would do :)) Anywhere between one inner loop per two frames (30/s) or one per frame (60/s) might be more appropriate. – Raffzahn Oct 07 '22 at 13:56
1

@davidbak - it turns out there IS a way to do this in Swift, but boy is it NOT obvious! If you want overflow, you add an ampersand - X &+= 1. Ugly, but less so that manually bumping the numbers. BTW, am I right in seeing two values for black in your color list? – Maury Markowitz Oct 07 '22 at 13:56
@MauryMarkowitz Thinking of it, it may in fact have been even lower than that as the Dazzler has no memory of it's own, it's a DMA device taking over the buss to fetch data to be displayed. At 2 KiB/Frame this might be round 130,000 DMA cycles per second, or about a good 1/4th of the memory bandwidth of a 2 MHz system so I'd say 30 full inner loops, or one run every other frame would be the best simulation fitting the asynchronous nature of Kaleidoscope to today's framework. Yes, may seem slow, but keep in mind, that's mind boggling fast for everyone not grown up with bleeping video games. – Raffzahn Oct 07 '22 at 14:08

Anyone up for decompiling some 8080 code for Kaleidoscope?

3 Answers3