26

When I learned BASIC on Elektronika BK, I got fascinated with the flood fill (PAINT) operator: how does it know to get to all nooks and crannies of the shape to fill? I've devised an algorithm, tried to write my own implementation of it, and soon realized that it requires a substantial amount of scratch memory. A fine grid would be a good example of a tall order.

The next thing I did was to write a very short program like this:

10 CLS
20 COLOR 1
30 FOR I=10 TO 200 STEP 2
40 LINE (5,I)-(205,I)
50 LINE(I,5)-(I,205)
60 NEXT I
70 PAINT (5,10),2,0

The grid is drawn with color 1; the paint operation uses a point on the grid and directs to paint with color 2 using color 0 (background) as the boundary.

And indeed, it hung and required power-cycling of the computer (there was no hardware reset button).

My question is, how do other implementations of the flood fill operator in BASIC on retro-computers with limited memory behave.

The absolute best would be to paint the 2-pixel-wide grid covering the whole screen correctly; presumably, this requires the most stack space.

The second best would be to detect the stack/queue overflow and to stop with a runtime error.

Here's another complex shape, a herringbone pattern, that causes PAINT to hang:

10 CLS
20 COLOR 1
30 FOR I=2 TO 200
40 FOR J=2 TO 200
50 K=I MOD 4
60 L=J MOD 4
70 IF (K=L AND K<2) OR K*L=6 THEN PSET (I,J)
80 NEXT J
90 NEXT I
99 PAINT (0,0)

Here we draw the pattern then ask to paint the background.


Findings so far:

  • Sinclair QL sidesteps the issue by implementing a simplified operation: the fill mode must be requested before drawing the outline, and If you have more than one pair of points in your shape's outline per horizontal scanline, divide the shape into less complex parts so that this condition is met.
  • CBM BASIC 7.0 on C128 fails with an "?OUT OF MEMORY" error if the shape is too complex.
Omar and Lorraine
  • 38,883
  • 14
  • 134
  • 274
Leo B.
  • 19,082
  • 5
  • 49
  • 141
  • 1
    Many Microsoft BASIC's supported the PAINT routine, including QBasic and Amiga Basic. – Brian H May 28 '17 at 04:00
  • @Brian Sure, that's where the Elektronika BASIC, mimicking MSX BASIC, got its PAINT from, but did PAINT in the original Microsoft BASICs work in a robust way for complex shapes? – Leo B. May 28 '17 at 05:09
  • The BASIC on the Commodore 16, Plus/4, and Commodore 128 all have a PAINT command. Also, the Super Expander (VIC-20) and Super Expander 64 (C64) and Simons BASIC (C64) each have a PAINT command. – Tim Locke May 28 '17 at 11:18
  • @TimLocke OK, and how well do they work on the example(s) in the question? – Leo B. May 28 '17 at 17:59
  • Look up the so called drunkard's walk algorithm.... – rackandboneman May 29 '17 at 09:15
  • 1
    "Flood Fill" is not without ambiguities - The Sinclair QL, for example, uses an entirely different shape fill algorithm (a scanline algorithm) that ignores (and thus, overwrites) "inner" pixels in a shape completely rather than "float around" and will just fill a solid color within the boundary lines of a shape. That works with a fixed-sized buffer without any danger of stack overflow, but has other limitations (might require you to sub-divide shapes in order to have them filled properly) – tofro May 29 '17 at 09:26
  • @leo-b On the Commodore Plus/4, the PAINT command has a mode where 0=paint an area defined by the color source selected; 1=paint an area defined by any non-background color source.

    For the first program, using 0 resulted in no painting. Using 1 resulted in the entire grid, including the holes, being painted, and then the rest of the background too.

    The second program, using 1, drew the box and then painted the area around it. Using 0, it painted everything.

    – Tim Locke May 29 '17 at 17:51
  • @Tim I don't get it. For the first program, are you able to cause PAINT to repaint the whole grid to a different color, leaving the holes intact? – Leo B. May 29 '17 at 19:18
  • @leo-b For whatever reason, it does not leave the holes intact. I think it should but have no idea why it doesn't. I used the coordinates you provided, although the Plus/4 is only 160x200 in multicolor mode so some of the lines went off the edge of the screen. – Tim Locke May 29 '17 at 19:44
  • CBM BASIC 7.0 on C128 fails the first example with an "?OUT OF MEMORY" error after filling about 80% of the grid. – Brian H May 30 '17 at 01:04
  • @Brian And that's how I would expect a robust implementation to behave! Electronika BK (with about 15.5K useful RAM) hangs after less than 10%. – Leo B. May 30 '17 at 02:13
  • @rackandboneman That's what I already knew. It cannot handle paths that are 1 pixel wide. – Leo B. May 30 '17 at 21:55
  • I remember one flood fill operation on - I think - the BBC micro, In certain circumstances, it could escape the visible screen area and flood fill the entire memory. Pretty much the opposite of robust. – JeremyP Apr 26 '18 at 10:20
  • Take a look at Sharp mz800, it has very fast PIANT implementation: https://youtu.be/FDPhsAtMBRM?t=110 – Frantisek Kossuth Apr 26 '18 at 11:01
  • @FrantisekKossuth That is respectable, certainly, but it benefits from a very good video-subsystem design (for an 8-bit micro). The custom VRAM interface chip eliminates a lot of bit-twiddling that the CPU normally has to do just to figure out which colours are where, and to put them right. It also decouples the VRAM from the main system bus, so the CPU isn't slowed down by the display refresh. If that's combined with an ordinary scanline-queue algorithm, good performance can be expected. But OP's question is: can it deal with complex shapes robustly? – Chromatix May 02 '18 at 02:12
  • @FrantisekKossuth ...and the answer is that it doesn't cope at all well with the herringbone. But it can deal with a fine grid, oddly enough. – Chromatix May 02 '18 at 05:47
  • The fill routine in the Atari OS Rom is particularly poor and not very useful - in fact it has no buffering (or possibly just one entry). It potentially could find some use in procedurally generated graphics for static pics and was used as such in a few graphical adventures and other programs. – Rybags May 05 '18 at 12:46
  • Outside the scope of the question, but I'll mention it for completeness: the Apple II program "Micro Painter" had a flood fill that someone once described as a "screen etcher". If you started it in the center of the screen, it would expand outward in all directions in a diamond shape, flowing around existing objects. The Apple II's quirky hi-res layout probably made that a bit more complicated than it was for other platforms. I've always wondered what data structure was used. – fadden May 06 '18 at 16:44
  • @fadden That sounds like the "pixel queue" algorithm, which is a relatively good one from a robustness point of view. If it has a big enough queue space, it should be able to handle filling any pattern. But since the ][ has a 6502, it might use a queue limited to 256 entries, which isn't enough for high resolutions (even within the restricted meaning of "high resolution" pertaining to 8-bit micros). – Chromatix May 06 '18 at 19:31

9 Answers9

15

I didn't quite manage to replicate your test, but came up with something for the Amstrad CPC6128. A screen of 40×25 lower case ‘u’ is quite a difficult fill, but the FILL command manages it well:

amstrad cpc fill

Here's the code to do this:

10 MODE 1
20 PEN 1
30 FOR i%=1 TO 25
40 PRINT STRING$(40,"u");
50 NEXT i%
60 LOCATE 1,1
70 MOVE 10,14
80 FILL 1

The animation is a little faster than realtime, but it really does take a few seconds to fill the screen.

scruss
  • 21,585
  • 1
  • 45
  • 113
  • I think that 40x25 is too coarse a grid to elicit a stack overflow on a 128 Kb computer. How many levels of GOSUB nesting does it allow (10 GOSUB 100; 100 PRINT I; 110 I=I+1; 120 GOSUB 100)? Elektronica displays an error at line 120 after printing 3459. – Leo B. May 29 '17 at 05:31
  • 1
    Locomotive BASIC gets to 83 before exiting with a Memory full error. The Amstrad CPCs were 8-bit Z80 machines. The extra memory was bank-switched in 16K pages, and most often used in CP/M+. I had difficulty getting your single-pixel grid to fill, that's why I came up with my own test. It's not 40×25, but 320×200 pixels – scruss May 29 '17 at 13:04
  • What kind of difficulty did you have with the single-pixel grid? Doesn't it prove that the algorithm is not robust? For the Scanline fill variation of the algorithm the resolution does not matter, it's the number of obstacles" per line and per column that affects the stack depth. – Leo B. May 29 '17 at 16:47
  • Note your example is not the worst case for a recursive flood fill: Filling a patterned area makes the recursion terminate early and saves stack space. Worst case for the algorithm is a large empty rectangle being filled from one corner. – tofro May 30 '17 at 08:13
  • @LeoB. it seems that the FILL command didn't like single-pixel widths. I'll try again alter, as I may have been doing it wrongly. To @tofro's comment, the CPC has no problem filling a blank 640×200 screen (MODE 2 : MOVE 0,0 : FILL 1) from the corner, so it might not be using a recursive algorithm – scruss May 30 '17 at 12:57
  • It may still be recursive, but using an optimisation for long runs in one direction (columns, by the look of it). The Acornsoft GXR is optimised for rows. – Chromatix Apr 26 '18 at 16:30
11

The classical flood fill algorithm is a recursive algorithm that, in its most simple form, can use an enormous amount of stack space

void fill (x, y, col) {
   plot (x, y, col)
   if (point (x + 1, y) != col then fill (x + 1, y, col);
   if (point (x, y + 1) != col then fill (x, y + 1, col);
   if (point (x - 1, y) != col then fill (x - 1, y, col);       
   if (point (x, y - 1) != col then fill (x, y - 1, col);
}

There are a number of strategies to limit the amount of space the fill algorithm will use, like having only working in a specific direction and limit the recursive part or using a list instead of the stack to avoid a stack overflow and be able to stop witth an "Out of memory error", and, there are alternative algorithms that aren't recursive at all.

The Sinclair QL, for example, uses such an alternative algorithm. This requires you to "bracket" shape drawing code to be filled with FILL 1 / FILL 0 pairs and fill the shape you have drawn in that bracket with the current ink color. BASIC will then set up a buffer of x-coordinate pairs per scanline in your shape and remember such pairs where you have drawn a "left" and "right" outer shape pixel. On the closing FILL 0bracket, the algorithm will run through the buffer and simply draw horizontal lines between the coordinates found in the buffer. This needs a fixed buffer space of

typedef struct fillBufferT {
   int x1, x2;
};

fillBufferT buffer [vertical_res];

with no chance of buffer overflow at all. The downside is, that this algorithm is only capable of storing one pair of horizontal coordinates per scanline - complex shapes that need more coordinate pairs need to be sub-divided by the programmer to be properly filled.

But the algorithm as such is entirely bullet-proof and cannot lead to a runtime error. It is also obviously much faster than the recursive flood-fill, but will always fill the entire shape, regardless of "what's been inside" before.

The limitation that the Sinclair QL FILL command was not able to fill arbitrary shapes was mentioned in the manual as follows:

The FILL algorithm stores a list of points to plot rather than actually plotting them. When the figure closes there are two points on the same horizontal line. These two points are connected by a line in the current INK colour and the process repeats. Fill must always be reselected before drawing a new figure to ensure that the buffer used to store the list of points is reset.

Warning

There is an implementation restriction on FILL. FILL must not be used for re-entrant shapes (i.e. a shape which is concave). Re-entrant shapes must be split into smaller shapes which are not re-entrant and each sub-shape filled independently.

tofro
  • 34,832
  • 4
  • 89
  • 170
  • That's cute, but was there a way to draw a filled non-simply connected shape, e. g. a thick ring? – Leo B. May 29 '17 at 16:50
  • Draw two half-rings (cut vertically) and fill them. – tofro May 29 '17 at 17:16
  • Actually, even a simply-connected shape can present problems for this algorithm. Consider a two-sided fine-toothed comb. It will need multiple starting coordinates per line to work properly. – Leo B. May 29 '17 at 18:07
  • I was not saying this was ideal ;) - The QL forces you to draw and fill the comb's teeth separately. But a comb-shaped shape would be a rare worst-case scenario. Apparently, they decided for this algorithm for speed and safety and decided to have the users live with the inconveniences. – tofro May 29 '17 at 18:18
  • 1
    Then it wasn't a flood fill operator. :) – Leo B. May 29 '17 at 18:22
  • 1
    Definitely disputable and depends on your definition. But I rather have something that works and is a bit inconvenient to use than something that crashes the computer in cases (which I wouldn't call a working FF command as well ;) ). – tofro May 29 '17 at 18:36
  • Granted; then I would count this implementation as "third best": not failing but not doing what's expected either. I wonder how this peculiar behavior was explained in the user manual. – Leo B. May 29 '17 at 19:12
  • More or less like I did - If you have more than one pair of points in your shape's outline per horizontal scanline, divide the shape into less complex parts so that this condition is met. Another advantage of this method: It cannot "bleed out" if your shape isn't properly closed. – tofro May 29 '17 at 19:32
  • Edge scanning and then filling in the middle was (or became) a common solution; the normal rule is that you use it to draw only convex shapes, since the limitation is that each horizontal line may intersect the shape's boundary only exactly twice, and a convex shape obeys a superset of that rule: any straight line will intersect its boundary exactly twice. – Tommy May 29 '17 at 19:32
  • Rewriting the classical flood fill algorithm to use a queue rather than a stack (as long as any points are in the queue, take one, paint it, and add any of its neighbors that need processing) would result in a maximum queue usage of twice the smaller screen dimension. If memory serves, Delta Drawing behaved as though it used that approach. – supercat Feb 01 '18 at 22:18
  • @supercat I think I can generate a test case that requires twice the screen width for a straightforward scanline-queue algorithm. On the BBC Micro's MODE 0 (640x256) that would require a 1280-entry queue. Use a fine grid pattern, then start the fill from one of the vertical segments in the middle of the screen. It'll immediately find the two long scanlines either side; from there it'll add 320 verticals above, 320 verticals below, 319 verticals between from above, and 319 verticals between from below. Of those, only the last 319 can be avoided by brute-force scanning the queue for dupes. – Chromatix May 02 '18 at 01:04
  • @supercat I'm working on a possible way to avoid that pathological case efficiently, just as an intellectual exercise. There's also the "floor painter" algorithm which requires only constant memory but is relatively complex, and looks a bit slow. – Chromatix May 02 '18 at 01:07
  • @Chromatix: If one uses a queue and starts from the center of a dot-grid pattern, the fill will start by adding four entries for each pair of dots it moves outward, but entries will start being retired once the fill reaches the top or bottom of the screen. – supercat May 02 '18 at 14:58
  • @supercat Right, but the scanline method is more efficient than operating one pixel at a time, especially when there's more than one pixel per byte. It also matters considerably whether the queue entries are painted on enqueue or dequeue; in the latter case there can be many more duplicate entries due to intervening searches nearby. – Chromatix May 02 '18 at 19:34
  • @Chromatix: Using scanline-based variations of the flood fill method will increase the maximum queue requirements significantly (often going significantly beyond twice the larger screen dimension) but that could be mitigated by falling back to a simpler approach any time the amount of remaining queue space isn't greater than twice the smaller screen dimension. – supercat May 02 '18 at 19:48
  • @supercat Which is why I'm working on an algorithm which can use the scanline method with fewer queue entries in common cases. If it works - and especially if I can port it from C to the BBC Micro - then I'll post the results here. – Chromatix May 02 '18 at 19:59
  • @Chromatix: The approach used in Delta Drawing behaved like a queue-based version of the "classical" approach (so the fill spread out from the middle in both directions). It's simple and it has a bounded memory requirement. Using a scanline-based approach will typically require more complexity to avoid a significantly-higher worst-case memory usage, but combining a scanline-based approach with a simple-approach fallback could offer the best of both worlds. – supercat May 02 '18 at 20:07
  • Bob Bishop's Micro-Painter on the Apple II used a variation. It established a 4K circular buffer at $7000 to hold the X,Y coordinates. The buffer was seeded with the initial point, which determined the color that was being replaced. Loop: get next coordinate from front of buffer, test current pixel color; if it matches the thing we're replacing, draw a point at that coordinate, and add the four adjacent coordinates to the end of the circular buffer. The Apple II screen was small enough (esp. when treated as 140x192) that the buffer never overran. Expands evenly in all directions. – fadden Oct 24 '19 at 23:38
  • I would call the QL method 'polygon fill' rather than 'flood fill', since it's drawing a filled shape rather than filling an existing arbitrary area. – john_e Feb 24 '21 at 14:45
10

In the original BBC BASIC versions for the BBC Micro, a large range of PLOT codes were reserved for the Graphics Extension ROM, which was an optional extra. The GXR included both a flood-fill function and a horizontal line-fill. The BBC Master included half of the GXR as standard, omitting the "sprite" routines, but still included flood-fill.

In BBC BASIC V, most of the commonly used PLOT codes gained convenience keywords, including quite a few associated with GXR functionality - and including flood-fills using the FILL statement. A version of BBC BASIC V was available for late-model BBC Micros, but it is most often associated with the Acorn Archimedes which ran RiscOS. On the Archimedes, GXR functionality was available as standard.

The easiest way to run BBC BASIC V or VI (the latter uses IEEE-standard floating-point but is otherwise identical) is on a Raspberry Pi running RiscOS. BeebEm in "BBC Master" mode gives access to flood-fill via GCOL 0,128+POINT(X,Y) : PLOT &85,X,Y.

BeebEm capture

The GXR manual indicates that the flood-fill routine accounts for 512 of the 768 bytes of RAM permanently claimed by GXR as workspace. A command is provided to disable the flood-fill routine to reclaim this extra RAM, reducing GXR's overhead to one page (256 bytes). The BBC Master's version of GXR used some of the shadow RAM for workspace (as well as the framebuffer), eliminating this overhead from user programs' perspective.

With 768 bytes of RAM, the flood-fill routine is able to maintain a list of 256 pixels to process, but it is still possible to overflow this list with a complex pattern such as a fine grid; the routine exits gracefully instead of crashing in this case (tested on a BBC Master emulation). Test case:

BeebEm screenshot

The RiscOS version, being less memory-constrained, should be much more robust, and my limited tests using RiscOS 3.5 (on an emulated ARM610 CPU with 32MB RAM) appear to confirm this; none of the three tests give it any trouble at 640x480 resolution (MODE 28). It must be conceded that a 32-bit environment is very different from an 8-bit one.

Chromatix
  • 16,791
  • 1
  • 49
  • 69
  • 3
    Now that's a real flood fill; it even looks like a flood! :-) – wizzwizz4 Apr 26 '18 at 19:08
  • How about the herringbone pattern? – Leo B. May 02 '18 at 02:55
  • I was able to recreate the pattern on my emulated BBC Master, and in MODE 1 (320x256, 4c) with a centred start point, it completed the fill correctly. In MODE 0 (640x256, 2c) it halted after completing a large hexagon-shaped area. So it's not quite as tough a test as the fine grid. – Chromatix May 02 '18 at 03:39
  • 1
    @LeoB. I believe the GXR algorithm uses a 256-entry circular buffer, and when the head meets the tail it believes the queue is empty (when it's actually full). Therefore it halts and returns control instead of overwriting stuff it shouldn't. The herringbone is a tougher test than the field of U's, and nearly hits the queue capacity in Mode 1. – Chromatix May 02 '18 at 03:48
5

It's not clear to me what constitutes a correct answer here, but here's my take.

GW-BASIC 3.23 uses a queue-based algorithm which is pretty robust. It has no trouble with either of your test cases or the u's mentioned by scruss; however it's still possible to design a pattern specifically to make it use lots of memory and overflow, even though it has nearly 60K available:

10 DEFINT A-Z
20 SCREEN 1:WIDTH 40:KEY OFF:CLS
30 LINE (1,0)-(1,95)
40 LINE (1,97)-(1,191)
50 PSET (0,191)
60 LINE (3,49)-(3,143)
70 LINE (5,25)-(5,71)
80 LINE (5,121)-(5,167)
90 FOR I=0 TO 3:LINE (7,13+I*48)-(7,35+I*48):NEXT
100 FOR I=0 TO 7:LINE (9,7+I*24)-(9,17+I*24):NEXT
110 FOR I=0 TO 15:LINE (11,4+I*12)-(11,8+I*12):NEXT
120 FOR I=0 TO 31:LINE (13,2+I*6)-(13,4+I*6):NEXT
130 LINE (15,1)-(255,191),15,BF
140 FOR I=2 TO 190 STEP 2:LINE (15,I)-(255,I),0:NEXT
150 FOR I=16 TO 254 STEP 2:LINE (I,0)-(I,191),0:NEXT
1000 PAINT (0,0)
1010 PRINT INPUT$(1);

GW-BASIC fill algorithm in action

The trick to make it fail is that the left part makes it double the number of "threads" several times, and each "thread" then has to cope with the many rows of dots, which means lots of holes both up and down, each of which has to be enqueued, until it crashes.

(Note there's a GW-BASIC emulator called PC-BASIC; however, as of this writing, the flood fill algorithm it uses is not the same queue-based algorithm implemented in GW-BASIC 3.23, besides the fact that it "cheats" by having much more memory available).

But the most robust algorithm I've seen, is one for the ZX Spectrum. The system's BASIC doesn't have a FILL function, but there's a BASIC extension called Beta Basic that does. Versions up to 3.1 have the most robust fill algorithm I saw at the time in a retro computer. I couldn't make it fail no matter what I tried.

  10 FOR x=1 TO 255 STEP 2
  20 PLOT x,0
  30 DRAW 0,175
  40 NEXT x
  50 FOR x=1 TO 175 STEP 2
  60 PLOT INVERSE 1;0,x
  70 DRAW INVERSE 1;255,0
  80 NEXT x
  90 FILL 1,1

(Use graphics mode and F to enter the FILL keyword).

The price for that, however, is speed: the algorithm is somewhat slow. That's probably why it was replaced in version 4.0, even though the new one is very easy to crash with either of your test cases.

Pedro Gimeno
  • 246
  • 3
  • 4
4

The Sharp MZ-800 is a little-known Z80-based micro which has to load its BASIC interpreter from disk or tape. What it does have is a bank of dedicated VRAM behind a custom video controller, which makes plotting and querying pixels significantly easier for the CPU.

I was able to put an emulated MZ-800 into "mode 4" graphics, which provide 4 colours at 640x200, and generate a similar fine grid which stumped all previous micros at this resolution. Issuing PAINT on this grid successfully changed it from white to blue, leaving the black holes intact.

The algorithm appeared to behave differently from usual, taking care of the whole scanlines first (outwards) and the vertical bars afterwards (inwards). This suggests it is a scanline-stack algorithm which sets pixels only upon retrieving stack entries, not upon placing them. This is very wasteful of memory and easily leads to unnecessary recursion steps.

When faced with the herringbone pattern, progress became very slow indeed. After several minutes, it had filled in only a few pixels, and it then gave a "Memory capacity error". So although it coped well with one "difficult" pattern, it was not robust enough to deal with another one.

Chromatix
  • 16,791
  • 1
  • 49
  • 69
4

As has been stated the problem with standard flood fill is that the buffer size can be enormous so it was somewhat rare for the 8 bit era computers to have a bundled fill routine that didn't impose limitations. I remember playing with a diamond fill variation in the 80s - the Wiki article refers to it as "4-way fill". https://en.wikipedia.org/wiki/Flood_fill

The disadvantage I found is that it executes somewhat slower than a flood fill. But the advantage is that the memory requirement is a lot less - the maximum number of buffer entries would be about 4 times the longest possible 45 degree diagonal line supported by the bitmap mode you're using. As a starting point I used the program provided as DATA within an article in Analog Computing #16 from Feb 1984 (an Atari centred magazine). https://web.archive.org/web/20060308092706fw_/http://www.cyberroach.com:80/analog/an16/default.htm

Full copy of the magazine here though it's ~ 130 Meg and very slow to download: http://www.atarimania.com/mags/pdf/analog_no_16.pdf

In most applications a constrained flood fill isn't greatly useful. It'd be OK for something like a drawing program but not so much for a realtime game e.g. Qix. For something like a graphical adventure that does procedurally generated pics, it'd be OK since you'd obviously test and correct any anomolies during development.

I downloaded the PDF I mentioned earlier - painfully slow and the text is hard to select and when you do grab some the OCR is full of errors. So, I typed in the 6502 assembly routine for the 4-way flood fill.

Someone started a thread over at AtariAge, so grab the routine from it here (hit Spoiler button in the linked thread to expand listing):
http://atariage.com/forums/topic/278446-flood-fill-test/#entry4021426

If I get time in the next few days I'll convert to "Psuedo-code" - to make sense of it you'd need to know 6502 assembly. There's Atari specific stuff in there like pulling A to get passed parameters off the stack, and CIO calls. The parameter stuff, easy to deal with. The CIO stuff - the GET call just expects the pixel value of a given location to be returned in A so could be replaced with a sub call that performs that function, same with the PUT call. Pixel values are just that - masked and shifted so that packed/chunky values equate to what a user would expect. The routine with modification can work on other resolutions so long as they're 0-255 range. Substantial modification to the program would be needed for larger values since 2 byte entries would be needed.

Rybags
  • 41
  • 2
  • 1
    There's also a "floor painter" algorithm which operates in constant memory - about 16 bytes, or less if your screen is always smaller than 256 pixels in one or both dimensions. It's probably slower than a stack or queue algorithm when faced with a complex area to fill, but is guaranteed to complete without running out of RAM. – Chromatix May 05 '18 at 06:52
  • Your "4-way fill" seeems to be a breadth-first search of the fillable space. It performs better (with respect to memory) on large open areas than a recursive depth-first fill -- but one can construct intricate mazes where up to around every sixteenth pixel on the entire screen are in the last generation of the filling, and therefore need to be on the work list at the same time. – hmakholm left over Monica May 05 '18 at 16:34
  • @HenningMakholm: I tried to construct particularly nasty patterns to test the fill routine and was never able to make anything particularly nasty. If I was missing something, that might be an interesting puzzle. – supercat May 05 '18 at 22:51
  • 2
    @supercat: I had in mind a fractal construction like https://i.stack.imgur.com/SLmZS.png -- the black pixels are walls. The ones marked in red will all be colored in generation 219 after flooding starts from the middle of the upper edge; they number more than one seventh of all the pixels. – hmakholm left over Monica May 06 '18 at 01:45
  • @HenningMakholm Looks like a very interesting test pattern; I'll see if I can replicate it in my test suite. – Chromatix May 06 '18 at 21:37
  • @HenningMakholm: I can see from your picture that doubling both dimensions would quadruple the number of "live" pixels, but for pictures with small heights, the number of live pixels won't exceed twice the width. I wonder what the shortest picture is for each k that would make the number of live pixels exceed k times the width. – supercat May 06 '18 at 23:39
  • @supercat: You can put N of my modules (of an appropriate size) side by side and use just 2log2(N) additional rows to distribute the flood synchronously between all of them. That will allow you to construct examples of arbitrarily wide aspect ratio where still close to WH/7 pixels are live in the critical generation. E.g. as a back-of-an-envelope approximation, putting 1024 blocks of each a sixteenth of my design side by side would give you something about 50 pixels tall and 46,000 pixels wide, with roughly four times the width live in the critical generation. – hmakholm left over Monica May 07 '18 at 00:16
  • @HenningMakholm: I wonder what the graph of worst-case time complexity vs worst-case storage would look like? I think a flood fill could be implemented with O(1) storage even if the only operation one could do was permanently change cells to be indistinguishable from walls. Start by using something like the Pledge algorithm to find the outside wall of the region to fill, and then use a wall-following algorithm to paint it but leaving a minimum one-pixel path except at dead ends, which may be filled in. Iterate until one performs a full loop without painting anything. – supercat May 07 '18 at 15:19
  • At that point, if the entire path hasn't collapsed, there will be a single-pixel path around one or more islands. Filling in an arbitrary point on that path would join an island to the outside, allowing the portions of the path around it to be retired as dead ends and reducing the number of islands by one. I'm not trying to be time-efficient here, but merely show that even with a shape of arbitrary complexity the algorithm could always find something to do that would leave the region to be filled contiguous. – supercat May 07 '18 at 15:26
  • Thinking about it, if speed really isn't critical, one could do the algorithm really simply: do a right-hand wall search, filling in nodes that have three filled-in neighbors and noting the first place one finds with three open neighbors. If one does a full circuit without finding anyplace that has three open neighbors, one last circuit will finish the job. Otherwise, loop until one finds the three-open neighbor node again, turn around, paint the square one had gone through to reach it, and proceed through that square's other exit. – supercat May 07 '18 at 15:38
  • @supercat Sounds a lot like the floor-painter algorithm. The one of Wikipedia doesn't work reliably (goes into an infinite loop on the herringbone). I'm looking at a couple of alternatives, one similar, one simpler. – Chromatix May 07 '18 at 16:45
  • @Chromatix: You mean "Flood fill: fixed memory method"? Yeah, that's pretty similar. BTW, with regard to putting your modules side by side, if the height stays constant the max generation size wouldn't grow very fast with respect to the larger dimension. More interesting is that doubling the size in both dimensions quadruples the live-generation size. For N=2^K, a grid that's 4N-1 by 3N-1 could have N^2 live points. – supercat May 07 '18 at 18:45
3

Atari BASIC had a fill operation, but for some reason this was not mapped to a command called FILL, but instead one called XIO.

The background here is that they couldn't fit the BASIC into an 8k ROM. The solution was to move some of the more generic operations out of BASIC into the OS ROM, which had some left-over room. So they moved the floating point routines and graphics over to the OS, and then wrote stubs in BASIC that accessed them.

Although they mapped most of these routines back into BASIC, like PLOT and LINETO, for whatever reason the didn't do a FILL. So in order to access it, you had to use the catch-all OS command, XIO (the IO is for input/output, which is what you would normally use it for). A little mystery to be solved there...

One upside to the code being relocated to the OS was that it was also available to other programming languages. One downside was that the floating point routines were very slow, so anyone that used them also got that slow performance.

Maury Markowitz
  • 19,803
  • 1
  • 47
  • 138
  • Yes, but how robust was it? – Omar and Lorraine May 02 '18 at 16:01
  • Sounds like a similar situation to the BBC Micro. That had 16KB ROMs, one each for the MOS, BASIC, and DFS. Graphics were considered an MOS feature and accessed through the VDU driver's OSWRCH call, plus some OSBYTE calls for less-frequent queries. Floating-point calculations were considered a language feature and implemented in the BASIC ROM, so other languages had to duplicate them. – Chromatix May 02 '18 at 19:42
  • More to the point, which of the many Atari computers is this relevant to? If we know that, we can test the fill routine using an emulator and the test patterns used elsewhere in this thread. – Chromatix May 02 '18 at 19:46
  • @Chromatix - Interesting parallel with the BBC there! Also interesting that they had THREE 16k ROMs, that's what 2 years of Moore's gets you... As to the versions, any of the 8-bits had this. Do you have a test you would like me to try? – Maury Markowitz May 04 '18 at 17:34
  • @MauryMarkowitz The "field of U's" and the "fine grid" referred to in my BBC Micro answer would be a good start. The other good test seems to be the "herringbone" pattern, with 0,0 1,1 3,2 2,3 pixels set in every 4x4 block. Use the highest resolution possible, even if it doesn't support colour, for the most severe test. – Chromatix May 04 '18 at 22:44
  • @MauryMarkowitz And if you think 48KB ROM is impressive, the BBC Master came with 128KB ROM, 32KB direct-mapped RAM, 32KB shadow RAM (MOS/DFS/GXR variables and screen memory were moved there), and 64KB of "sideways" RAM which could be used to soft-load extra ROM images. You could also add a second processor with double the speed and no cycle-stealing, and its own 64KB RAM. – Chromatix May 04 '18 at 22:56
  • 1
    I managed to dig out an Atari 800XL emulator and a copy of the Atari BASIC manual. The fill routine only works to fill colour 0 with something else, so I made a "field of dots" to simulate the "fine grid", and started the fill from the middle of the screen. This made the machine reset after incompletely processing half the screen, in a manner strongly suggestive of a stack-based algorithm. – Chromatix May 04 '18 at 23:56
  • 1
    Even worse results with the herringbone pattern; it only "filled" a diagonal line to the southeast corner before resetting. Moreover, the thing throws an error if you try to plot outside the screen, instead of simply discarding the nonexistent pixels as the BBC Micro does. Not very impressive, really. – Chromatix May 05 '18 at 00:10
  • So then, not robust! – Maury Markowitz May 06 '18 at 17:43
  • As Rybags notes below, this spawned a thread on AtariAge which resulted in several improved algos. Perhaps someone might want to back-port these to other platforms? – Maury Markowitz May 14 '18 at 13:23
  • Source code to some of Atari's OS was available at the time, I remember reading through the fill and line-draw routines. I think that the fill was recursive, like @tofro describes. The line-draw was very clever (to young me at least), as they managed to do it without using floating point or division. Sadly, I can't remember exactly how, though. – Marc Bernier Feb 24 '21 at 13:57
3

The answers specific to each micro are interesting, but I thought it might be useful to compare the classical algorithms from the point of view of theoretical memory requirements. So I implemented a test harness in C and ran it on my MBP.

The test cases are:

  • Field of U's, 8x8 chars on a 640x256 canvas, starting from the top and the bottom. This is very close to the animation in my BBC Micro answer.
  • Fine grid, 320 columns and 128 rows on a 640x256 canvas. Filling begins either from the corner, from a row in the centre, or from a column in the centre. So far only the MZ-800 has passed this test convincingly, but it has a smaller screen resolution than the BBC Micro.
  • Herringbone pattern on a 640x256 canvas, starting from either the first background pixel on the top row, or one in the centre. Nothing has passed this test in the 8-bit category, except for the BBC Micro with reduced screen resolution (320x256).
  • Yin-Yang symbol rendered 768 pixels across on a 1024x1024 canvas. This provides several simple shapes which represent a plausible "intended use case" for a compact graphics library.

So far I've got six variants of classical flood-fill algorithms working correctly:

  • Pixel Queue with painting before enqueue.
  • Pixel Stack with painting before enqueue.
  • Pixel Stack with painting after dequeue (this one has the simplest implementation).
  • Scanline Queue with painting before enqueue.
  • Scanline Stack with painting before enqueue.
  • Scanline Stack with painting after dequeue.

Of these, there is a clear categorical difference between the Queue and Stack variants in terms of peak memory usage. For any of the test cases above, the Pixel Stack algorithms require such deep stacks that they would have severe difficulty fitting into an 8-bit micro. For the "grid" and "herringbone" tests, the paint-after-dequeue version additionally requires twice the stack size of the paint-before-enqueue version - as many as 81154 entries for one of the "fine grid" fills, requiring over 240KB RAM just for the stack.

Conversely, if we ignore the Yin-Yang test with its huge canvas, the Pixel Queue algorithm gets away with a 513-entry queue on the worst test case, which is starting the "fine grid" fill on a central row (starting on a column requires 512 entries). If each queue entry requires 3 bytes (one for the Y axis, two for the X), that's a hair over 1.5KB. One of the Yin-Yang fills requires 1160 entries.

The scanline variants show a similar pattern in general, but differ in detail. The Yin-Yang tests are really their forte, as even the most wasteful stack-based variant requires only 4 queue entries on those. They struggle a bit more with the more complex tests, with Scanline Queue requiring 959 entries to cope with starting the "fine grid" fill on a central column, and 381 to fill the herringbone from the centre. These queue entries would also be a bit bigger, requiring 4 bytes each if two X coordinates were packed into 3 bytes. Memory usage could thus go up to 4KB, though performance is generally improved by dealing with runs of pixels in one go.

The BBC Micro (especially the BBC Master, in which the GXR extensions came as standard) has plenty of RAM to make either of the queue-based algorithms work. However, the 6502 makes constructing queues more than 256 entries long a bit awkward, because it only has 8-bit index registers. (Multi-byte queue entries can be split across pages, so you can always have 256 of them.) To index into a longer queue, you need to explicitly construct a pointer in memory. I assume that GXR implemented a 256-entry queue for ease of implementation and for performance reasons, and because it's far more than adequate (with the Scanline Queue algorithm) for the simple, solid shapes that most people want to fill.

I also implemented the Floor Painter algorithm, which uses neither a queue nor a stack and is therefore not memory-constrained. On the Yin-Yang tests, it appears to be similar in performance to the Pixel Queue algorithm, but using less memory. On the synthetic patterns, it still uses no extra memory, but is very slow indeed. I might port this one to the RiscPC instead of the BBC Micro for that reason.

There might still be a bug in the Floor Painter; filling the herringbone from the centre appears to hang, while filling it from the corner didn't take long compared to the fine grid tests. I'll poke it some more with the debugger.

Chromatix
  • 16,791
  • 1
  • 49
  • 69
  • Incidentally, another algorithm which may be usable if there are at least four colors that would be recognized as "edges" is to use a backtracking algorithm, but initially fill each pixel with a color that indicates its direction from the previous one. When backtracking, observe the color of each pixel, paint it with the final color, and move in the direction the pixel's color had indicated. That approach will allow any size screen to be flood-filled with O(1) memory outside the screen pixel array. – supercat May 06 '18 at 23:44
  • @supercat That presupposes you conveniently have several colours in the palette exclusively for use by the fill routine. In effect, it's just another way of saying you need O(N) storage, where N is the number of pixels involved. – Chromatix May 07 '18 at 06:19
  • It's not necessary that the colors be reserved exclusively for the fill routine. All that is necessary is that they be recognizable as edge colors; as I think about it, even that might not be necessary if one is sufficiently clever. Divide colors into "fill", "edge", or "anything else". Use right-hand rule to follow the boundary of what one can fill, and paint places one goes straight with "edge" color and places where one turns right with "fill" color. When backtracking, the color of a square combined with the direction of backtracking will indicate the direction to backtrack next. – supercat May 07 '18 at 15:03
  • Operations whose job it is to change the contents of some storage in some way do not usually count that storage in the "space" requirement. Sorting algorithms, for example, are usually allowed to write and read back "array slots". While one could perform a selection sort that required O(1) space and could never read back anything it had written (e.g. producing a sorted printout from a list stored on CD-ROM) such things are very seldom required. – supercat May 07 '18 at 15:45
  • @supercat With sorting algorithms, the object is to rearrange the items, so rearranging them is allowed. With a fill routine, you either can't predict what the edge colours might be (you're given the background colour) or you can't predict the colours already present in the area to fill (you're given the edge colour). So unless there are colours specifically reserved, you can't rely on the framebuffer itself for storage. Either way, it counts as auxiliary memory because you're storing values not found in the input. – Chromatix May 07 '18 at 16:01
  • My implementation of the Floor Painter has the same bug; I guess either there is a mistake in the Wikipedia description, or the algorithm was intended to be run from the corner to guarantee termination. – Leo B. May 07 '18 at 21:09
  • @LeoB. Yes, the Wikipedia version is buggy; it fails to detect looping around any island in the herringbone pattern, never setting any marks or flags in that case. I found a broadly similar version elsewhere, and a simpler version as well, which I'm gradually converting to C in my test harness. – Chromatix May 08 '18 at 06:34
  • @Chromatix If you have a bug-free version of the floor-painter algorithm, please provide a pointer. – Leo B. Jul 26 '18 at 21:18
  • @LeoB.: I was just revisiting this post and noticed nobody had responded to floor painter. I think for simplicity and performance, one should start by expanding out walls as much as one can without affecting the topology of hallways, so as to eliminate all 2x2 unfilled areas except those which one filled and one unfilled block adjacent to each corner. This will minimize the amount of work one must do in the excess-looping parts of the exercise. After any time one does a fill, make note of one's position and direction and watch for it to recur. Also watch for the first 3- or 4-way branch... – supercat Feb 28 '20 at 06:32
  • ...after one starts looking for a loop, and make note of its position. If one re-encounters the starting point from a the other side without having seen a 3-way or 4-way branch, all that's left is a single loop that needs to be filled in, so fill in the square and proceed forward in either direction. If one re-encounters a 3-way or 4-way branch from a new direction, leave again in the original direction but fill in the first square one passes through. If one restarts loop detection any time one fills in a square, or on the first 3-way or 4-way branch one finds after filling in something... – supercat Feb 28 '20 at 06:36
  • ...I think that should avoid trouble. Note that if it's not possible to find and re-encounter a two-way spot going in opposite directions without in the interim either having encountered and re-encountered a 3-way or 4-way spot, or having found and filled in a dead end, so the only "re-encountered two-way" scenario that would be relevant would be the simple loop case. – supercat Feb 28 '20 at 06:42
1

Locomotive BASIC 1.1 on the Amstrad CPC6128 had a fairly decent FILL command. It wasn't the fastest, but then what FILL routines in BASIC were back then? It would do a decent job of filling complex shapes though.

This was one of the differences between 1.1 and the BASIC 1.0 supplied with the CPC464.

Caleb Fuller
  • 574
  • 2
  • 6