What made some 8-bit BASIC interpreters especially slow?

Question

Most 8-bit systems had a BASIC interpreter that ran at a rate roughly commensurate with the CPU type, speed, memory bandwidth and interrupt status. Some systems, however, had interpreters that ran at a fairly dismal speed: I'm particularly thinking of Atari BASIC and Sinclair BASIC.

What design decisions contributed to making these interpreters so inefficient compared to contemporary competitors?

This question is asked in counterpoint to BBC/Acorn BASIC, what made it so fast? — scruss, Sep 14 '22 at 23:26
The other question already mentions terrible floating point math implementation for the Atari version (presumably when compared to the Microsoft BASIC used by most 8-bit systems). It wouldn't surprise me if Sinclair's BASIC also suffered for this too. — Brian H, Sep 14 '22 at 23:34
I've read Atari BASIC has to search from the beginning of the program every time any GOTO or GOSUB is executed. They seemed to be focused on getting the required features into 8KB by the deadline and didn't have time to optimize for speed. — Tim Locke, Sep 15 '22 at 00:07
TI BASIC is quite slow because of the hardware in the system. It runs out of the video RAM. — Tim Locke, Sep 15 '22 at 00:09
Commodore BASIC 7 in the Commodore 128 is quite slow. Like Commodore BASIC 3.5 in the Commodore 16 and Plus/4, the BASIC ROM is in a separate bank from the RAM that holds the BASIC program so while executing, the banks have to be switched back and forth constantly. Also, because Commodore BASIC 7 had over 256 commands and functions, it had to use two bytes to represent them, which additionally slowed the interpreter. — Tim Locke, Sep 15 '22 at 00:14
While there are exceptional slow BASIC, I would think this question is not very focused - one could go in quite lengthy discussions about details without having any real result - just opinions. Especially when repeating assumed speed from back then. — Raffzahn, Sep 15 '22 at 00:25
@scruss They were terribly slow even when compared to mid to late 1960's BASIC interpreters. Mostly, it was due to youthful inexperience and some bad choices (mathematical for the transcendentals and just in general.) If you want to see how it should be done well, see HP 2000F TSB. (Circa 1969, or so.) Many good choices were made with their simulation of BASIC. And it was quite fast. Even on 1960's computer tech. — jonk, Sep 15 '22 at 01:37
Your question lacks details, which specific dialects do you mean? For example, there is no single Sinclair BASIC. Especially the ZX81 suffered (in SLOW mode) from the burden to generate its video output in software, reducing the speed to about 1/4. — the busybee, Sep 15 '22 at 06:43
@TimLocke I'm not sure what you mean by "two bytes to represent them". That can mean 2 bytes were used to represent every command, or most commands were 1 byte, with one value being an "alternate command set" indicator to mean the following value represents a command of the second set. The latter could be much more efficient than the former. — RichF, Sep 15 '22 at 06:50
QL SuperBasic was OK for speed by the standards of the time. — Michael Harvey, Sep 15 '22 at 09:09
@MichaelHarvey: definitely. I translated Fortran FFT functions to it, and used it to learn about DSP. I also wrote a program to calculate the Mandelbrot set, and added a selection system to zoom in on it. Its mathematical routines were fairly fast. — chthon, Sep 15 '22 at 10:17
@chthon - it went quicker still if you used Freddy Vaccha's Turbo compiler. Forty years on, nearly, I can confess that a pal gave me an (unlicensed) copy. The protection was a gadget with plastic prisms called a Lens-Lok which you had to use to look at a randomly scrambled character pattern on the screen and enter the correct chars to unlock the program. I could solve it by eye three times out of four, making the program perfectly usable. — Michael Harvey, Sep 15 '22 at 10:33
@RichF: The keywords in 8-bit MS BASICs (and most others) are stored as a single byte to save space in memory. It also saves time during interpretation because keywords are pre-interpreted. One byte can only denote 127 keywords so to handle more, two bytes are required for each keyword. Having to parse two bytes per keyword takes longer than parsing one byte but it's still faster than detecting keywords in plain text.. — Tim Locke, Sep 15 '22 at 13:24
@jonk Yes, I used Basic on the HP2100A in my high school in 1972 (only 8K of core memory). It was quite fast. IIRC/AFAICT (and I could be wrong about this), one of the reasons was that it stored the program as a VM / bytecode. I say that because when I entered a program, then printed it out (on the ASR 33), it would change it slightly (e.g. unify the spacing and renumber the lines(?)). — Craig Estey, Sep 16 '22 at 15:06
@CraigEstey They did more than 'bytecode' They pre-compiled it such that all references to lines were replaced with address pointers to those lines (no searching, as many later BASIC incarnations stupidly forced themselves to always do every time) and replaced variables with address references to the variable table, again so no searching. And there was still more done. I am working on re-writing it, right now, for the MSP430FR5994. — jonk, Sep 16 '22 at 20:32

Jim Nelson · Answer 1 · 2022-09-15T17:10:04.850

33

"Atari BASIC: The Good, the Bad, and the Ugly" is an excellent summary Atari BASIC's advantages and weaknesses. To answer the Atari half of your question:

How did it get so slow? Basically because of two problems – a poor implementation of line number lookups in loops and jumps, and a poor implementation of multiply and divide.

Because Atari distributed their standard BASIC as a ROM cartridge (and later in system ROM with the XL line), it was difficult and expensive to distribute a patched version that could have fixed these problems.

Atari BASIC's designers discuss the language's internals in The Atari BASIC Source Book.

edited Sep 15 '22 at 17:10

answered Sep 14 '22 at 23:59

Jim Nelson

3,783
1
18
34

3

I don't think the ROM media was really the problem with patching. There was no internet back then, so "patches" for home PC software weren't really A Thing. However your software was initially delivered to them, bugs and all, was how the customer was going to use it. – T.E.D. Sep 15 '22 at 14:40
2

Oh hacks really * were * a thing. Software might have been on tapes and on old floppy disks with software controllers, but teenagers were disassembling copy protection and modding games as early as Apple II and BBC model B, because I remember it from 1980-82 ish. The difference was they circulated between friends, by hand, not online. (Perhaps BBs but I wasn't around those). So yes, patches and mods were rampant, and modded software to remove bad features and limits, as well as copy protection was an established scene. Systems that only took Roms sometimes had modded Roms, but rarer. – Stilez Sep 15 '22 at 16:05
@Stilez - Yes, that was totally A Thing, particularly in the later Amiga/ST era. Modems became common only after '83 or so though. Before that hacks were 3rd party enthusiast things, largely distributed virally via floppies or in hobbyist magazine articles. I never heard of anyone hacking their BASIC interpreter itself, but I guess that doesn't mean people somewhere weren't doing it. – T.E.D. Sep 15 '22 at 16:14
1

@T.E.D.: Many versions of BASIC included hooks to change certain aspects of behavior, and tweaks to do so were quite commonplace, with some being supplied by system vendors. – supercat Sep 15 '22 at 16:46
1

In the early 80s, user groups, computer dealers, and fellow enthusiasts were surprisingly efficient at getting updates distributed. I was always using the latest version of Atari's disk-based DOS as it arrived, but never upgraded the BASIC cartridge that came in the box with the machine. – Jim Nelson Sep 15 '22 at 17:15
It's been many years since I used my Atari (like about 40 years) but there was a way to get Basic into RAM. I once did that with the intentions of writing an article on it for one of the mags. I wrote some code that added some commands to Basic, and an ELSE to IF, using info from the Atari Basic Sourcebook, which was a great read. I think the only way to do it was to write the Basic ROM to disk, then start it with the Option key down and load it into the same area, then apply my patches. It was cool, but not very practical for widespread use. – mannaggia Sep 15 '22 at 19:03
2

@T.E.D. your "local computer shop" who had a relationship with a sales rep at the manufacturer was a thing, and fixes could be (and were) distributed that way if the company had an interest in doing so. – hobbs Sep 16 '22 at 02:58

doug65536 · Answer 2 · 2023-03-02T00:29:03.237

27

Commodore BASIC suffered from four major performance issues:

It stored numbers as text in the source, and had to parse numbers every time they were used.
Program lines are stored as singly-linked lists, so if it branched back to a line earlier in the program, goto and gosub had to start at the top of the program and find the target line, one line at a time. Forward branches searched one line at a time, starting from the executing line.
Variables were stored in a singly linked list, so it would iterate through the list, one by one, until it found the referenced variable.
All computations are done as software floating point. Integer variable updates would be converted to floating point, have the computation performed, then converted back to integer.

Other than those severe performance issues, it was pretty good.

edited Mar 02 '23 at 00:29

answered Sep 15 '22 at 12:17

doug65536

371
2
5

10

Wow, that's last bullet ... that's just appalling. Floating point math back then had no CPU support whatsoever, and was so slow that programs used to move heaven and earth to avoid it. I have a tough time believing anyone would have been so profligate as to do loop increments with it. – T.E.D. Sep 15 '22 at 14:46
2

@T.E.D.: It may seem absurd, but that's how things were done. The fact that stuff worked at all was sufficiently magical that there was no perceived need to make things faster. I don't think one would have needed to add very much code to a typical MS BASIC to make it so that numeric constants would be parsed as integers unless they exceeded 65535 or a decimal point was seen, nor to say that if the exponent byte of a number was zero, its value would be expressed by a two-byte integer and the sign bit, addition, subtraction, and comparisons of such values would operate... – supercat Sep 15 '22 at 16:44
1

...on them as integers if the results would fit in two bytes plus sign, and would otherwise convert them to float. Doing that would have yielded a really huge speedup, but I can't think of any 8-bit MS-based BASIC interpreters that made any effort to do such a thing. – supercat Sep 15 '22 at 16:45
There were BASIC programs in those days that optimized storage of numeric arrays (that happened to be integers between 0 and 255) by putting them in a string. That wouldn't have helped computation, though. – dan04 Sep 15 '22 at 16:51
2

Also, some interpreters allowed the use of arbitrary numeric expressions as GOTO/GOSUB targets, e.g., GOTO 1000+N*100. This provided a compact way of implementing a switch statement, if you set up your line numbers correctly. But it may have impeded optimization, as you then can't distinguish lines that are potential jump targets from lines that aren't. – dan04 Sep 15 '22 at 16:56
Curiously, Commodore BASIC is pretty much the same speed as any other interpreter that could run on the same hardware. BASIC uses floating-point by default always, so thise repeat conversion isn't too absurd – scruss Sep 16 '22 at 01:03
2

@T.E.D. that’s how the original BASIC was designed. It had no other numeric type than floating point, to not bother beginners with such things. Everything else is a non-standard extension… – Holger Sep 16 '22 at 08:10
1

This is not just Commodore BASIC: this is pretty much any Microsoft BASIC of the late '70s or early '80s. – cjs Sep 16 '22 at 09:34
4

"All computations are done as software floating point. Integer variable updates would be converted to floating point, have the computation performed, then converted back to integer.". Sooo... just like Javascript is still doing it in 2022? – Eric Duminil Sep 16 '22 at 14:36
1

@EricDuminil: In many ways, JavaScript today is like BASIC on c. 1980 microcomputers. Both were designed around making it easy to make simple, short programs ("make the monkey dance" in JS's case), while being difficult for large ones. And both became popular because an interpreter for it is available whenever you need one. – dan04 Sep 16 '22 at 15:24
@EricDuminil: Many Javascript implementations keep track of which objects hold integers and which hold floating-point types. From what I understand, some will even generate code for loops which will use overflow-checked integers until an overflow occurs, and then start doing calculations in floating point. – supercat Sep 23 '22 at 14:59
@EricDuminil Difference is, computers today are much faster, and more importantly, have dedicated hardware just for floating point math. These old computers had none of that whatsoever. So you had to use integers to simulate floating point (good luck writing that without the C standard library, it's not easy.) – puppydrum64 Dec 07 '22 at 13:58
@puppydrum64: Isn't it the opposite in JS? Use floating points to simulate integers? – Eric Duminil Dec 07 '22 at 20:56
@EricDuminil Could be, but modern computers have the hardware to help do this much more quickly than Commodore BASIC, which didn't even have hardware support for 16-bit integers, let alone floating-point numbers. So really, Commodore BASIC had to use integers to simulate floating point, and convert them back to integers to do the math, and back to floating point as their "default state." In JS, the numbers are held as floating point in hardware, skipping the first step that Commodore had to do. Also, Javascript runs on 32-bit machines making this process much quicker. – puppydrum64 Dec 08 '22 at 11:21
@puppydrum64: Yes. Speed isn't the only interesting factor, though. There are related problems with those fake integers. For example, you'd expect n == n + 1 to never be true for integers, in C, Java, Ruby or Python... But in JS, 2 consecutive integers are considered equal if they are large enough. – Eric Duminil Dec 08 '22 at 11:49
1

@EricDuminil Wow that seems like it's completely broken. – puppydrum64 Dec 08 '22 at 15:56
@puppydrum64: Exactly my point. You've understood JS Numbers. :D – Eric Duminil Dec 09 '22 at 10:19
@EricDuminil Floating point represents integers exactly, up to the width of the mantissa, including the hidden bit. It would be exact for integers from -(2^24) up to +(2^24) on 32 bit single precision floating point numbers. For double precision (like JS) it is exact for integers from -(2^53) up ro 2^53. – doug65536 Dec 10 '22 at 01:05
1

@EricDuminil: In C, if n is a signed integer type, then not only might n==n+1 be true, but an attempt to evaluate n+1 may make inferences about the values of other objects, which may cause them to behave nonsensically in cases where evaluation of n+1 would have overflowed. – supercat Dec 11 '22 at 07:57
@supercat I stand corrected, thanks. Hmmm, I'll need to check a few projects I wrote in C++ :-/ – Eric Duminil Dec 11 '22 at 13:04
@EricDuminil: From what I understand K&R1 characterized integer overflow as having wraparound semantics. K&R2 changed characterized it as having "machine dependent" semantics. The authors of the Standard expected two's-complement quiet-wraparound behavior to be dominant, but would probably not have been opposed to compilers providing an option to perform substitutions like replacing x+y > x with y > 0. More aggressive compilers like gcc, however, will attemot to back-propagate assumptions about inputs based on possible overflow, in ways that aren't bound by ordinary laws of causality. – supercat Dec 11 '22 at 19:50
Small remark on bullet point 2: the search for GOTO/GOSUB is done from the beginning only for line numbers smaller than current line. For greater line numbers, the search starts at the current line. I've read the Applesoft source code recently and that's what's implemented. Quite sure it applies to other MS-Basics too. – Patrick Schlüter Dec 12 '22 at 09:26
@supercat I suspect that all the code to do integer math first, then fall back to FP on overflow, would have given up all the speed advantages of doing integer math in the first place. Integer multiplication is already expensive, and you'd have to do it once, then possibly again with FP. Also, every numeric value would be either a 16-bit integer or a FP value, and so you'd need to handle both cases everywhere: in PRINT, in function arguments, in operators, etc. For example for addition you'd have to handle int+int, int+FP, FP+int, FP+FP. – Willis Blackburn Feb 20 '23 at 05:33
@WillisBlackburn: On the Apple II, I wrote a little CHRGOT wedge which checks every character processed to see if it's a # and, if it is, checks whether it is being fetched by the FIN routine (formula evaluation). If so, it interprets the following characters as hex digits. The loopFOR I=256 TO 511:P=PEEK(#C030):NEXT runs 2.5 times as fast as the loop FOR I=256 TO 511:P=PEEK(-16336):NEXT despite the extra processing needed to inspect every character. Floating-point math really is S--L--O--W. – supercat Feb 20 '23 at 15:24
@WillisBlackburn; As for handling all cases separately, all that would be necessary would be to add code immediately before routines that can't handle integer math to check whether the exponent is zero but value bits are non-zero, and convert the value to a normalized floating-point value in that case. The main performance win would be with addition and subtraction. Next biggest would be integer multiplication in cases where one of the operands is in the range 0-255. While integer multiplication is somewhat expensive, FP multiplication is far worse. – supercat Feb 20 '23 at 16:51
@supercat There’s no doubt that it’s faster to parse hex chars into an integer than it is to parse base 10 digits into a float and then back to an int for PEEK. But I’m talking about the program in general, not just parsing numbers out of the program code. I suspect that having to constantly promote integers to FP, which often requires a lot of shifting in order to normalize the value, would negate the benefit of more efficient addition and subtraction when both operands are integers. – Willis Blackburn Feb 22 '23 at 04:20
@WillisBlackburn: Looking at the Applesoft code some more, there are a few spots in the floating-point math routines where it spends a really huge amount of its time. One of the worst is at $E8FD. The routine to shift a floating-point number right one bit is designed to perform sign extension, which is accomplished by using a ten-byte 27-cycle sequence of instructions. Splitting the shift process into a pair of functions, one of which would always shift in ones and one of which would always shift in zeroes, would save 21 cycles per loop (over 100 cycles when shifting... – supercat Feb 22 '23 at 17:15
...a number to the right seven places). Another couple of good opportunities for speed up, even sticking with floating-point math, would be having the FADD check whether one of the operands is a power of two and use specialized code for that scenario, and having a separate version of QINT for use with the INT function (which would require shifting all 32 mantissa bits) and for operations that require integer values (which would only require shifting 8 or 16 bits). I think my next experiment may be to write a USR function which checks whether it is being called from... – supercat Feb 22 '23 at 17:25
...one of the routines which is supposed to evaluate an expression and then convert the result to an integer and, if so, pop the intervening function call from the stack and convert its operand to an integer more efficiently than QINT. If that works, POKE USR(X),USR(Y) would be equivalent to POKE X,Y, but faster. – supercat Feb 22 '23 at 17:28
@supercat I've given some thought to your suggestion that exponent 0 could flag the significand as an integer. I'm actually in the process of writing a 6502 BASIC so it's an idea that I could potentially implement. Normally exponent 0 with a non-zero significand is a subnormal number, so using exponent 0 for integers would preclude its use for subnormals. Do you think that's a good tradeoff? I actually wasn't sure if I should bother with subnormals, since let's face it, a 6502 micro isn't a great platform for scientific computing, but I wound implementing them just because it was fairly easy. – Willis Blackburn Feb 23 '23 at 23:54
@supercat My current FP implementation is mostly IEEE 754 single-precision but with an extended 32-bit significand (including the implied 1). But before this I implemented a FP system where the binary point was after the last digit of the significand, so that all integers could be regular 32-bit numbers, in an attempt to avoid FP math on integers. The downsides were that division and especially comparisons were awkward because I had no "normal" form. So FOR loops would get an advantage from simpler addition but then lose efficiency when checking for the end of the loop. – Willis Blackburn Feb 24 '23 at 00:02
@WillisBlackburn: If the step and starting values of a FOR loop are integers, and the ending value can be converted to one, having a separate kind of "integer" FOR loop entry on the stack would avoid the need for expensive comparisons. Otherwise, looking some more at Microsoft FP code, I've been thinking that moving the sign bit to the bottom of the mantissa and using an explicit 1 would allow some nice simplifications, since one could simply use the bottom bit of an unpacked mantissa as the guard bit, and after doing addition one could defer normalization. Much of the time spent... – supercat Feb 24 '23 at 15:58
...in floating-point math is spent shifting, and there's a lot of room for speed up there. Another thing I've pondered is whether it might be advantageous to have the floating-point accumulator use a five-byte mantissa which is shifted in multiples of eight bits, round the bottom byte based on the number of bits that are filled on the top byte, and then store the OR of the top byte and the rounded bottom byte. Loading a floating-point value would require masking the top and bottom bytes, and storing a floating-point value would require counting how many bits are used in the top byte, but... – supercat Feb 24 '23 at 16:06
...a routine which checks whether the accumulator is 15 or less and then counts either how many left shifts are required to get a carry, or how many right shfits are needed to get zero, could be a lot faster than the right shifting used to add the value 1 to a value in the range 128 to 255. I'd be interested in working with someone on the design of a 6502 BASIC interpreter. I think my big interest is in efficient string handling. – supercat Feb 24 '23 at 16:17
@supercat Reach out to me on LinkedIn, I'm easy to find! – Willis Blackburn Feb 26 '23 at 19:25
@PatrickSchlüter I applied the correction, thanks. – doug65536 Mar 02 '23 at 00:30

score 11 · Answer 3 · answered Sep 15 '22 at 21:31

Oric BASIC was slow for several reasons I can think of

like most interpreted BASIC flavours, it used floating point for numeric variables by default. One could use integer types (ex: A%) but in practice that was rarely used.
In so-called "high resolution" mode, graphical operations were limited to point, line, circle. There was no flood fill or ellipses for instance, you had to roll your own (better in assembly!)
the language doesn't provide array copy. You have to use basic loops.
the keyboard hardware doesn't trigger interrupts, so the BASIC (using the 100Hz interrupt) scans each key of the keyboard a lot of times per second. It has to, when you have to type text. In games, reading the keyboard without any tricks results in a 20% performance loss (even in pure assembly loops, as the interrupt was active all the time, unless shut down and replaced by custom poll of just a few useful keys, or reduced frequency of the polling (L'Aigle d'Or used the latter).

(I would be tempted to add that it was running in ROM, and that ROM memory is slower than RAM memory, but I'm not sure if it applies to this case)

Despite those issues, it was able to store a lot of instructions. Each basic token (FOR, CLS, ...) was stored as 1 byte token. Coding the same thing in assembly or in C took a lot of instructions, thus memory. It was very often that commercial games used BASIC for the non real-time parts of the game (hiscore, menu, intro, level init) for simplicity/quicker development but also for smaller size, and reserved pure assembly for the main game loop.

In many MS dialects of BASIC, if one writes A%=B%+C%, the interpreter would convert both variables to floating-point format, then perform the addition, and then convert the result back to an integer. This sequence of operations would take much longer than simply using floating-point variables. Arrays of integers would take less storage than arrays of floating-point values, but that was the only time when integer types would offer any advantage. — supercat, Sep 15 '22 at 21:57
there is also euclidian division... but that can be achieved with rounding too — Jean-François Fabre, Sep 16 '22 at 06:16
I feel like that's incredibly difficult to do since the BASIC interpreter used the zero page liberally to increase speed, meaning that you have to know exactly what memory locations are safe to use, which can get confusing fast. Meanwhile if you stick to pure assembly you can just do whatever. — puppydrum64, Dec 07 '22 at 13:59

score 6 · Answer 4 · answered Sep 15 '22 at 23:10

6

Don't blame the BASIC - blame the computer.

The ANTIC chip on Atari computers basically locks out the 1.79MHz 6502 for the entirety of each visible scan line in text mode, so the CPU gets maybe 40% of the available cycles (horizontal and vertical blanks).

The Sinclair uses its Z80 to do screen refresh and again, that leaves little for BASIC (or anything else).

answered Sep 15 '22 at 23:10

d3jones

911
4
3

1

And that's why the ZX81 had FAST and SLOW modes. Not that FAST mode was that fast, it was just fast by comparison with SLOW mode. – Neil Sep 16 '22 at 21:59
Sinclair BASIC was on the ZX80, ZX81 and Spectrum but the latter used the ULA to display the screen, not the Z80. Sinclair BASIC just wasn't written for speed, it was to be compact in the small ROMs of the ZX80 and ZX81 while Nine Tiles weren't given the opportunity to speed it up for the Spectrum and its 16 KB 90%-full ROM. – TonyM Sep 21 '22 at 23:23
1

Didn't the Sinclair skip text parsing altogether by storing each BASIC instruction (e.g. PRINT,GOTO,etc) as a single byte, meaning that you had to press the specific PRINT or GOTO key to actually use those commands, but in return saving the interpreter a ton of hassle? – puppydrum64 Dec 08 '22 at 15:51
I disagree. Algorithmic shortcomings were a real problem with BASIC interpreters. Retro BASIC implementations had serious memory and development time constraints. It was very time consuming to implement and debug anything fancy, thus simple linear algorithms prevailed, and a lot of subroutines were reused in as many places as possible. Thus eg. only integer or only floating point numbers but not both, floats stored as text, etc. The only BASIC from back then that had excellent performance was ABC 800 BASIC II from Luxor In Motala, Sweden. That was good enough for serious business applications! – Kuba hasn't forgotten Monica Dec 10 '22 at 19:41
And speaking of BASIC II: I'm reverse engineering it right now and it's a tour-de-force of interpreter design. It was a structured basic with functions, while loops, "business strength" file I/O, solid type conversion functions (floats/ints/strings), BCD arithmetic in addition to integer and floating point, and the language was extendable - hardware extensions could include new commands and I/O routines in their EPROMS. Stay tuned :) – Kuba hasn't forgotten Monica Jan 02 '24 at 15:02

Maury Markowitz · Answer 5 · 2022-12-10T16:40:43.330

Someone linked to my earlier article on the problems in Atari BASIC, but that was written long ago and I can offer some additional insights. Curiously, by all rights Atari BASIC should have been quite fast, but a couple of decisions nuked it.

This is going to be long...

To understand where this is going, we need to consider some other examples of early microcomputer BASIC. To do so, we'll consider the following program:

10 A=A+1
20 IF A<1000 THEN 10

When one thinks of "an interpreter" one may think of a system that reads the source line by line (or statement by statement), interprets it, and runs the result. Such systems are rare, but we can consider one example, Tiny BASIC. When the user types the second line in and presses return, you would get something like this in memory:

$14 IF A<1000 THEN 10 $13

At runtime, the interpreter has to read this character by character, figure out each of the keywords, and then run them. Now compare the same line in MS:

$0014 $xx $8B A $B3 1000 $A7 10

MS tokenized the keywords, and at runtime, it can separate them easily because tokens have their high bit set. This short-circuits having to read the text and parse it, at least partially. It still has to parse and lookup variables and convert the numeric constants, 1000 times each. Note the $xx which is a pointer to the next line, allowing it to do line lookups much faster than in Tiny.

Now finally, consider Atari BASIC:

$0014 $xx $xx $07 $80 $20 $0E $430100000000 $1B $410100000000

The line has been completely converted into the form that it will be when it is run. For instance, the $80 indicates the location in memory for the A variable, so there is no need to search for it. The two numeric constants have been converted to their internal format, so again, nothing has to be done at runtime, they just copy it directly into the registers. Additionally, it stores the location of the next line, as well as the next statement, which may or may not be on the same line. This allows IF statements to jump to the next statement without having to search the source for the colon.

So at this point, it would seem AB should run circles around MS. But in fact, it is about 1/3rd the speed on most benchmarks. As I noted in my earlier article, there are two main reasons for this.

The first is that they store all numeric constants in a BCD form. However, the line numbers themselves are in 16-bit int format. This means the line number has to be converted from BCD to a 16-bit int every time through the loop. Had they either (A) provided a second format for storing line numbers, or (B) stored the line numbers in BCD format, all the line lookups would immediately have improved.

Now a GOTO here and there is not going to be too bad, they are not always found in loops. But here's where it goes from bad to worse: they also stored the return line in FOR/NEXT loops as a line number. So every time through a FOR/NEXT, it has to search the entire program for the matching line. This was absolutely brain-dead. There was a patch that came out sometime in the 1980s that made the change to store the address in loops, and it results in an average 50% speedup for about 30 bytes of code.

To a lesser degree, another issue is that the BCD code was complete pants. It's possible to make performant BCD code on the 6502, although multiply and divide will always be slower. But simple stuff like A=A+1 should not be too much different than in binary. Not so in AB, where the code was almost always much slower than the equivalent in MS's binary code. For instance, that BCD-to-int could take some god-awful amount of time.

So in the real world, does the basic idea behind AB - and Sinclair worked the same way BTW - actually improve performance? Well for that we can look at TurboBASIC. TB was a (significantly) patched version of AB which removed some of this dumbness and added a new math package.

The results can be seen here.

Note the faicuai tests, where 100 = 100% of the performance of a C64 - as you can see Turbo runs about 65% faster in this large battery of tests.

It's even worse when you consider the fact that the 6502 has no hardware support for 16-bit integers. You have to work with each half separately! (But I'm sure you knew that.) — puppydrum64, Dec 07 '22 at 14:01
@puppydrum64: Worse than that, the BASIC interpreter would, for each digit of a decimal constant, increment the floating-point exponent of the value processed thus far, make a copy with the exponent two higher, and then use a floating-point addition to compute the new value (original times ten). — supercat, Dec 08 '22 at 15:42
@supercat Good grief. I couldn't figure out how to do software floating point in assembly, and these guys flex by using it for everything. Not a good thing though! — puppydrum64, Dec 08 '22 at 15:49
@puppydrum64: I should have specified that Microsoft's BASIC interpreter in particular does that. But it has the part of the CHRGOT routine (retrieve next byte of BASIC code) which is stored in RAM include logic to check whether each character is a digit--a placement which saves at most three cycles in cases where calling code would actually care, when the scenarios where the number is a digit are processed abominably slowly. — supercat, Dec 08 '22 at 15:59
@puppydrum64: I'll have to experiment later, but I think I figured out a way of rewriting the multiply-by-ten routine in a manner which would make it both smaller and faster. The smaller version wouldn't be particularly close to optimal, but would be much faster than the existing routine which is pretty horrible no matter how one slices it. — supercat, Dec 11 '22 at 07:59
@supercat - when you're done perhaps you can answer a question I've always wondered about - would a shift-by-4-bits op have really had any effect on overall performance? — Maury Markowitz, Dec 11 '22 at 16:09
@MauryMarkowitz: In the 8-bit era, silicon cost was dominated not by transistors, but by routing. The Z80 used a 4-bit ALU internally, which would have had one input and one output that could be attached to bits 0, 4, 8, or 12 of any register, one that could be connected to bits 1, 5, 9, or 13, etc. so a 4-bit shift could have been handled efficiently (and some instructions exist for that). On most other 8-bit CPUs, adding a 4-bit shift instruction would have required adding a fair amount of extra wiring. — supercat, Dec 11 '22 at 19:53
@Maury Markowitz Sharp pocket computers also used a BCD format for their variables in Basic. The CPU used (SC61860) had a battery of BCD instructions: arithmetic ADD/SUB in a loop, shift by 4 in a loop instructions, nybble swap instructions. — Patrick Schlüter, Dec 12 '22 at 10:06
@PatrickSchlüter - yeah, as I looked into more and more versions, I found BCD was much more common than I initially thought. It was used by almost every non-micro implementation, like Wang and such, as well as all the calculators, cash registers and gas pumps and all the other roles micros were originally designed to serve. — Maury Markowitz, Dec 12 '22 at 13:26
@MauryMarkowitz Game Boy kind of had it and it was pretty fast. Technically it was a 4-bit rotate rather than a 4-bit shift but you can use AND after the fact to get the desired result. — puppydrum64, Dec 12 '22 at 14:39
@puppydrum64 - Interesting. That prompted me to look at the Z80 specs, and sure enough, they have RLD and RRD. These seem to be the redheaded stepchildren of the Z80 opcodes, they are rarely even mentioned. — Maury Markowitz, Jan 02 '23 at 19:33
@MauryMarkowitz Game Boy can't use those actually. I was referring to the Game Boy exclusive swap which replaces the undocumented SLL on the Z80 (which does a left shift but the new bit 0 is a 1 rather than a 0.) As for RLD and RRD they're kind of hard to find a use for, the best use I've found for them is rotating the values in an array. For example, turning an array of 00 11 22 33 into 11 22 33 00 or 33 00 11 22. — puppydrum64, Jan 03 '23 at 11:40

score 5 · Answer 6 · answered Dec 08 '22 at 01:20

5

I'll add my $0.02 since I've written a few performant BASIC-like interpreters for work for the S08, ColdFire and ARM7TDMI. Everything below is generally NOT what BASIC interpreters did in the 80's.

tokenize everything into a token/direct/subroutine threader
tokenize all values into the smallest, most 'native' format
move line numbers into a list of pointers (out of the token stream)
move variables to their own region and use pointers
move strings and comments to their own region and use pointers
use the string table as the base for the string heap
binary search the list of line numbers on GOTO/GOSUB
do not use line numbers for function-calls and loops
lean on your parser to do all the heavy lifting, not the interpreter

What it should have been is something more like a "Shunting-Yard FORTH with Line Numbers" and it should have been able to perform close to that as well.

answered Dec 08 '22 at 01:20

Renee Cousins

341
3
6

How feasible would it be to just create a BASIC compiler that turns it into ARM7TDMI code? – puppydrum64 Dec 08 '22 at 15:54
A subroutine-threaded FORTH can already be seen as a compiler. For performance, you would only need to add inlining and maybe some basic peephole optimization. But then you lose the biggest advantage of BASIC, the ease of debugging. – Renee Cousins Dec 08 '22 at 16:11
Well, some 1980s BASICs did do much of this. Some require extra bytes, and on a machine with 4k static RAM those were in very short supply. – Maury Markowitz Dec 10 '22 at 16:34
1

BASIC Four (MAI) microcomputers did most of this - in microcode. But BASIC-only computers for commercial 3rd-party software development was their entire thing and they had a good engineering staff for it. – davidbak Dec 10 '22 at 18:24
1

One extra trick I’ve seen done was variables stored in the token stream. After all, if a variable is referenced in the token stream, it exists, so fixed size values could be stored right there. For variable sized stuff (strings, variable arrays) a pointer would be stored. The symbol table was stored in a Huffman coded format. After all, runtime is time-critical, editing/listing can be at “human” speeds. – Kuba hasn't forgotten Monica Dec 10 '22 at 19:44
1

@davidbak -- that's interesting, certainly not the typical BASIC found on home computers though – Renee Cousins Dec 13 '22 at 22:59
@kuba-hasnt-forgotten-monica -- indeed. So something like:
X = X + 1

becomes this:

[LIT] &X [STORE] &X [FETCH] [ADD] [LIT] 1 [EOL]

Which would push the address of X, push the STORE-op, push the address of X again, call FETCH to replace the TOS with the value at X, push ADD-op, push the literal '1', execute EOL which pops the ADD-op (adding 1 to the value of X), and the STORE-op (which stores TOS back into X).

In BASIC the first '=' is always [STORE] and subsequent '=' are [EQUALS], so it's easy to decide when to use [FETCH] -- after any [STORE]. Only the parser needs to do this.
– Renee Cousins Dec 13 '22 at 23:26
@ReneeCousins - w.r.t. subroutine-threading + inlining + peephole optimization. You wouldn't necessarily lose ease of debugging. Certainly not just w/ peephole optimization. At that point you would still be doing "line at a time" BASIC "compilation". Add inlining, then, maybe. You could still keep track of dependent lines (this line got inlined here, here, and here) and recompile all when the you changed a line. But maybe you couldn't afford that on really small machines. – davidbak Dec 13 '22 at 23:43
1

Otherwise all the points in this answer are spot on. Especially the last one! – davidbak Dec 13 '22 at 23:44
@davidbak as long as your optimizer doesn't scope beyond one line, it should be okay and comparable to most BASICs that just dump some error on you and the line number, like "***SYNTAX ERR, LINE 1050". Some better BASICs could tell you precisely where in the line you made the error and that can get lost with optimization, but maybe I'm giving that too much importance. – Renee Cousins Dec 13 '22 at 23:48
In which case, it's better to sick the shunting-yard on the PARSER (or compile time in FORTH parlance) and not at RUN TIME (or in interpreter mode in FORTH parlance). Then you're truly running at the speed of FORTH and all BASIC is now is a much smarter PARSER. – Renee Cousins Dec 13 '22 at 23:50
@ReneeCousins - and this is especially true when you examine the original MS code. The code for, say, reading a constant from ASCII into a FP "register" is identical if you do it at parse or runtime, the only difference is that if you do it at parse time you need to write another routine to reverse it when you do a LIST. But all of that (mostly) existed too. I think they were down to the point of shaving bytes, not k, and so A=A+1 saves 5 bytes of storage compared to converting the 1 to FP at parse. With 784 bytes of program storage free, well, its adds up. – Maury Markowitz Jan 02 '23 at 19:44

Patrick Schlüter · Answer 7 · 2022-12-12T09:51:13.680

I will add the special case of TI-99/4A TI-basic which is probably one of the worst offender in the category of slow interpreters.

TI-Basic (and also TI-Extended Basic) are especially slow because the interpreter was not written in TMS-9900 machine language as one would suppose, but was written in GPL (Graphic Programming Language) which was a kind pseudo-code (a bit like Java bytecode or UCSD P-Code or Apple II's Sweet16) to add a level of abstraction to the system.
On the base console, the Basic program was not stored in CPU addressable RAM but in the Video-RAM of the display processor. Each byte read or written required I/O commands.
As all other interpreters, everything is done in floating point. The peculiarity of the TI machine is that the floating point routines implemented were at the same level of sophistication as their FP routines in the TI calculators, i.e. 8 byte for storage, 13 digits precision with -99 to +99 range for the exponent (decimal storage). This had the advantage of nice calculation capacity without the precision issue with the binary floating points of other machines but cost quite some performance.
Syntax and semantic check at start. When invoking RUN command the interpreter analyses the whole program to see if there are still some errors in the code. For big programs this can take a lot of time (minute).

All the other issues listed for the other interpreters apply also here: linear search for line numbers and for variables. Garbage generating string functions with stop the world garbage collection from time to time.

On the base console, the Basic program was not stored in CPU addressable RAM but in the Video-RAM of the display processor -- note that there was a good reason for this: the machine had 256 bytes of CPU RAM and 16K of Video RAM. — occipita, Mar 11 '23 at 20:36
Indeed, there was a reason for it, but I'm not sure it was a "good" one. ;-) — Patrick Schlüter, Mar 13 '23 at 07:07

score 1 · Answer 8 · answered Dec 07 '22 at 13:31

There are many things that could slow an interpreter to death, I can list them but here are a few:

How it does manages the variables in memory - if the access method is fast or slow, and the number of times the program accesses them.
In case of numeric variables, if and how differs between different types (bytes, words, ints, decimals).
In case of having decimal support, how long they are and how are implemented (fixed vs floating point). Fixed point are always faster than floating point, but have both fewer range and resolution.
If the computer has a math co-processor in the form of a floating-point unit or similar, or by the other hand if they have to operate them with a software library. Software implementations are always slower than hardware ones when using the same system. You can check it even without an interpreter if you take an old PC with 8086/8088+8087 and make a program that list the result of a large set of operations in floating point numbers first in software, then with the coprocessor.
In case of microcomputers with banking systems, how efficient is it. For instance, TA Alphatronic P2U had 16KB RAM on top of the system ROM and for every single access to screen had to disable the lower RAM to access it, then enable it again.
The number of checks every command of the interpreter does at run-time. This may be the most problematic factor when building an interpreter. Unnecessary checks not may, but will slow the interpreter; but the problem not only lies with unneeded comprovations but also with necessary ones placed wrongly and, when in masse causing significant overhead.
Basic specs of the computer: the frequency at which the processor operates, if the computer has slow dram which has to wait for...
How many accesses the I/O devices have and if it has to wait for them.
If the block operations are executed with a DMA or if the CPU takes responsibility. A couple of clock cycles per operation may not seem a lot, but when you have hundreds or thousands of transfers it pays back.
Specifically to timers, if they are implemented with an interrupt-driven clock or are implemented using cycle-counting

What made some 8-bit BASIC interpreters especially slow?

8 Answers8