18

Imagine a modern computer, where let's say Python is a high level programming language and needs to be interpreted in order to execute a piece of code. You could write some code in C, compile it, which will be much closer to the actual hardware and therefore runs faster.

Now, if we take a C64, if I write a piece of code in BASIC, would it be much slower than Assembly? If so, is there any comparison I could make? Is it even possible to quantify it this way?

chicks
  • 397
  • 1
  • 4
  • 15
Bartek Malysz
  • 4,204
  • 11
  • 39
  • 1
    What do you consider "close to the hardware?" On a C64 or Apple ][ you could directly PEEK at and POKE into address anywhere in RAM. You were basically doing pointer arithmetic and writing to addresses directly. But your programs were stored as strings which were interpreted, so it was much slower than assembly. – user1118321 Sep 28 '19 at 23:03
  • 1
    Is the right metric number of assembly lines to source lines? – dashnick Sep 28 '19 at 23:11
  • 3
    What made MS-Basic (both Apple II and C64) exceptionally slow was all the handling of numbers was done through its floating point routines. So, as soon you had a loop running over a counter in- or decrementing by 1 each iteration – the most common case –, the equivalent assembler code was about 1000 times faster. – Janka Sep 29 '19 at 00:03
  • 4
    @user1118321: Slight correction: It was common for 1980s BASIC implementations to store programs in memory in a binary "tokenized" format rather than raw ASCII strings. Still, any such intermediate language was still interpreted, slowly. – dan04 Sep 29 '19 at 01:31
  • 3
    @Janka And that's why you had Integer variables (with % suffix) in Applesoft (Apple II Basic) - after all, the first Apple II Basic ("Integer Basic") only used integer variables in the first place. Of course you had to use them, or your loops would still be slow. – dirkt Sep 29 '19 at 04:52
  • 1
    @dirkt: MS-Basic converted integer values in variables into floating point first before any operation, and back to integer before storing it back into the variable. So using integer variables was slower than using floating point with MS-Basic. That's why Apple Integer Basic had still a use after Applesoft came out. – Janka Sep 29 '19 at 08:47
  • The code would have to be interpreted as there was no compiler in memory. And even if there was, you would have to either allocate a portion of RAM (of which there was already a limited supply) or perform some kind of swapping to (hard) disk, for which the C64 had none. – Zhro Sep 29 '19 at 13:41
  • I don't know that comparing to python is completely valid. Compiling a .py file to a .pyc during the first invocation of the script certainly speeds up subsequent invocations of the same script. The exception to this is if you have #!/bin/python as the first line of the script and run it as $PATH/script.py - this always re-interprets the script. – doneal24 Sep 29 '19 at 18:03
  • 1
    Note that much of the speed of modern interpreted languages comes from caching things in memory. In the 8-bit era this simply wasn't possible. – Thorbjørn Ravn Andersen Sep 30 '19 at 09:40
  • 2
    C64 BASIC isn't a particularly stellar implementation. While interpreted BASIC will always be slower than assembly language, other systems — particularly the BBC Micro — had far more efficient interpreters. – scruss Sep 30 '19 at 14:45
  • I would confirm a factor of ~100. I once implemented a Morse detection program in BASIC until I noticed that distinguishing only between 1, 2 and 3 was not useful to identify dots and dashes. Once written in assembler, I had values of 80 to 240, which was good enough to detect Morse code at 60 BPM. – Thomas Weller Sep 30 '19 at 19:34
  • Youtuber "TeaLeaves" has a nice comparison using a screen-filling loop in both BASIC and Assembly... https://www.youtube.com/watch?v=wlmnaNXmktw time codes 4:00 for the BASIC example and about 14:15 for the Assembly version. It's a good video throughout actually. – Wossname Oct 01 '19 at 10:25
  • I did much the same as a demonstration when I was teaching electronics and fundamentals of assembly code (in the early to mid 80s). I had an Apple ][ and I wrote a utility to change the screen to a single colour. The basic version took about 10 to 15 seconds and the assembler version was very fast (almost but not quite imperceptible to the eye). – Peter Smith Oct 01 '19 at 15:57
  • @scruss And the Acorn Archimedes. I wrote a 4 channel drumtracker on it in BASIC, which played drum samples, and speed was not an issue. – Alan B Oct 02 '19 at 08:03
  • @AlanB - the Arch was an ARM system, so it had plenty of processing to spare. The C64 and the Beeb were 65xx based and CBM BASIC is pretty slow for a 6502 BASIC interpreter – scruss Oct 02 '19 at 13:10

6 Answers6

46

Yes, BASIC is much slower than assembly for many operations. For an easy example, try out this program on a Commodore 64 or emulator:

for i = 1024 to 1984 : poke i,peek(i) or 128 : next

You will see each character on the screen reverse, row by row, over the course of ten seconds. By contrast, the exact same routine in machine language inverts the entire screen in a fraction of a second; there's almost no perceptable gap between the first character and last character being inverted. (The source and a BASIC loader for it are appended below, if you want to see how it works or run it yourself.)

The two main issues that make it much slower are that each line of BASIC is read and interpreted before it's executed, and the data formats used by BASIC often have much higher overhead than the wider variety of formats one can use in machine language.

In some cases the latter is due to BASIC not using the most efficient formats it has available. For example, BASIC always uses floating point for the index of a for loop rather than having extra code to determine whether it could use integer variables instead. Thus, adding one to i in the code above ends up executing machine-language procedures to copy several bytes of data to the FAC (floating point accumator), do the floating point addition, and copy it back out. This is many dozens of instructions, whereas a loop that meets the restrictions that allow integers to be used (as in the machine-language routine below) can do its math in a small handful of instructions.

In other cases, BASIC just doesn't support at all the kind of techniques and formats you can use in assembler. As Harper points out in a comment below, unrolling the loop in the following assembly routine would save some arithmetic and several memory lookups, probably doubling the speed of the routine. That kind of optimization is something that assembler programmers can do in the right circumstances, and you can't really work at the level at all in BASIC.


Appendix

The following is a machine language routine to invert the screen on a Commodore 64 in a way similar to how it was done in BASIC above. Note that this is deliberately not optimized; it's written instead with an eye towards clarity and generality. (For example, a simple change could make this update 32 KB, rather than just 1 KB.)

All numbers in the listing are in hexadecimal (base 16). The # in front of some of them means to load that actual number itself into the A or Y register; otherwise it's loading data from the address in memory specified by that number. In the case of the [addr],Y references, it's loading a 16-bit address from addr, adding the Y register to that value, and that determines the memory location of the load or store. We need to do this because the Y register is only 8 bits, holding values up to only FF (256 decimal), so we need to count through 256 four times to to read and write all 1024 screen addresses. (Actually, there are only 960 displayed on the screen, but we do 4×256 to keep the code simple.)

00FC        addr      .equ 00fc     ; unused zero-page location

C000 A9 00  invscr:   lda  #00      ; screen RAM start low byte
C002 85 FC            sta  addr     ; unused zero-page location
C004 A9 04            lda  #04      ; screen RAM start high byte
C006 85 FD            sta  addr+1   ; unused zero-page location
C008 A0 00  nextpage: ldy  #00      ; set 8-bit register Y to 0
C00A B1 FC  nextchar: lda  [addr],Y ; load character from addr + Y
C00C 09 80            ora  #80      ; set bit 7 to make it inverse
C00E 91 FC            sta  [addr],y ; store modified character
C010 C8               iny           ; increment Y
C011 D0 F7            bne  nextchar ; branch back if y != 0
C013 E6 FD            inc  addr+1   ; increment 16-bit screen address by 256
C015 A5 FD            lda  addr+1
C017 C9 08            cmp  #08      ; reached end of screen?
C019 D0 ED            bne  nextpage
C01B 60               rts

And here's a BASIC program that will load the routine; you can run it after that with sys 49152.

10 loc=49152 : rem store the routine at $c000
20 read v: if v = -1 then end
30 poke loc,v : loc = loc + 1 : goto 20
50 data 169,0,133,252,169,4,133,253
60 data 160,0,177,252,9,128,145,252,200,208,247
70 data 230,253,165,253,201,8,208,237,96
90 data -1
cjs
  • 25,592
  • 2
  • 79
  • 179
  • 1
    And that's not even a particularly fast way to invert the screen. I can see at least a couple of optimisations. – JeremyP Sep 29 '19 at 17:55
  • 1
    @JeremyP Oh yes, there's a lot, saving clocks as well as size - startign with elimination of unneeded execution. But then again, this isn't a programming context, but an example to prove a point (maybe a bit to excessive by including a loader). It serves its purpose. – Raffzahn Sep 29 '19 at 18:17
  • Yeah, given that there's only four pages, I make one pass 00-FF and repeat the LDA/ORA/STA code 4x, with absolute,y instead of (indirect),y. But any implementation will be beyond seeing. – Harper - Reinstate Monica Sep 29 '19 at 18:27
  • My dad had a book illustrating a similar idea; it puts the C64 into graphics mode, and then clears the framebuffer. Takes something like 38 seconds in BASIC, and 10 milliseconds in assembly. (Or something like that...) That's a pretty stark difference. – MathematicalOrchid Sep 30 '19 at 08:06
  • @Harper Yeah, I was aiming for clarity and doing the loop counter in a way parallel to how BASIC does it, but your point about that kind of optimization is actually worth adding to the answer, so I expanded my answer with another paragraph to discuss that. Thanks! – cjs Sep 30 '19 at 08:20
  • @Raffzahn Including the BASIC loader didn't seem "too excessive" to me because I judged that the OP (and others at a similar level) would probably not find it terribly obvious how to enter the hex from the assembly listing themselves so they could see just how fast this is. – cjs Sep 30 '19 at 08:22
  • @CurtJ.Sampson Well, focusing on a very special case - and that in depth - might be less helpful. And I doubt that anyone with no knowledge will try that. Also, the unrolling is maybe a possible optimization, but way before that confusing code constructions should be eliminated way before thinking about optimization. Anyway. What should be pointed out is that some of the issues mentioned (like the way FP is used) is specific to C64 (Microsoft) BASIC and should be noted as such, as the question just asks for BASIC in general. – Raffzahn Sep 30 '19 at 08:35
  • 4
    @CurtJ.Sampson I don't think including the loader is excessive, it means that anybody with a C64 emulator can type it in and try it. – JeremyP Sep 30 '19 at 09:27
  • As always making things visual helps a lot making the point. – Thorbjørn Ravn Andersen Sep 30 '19 at 09:42
  • @Raffzahn You are certainly entitled to your opinion and have your own answer in which to put it. I think you are quite wrong, however, about what kind of answer the OP and others interested in the question are looking for, and I think the vote counts provide at least some support for the direction I chose to take in my answer. I am failing to see the benefit you think the community will get from these criticisms you're making of my answer. – cjs Sep 30 '19 at 09:54
  • 1
    I find it curious in retrospect that nobody published a short machine-language routine equivalent to the Macintosh "StuffHex" toolbox call, thus allowing a VIC-20 programmer to write e.g. SYSSQ,7168,"3C42A581A599423C" to store a smiley-face character to RAM at address 7168. A lot of programs were slow to start up as a result of READing and POKEing bytes separately; a stuffhex routine would have made code faster and more compact. – supercat Sep 30 '19 at 14:59
  • The reason for the optimization I mentioned is CPU cycles. My way saves a cycle on each read and write, so about 2100 cycles to be saved, that is 32 scan lines or 4 lines of text. – Harper - Reinstate Monica Sep 30 '19 at 15:07
  • It's just occurred to me that the optimisation of using absolute addresses indexed is not necessarily a good one on the C64 because the screen RAM can be moved. – JeremyP Oct 01 '19 at 09:14
  • @CurtJ.Sampson It's once more surprising how hard you try to turn everything into negative and then press for a fight. Putting of the blinder and reading again should reveal that I was genuin defending your post. Further, a mature reaction to pointing out that answering a generic question with machine specific points, without marking them as such, might rather be to clarify the point, improving, not so much picking another fight. Shouldn't it? To me the goal is to improve RC.SE, no matter who's writing an answer. The over all quality is what stands for the site. – Raffzahn Oct 01 '19 at 11:22
  • @supercat Nice idea - like it. Any links for mor information? It would as well fit this question – Raffzahn Oct 01 '19 at 11:26
  • @Raffzahn Again, I am unclear how your comment is intended to improve this answer. I suggest again that if you think this answer is missing something or has too much information, you tweak your own answer as appropriate. Your criticisms in comments here do not strike me as helpful. – cjs Oct 01 '19 at 12:24
16

Most implementations of BASIC for 8-bit home computers were interpreters, and in that sense they're similar to the standard versions of Python. You could typically expect simple programs to run 100 times slower in BASIC than in assembly of ordinary quality.

However, it would normally take much less time to write that program in BASIC than in assembly. For that reason, some commercial games were still written in BASIC, if the full performance of the machine wasn't needed and thus the cost of production mattered more.

Chromatix
  • 16,791
  • 1
  • 49
  • 69
  • 4
    Another benefit of BASIC was that the bytecode was pretty compact. Most programs would be 2x-5x smaller when written in BASIC vs. assembler, which was quite a big difference with the small RAMs back then. – jpa Sep 29 '19 at 15:45
  • 3
    @jpa But one could also use bytecode with assembly, by writing a little bytecode interpreter in assembly. This technique was regularly used when compact size was more important than speed. I can see some BASIC programs being 2-5x more compact, but not most of them, especially since the BASIC routines were usually available for use by assembler programs as well. (E.g., JSR $ABF9 to use the BASIC INPUT command from assembly.) – cjs Sep 29 '19 at 17:08
  • @CurtJ.Sampson True enough. I guess my assembly programs were always on the beginner level. – jpa Sep 29 '19 at 19:21
  • And note that for most programs performance is of little concern for most of the program. Back in the old days I wrote various programs that used assembly for small things done frequently even though the rest of the program was in BASIC. – Loren Pechtel Oct 01 '19 at 04:31
  • 3
    @Loren Which funnily enough is how much of Python code works today as well. Write most of the code in Python and call out to C code for the performance sensitive parts (i.e. usually the Python runtime or specific libraries such as numpy). – Voo Oct 01 '19 at 12:12
5

if I write a piece of code in Basic, would it be much slower than Assembly?

Well, it's interpreted. So even though it's a simple language, it'll never reach native speed - not even coming close.

If so, is there any comparison I could make?

For most parts like with Python vs. Assembler on a PC (*1). Except of course, BASIC is a way less comfortable language than Python, with way less build in functionality, so it usually ends up with more source code to interpret to do the same job. And it's the interpreted part making it slow (*2).

Is it even possible to quantify it this way?

Simply no. Any quantification can only be done in relation to a concrete task to be done, as it relates much to

  • Interpreter used
  • Algorithms used
  • Functions used
  • Task selected
  • Implementation tehreof.

Real world examples will range between BASIC being 100 times slower (e.g. when doing bit level graphics) to almost as fast as Assembly (Like with only FP-Math). Trying to tie it to single constructions (and examples) in either language will be like judging a natural language by a single word - useless for a generalized observation. Not to mention 'good' BASIC coding vs. 'bad' Assembly.

Just picking a CPU or its assembler won't give any relation - except that BASIC will never be faster.


*1 - Assembler should be at least as fast as a C binary for the same problem.

*2 - Ofc, assumed the used functions/libraries are sufficient well implemented.

Raffzahn
  • 222,541
  • 22
  • 631
  • 918
  • 1
    Just between versions of BASIC there were huge differences. BBC Basic on the BBC B was in places 10-20 times faster than MS Basic on a Commodore or Apple. – tofro Sep 29 '19 at 10:30
  • 1
    Additional note for "*1": Assembler written to be well understood by humans has a good chance to be slower than the same algorithm written in C and compiled by a good compiler. Even if the C source is written in a good style. – the busybee Sep 29 '19 at 13:29
  • @thebusybee That, I'd say, still needs some support added, as I can not see any reason why Assembly code would be slowed down due being readable. – Raffzahn Sep 29 '19 at 14:38
  • 1
    Well, because assembly programmers tend to write not so much optimized code while trying to keep it understandable. OTH a compiler can analyze the life time of variables, fold and extract constants, unroll loops, inline functions, and so much more. The resulting code still does the same but is not straightforwardly understandable for humans. In this sense a compiler is an extremly trained expert with a perfect memory, most disciplined, and with the knowledge of many years of professional work. I still have to see such a human programmer. ;-) – the busybee Sep 29 '19 at 15:59
  • But then, carefully crafted and optimized handwritten assembly can be faster than the best optimized code of a compiler. It heavily depends on the compilers available for the target system, and of course of the experience of the programmer. – the busybee Sep 29 '19 at 16:01
  • @thebusybee In my opinion, well readable code is usually also well optimized code. Basically all points mentioned are handled the same way with an Assembler. Liftetime of variables is due scope, constants handling is an objective for Assemblers, inlining and unrolling is what macros are for and all that is easy to be done with clear code. Like with any language it's a matter of tools and environment. It most definitely is not just threading of some static machine instructions. Having served most of my professional life doing Assembly, I can assure you such programmers are no rare sight. – Raffzahn Sep 29 '19 at 17:03
  • 1
    Well, I was thinking for example about re-use of registers, selection of the best combination (variable <-> register), including application-wide "register colouring", re-ordering of instructions, and so on. Brought to the max this will lower the understandability of code, in my experience. Introducing a lot of macros will generate a steeper learning curve for new team members. All of this is hidden behind the scene with a good compiler. But we are discussing opinions, based on experience. And I have my share by 30+ years in assembler and C for embedded and safety critical systems. – the busybee Sep 29 '19 at 18:31
  • @thebusybee Well, given, my experience was just in part embedded, the majority (>60%) was application programming. Only Assembly. Large scale and mainframe. Like >2 million source lines (comments not counted) for a single application. Wouldn't know any register colouring at that scale, would you? Learning Macros is not different from learning function/library calls - every project past hello-world got them, don't they? Benefits of reordering and such is quite machine dependant. Again, with a good setup most is as well hidden in Assembly. But yeah, YMMV - in programming more than anywhere else. – Raffzahn Sep 29 '19 at 19:20
  • In a sufficiently high-level language (C isn’t), a good compiler can sometimes do better optimizations than most assembly coders can do. – WGroleau Sep 29 '19 at 22:43
  • @thebusybee Yes, I have sometimes written short, clever routines in asm which I was proud of. And then the next day would rewrite it in favour of a longer, slower, yet much more understandable (and thus hopefully maintainable!) routine. – RETRAC Oct 02 '19 at 02:46
5

[Modern Python compared to C; C64 BASIC compared to assembly.]

is there any comparison I could make? Is it even possible to quantify it this way?

Yes, you have the right idea. That is exactly the comparison you can make.

BASIC was easier to write (don't underestimate the value of that), but "slower" to "dreadfully slower", depending on what you were doing.

Speed: BASIC vs assembler

Obviously, everything is worlds slower. Especially operations the CPU just can't do, like divide or compute a cosine. But there are more gotchas.

  • the unpredictability of the duration of certain operations. For instance sin(x) was very quick if x=90, otherwise not.
  • the dreaded "Garbage collection", where the system runs out of clear memory space to allocate for variable-length records like strings, and "defrags" RAM by repacking all existing strings to the bottom of free RAM. I have seen garbage collections take 3 seconds. That's an eternity in game time.
  • Even with tokenization (go Woz!), the language isn't very compact. That mattered on ROM cartridges, where space was money.
  • Outside of Apple Integer Basic, BASICs had bugs. And a future revision of the system could add bugs. Assembler had precious few and they were well-known.
  • If the CPU had to do extremely timing-sensitive tasks, like soft-listen to serial ports, modulate the cassette deck, make sound without a sound chip, or especially beam-riding for sprite manipulation such as in games, it was impossible in BASIC and you had to go assembler. Even the entry/exit from assembler to BASIC was too time-consuming to use in action games, unless you did it once per field.

Our Speed metaphor: modern vs RC

  • Because they have loads of RAM and code space, code optimization is much, much better today. They're still never as fast as C, but they do much better than you'd expect. While you could always count on BASIC to always be balefully slower than assembly. Advantage: modern.
  • modern processors include complex math. This means that all the gory complexities of, say, a double-wide floating point divide are both done by hardware at the same speed, whether we're calling from Python or C. Older processors did complex math in software, which was a heck of a motivation for assembler programs to bypass the need for complex math. In BASIC you would ask for a cosine and wait, wait, wait... In assembler you wouldn't even try cosine, you'd just find a way to make a lookup table work. Advantage: RC.

As for quantifying, that is a wet noodle.

A safe bet is it will be 10-100 times slower. It's very difficult to quantify more precisely, unless you have a lot of experience timing both sides' operations.

And in the pro world, that's exactly what we did. We put a quickie version up in BASIC and assembler, wrapped iteration code around it, and got out the stopwatch. When we were serious, we also timed how long the iteration code itself took with no payload.

4

The Byte Sieve benchmark, in Applesoft Basic took 2806 seconds, according to Byte Magazine, September 1981 issue, page 192. Byte Sieve in 6502 Assembly language took 13.9 seconds, according to Byte Magazine, January 1983 issue, page 292.

That's a factor of 200X between a tokenizing Basic interpreter and hand-coded assembly for the 6502.

200X is in about the right ballpark, as various other Basic interpreters that I've benchmarked (including my own Chipmunk Basic) on a bunch of different CPU's range from 40X to 500X slower than Asm or C code running the equivalent algorithm. Modern REPL language systems run faster by including a JIT compiler to machine code, or tokens for a fast VM.

hotpaw2
  • 8,183
  • 1
  • 19
  • 46
2

I've had experience with the TRS-80, and there were three programs I wanted to do that I simply could not get good performance in BASIC. All three programs were dealing with the screen.

The first program was to fill the screen with a single arbitrary character (if you used space, it's the same as clear screen, else I could fill it with whatever character I wanted). The naïve way is is to just make a for loop for all 1024 characters and print them. This actually took around 6-8 seconds to fill the screen.

A faster way is possibly to generate a fairly long string and print them in quick succession. I think this still would take 2 seconds to fill.

This was not acceptable, so I decided to program them in assembly/machine code. The resultant code I recall poking into a string variable (very common at the time to prevent memory clashes as there is no mmu) and the screen filled so fast that I could not time it. From memory I did do some repeated filling and timed those, and recall them to be around 1/20 of a second or faster to fill the display.

The next problem I wanted to do was fill the screen with RANDOM characters. This took almost a minute to fill the display, if I remember correctly and the string optimization doesn't help. However I did cheat in machine code and used the refresh counter as the RNG which may not be as "random" as the PRNG, but the machine code was fast enough to make my screen look like snow when called in quick succession, probably also around 1/20 of a second to fill.

The third problem is that despite the poor graphics of the TRS-80 with its 6-block, I wanted to save/restore "bitmap" graphics. If I recall correctly, the BASIC program that peek/poked into video memory and wrote/read from disk took upwards 5 minutes to save 1KB of memory handled byte at a time. OS calls were expensive on the TRS-80 I suppose. I did not try string optimization at the time to save on OS calls, but did write the code in assembly... which took about 5-10 seconds to complete.

Today, these pieces of code sequences are kind of childish and "simplistic". Code sequences to hash a key to access a piece of dynamic data in memory would take eons to code in assembly not to mention how error prone it was (I've crashed my TRS-80 many times getting the machine code right.) Most of the time coding in assembly simply would not give you the time to market, and sometimes won't even give you performance - keep in mind a well coded interpreted language program with the proper support libraries that well implements the functions you need will give you fairly good performance with no risk of buffer overflows.

Note that the function library bonus didn't exist as much then as it does today. BASIC pretty much maps one to one and you don't gain much speed improvement calling its functions - though one can say that if you had a program that had to continually run floating point transcendental functions (sin, cos, x^y, etc.) you may not see as much of a difference between assembly and BASIC as computing floating point natural logarithms in assembly is not much faster than calling ln() in BASIC.

Now today imagine if BASIC also had matrix multiplies, hash tables functionality, etc. as functions then - Now you can see the performance gap decrease between BASIC and assembly as more compute time is spent in the complex functions instead of parsing code.

And this issue still applies today when writing python, perl, java. If you want performance, no excuse not to use the dedicated functions else you'll be back to BASIC speed problems just like if you were to write x^y with a for loop instead of just using the builtin.

trash-80
  • 21
  • 1