When and why did high-level language compilers start targeting assembly language rather than machine code?

Question

From what I've read, the first FORTRAN compiler built a machine-code program entirely in memory; it was, in fact, designed to read the entire source code of the program, and then sequentially load pieces of the compiler that would process different parts of the source code into either machine code or other information that would be processed by later parts of the compiler.

Although the just-in-time compilers for most (all?) implementations Java and .NET directly produce machine code in memory, and although Borland's language products would produce machine code directly, it seems much more common to have compilers output assembly language instead.

While it is certainly useful to have a means of getting a human-readable dump of the compiler's output, having to feed the output of a compiler through a separate assembler program would seem like it would substantially increase build times. While targeting assembly language would make it possible for a compiler to produce output containing forward jumps, a compiler could produce output targeting a much simpler "fixup" program which would expect input of the form "output the following 56 bytes, output a two-byte fixup, output the next 127 more bytes, output another 2-byte fixup, patch the fixup 2 records back to the value 1137, then output the next 57 more bytes, etc." Processing such a fixup file would be much faster than processing an assembly-language source file, and for test builds that process could even be deferred until load time.

When did the now-ubiquitous approach of inserting an "assemble" step into code generation become commonplace, and why was it seen as worth the extra build time?

It's not ubiquitous. At least some compilers I used between the '90s and now compiled straight to machine code, but had an option to output instead and/or also to assembly if you wanted. I think this is actually more common these days but I'm interested in what the real experts have to say. — hippietrail, May 26 '20 at 02:24
@hippietrail: Perhaps ubiquitous wasn't quite the right term, but it seems a very common way of doing things. — supercat, May 26 '20 at 02:28
"having to feed the output of a compiler through a separate assembler program would seem like it would substantially increase build times" -- that's directly measurable with a stopwatch, if your compiler really does have separate stages. In practice, some assemblers are limited by the I/O reading the source file, whereas compilers for high-level languages might, depending on the language, have to do a bit of thinking. Especially for optimisation or where the type system is Turing complete. — Steve Jessop, May 26 '20 at 03:55
Around the time the first Java JITs appeared, I was working on a non-JIT system which compiled Java byte code to a virtual processor byte code, then to machine code, all at install time (not runtime). This is also before LLVM, but similar idea. We also had a macroised assembler for the virtual processor. javac was way, way slower than the assembler for programs with similar function (although of course the time to write the code wasn't the same). So, in some cases the answer is definitely, "what extra build time"? — Steve Jessop, May 26 '20 at 04:07
Of course, if you have to manually take off the tape for the compiler and thread the tape for the assembler between steps, then the runtime isn't what matters :-) — Steve Jessop, May 26 '20 at 04:09
@SteveJessop: Many C compilers operated in a single pass, while assemblers required two passes. Additionally, if one avoided use of headers but simply included whatever declarations or prototypes one needed at the start of a file, the length of the assembly file would be longer than that of the C source, so for I/O-bound processing, the assembler would take more time than the C compiler. Turbo Pascal could also process more typical lines per second than even Turbo Assembler, which was itself a fair bit faster than competitor MASM. — supercat, May 26 '20 at 15:00
GCC is the only notable compiler that compiles via assembly anymore. It not ubiquitous or even common today. — , May 26 '20 at 15:07
@supercat: yeah, possibly I was distracted by the mention of Java JITs. If it's an axiom of the question that you're only talking about compilers which are faster than their corresponding assemblers, then it rather simplifies the domain of discourse. I suppose then that if the compiler only requires a single pass, whereas generating binary code requires two, then that in itself is an argument for not generating binary code in the compiler. — Steve Jessop, May 26 '20 at 16:42
Or looking at it another way: if your idea for outputting binary code with unfixed-up forward jumps is a good one, then were assemblers doing it, and if not why not? Roughly what percentage build-time speed cost are you stipulating was the cost of C compilers deciding not to do it? — Steve Jessop, May 26 '20 at 16:45
@SteveJessop: Even if a compiler would require two passes, if those passes were faster than those of an assembler, that would be a win. Further, a high-level-language compiler can use self-modifying code to handle fixups in ways that would generally be inappropriate for assemblers. For example, instead of generating a forward branch, a compiler could generate a trap instruction followed by a fixup number, and include a trap handler that would load the fixup address, replace the trap with a branch or call, and then return to the original trap address. Alternatively... — supercat, May 26 '20 at 17:09
...a compiler could use a buffer to hold code which occurred after a forward branch that hadn't yet been output. If e.g. jumps are four bytes and the buffer gets full when it holds 256 bytes, the compiler could generate the forward branch as "jump 260 bytes past the end of this instruction", and then output the contents of the buffer followed by "jump 4 bytes past the end of this instruction". Not great, but for many programs which are going to only be run once after they're built, and won't take too long to run, the cost at runtime may be far more than offset by the reduction in build time. — supercat, May 26 '20 at 17:15
It would probably be easier if you just provide some concrete examples of the slow-down that you're interested in, rather than us each pointing out that in some cases it was large and in others it was small. Which compiler+assembler chain is it that ran, say 200% slower than you are confident you could have got it to by implementing your proposed scheme? Or 10%, or whatever. I don't think it makes sense to try to answer ""why" by summarising the the analysis done by all compiler-writers, for all slow-downs regardless of whether they're large, small, or negative. — Steve Jessop, May 26 '20 at 17:19
@SteveJessop: The only microcomputer compilers I used back in the day were Turbo Pascal, Turbo C, Lightspeed/Think Pascal, UCSD Pascal (which had its own performance issues) and Lightspeed C. My understanding is that most of Turbo Pascal's contemporaries worked by generating assembly language and then processing that (USCD was slow because the compiler itself was an interpreted P-code program). Even if other compilers were run on infinitely fast CPUs, the disk I/O required to build programs would have vastly exceeded the real-world build times Turbo Pascal. — supercat, May 26 '20 at 22:47
@SteveJessop: I also used Commodore's assembler somewhat, but once Compute Gazette published their Fast Assembler I switched to that. Minimum build times dropped from multiple minutes to essentially zero, and even decent-sized programs could build in less time than it took the Commodore to even just load the assembler and then the linker. — supercat, May 26 '20 at 22:49
You have it back to front. Early compilers generated assembler and then ran the assembler. Integrating the assembler into the compiler was a later development. — user207421, May 27 '20 at 20:40
@user207421: The first FORTRAN compiler consisted of about 60 subprograms, each less than 4K words, which were run in sequence. The first part read the entire source program from punched cards into memory, stripping out unnecessary whitespace, and then loaded the next part. If I remember right, that part rearranged the order of lines in memory in the order that they would need to be processed (e.g. moving all declarations ahead of everything else). Once portions of the source code were no longer needed, they could be jettisoned to make room for machine code as it was built up. — supercat, May 27 '20 at 20:46
@user207421: Between the time the source code was loaded and compilation was complete, the only I/O the compiler performed was the loading of the code for all the different pieces of the compiler. Such a design would only be suitable for processing programs whose source code was small enough to be loaded into memory all at once, and I don't know in what exact form code was kept to deal with the fact that different parts would be generated, out of order, by different compiler phases, but I don't think it ever went to anything resembling a human-readable assembly-language format. — supercat, May 27 '20 at 20:52
Related: Does a compiler always produce an assembly code? takes some guesses at why GCC targets asm, while other compilers (including clang) have a built-in assembler or go straight to machine code (however you want to look at it). — Peter Cordes, Jun 04 '20 at 18:16

Jean-François Fabre · Answer 1 · 2020-05-28T06:41:52.863

40

why did high-level language compilers start targeting assembly language rather than machine code

Well, the answer is probably: to avoid developing a high level language to binary converter for each language.

Issuing assembler text is much easier than issuing binary directly for at least 3 reasons:

writing text is easier than writing binary. The compiler doesn't have to worry about the binary representation of the mnemonics or branch computation. That makes the interface of the compiler very clear: high-level language as input, low-level language text file as output.
the non-relocatable code is managed by the assembler, not the compiler. A binary file isn't always position independent, so there are relocation tables. Handling those relocation tables aren't trivial. Better let it be done by a single tool.
as you mentioned, if you suspect a compiler bug then it's better to have intermediary output with symbols than a disassembly (and disassembling a .o file usually fails on the relocated symbols, you need to disassemble the whole executable file for it to be correct)

The overhead exists, of course (must write the asm, then parse it back, in a different process), but converting assembly to binary is done in a very systematic way.

The costly bits are located in the compiler itself:

Optimizations (which cost a lot of CPU time when compiling) are done at source level, not at assembly level (well, optimizations are always possible at assembly level but those are micro/local optimizations, and not all assemblers do them).
Locating all include/header files and parsing them (when the produced assembly file is self-contained)

In terms of I/O, the assembly file is usually written on a temporary diskspace, so it can even remain in ram and never be written to disk (unless requested).

So it's a trade-off between efficiency and convenience. Once the assembler has been written, it can be used to assemble any file any compiler produces.

(Some Ada compilers like GNAT used to issue C code instead of assembly or binary file, also because it was easier)

Nowadays GNU compilers even add another stage: the compiler front-end produces an intermediary language output (known as GIMPLE) regardless of the language (Ada, C, C++, Fortran...), and the back-end produces the assembly from this GIMPLE file.

edited May 28 '20 at 06:41

answered May 25 '20 at 18:51

Jean-François Fabre

10,805
1
35
62

In the assemblers I'm familiar with, there's always a one-to-one relationship between the written form of the instruction and the binary coding to that instruction. So it seems the effort to emit instruction as text versus instruction as bits is little different. The address-fixup argument doesn't really convince me either, especially in the case where the choice of instruction depends on how far away the target is. But then again, I never implemented a production-quality compiler. – dave May 26 '20 at 00:27
1

If one is trying to keep the compiler output in a small temporary storage device, having it in a bloated text format would seem less useful than storing it as packed binary plus fixups. I could see a multi-phase build process with intermediate text output as perhaps being useful if one were using a direct-to-machine code assembler so combine assembly-code files (as opposed to a linker) but if there's going to be a linker step after the assembler, that would turn the assembly-code processing into a separate build step. – supercat May 26 '20 at 02:25
@another-dave: I did most of my machine code programming on Z80s and most of my assembly programming on 680x0s. I wrote my own disassemblers for both and at least one definitely did not have a 1:1 relationship between the written form of the instruction and the binary coding to that instruction. A couple of decades have passed so I'm rusty but I think it was on the 68k and due to data sizes being implicit in some addressing modes so that a load immediate into a register might be 16-bit or 32-bit. That's off the top of my head though but it was along those lines. – hippietrail May 26 '20 at 02:31
1

@another-dave "So it seems the effort to emit instruction as text versus instruction as bits is little different." - if that was true, why would anyone write in assembly language? In reality the assembler to machine code translation is not always trivial. – Bruce Abbott May 26 '20 at 07:40
assemble to machine code (without optimizations) should always produce the same result. It's specified. Of course, if the assembler optimizes by replacing instructions by equivalent ones to save cycles (I have plenty examples in mind for 68000) then it's different. Here we're not talking about writing asm code by hand, which is a different use case. Here we assume that the generated assembly is optimal and doesn't need further optimization. – Jean-François Fabre May 26 '20 at 08:07
3

A significant convenience of using an assembler as the target, if you're generating code directly, is that the assembler will resolve branch targets for you, thus all you need do in your code generator is have a scheme for generating unique labels.
This can get tricky quite quickly — some architectures have variable length branch instructions, and others have limits on distances after which you'll have to replace the instruction with a sequence of instructions.
– al45tair May 26 '20 at 09:24
1

yes, the compiler could generate branches without size, and assembler could find the best branch size (if available) or replace by an absolute jump. Apart from the jump/replacement instruction thing, this is a micro optimization. A good compiler should not transfer the risk of branch distance overflow to the assembler though. – Jean-François Fabre May 26 '20 at 09:27
1

@Jean-FrançoisFabre It depends on the assembler, to be honest. Some assemblers are fancier than others and have lots of pseudo-instructions to help out compiler writers (the extreme case is probably IBM's "High Level" assembler on z/OS). I've even seen the final branch resolution stage pushed into the linker(!); Lattice C used to do that IIRC. I don't think Microsoft's current compiler inherited that feature (AFAIK, the original Microsoft C was a Lattice derivative, which is why all the Lattice extensions are in the Microsoft C library.) – al45tair May 26 '20 at 09:31
I believe you. I only witnessed assemblers which generate real instructions. About the linker, with some option it can insert trampoline calls to resolve relative branching between objects on some platforms too (powerpc), else it just fails to link :) – Jean-François Fabre May 26 '20 at 09:46
@BruceAbbott - assembler versus machine coding. Assemblers make life easier for humans. A compiler can equally easily emit 0x123456 as "OP REG,DEST". – dave May 26 '20 at 11:27
4

@another-dave "A compiler can equally easily emit 0x123456 as "OP REG,DEST""_ not necessarily true. OP reg,dest could have several different encodings depending on memory location etc., that the compiler doesn't have to worry about if the assembler takes care of it. Without an assembler it needs more intimate knowledge of the encoding, which means it needs to have similar internal code as an assembler. And if you need an assembler anyway... – Bruce Abbott May 26 '20 at 16:49
@BruceAbbott: A typical assembler will spend the vast majority of its time reading and processing input. If e.g. i is a non-register-qualified automatic object of type int, the amount of work a compiler would have to determine the address-mode modr/m byte necessary to access it would be less than what an assembler given mov ax,[bp+12] would have to do to reach that same conclusion. – supercat May 26 '20 at 17:30
1

GCC doesn't write out a text representation of the GIMPLE to a file unless you ask it to. Useful for debugging missed optimizations and other compiler internals, but doesn't create I/O overhead during normal use. For an optimizing compiler, transforming through an SSA representation internally is not really an optional step. But modern GCC (unlike clang) really does use a separate assembler that runs on a text file or pipe, normally as aka GAS from GNU Binutils, for the reasons you describe. Modern GCC also integrates the C preprocessor into the cc1 compiler executable, too. – Peter Cordes May 26 '20 at 17:51
4

@supercat I am not denying that direct compilation is faster, but that it needs more code (and more opportunity for bugs etc.), especially if different CPUs are targeted. I don't have much experience with 8086, but the 68000 assembler I use is ~20 times faster than the C compiler, so the time taken to parse the assembly source is insignificant compared to other stuff the compiler does. Quite important when coding on a retro machine where compilation may take minutes rather than seconds. – Bruce Abbott May 26 '20 at 18:16
@BruceAbbott: Did the C compiler run the preprocessor as a separate step that produced a preprocessed output file? C isn't designed very well for compiler performance, but other languages like Pascal could be processed more efficiently than assembly language. – supercat May 26 '20 at 20:51

score 19 · Answer 2 · answered May 25 '20 at 22:18

19

According to this answer gcc does this because of the proliferation of different object file formats: x86-64 processor alone uses ELF, PE/COFF, MachO64.

But other compilers (e.g. clang) go straight to object files without using an intermediate assemble step, so I would disagree that an assemble step is "now ubiquitous".

answered May 25 '20 at 22:18

Erik Eidt

3,357
1
14
21

2

Some compilers even emit C code (which can then be compiled to assembly code which can then be assembled....). The advantage of that of course is that you're also independent of the particular hardware architecture, and probably of the OS architecture too. – dave May 26 '20 at 00:30
2

@another-dave: I know Cfront, Stroustrup's original implementation of C++, used to do that. Way back when I was into compilers I recall reading that some potential C++ optimizations were not possible via C (or much harder). I know small languages still like to target C as a "portable assembly language". When I moved from assembly to C way back when I really missed some types of instructions. One was the rotate instructions. Anyway a modest number of things are harder to do in C than asm. I'm not sure if any major languages/compilers still go via C. – hippietrail May 26 '20 at 02:42
2

Comeau C++ compiler also targeted C, but it's abandonware now. – Steve Jessop May 26 '20 at 03:52
3

@hippietrail: The new MJIT compiler in the YARV Ruby VM compiles to on-disk C source code and compiles it with bog standard GCC. No fancy in-memory streaming to libgcc or whatever. Literally the simplest and stupidest possible way to implement a JIT. And it is surprisingly fast. You'd think all the overhead of serializing C to disk and invoking compiler, assembler, linker, out-of-process would make it useless for JITting, but no. I'd argue that Ruby is at least somewhat major and YARV is the most popular implementation. – Jörg W Mittag May 26 '20 at 07:47
5

@hippietrail Also keep in mind that currently we have way fewer processor architectures to deal with for general purpose computing, which makes native way easier compared to the 25+ years ago. C as a portable assembly language in many ways meant having GCC w/ ANSI C, and not the OS supplied K&R cc. And as a side note, GHC (defacto standard Haskell compiler) still can do C output, primarily to help new ports. – mpdonadio May 26 '20 at 18:23
Your statement about Clang is a bit misleading. Clang compiles everthing to LLVM bytecode which it then compiles to object code. So LLVM bytecode basically fulfills the same role which assembler fulfills in other compiler toolchains: the smallest common denominator between all supported binary formats. – Philipp May 28 '20 at 10:49
@Philipp, sure if you're using the textual IR form and writing that to a file, but not if you're using the in memory form. https://stackoverflow.com/a/5598021/471129 – Erik Eidt May 28 '20 at 13:49
I don't really see what difference it makes whether the compiler toolchain saves the assembler/bytecode to file or keeps it in-memory. My point is that it does compile the high-level code to an intermediate low-level language which it then compiles into platform-specific binary. As opposed to compiling directly from high-level to binary. – Philipp May 28 '20 at 13:59
1

@Philipp, I see. My point is that used the latter way there is no assembler, no separate process step to invoke an assembler. All compilers use some IR to communicate from the front end to the back end, but that doesn't make the back end a classic assembler. – Erik Eidt May 28 '20 at 14:02

score 16 · Answer 3 · answered May 26 '20 at 03:55

16

Early Unix C compilers were actually a pipeline, preprocessor | compiler | optimizer | assembler > abc.o. The optimizer was an assembly optimizer, doing things like fixing up things that the compiler took the easy way on, like subroutine entry and exit, and deciding between a short or a long jump (PDP-11s had short conditional branch instructions). Having used other OSs that required paper tape for intermediate stages, this was quite the revelation.

answered May 26 '20 at 03:55

stolenmoment

261
1
3

4

Great profile picture. – Lars Brinkhoff May 26 '20 at 05:55
7

The optimiser in that pipeline would these days be called a "peephole optimiser"; it's the bit that does local optimisations on the generated instructions (as if looking through a "peephole", since it can't see any significant amount of context). – al45tair May 26 '20 at 09:28
5

This may actually be a big reason why - the UNIX philosophy has always had an emphasis on combining separate programs that do their own tasks. – jpa May 26 '20 at 11:36
3

gcc and clang is still a pipeline. Well, they can run also pipe-less, by creating temporary files. – peterh May 26 '20 at 20:31

score 11 · Answer 4 · answered May 26 '20 at 14:58

I think that some of the existing answers are using the modern state of development ecosystems to address the state of things in the "retro" time. I don't recall using anything other than a.out format until the mid-90s, and the switch was driven by shared libraries (which I wouldn't call retro). You need to think in terms of not being able to download prebuilt binaries; if you were lucky you could download source but often times you may have had to request a QIC.

In my experience (which I will admit is skewed more toward specialized systems and less towards general computing), compilers used external assemblers and linkers because they already existed, plain and simple. Debugging was slow enough with dbx/gdb, so why risk needing to maintain your own when someone else had already done the work. It also means that working towards a fully bootstrapped compiler (ie, a compiler written in the target language), was easier since there was less to bootstrap.

From a practical standpoint, it also meant being able to work with buggy compilers (and optimizers), by being able to look at the intermediate asm and patching it. And in some cases, prototype code was worked out at a high language, asm generated, and then the asm was hand optimized for cases where you could work around language semantics or if the compiler didn't "get" what you were trying to accomplish. For example, some later gen processors with 32-bit ALUs would support 64-bit math for certain operations (maybe MC68040?) that the compiler would never output.

good point, a friend of mine fixed Ada exceptions on an old compiler by patching the generated assembly with some sed script. — Jean-François Fabre, May 26 '20 at 16:59

score 7 · Answer 5 · edited Apr 26 '21 at 19:48

Turbo Pascal was made famous specifically because it skipped the assembly step (as well as most of the linking step). In a single pass it created raw, absolute located binary code and saved a lot of time. This is one aspect that made Turbo particularly fast. Action! on the Atari was very similar.

The time was saved mostly by skipping the I/O, especially to the then glacially slow and low capacity floppy drives of the day.

Compiling to assembly removed a litany of issues from the compiler. The compiler could pretty much blindly emit opcodes and pseudo-opcodes. The assembler and linker were joined at the hip, having to work with the shared experience of managing an object file, which contained both binary code, symbols, and relocation information.

Since the assembler and linker are so closely entwined, the assembler acts as a level of abstraction between the compiler and linker. This also allows the assembler and linker to diverge and improve apart from the compiler. As object file formats evolved, the compilers had to at best make only minimal changes (to perhaps update the meta data as manifest by assembler pseudo ops). Whereas were the compilers writing object files directly, then now all of them have to be updated as the linkers et al improve.

Turbo Pascal was able to target the very simple system that is CP/M, with its absolute memory layout and not need many of problems that a linkage step solved. Turbos solution to code reuse was simply the include file (and they sold several Toolboxes of utility source code to incorporate directly in to you applications rather than precompiled binary code that could be linked).

It wasn't until Turbo Pascal 4 that Turbo actually started to involve a formal conventional link step in to the process (via the addition of Units).

Addenda for comment:

most practical programs would be small enough to be handled by a single-shot build.

Simply put "small enough" is solely dependent on the speed of the machine doing the build. Linking pre-compiled objects is faster than compiling source code. At some point, the time it takes to incrementally rebuild and link a final executable will be faster than recompiling everything, all the time. As machines got faster, the size of that program grew. But machines were not always fast.

Back in the day, Moria (a dungeon crawl "roguelike" game) was distributed on DECUS tapes in source and binary. The source was 22,000 lines of VAX Pascal. Our tiny VAX 11/730, on which we did a remarkable amount of daily work (with up to 10 users), simply could not compile that program before the universe achieved heat death (at least it felt that way). Were it built as a bunch of modules that were linked together, we might have had a chance to dabble with it. But on our machine, it wasn't practical.

However, on the authors machine, a VAX 8600 (far far bigger), it was, demonstrably, not an issue. Since it wasn't an issue, he never bothered to break the program up. If he had, then maybe (maybe) we'd have had a remote chance of being able to build and iterate and play with the source code.

You also have to consider other aspects. When doing development on a large program on a PDP-11/70, my friend and I would have 3 terminal sessions open. One to run the program, one to edit the program, and one to compile the program.

We did that simply because getting in and out of the editor was glacial due to the size of our file. When it started up, the editor (on our 1200 baud terminal...) even noted "Loading xxx.yyy slowly...", and it wasn't kidding. Even then we still had to manually page blocks in and out of active memory. It would have been awful if we had to weather reloading that editor every compile cycle. Compile time alone was bad enough if a simple typo slipped in.

I can't say whether we could have done multiple source files with incremental build and link for our program or not -- we were just college students bumbling our way through it. I don't even know if it was possible with that particular dev environment (probably, but we may not have got that far in to the back of the manual). But it just stands as an example that highlights how small the definition of "small enough" can truly be, and how fast one can outgrow the tools.

Oh, just how big WAS our program? 35K of source code.

All of these tools were built to facilitate productivity, and the domain of those tools was REALLY BAD hardware. It's amazing anything was accomplished at all in hindsight, but that's just looking backward with jaded eyes.

I ran the compile/assemble/link cycle on a C environment for the Atari 800 — once. It was completely unusable it took so long.

I have a current Turbo Pascal project, it's around 1200 lines of code. It's in several include files. On a simulator, running a simulated 4Mhz CPU, this takes 1-2 minutes to build. But, while the CPU is simulated at 4MHz, the I/O is my "XXX Gbps" hardware, vs 2000 Bps (if we're lucky) floppy drive. It would be even slower on a "real machine", since it has to read all of the files and write the final .COM file each build, vs normal Turbo compiling a memory based program in to a memory based executable. 1–2 minutes isn't bad. Human scale, it's ok. But 10 lines per second? Nothing to brag about. But in the end I have no choice because of how TP is structured and its feature set. This will not get any faster outside of porting to something else, and who knows at what point that would be.

It's not 20 minutes, thank heavens for that.

Although the Turbo Pascal approach is limited to programs of a certain size, and using a separate linking step would make it possible to accommodate bigger programs than would otherwise be possible, most practical programs would be small enough to be handled by a single-shot build. My question is why the concept of single-shot builds had largely gone by the wayside by until Turbo Pascal came on the scene. — supercat, May 26 '20 at 18:49
I don't think the speed of the machine doing the build doesn't matter so much as the relative fraction of partial-build artifacts that would be usable, or the ability to hold in memory everything that's needed for each compilation pass. If your code could be partitioned into a couple of source files that don't call each other too extensively, and only one would change frequently, I think you could slash your build time by compiling one program to a CHN file with a fixed start address near the top of memory, and having the other program use BlockRead to fetch that into memory. — supercat, May 27 '20 at 15:48
Your CHN-file program would need to start with some inline-machine-code functions that would jump to the "real" functions, as well as some that would call back to the original. The "main" program would need to include a inline-machine-code function to patch the callbacks in the CHN file as well as to functions that would chain to those in the CHN file. A little bit of a pain to set up, and you'd have to rebuild everything if your guesses as to where things should go need to be adjusted, but otherwise you could avoid rebuilding the CHN file. — supercat, May 27 '20 at 15:52
The major reason for this being possible was that the editor+compiler+runtime was small enough to fit in memory with room to spare for the user. The trick with the runtime being reused from the compiler was probably the enabler for this. — Thorbjørn Ravn Andersen, Jun 03 '20 at 18:58
@ThorbjørnRavnAndersen: That certainly helps with scenario of compiling and editing code all in RAM, but I don't think it would be essential. If one were trying to design a single-shot build system on a machine with small RAM, two tape drives, and no disk I/O, having a binary format that would let a compiler intersperse code with back-patching information would help. On a Z80 system, for example, if it would only be necessary to patch 16-bit words, one could have the "write word that will be patched later" function write the offset between the start of the code and... — supercat, Sep 16 '20 at 17:15
...the previous such word that was written using the same style of patch. Then at the end of the code, include a list of the patch values. A JMP to a forward address would take five bytes at load time rather than three (two bytes could be jettisoned once loading was complete), but if all compilation units had a global list of function entry points and shared objects and started with a jump table for all of their exported functions, and cross-module calls were output as e.g. RST 30h followed by a module number and function number, the RST handler could then... — supercat, Sep 16 '20 at 17:23
...find the start address for the module, index from that to find the start address of the appropriate routine, rewrite the two bytes after the RST with the proper function address, replace the RST with a call, and call it. — supercat, Sep 16 '20 at 17:24
@supercat You can always do tricks, but why if you don't have to. The PolyPascal system (of which Turbo Pascal had a slightly dumber interface) was a true marvel of engineering. — Thorbjørn Ravn Andersen, Sep 16 '20 at 22:37
@ThorbjørnRavnAndersen: Not familiar with that. The purpose of the tricks would be to allow a limited-memory build system whose I/O is limited to a couple of tape drives to process most programs with one pass where it reads source code and writes machine code, and then a second read-only pass where it loads the machine code and runs it. Most Z80 code isn't relocatable, but a code generator that uses back-patching could add support for load-time relocation relatively cheaply. — supercat, Sep 16 '20 at 22:47
@supercat That wasn't necessary. The clever design of the runtime meant it just had to be copied to disk. To my understanding no backpatching was needed. — Thorbjørn Ravn Andersen, Sep 16 '20 at 23:01
@ThorbjørnRavnAndersen: Yeah, but the designs required permanently allocating space for everything in the runtime. Using an approach like I described would make it possible to jettison parts of the runtime that aren't used by a particular application. — supercat, Sep 17 '20 at 02:41
@supercat yes. That was why it was so fast and so successful. Shared common runtime between IDE and program. — Thorbjørn Ravn Andersen, Sep 17 '20 at 07:12
@ThorbjørnRavnAndersen: If the compiler doesn't need to re-use the runtime between program builds, adding the ability to jettison parts that are unused would be useful, and the approach I was describing should be compatible with a single-shot compile/build cycle. — supercat, Sep 17 '20 at 14:41
@ThorbjørnRavnAndersen: The ability of Turbo Pascal and similar tools to build programs in memory certainly contributed to their success, but even without such ability a single-shot build cycle could still have yielded massively better build times than the multi-pass linkers that still seem to dominate. — supercat, Sep 17 '20 at 22:27

score 3 · Answer 6 · answered May 26 '20 at 17:59

I don't know exactly when it started, but Wikipedia says:-

The first C compiler, written by Dennis Ritchie, used a recursive descent parser, incorporated specific knowledge about the PDP-11, and relied on an optional machine-specific optimizer to improve the assembly language code it generated. In contrast, Johnson's pccm was based on a yacc-generated parser and used a more general target machine model. Both compilers produced target-specific assembly language code which they then assembled to produce linkable object modules.

Most compilers are not capable of creating all the code required to produce a complete program from high level source only, so some assembly is required anyway. If you need an assembler for producing startup files and inline assembly code etc. anyway, why not use it? Or just use an existing assembler and save work on the compiler package. This becomes even more useful when the compiler needs to target different CPUs that may have similar assembly language but quite different machine codes.

Another reason for having a separate assembly phase is that it guards against the compiler producing invalid machine code. If the compiler produces the machine code directly then it is responsible for every detail of the encoding, which is easy to get wrong when nothing is checking it

I have seen some real clangers in directly compiled code for the Amiga - things like incorrect encoding that crashes later CPUs, jumps into the middle of instructions, instructions with blank register lists that are effectively no-ops, and 'junk' code that was apparently meant to be for alignment - all stuff that a good assembler would would have flagged (and much harder to fix when the machine code is produced by direct manipulation of bits by the compiler).

I would expect Ritchie targeted assembly language because that's what compilers for other languages were already doing. Something like Turbo Pascal, however, doesn't need to include an assembler to use its runtime library, nor to allow user-code functions to include machine-code routines. I would guess that the evolution of object-file formats is responsible for the decision to output assembly language, rather than constantly-changing object-file formats, but that still leaves the question of why even small programs have to be processed that way. — supercat, May 26 '20 at 19:03
I would guess that the way mainframe computers worked made the 'pipeline' process more attractive than trying to squeeze everything into memory at the same time. Early 'desktop' computers had relatively large memory but limited storage and slow CPUs, so it made more sense to compile direct to memory. Turbo Pascal was a rather special case. It was written in assembler for 8080 and 8086 only, developed from a cassette tape based compiler for the Nascom (Z80 kit computer). — Bruce Abbott, May 26 '20 at 22:59
Many mainframes were tape based, and so I would think that on such systems minimizing the number of tape swaps would be desirable. If one has some jobs that support single-shot operation and some that require multiple passes, I would think the best use of computer time would be to process a tape of single-shot jobs while one transcribes card stacks for multi-pass jobs to tape, then process the multi-pass jobs tape, while transcribing single-pass jobs to tape, etc. Even if a compiler couldn't produce directly-executable code, it should be able to produce something... — supercat, May 26 '20 at 23:26
...that would be easy enough to process that it could be loaded into memory and apply fix-ups to itself. If one assigns indices to all of the functions exported by a program, as well as all modules, and all functions imported within each module, it should even be practical to have a loader handle a concatenation of separately-built files. For large jobs, having to maintain indices, common sections, etc. would be a pain, but for many small jobs, the reduction in build times would be worth it. — supercat, May 26 '20 at 23:30

score 3 · Answer 7 · answered May 27 '20 at 15:40

Could this be when the "middle-end" was created? ("Front-end" = lexing, parsing, analysis and "back-end" = compiling to machine code.) With the "middle-end" the idea was to have an Intermediate Representation of the code. That way you can break the process into escapulated steps, with the IR as a bridge between the two.

Then you can focus on turning your IR into platform specific code as a separate tasks, rather than something you need to think about from the beginning when examining the source code.

(Image from "Crafting Interpreters" by Bob Nystrom)

You can see where the IR sits in the process of going "up" and "down" the compiler mountain.

I'm no expert, this is just a guess that ASM is being used as the IR?

score 3 · Answer 8 · answered Apr 26 '21 at 21:42

For the record: IBM's first Fortran compiler did not necessarily produce machine code output (it depends on how you look at it). Most of the compilation process is spent building up symbolic instructions (i.e., lines of assembly) in a table called CIT (Compiled Instruction Table). Parts of the program depend on the symbolic format; for instance, section four detects basic blocks by the presence of an operation mnemonic beginning with “T”, since such instructions designated transfers. The CIT goes through many phases of refinement during compilation.

After the CIT has been fully constructed, and all the code has been compiled, an assembler built in to the compiler is run to assemble the code. (I believe that it was possible to run the assembler on its own, separately from Fortran.)

Source: “Systems manual for 704 Fortran and 709 Fortran”, via http://www.softwarepreservation.org/projects/FORTRAN.

score 0 · Answer 9 · answered May 27 '20 at 13:11

Assembler output can help with debugging. The compiler can annotate the assembler with comments that help the programmer and the debugger relate instructions back to the higher level language statements. Some of it is simple quality-of-life stuff like giving numbers in both decimal and hexadecimal bases, right up to writing the actual high level statements in comments next to the assembly code that implements them.

When and why did high-level language compilers start targeting assembly language rather than machine code?

9 Answers9