21

One thing that struck me about the design of COBOL was that it was surprisingly complex, particularly for the era. As in, if I were trying to squeeze a compiler into a few tens of kilobytes of memory, I would be unpleasantly surprised by the number and diversity of features that had to be implemented.

I never saw this remarked on until today, when I stumbled on 10 Most(ly dead) Influential Programming Languages:

COBOL was also enormously complex, even for today’s languages. This means that COBOL compilers lagged contemporaries on microcomputers and minicomputers...

And so they did; there was a generation of microcomputer business software, that would classically have been expected to be written in COBOL, that was actually written in BASIC instead. Now, there were a number of reasons for this, such as microcomputer programmers first learning BASIC because it came with the machines, but the above quoted reason seems likely to also have been a factor.

Are there any references for the complexity of COBOL? Either quantitative (e.g. measurements of the size of a COBOL compiler at a given time, compared to compilers implemented with the same technology for other languages), or firsthand accounts of the difficulty of implementation?

Thorbjørn Ravn Andersen
  • 2,262
  • 1
  • 14
  • 25
rwallace
  • 60,953
  • 17
  • 229
  • 552
  • 6
    It might be helpful if you could specify what you mean by 'complex' ? To my knowledge (somewhat limited, as I usually converted COBOL to Assembler :)), COBOL isn't complex at all. in fact it's made in a way to allow rather straight forward translation on the go. That is, at least up to COBOL-74 maybe even C85 though that added real new ideas. Could it be that people get confused by the high number of keywords? Most of them are just syntactical sugar enabling readability by being a kind of narrow formalized English. – Raffzahn Dec 27 '22 at 14:07
  • 1
    @Raffzahn In this context, by complex, I mean that a compiler for the language will tend to be large, i.e. consist of many kilobytes of code, thereby having difficulty running on small machines. Are you disagreeing with the above assessment and taking the view that though there are many keywords, the language translates straightforwardly to machine code, and a compiler for it should therefore be small? – rwallace Dec 27 '22 at 15:35
  • 5
    The first thing that comes to mind (at least for me) with COBOL is not exactly complexity. Chattyness I would agree, yes, but complexity? – tofro Dec 27 '22 at 17:18
  • @rwallace Exactly. Or as Tofro puts it 'Chatty'. Just look at a COBOL program .and not ... _well, this doesn't fit into a comment, let me put it into an Answer. – Raffzahn Dec 27 '22 at 18:21
  • 6
    I programmed in Cobol for nearly 20 years so I am a bit jaded, but I don't consider the language all that complex. But it is probably very complex for a compiler writer to adhere to the extremely detailed definition of the Cobol standard, which came out of a committee with huge and varied interests, like government. Much complexity is not the language itself, but the runtime. A "READ" verb has to handle sequential, relative, and indexed files for example. I learned on a PDP-11 with 32k with 20 terminals, but realistically only one person at a time could compile a Cobol program. – mannaggia Dec 27 '22 at 20:37
  • 4
    At least two companies - Micro Focus and Ryan-McFarland - produced COBOL compilers for micros starting with 8-bit CP/M systems. I supported an accounting package (Paxton Business Desk) written in COBOL for CP/M and DOS in the mid-1980s. – grahamj42 Dec 27 '22 at 22:10
  • 2
    @mannaggia "But it is probably very complex for a compiler writer to adhere to the extremely detailed definition". quite the other way, the very fine detail of definition simplifies compiler construction. COBOL relays on clear data definition which in turn allow straight code generation. There isn't much leeway to screw that. – Raffzahn Dec 27 '22 at 23:43
  • 4
    I used MicroFocus Cobol at British Rail in the early 90s. The thing about the Cobol complier that I most recall is that it is single-pass and so you can't do recursion. (I used to say - "you may curse, but not recurse!") – kpollock Dec 28 '22 at 08:22
  • 1
    @Raffzahn: I would also guess that although COBOL is verbose when written as text, programs would never be edited within RAM as text, but would instead be edited off-line using a keypunch. – supercat Dec 28 '22 at 21:02

7 Answers7

27

No, COBOL is not complex and didn't require complex compilers.

At least not for COBOL up to 74 (*1) which was the standard at the time of introduction of micros (mid 70s to late 80s). From the compiler's angle, it's straightforward, which should result in comparably small compilers. Though, it got some 'Chattiness' as Tofro calls it, by having many, sometimes even redundant keywords as well as high level features, which both may be mistaken as complexity.

It's also fundamental to keep in mind that COBOL wasn't created as some academic exercise, but to solve real world tasks with real world machines, in the late 50s this meant computers with usually less than 64 KiB and only the most basic OS

Next COBOL isn't one of many languages, but an early one, its concepts are not shaped by what we take as canon today. Today we tend to see programming languages as a primitive core providing certain, almost standardized constructs coming out of an ALGOL tradition (further simplified by C). COBOL wasn't created in that tradition. Like Will Hartung notes, COBOL has way more in common with application specific (4GL) languages. No surprise, it was designed as a language for data processing. Much the same way FORTRAN was done for computing.


A Language Made for Simple Compilers

Other than one may assume, COBOL is an extremely easy to translate language, not at least due its simple and straightforward structure. A program always has to be written in a specific order, easing translation. It runs a bit like this:

  1. Identification Division

    It essentially contains just the program name. Everything else is formalized documentation. Useful to have, nothing the compiler needs to care for (*2)

  2. Environment Division

    Configuring the compiler.

    Essentially what today's compilers get handed via endless numbers of hard to decode command line options. COBOL has it nicely hedged up in a machine independent format. No need to invent anything, just parse it.

    It contains basic items like compile with debug support (*3), target computer, disk space, character set, special character (like currency sign or separators) and so on in its Configuration Section.

    The Input-Output Section, in turn, describes what files are to be used and how they are accessed - think of it like the definition of a database connector - which in fact it can as well contain.

  3. Data Division

    It defines, much like the name says, all data structures, as there are

    • Record structures for
      • files,
      • databases and
      • communication
    • Global data (which as well can be structured)
    • Exchange structures between routines
    • Report data
    • Screen Data

    Especially the latter two might be notable, as COBOL provides a nice, device independent way to describe printout and screen handling. No need to dig thru hundreds of lines of PRINT or write() to decode what is outputted when. COBOL takes care of all printer and screen handling much like file I/O. One simply fills all fields (usually with a single MOVE CORRESPONDING) of a print or screen definition, and COBOL does everything else. Including line counting, page control, etc.

    Most other languages (4GL not so much) can do similar only with additional report generators which usually have extreme 'notable' structures, made to somehow fit. For COBOL it's built in with the very same syntax as any other structure.

For a compiler writer all so far is fast food easily swallowed. Each of these Divisions and Sections can be turned straight into memory tables to be later used to generate addresses, as well as reserve storage within the compiled program.

Only after all is said and defined, the code itself follow:

  1. Procedure Division

    COBOL code is extremely strict forward, not least as it doesn't know much of the flow control other (ALGOLic) languages offer. All statements are sequential and there is, except for nested IF, no need to keep track of any structure. Just label names. All program flow (*4) is handled by GO TO and PERFORM.

    • Go to is just that, execution will be continued at the label given.
    • Perform fills two function
      • calling of subroutines and
      • creating loops around subroutines.

    That capability of PERFORM is exactly the clue for compiler design that takes away next to all complexity of code generation in other languages:

    • In the case of a simple subroutine call, it records the return address and jumps ahead to that label.
    • In case of a loop like PERFORM routine-1 n TIMES (*5), or any other variation, the same subroutine calling but now part of a looping code block is done. All can be turned straight into assembler or machine code, without looking at any structure level or noting to place the loop ending code after an unknown number of statements.

Long story short:

COBOL was designed with straight compilation in mind, something very basic and thus short compilers can deliver. But it comes with mighty tools, which have to be offered by the OS, or other runtime.


Why not on Micros?

Interesting question. COBOL was quite available on early micros, for professional systems, but also low end CP/M. E.g. as Micro Focus CIS COBOL. It also has been used quite a lot. So why are there so many BASIC applications? One may think of 3 reasons:

  1. (Business) BASIC was already a force on low end machines - just think about HP or Olivetti and even more MAI and WANG. Especially the latter had a strong standing in low end business systems, offering BASIC at very high integration level.

  2. Infrastructure. COBOL takes a lot of its abilities from integration with sophisticated file systems. Random access record access, variable length records, indexed access and databases are nothing systems like CP/M or other lowest end micros offered by default. So either the COBOL had to come with those access layers included (*6), or additional Packages were needed.

  3. 'The English Publisher issue' (*7): Micro Focus (et al.) requested premium prices for their compilers, resulting in comparably few sales. Their idea was that COBOL is something for companies porting their smaller and/or client applications down to minis/micros. They got the money to pay more for a compiler than a computer, so let's milk 'em.

Then again, in all fairness, already back then everyone was preaching COBOL as a dead end and gone even before the '80s could develop a style of their own. So a 4th reason was that no company really saw a need nor a motivation to spread COBOL knowledge and support to a new generation of programmers.


Some more details

Chattiness ...

COBOL was meant to be readable, thus many of the keywords are redundant or have alternate spellings. A great example is the VALUE keyword used to define test values for fields.

Let's assume a stock record has a field containing a marker telling if that item is to be delivered virtual (delivered via e-Mail), as single mailing or can be collected with others - coded as V/S/P. In most languages, one would do some equates or defines and a series of IF to check. COBOL allows such to be nicely defined as part of data definition and checked in a quite readable fashion.

       01  delivery          PIC X(01).
           88  deliver-virtual  VALUE  "V".
           88  deliver-physical VALUES "P", "S".
           88  deliver-valid    VALUES "P", "S", "V".

VALUE and VALUES is exactly the same keyword and interchangeable. Neither implies that anything different. Just syntactical sugar for readability.

... or Not

In fact, 88 type Condition Names, as they are called, make a great example of COBOL being way less chatty and more high level than other languages:

   IF NOT deliver-valid THEN SET deliver-valid TO TRUE. (1,2)
   IF deliver-virtual THEN PERFORM delivery-per-mail. (3)

Quite readable, isn't it?

#1 checks if the record got a valid marker, so any of P/S/V. If not, #2 sets it to the default way of collecting packages (the first value given). #3 performs a virtual delivery if needed. Heck, I have a hard time describing the workings other than the COBOL code already does.

Instead of 'Chaty' it might rather call it sophisticated - or posh if one likes.

Now try the same in C.

Small Computers

While COBOL was from the beginning made with strong data I/O in mind, including high level access of streaming data as well as random and indexed, it was meant to run on small machines. Around 1960 a computer with 128 KiB was considered quite large. So any compiler had to be made in a way to fit in considerably less - plus leaving space for OS and I/O. Not much different from later micros, isn't it?

It's Not ALGOLic but Really Early

While officially presented a year after ALGOL (1960 vs. 1959), COBOL is in no way influenced by it. It doesn't know a stack, it doesn't need one, and, on top, some of its constructs simply won't work with a stack - at least not without issues. The non-availability of the stack idea and COBOL being built on programming styles before that can not be underestimated.

It's Not Minimalist

ALGOL, and all its decedents, are based on the idea of providing a rich structure with as few different components as possible and having them as uniform as possible. All with a goal to be as universal as possible offering all flexibility to have. COBOL is anything but minimalist.

COBOL means Serious Data Shoveling

In contrast, COBOL is a rich language made for data shovelling. Nothing else, but that really good. COBOL is perfect for example to write data driven applications. Things like

  • Application front end for a database,
  • Order booking,
  • Order fulfilment,
  • Part lists,
  • Receipts printing,
  • Bank statements,

or anything else in accounting and data management often needs only a few program lines. Well, that and a description of the data structures handled - which in turn was usually just included via COPY.

Real world COBOL sources are more often than not shorter than the same task coded in C or any other language (maybe except RPG). Such programs for the most part just say 'MOVE CORRESPONDING' between input and output, maybe add up a few values, multiply VAT and check for exception/ed condition. That's it (*8).


*1 - For most parts this includes COBOL-85, although 85 did also introduce (useless) new features that made compilers more complex, like inline perform (i.e. loop content no longer needed to be a dedicated routine) or scopes used for multi statement IF and nested subroutines. Bah. No one needs that!

*2 - Well, if a DATE COMPILED. sentence is present its comment will be replaced by the actual compile date in the listing.

*3 - Who doesn't remember endless #IFDEFs and debug macros in C? Well, COBOL got the same, except, more structured. By placing a D in column 7 any source line can be declared debug code. They get only compiled if debug mode is enabled in the Configuration Section of the Environment Division. All standardized. No need to learn whatever macros a programmer of a certain file invented for their debug strategy.

*4 - There are further variations, but none change the basic handling. Just adding more linear code blocks.

*5 - Again this plural variation like with VALUE, one can write TIME or TIMES, all to satisfy a more natural sounding source code.

*6 - Well machines that offered ISAM files and so on were exactly the ones by MAI and WANG which did run their Business BASIC variants - so why use COBOL for new developments?

*7 - Copyright in England of the 18th/19th century was about the major hurdle for readers and writers, as publishers used it to get highest possible profit per book sold, resulting in high sales price and quite small print runs. One of the reasons why the US (which didn't honour English laws) and Germany (whose states not even recognized each others) became a mecca for readers - cheap books resulting in widespread knowledge. In addition, a goal for English authors, who were more interested in getting a fast overseas publication with high volume and good returns than seeing their books nicely printed but no income.

*8 - Well, this that's it of course relies heavily on a good data definition first.

Maury Markowitz
  • 19,803
  • 1
  • 47
  • 138
Raffzahn
  • 222,541
  • 22
  • 631
  • 918
  • 3
  • meanwhile a lot of value in COBOL was integration with the more complicated/expressive I/O systems of the day - various fixed and variable record formatted files (rather than stream files as we're used to today) and ISAM indexing. I've mentioned elsewhere here that Realia COBOL - and you paid the big $$$ for that - came with all of that and more necessarily written and supported by them on top of DOS.
  • – davidbak Dec 28 '22 at 01:51
  • 2
  • I worked for a year with R.B.K Dewar, a brilliant (and funny!) computer scientist - earlier known for SPITBOL, later known for his involvement with GNU Ada (GNAT) - he worked closely with and for Realia - and he entertained us hard-core Ada programmers greatly with his claims of how great COBOL was. He wrote the Realia COBOL compiler's x86 peephole optimizer in COBOL. And it was a good one! He refuted that COBOL was chatty. Instead, he extolled it's really easy and terse subroutine syntax: One dot to declare a subroutine! (.) (I think that was tongue-in-cheek.)
  • – davidbak Dec 28 '22 at 01:54
  • 3
    @davidbak As /370-Assembly programmer I'm bound by trade to state that COBOL is the worst language choice possible. Having that out of the way, there is something I really love and your comment reminds me hat it's a common feature of COBOL and Ada: well defined data. While not really enforcing it, COBOL does quite well support fine grained definition of what data structures and their limits are. As asm-guy I'm all about data. COBOL's way is incredible odd, but extreme helpful. Same goes for Ada, except with next to no oddities and thruout checking added. – Raffzahn Dec 28 '22 at 02:35
  • "FORTRAN was done for computing" -- aren't all programming languages for computing? Maybe you meant to say "scientific and mathematical calculations". – Barmar Dec 28 '22 at 14:36
  • 1
    @Barmar No, programming languages are to write programs to control computers. For what purpose depends on the language. Also it's what the terms were: Data Processing vs. Computing. In the 70 years since then the 'computing' faction won the interpretation, making that term the generic one. It should be telling that you need that many words to describe it, shouldn't it? – Raffzahn Dec 28 '22 at 15:00
  • 4
    Re: chattyness: what I really liked about COBOL was the speed at which you could input programs. There were no special characters so if you could touch type, the whole program could be in within a matter of minutes. – cup Dec 28 '22 at 15:52
  • 2
    @Raffzahn - don't forget "cybernetics", speaking of terms from quite some time ago - from my POV in the USA that was a term much favored by the Soviets and their satellites (at least you saw it all the time in translations of academic papers). – davidbak Dec 28 '22 at 18:02
  • 1
    Minor point - generally, yes, the IDENTIFICATION DIVISION is just documentation, but Cobol-85 added the IS INITIAL PROGRAM clause to PROGRAM-ID. This affects how the program runs when CALLed. Without it, if you CALL a Cobol program and it returns to the calling program, and the program does not CANCEL it before the next call, the state of all files and WORKING-STORAGE variables is maintained between CALLs. But if the CALLed program has IS INITIAL PROGRAM, they will re-initialize every time it is CALLed. Essentially an automatic CANCEL, except it remains in memory. – mannaggia Dec 28 '22 at 21:05
  • 1
    Another issue I suspect may have limited the popularity of COBOL on microcomputers is that when using a computer with a high-speed card reader, someone with a keypunch can easily edit programs whose source-code representation is vastly larger than a computer's RAM, but when using a microcomputer, editing programs bigger than RAM is a major pain. – supercat Dec 28 '22 at 21:19
  • 1
    I disagree that the things Cobol-85 added were useless - many made it much more expressive Some were a godsend, such as Inline PERFORMs, scope terminators, EVALUATE, and reference modifiers. – mannaggia Dec 28 '22 at 21:20
  • 1
    @mannaggia Do you really expect an analysis of every detail of a quite capable language in a few paragraph - especially on not really related issues? Even more if that part is already covered? Note the paragraph between #3 and #4. Also, how many baubles are needed to mark a tongue-in-cheek footnote? – Raffzahn Dec 28 '22 at 21:29
  • 2
    I see the ALTER command as the only complex feature of COBOL - it may not be complex to compile but it does make understanding the code difficult. Fortunately, as far as I can see, it is not being used often. – Jonathan Rosenne Jan 05 '23 at 10:52
  • 1
    @JonathanRosenne It is in fact rather easy to compile. Al needed is a list of all paragraphs with GO TO which is a subset of the list of paragraphs it needs anyway to manage labels. And yes, the concept is rather strange from today' s view. much like the PERFORM a thru F, which does require quite unusual ways of code. Still, all can be done with low effort. – Raffzahn Jan 05 '23 at 11:42
  • 1
    @Raffzahn Although they are not so difficult to compile, both the ALTER and the PERFORM THRU make the program difficult to read which makes a program using them complex. – Jonathan Rosenne Jan 05 '23 at 15:06
  • @JonathanRosenne No need to tell ... been there, survived it :)) – Raffzahn Jan 05 '23 at 15:34