34

In the fifties and sixties, program source code was typically stored on punch cards, one card per line.

The most common card format was the IBM 80 column by 12 row. For source code, this was commonly used as one character position per column, the first 72 columns used for actual code, the last 8 for a sequence number. (Practical application: if you dropped a deck of cards all over the floor, after you picked them up, you could get them automatically sorted by the sequence number into correct order again.)

In those days, computers didn't really do lowercase. Uppercase text only needs six bits per character.

That means six bits per character were left over. What were they used for, if anything?

rwallace
  • 60,953
  • 17
  • 229
  • 552
  • 11
    Punched cards didn't use a binary encoding system. There is nothing left over. https://en.wikipedia.org/wiki/Punched_card#IBM_80-column_punched_card_format_and_character_codes – cup Sep 09 '20 at 07:53
  • 3
    There was an alternative coding system called "column binary" where the 24 holes in two columns were a direct binary representation of three 8-bit bytes. This was used for distributing executable code on punch cards (which was a useful option even though cards were physically bulky, because of the proliferation of incompatible magnetic tape hardware and data formats). – alephzero Sep 09 '20 at 11:34
  • 19
    The rest of the column was used for structural integrity :-) – dave Sep 09 '20 at 12:09
  • 3
    Hollerith cards were used a lot later than the sixties - well into the early 1980s. – Spehro Pefhany Sep 09 '20 at 18:13
  • 1
    @SpehroPefhany they were used, but not "typically" any more. If I had classes in the 1970s that required punch cards, I'd get the code working with a time sharing terminal then have the file sent to a card punch machine. – Mark Ransom Sep 09 '20 at 18:24
  • 2
    @MarkRansom Undergrads at University of Toronto used the IBM 029 keypunches and submitted jobs (with the JCL) into hoppers in the seventies. There were other computers than the IBM 370-165 (with terminals), but all the undergrads used the keypunches. I think UofT was not particularly behind the times. – Spehro Pefhany Sep 09 '20 at 18:31
  • 5
    @MarkRansom I was punching cards in 1976 in a diploma course and again in 1979 in a postgrad summer course at UCSC. – user207421 Sep 10 '20 at 02:02
  • 1
    You can make your own digital punchcards here: https://www.masswerk.at/keypunch/ – Bobson Sep 11 '20 at 17:29
  • 1
    When I started computer science at university in 1980, we used punch cards. They didn't have serial numbers for sorting, so we took extraordinary care not to drop a pack! – Paddy Landau Oct 01 '20 at 11:44
  • 1
    Fun punch-card fact: they long predate computers. IBM long sold specialized equipment (like the type-80 sort and the type-402 accounting machine) to save and retrieve data like stock data, railroad data, and census data. The equipment worked by timing: a card is fed into the machine long-edge first. As it moves, a 'brush' either completes a circuit or doesn't for each individual hole. In the first part of the overall timing cycle, the brush either makes contact (or not) through a hole in the '0' row. Next is the '1' row, and then the '2' row. (There's also an 'x' and 'y' row at the top) – PESMITH_MSFT Oct 01 '20 at 18:45
  • 1
    For sorting, a card is directed to one of 13 hoppers based on what hole the brush detects first.

    The Bitsavers site has great scans or early manuals for these devices.

    – PESMITH_MSFT Oct 01 '20 at 18:45

9 Answers9

53

TL;DR;

Punch card code is not binary but a collection of n out of m encodings.


Long Story

Yes, really a long story, so I'll only cover the main line from Hollerith to EBCDIC. There are many sidelines for special equipment, situations and as used by different manufacturers. Some covering up to 7 holes but all mostly compatible in the basic Numeric/Alpha region ... a bit like the various ISO 646 encodings :)

Punch card encoding is essentially combinatoric and based upon decimal - with one hole per number - as it did grow out of numerical only - and based on the way cards were seen:

Example:

  COL 1234...
ROW  ,-------~
  12 |
  11 |
(1)0 |0000...  (Row zero is called 10 when it's about Alpha)
   1 |1111...
   2 |2222...
   3 |3333...
   4 |4444...
   5 |5555...
   6 |6666...
   7 |7777...
   8 |8888...
   9 |9999...
     '-------~

Notation: Punched characters are described as their row numbers connected by hyphens, like 12-1 marks an A.

Numbers

Numbers use a one out of ten encoding. A number gets only one hole within a column. Rows 11 and 12 were used for - and + as sign.

(Upper Case) Letters

To add alpha, a two out of twelve encoding was used (or more precisely, one out of three plus one out of nine) Each of the 26 basic (English) characters got one 'group' hole in row 10..12, called 'Zone', and a 'number' hole in 1..9. 3 x 9 = 27 combinations, a pleasant fit to hold 26 character, isn't it?

  • A..I got a hole in 12 plus one in 1..9
  • J..R one in 11 plus another in 1..9, while
  • S..Z had it in 10 (0) and 2..9.

The surplus combination (27 positions minus 26 letters) was assigned to 10-1 to avoid having two adjacent holes (*1).

      RETRO
     ,-------~
  12 | X
  11 |X XXX
  10 | 
   1 | 
   2 | 
   3 | 
   4 | 
   5 | X
   6 |    X
   7 | 
   8 |  X
   9 |X  X
     '-------~

Punctuation

To add punctuation, the scheme was repeated using a three hole encoding. This time a hole in row 8 marks all punctuation, with characters as none or one out of three (10..12) plus one out of six in row 2 to 7, allowing up to 24 symbols.

      *C+=1
     ,-------~
  12 | XX
  11 |X   
  10 | 
   1 |    X
   2 | 
   3 | X
   4 |X
   5 |  
   6 |  XX  
   7 | 
   8 |X XX
   9 | 
     '-------~

Lower Case Letters

EBCDIC finally added lower case letters by again using 3 holes, but this time two in the group section (10..12), making it a two out of three plus one out of nine (1..9). Except for the added group hole, the encoding was exactly like the uppercase, so

  • a..i like A..I plus 10 (0)
  • j..r like J..R plus 12
  • s..z like S..Z plus 11
      Retro
     ,-------~
  12 | XXXX
  11 |X XXX
  10 | X
   1 | 
   2 | 
   3 | 
   4 | 
   5 | X
   6 |    X
   7 | 
   8 |  X
   9 |X  X
     '-------~

Control Characters

Control characters were filled in with EBCDIC as well, much like punctuation, but this time with an additional hole in row 9 and using a one out of three (10..12) plus one out of seven (1..7) producing 28 possible control codes.

Oddities

Two control characters (NUL and DS) use a five hole combination, while SPACE means no hole at all (and differs form BLANK). 12 alone has been redefined to & as + wandered over to 12-8-6

          S
          P
          AN
          CUD
      &-/+ELS
     ,--------~
  12 |X  X X
  11 | X    X
  10 |  X  XX
   1 |  X  XX
   2 | 
   3 | 
   4 | 
   5 | 
   6 |   X
   7 | 
   8 |   X XX
   9 |     XX
     '-------~

Bottom line

Although a hole might be seen as a binary values, punch card holes are not, but represent their row.


*1 - It was later used for the slash (/).

Raffzahn
  • 222,541
  • 22
  • 631
  • 918
  • 2
    This great! Even back in the early 70's when I first started using punch cards, I never knew of anyone who actually knew what the encodings were. – RBarryYoung Sep 09 '20 at 15:10
  • 9
    @RBarryYoung I learnt Fortran using 40-column hand-punched cards... you had to know the encodings. – TripeHound Sep 09 '20 at 16:43
  • 1
    @DanIsFiddlingByFirelight You're right about C as it should be 12-3, but for = it's only two holes, as my description missed the case of no hole in 10..12 for symbols. Which gives 4 groups of 6 (the mentioned 24) . Sorry, my fault, changed the example as well as the desdription. – Raffzahn Sep 11 '20 at 20:08
  • @RBarryYoung: For those of us who didn't have a card punching machine a HB (US #2) pencil was used to leave a significant lead pencil mark in most of the rectangle where the hole would have been punched. I remember using the DEC 029 code for a PDP 11/70 during the mid 1970s. When a punched card was read it was placed against a shiny metal backing plate & a bright light caused the plate to reflect the light through the holes to a light reader. The graphite in pencil lead was a good enough reflector of light which enable a pencil marked card to be read like punched card. – Fred Oct 01 '20 at 15:10
32

Uppercase text only needs six bits per character.

The fundamental mistake that you are making is assuming that punch codes were binary numbers. They were not.

The encodings were patterns, combinations of of zero, one, two, or three holes. This is a reference card in IBM 5081 format:

a reference punch card

The row numbering was somewhat odd, for historical reasons: 12, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Notice that the IBM 5081 here doesn't number rows 12 and 11.

A complete explanation would be complex and long, and probably obscure the point. But briefly:

  • Space was zero holes.
  • The 12 one-hole patterns, through all rows, represented a couple of punctuation characters and then the digits 0 to 9.
  • The remaining patterns were a combination of a range selection code, in the "zone" rows 12, 11, and 0, and an ordinal encoded in the "digit" rows 1 to 9. The "zone" rows effectively bank-switched the meanings of the "digit" rows.
  • The 27 major two-hole patterns all had exactly one hole in the "zone" rows and another hole in the 1 to 9 "digit" rows. 3 × 9 was enough for all uppercase letters of the English alphabet plus slash.
  • Things got complex with the further patterns, which were punctuation but whose exact meanings varied over the years, and from manufacturer to manufacturer. In these patterns, "digit" row 8 was always punched, and the rest of the pattern was one hole in the "digit" rows 1 to 7 combined with zero (making two-hole patterns) or exactly one (making three-hole patterns) hole in the "zone" rows. This made for 4 × 7 combinations.

In the IBM 5081 picture, the two one-hole patterns with holes in rows 12 and 11, representing two punctuation characters, are in fact there, but presented in the middle of the three-hole patterns. The two-hole pattern for slash, with holes in rows 0 and 1, is similarly presented out of place. This makes it less obvious that there are in fact one unassigned two-hole and three unassigned three-hole patterns with row 8 punched, here.

EBCDIC looks odd to eyes used to character codes in binary. It makes a lot more sense when viewed as punch codes. There is a direct correspondence between the upper nybble of the EBCDIC code and the "zone" row pattern, and between the lower nybble of the EBCDIC code and the "digit" row pattern. There are tables showing this in detail in all three further reading items.

Further reading

  • Douglas W. Jones. Punched Card Codes. The Punched Card Collection. University of Iowa.
  • John J. G. Savard. The Punched Card. quadibloc.com.
  • W. Wayne Black (1971). "Appendix 4: Punch Card Codes". An Introduction to On-line Computers. CRC Press. ISBN 9780677029306.
JdeBP
  • 2,288
  • 14
  • 24
  • Erm ... the slash is still a two hole pattern 10-1. Also, row 1 was never used for punctuation (at least not in the basic variants) . All visible on your example (given, the ordering is a bit missleading. – Raffzahn Sep 09 '20 at 09:18
  • Hah! Misreading slash is what I get for not putting a straightedge up against the monitor. But saying that row 1 was not used for punctuation and that the punctuation character slash was 10-1 in the same breath … didn't that strike you as self-contradictory when you were writing it? (-: – JdeBP Sep 09 '20 at 09:55
  • No, as the Slash is an oddity left over from character encoding. It was added before the punctuation came to exist. Putting it into the punctuation group is works only in hindsight. Structural it's a character - same way as & and - – Raffzahn Sep 09 '20 at 09:57
  • Incorrect. Slash at 0-1 was in fact not in IBM's 1935 patent, so could not have been a remnant of it. – JdeBP Sep 09 '20 at 10:11
  • Haven'tsaid that. the point is taht it belongs to the character encoding - it's an oddity among punctuation, thus rather proving the rule that row 1 is not used. – Raffzahn Sep 09 '20 at 10:12
  • IBM's 1935 patent dealing in "selection of a particular alphabetical type element under control of perforations in a single column of a record card" was for the character encoding, given in full in its description of figure 5. No slash. Slash is punctuation. – JdeBP Sep 09 '20 at 10:27
  • So what? thisisn't about patents but encoding, isn't it? And the slash is encoded like a character, using zone 10 and number 1 – Raffzahn Sep 09 '20 at 11:09
31

Although you have many correct answers describing the nature of the coding used in punched cards, no one has touched on the mechanical properties of the cards. Regular users of punched cards in the past would be familiar with this issue, as getting cards through the mechanics of a fast card reader regularly and repeatedly was a major issue at the time.

If a card used all the holes in a vertical column (used to represent a character) then it would be much weakened and flimsy. It would not handle like a card and would very likely shred and thus jam the card reader. The design of the pattern of holes took issues like this into account, using just enough holes to convey information, but not so many holes as to remove any stiffness properties of the card.

Cards lost their stiffness due to environmental issues, such as humidity, dampness and so on. This also caused them to jam up readers. Card reader jams was a regular daily occurrence in the day.

That is a very strong reason why not all 12 x 80 holes were used in the coding.

However it was possible to do this, and some IBM machines had this capability, and created what were known as lace cards. More details are shown in Wikipedia.

Lace Card image from Wikipedia

  • 2
    Next to all punches could do this. There was always a way to punch every hole without moving. Not to mention hand punches. – Raffzahn Sep 09 '20 at 11:11
  • 2
    You don't have to DUP very many of those lace cards on an 029 keypunch before you'll be wanting a visit from the repair man. (Don't ask me how I know!) – Solomon Slow Sep 09 '20 at 12:23
  • @Raffzahn agreed. This was how we made confetti back in the day. – RBarryYoung Sep 09 '20 at 15:12
  • 1
    This is how all of us old programming geezers knew all about "chad" long before the Bush/Gore election. – davidbak Sep 09 '20 at 15:20
  • 6
    @RBarryYoung - If you visit a whole bunch of 026/029 machines in the "computer lab" of your local public university, and empty their chad holders into a big sack you have with you, and then you take your sack to school, pick open the (top-row) locker of a friend, and fill it from bottom to top with the contents of the sack (you'll need a cardboard dam with you) then you'll discover later in the day, from your friend, that the square corners on the chad make it damn hard to get the stuff out of your hair, plus, unfortunately, it can hurt when it gets in your eyes. (Don't ask me how I know!) – davidbak Sep 09 '20 at 15:23
  • 1
    @Raffzahn that's not the trick. The trick is to keep the card from self-destructing when you do that. When you get that many holes in it, it really, really likes to tear and jam. (Don't ask me how I know!) – Harper - Reinstate Monica Sep 09 '20 at 15:34
  • 1
    @davidbak Yep, been there, done that. My best friends father was a regional manager for IBM back in the 60's and we used to get all of our confetti for parades, pep rallys, etc. from them. – RBarryYoung Sep 09 '20 at 15:35
  • 3
    @SolomonSlow The Soviet GOST punch card encoding was horizontal, allowing up to 120 bytes per card. Lace cards with vertical lines denoting byte boundaries (known as "readers") were used to help reading the contents of GOST cards. I remember a handwritten note next to a duplicator: "DO NOT DUPLICATE READERS!" – Leo B. Sep 09 '20 at 18:01
  • 1
    @LeoB., perhaps elaborate a little bit more for the lay people? – akostadinov Sep 09 '20 at 19:21
  • 1
    @Harper-ReinstateMonica The very first mainframe program I did was still on punch cards - not to mention that I still operate punches at least once a month - hand punch more often. It isn't easy to tear them, at least when operating like it's designed (Ask me if you need some hints:)) – Raffzahn Sep 09 '20 at 19:22
  • 1
    @LeoB. Would it be possible to link the relevant (GOST) standards, German or English version preferred :) It's great to find more usages. – Raffzahn Sep 09 '20 at 19:27
  • 1
    @Raffzahn maybe modern cardstock is better. Back then you were under pressure to use IBM brand so you got what IBM considered good enough. A homemade lace card was definitely too flimsy to trust. – Harper - Reinstate Monica Sep 09 '20 at 19:50
  • 1
    @Harper-ReinstateMonica I guess that needs the definition for 'back than'. my experience comes from early to mid 70s equipment, when I used them first (with non IBM brand cards) and 1940s to 1970s machinery as I use today, again with (almost exclusive) non-IBM-brand cards, as that's the stock we have left. – Raffzahn Sep 09 '20 at 19:57
  • 1
    @akostadinov Some info is here; the automatic English translation is funny, though. – Leo B. Sep 09 '20 at 21:57
  • 1
    @Raffzahn The character encoding was the slightly truncated (without the 6x row) GOST 10859 with odd parity, so that 0 was represented as ▮▯▯▯▯▯▯▯, and the unfilled trailing zero bytes were ignored. I could not find an authoritative description of the format. – Leo B. Sep 10 '20 at 00:09
  • @LeoB.: the automatic translation into Spanish is undoubtly funnier, the letters B, D, and E on the heading of the code table got translated into SI, RE and MI. – ninjalj Sep 11 '20 at 21:44
  • If you were working with a lot of data, there was a file compression utility that would punch it out in binary (lace) format. I remember the frowns on data center personnel when I'd punch out a deck in binary. The noise level was high, and the implied extra repair calls weren't welcome. Generally best to avoid binary. – Martin Ewing Oct 05 '20 at 18:21
5

The code punched into a 12-row card is not a binary code, but actually a form of extended decimal coding. Rows 0-9 are used to directly encode decimal digits, while letters and symbols are encoded as one decimal row plus one "zone row" which could be the A, B or 0 rows.

Within the IBM 1401 series, this was re-encoded as an extended-BCD code in six bits. Two of the bits record the zone row used (if any), while the other four encode the decimal rows. This encoding propagated to the tape format.

Each machine word on the 1401 had two additional bits for a total of eight; a "word mark" bit which was used to delimit instructions and data, and a parity bit for error detection. These bits could not be encoded on a punch card.

Chromatix
  • 16,791
  • 1
  • 49
  • 69
4

More historical folklore...

As late as 1978, I worked on META-4 systems at Digital Scientific Corporation that still supported punched card readers, and even had to write microcode to allow the systems to be bootstrap loaded from a single punched card. These systems emulated IBM 1130 and 1800 computers, which were typically booted from a single punched card.

The card only contained 12 "bits" per column, but during the special "initial program load" (IPL) or "boot load", the 12 bits were mapped into the 16-bit "words" of the main memory at addresses 0-79, then the computer began executing the code starting at address 0. This code then read the "boot sector" - sector 0 of the primary disk drive, which contained the next sequence to load the operating system. The mapping of the 12 bits to 16 bits was pretty cute, as the instructions that could be used had to only use those "bits" that were mapped and had to have zero value bits for the 4 instruction bits not provided on the card.

Kim Crosser
  • 141
  • 2
2

For completeness, here is an example of a punch card in the row-order byte-based Soviet GOST encoding.

     ,--------------------------------------------------------------------------------.
  12 |  X     X   XXXX X   XX  X  X  XXX    X   XX   X  X X X X   XXXXX X   X XX   XXX|
  11 |X X XXX XX  X XX X   X XX   XXXX X      X X XXX X XX X XX   XXXX X    XX X  X  X|
  10 |  X XX  X XX    XX  X   X   XXXXX X XXX  X  X X   X  X XXX   XXXX   XXXX  XX  X |
   1 |X X XX X  X  X XX   XXXXXX   X    X      X  XX  X XX  XXX   XXXXX XXXXXXX X XXX |
   2 |XX     X    XXX X   XXXXX   XXXXX XX    X X XXX XX   XXX  XX  X   X  X X X  XX  |
   3 |X   XXXX  XX   X  X  X XX   XXXX X  X X XX    X   X  X X X  X  XX XX X XX   XXXX|
   4 |XX  X XXX X XX XXX    X XX  X     X X X X XX  XXX   XXXX  X      X  X  XX   XXXX|
   5 | X    XX X  X  XXX     X  X  X XX   XXXXX X   X XX   X  X X XXX  X   X XX XXXXXX|
   6 |X   XXXX X   XX  X  X  XXX    X X   XXXX X       X  X  X  X XX    X  X X    XXX |
   7 |                                                                                |
   8 |                                                                                |
   9 |                                                                                |
     '--------------------------------------------------------------------------------'            

It contains the text "A QUICK BROWN FOX JUMPS OVER THE LAZY DOG. PORTEZ CE VIEUX WHISKY AU JUGE BLOND QUI FUME."

Leo B.
  • 19,082
  • 5
  • 49
  • 141
2

It's been mentioned already that cards were also used to hold binary data (and programs), specifically, using 2-columns representing 3-bytes. This must have been relatively modern, as prior to the introduction of System-360, IBM's mainframes (IBM 7090 et al) used 6-bit characters, packed 6 to a 36-bit word. I used a (then very old) IBM 7094-II back in the early 1970s and I remember encountering boxes of cards holding binary data. I don't know whether the data was directly encoded (3x12 columns - 36-bits) or whether the data was encoded to ensure some anti-holes were present, ensuring the structural integrity of the card.

0

One or two things not mentioned on this trip down memory lane.

Columns 73-80 were used for sequence numbers on fortran (and probably other) source code cards because the IBM card readers on the 7090 series computers did not read those columns; they read the card as 24 36 bit words.

We did use binary cards on CDC 3600 computers to store compiled programs and I remember patching such decks sometimes actually replacing a hole with a chad from the scrap bin in the key punch

I saw but did not use cards with circular holes; I think these were used on early Sperry Univac systems.

Mike D
  • 117
  • 1
  • 1
    It's not a trip down memory lane, though. It's a specific question about what the "other six bits" were used for. – JdeBP Oct 01 '20 at 23:03
-2

When I worked with keypunches and card readers, the most common issue I faced was a jammed keypunch. You could often clear a keypunch with a card saw, but sometimes one of the punches stuck down, and then you needed a service call to IBM for a repair.

One of the things I most dreaded was dropping someone's card deck. It happened to all of us, and there was really nothing to do but collect and rubber band the cards and attach a note of apology. Most user's didn't mark or number the cards, so it wasn't easy to correctly order the deck.

  • 1
    Welcome to Retrocomputing! While this may be an interesting commentary on punch cards, it doesn't answer the question, which is about the extra bits in each column. – DrSheldon Oct 01 '20 at 20:04