Did many programs really store years as two characters (Y2K bug)?

Question

The claim that programs stored dates as two ASCII or similar characters because computers were limited in resources seems wrong to me because it takes more memory than one 8-bit integer would. Also in certain cases it's also slower.

Storing the year as one 8-bit signed integer would give (assuming 1900 + year) a range of 1772 (1900 - 128) to 2027 (1900 + 127). An unsigned 8-bit integer gives 1900 to 2155 (1900 + 255). In either case the end result is a much wider range and half the storage and RAM requirements.

In addition to perform arithmetic on the years the software would need to convert the two characters back to an integer whereas if it were stored as an integer it could be loaded without conversion. This is where the slower in certain cases comes in. In addition it could make sorting by year slower.

In my view the only reason I can think of for why someone would store a year as two characters is bad programming or using a text based storage format (such as something like XML or JSON which I know those are more contemporary in comparison to programs that had the Y2K bug). Arguably you can say that choosing a text-based storage format is an example of bad programming because it's not a good choice for a very resource limited computer.

How many programs stored years as two characters and why?

Comments are not for extended discussion; this conversation has been moved to chat. — Chenmunka, Jun 22 '20 at 16:59
I believe it would have been either two EBCDIC characters or more likely two BCD digits. — dave, Dec 07 '21 at 18:34
It occurs to me that I've never once, in a long programming career, ever stored a date as text (or BCD). Always it's a count of some time unit since some base date-time. (I did once use seconds-over-the-hill, where the base date was my 40th birthday - this gave a better range in 32 bits unsigned than the usual 1970-01-01) — dave, Dec 23 '21 at 02:28

Raffzahn · Accepted Answer · 2021-12-07T22:55:06.790

103

Short Answer: BCD rules over a single byte integer.

The claim that programs stored dates as two ASCII or similar characters because computers were limited in resources seems wrong to me

The point wasn't about using ASCII or 'similar', using only two decimal digits. That can be two characters (not necessary ASCII) or two BCD digits in a single byte.

because it takes more memory than one 8-bit integer would.

Two BCD digits fit quite nicely in 8 bits - in fact, that's the very reason a byte is made of 8 bit.

Also in certain cases it's also slower.

Not really. In fact, on quite some machines using a single byte integer will be considerable slower than using a word - or as well BCD.

In addition to perform arithmetic on the years the software would need to convert the two characters back to an integer whereas if it were stored as an integer it could be loaded without conversion.

That is only true and needed if the CPU in question can not handle BCD native.

This is where the slower in certain cases comes in. In addition it could make sorting by year slower.

Why? Sorting is done usually not by year, but as a full date - which in BCD is 3 Byte.

In my view the only reason I can think of for why someone would store a year as two characters is bad programming

Are you intend to say everyone during the first 40 years of IT, up to the year 2000 were idiots?

or using a text based storage format

Now you're coming close. Ever wondered why your terminal emulation defaults to 80 characters? Exactly, it's the size of a punch card. And punch cards don't store bytes or binary information but characters. One column one digit. Storage evolved from there.

And storage on mainframes was always a rare resource - or how much room do you think one can give to data when the job is to

handle 20k transactions per hour

on a machine with a

0.9 MIPS CPU,
1.5 megabyte of RAM, some
2 GiB of disk storage?

That was all present to

serve 300-500 concurrent users

Yes, that is what a mainframe in 1980 was.

Load always outgrew increased capabilities. And believe me, shaving off a byte from every date to use only YYMMDD instead of YYYYMMDD was a considerable (25%) gain.

(such as something like XML or JSON which I know those are more contemporary in comparison to programs that had the Y2K bug).

Noone would have ever thought of such bloaty formats back then.

Arguably you can say that choosing a text-based storage format is an example of bad programming because it's not a good choice for a very resource limited computer.

Been there, done that, and it is. Storing a date in BCD results in overall higher performance.

optimal storage (3 byte per date)
No constant conversion from and to binary needed
Conversion to readable (mostly EBCDIC, not ASCII) is a single machine instruction.
Calculations can be done without converting and filling by using BCD instructions.

How many programs stored years as two characters and why?

A vast majority of mainframe programs did. Programs that are part of incredibly huge applications in worldwide networks. Chances are close to 100% that any financial transaction you do today still is done at some point by a /370ish mainframe. And these are as well the ones that not only come from punch card age and tight memory situations, but also handle BCD as native data types.

And another story from Grandpa's vault:

For one rather large mainframe application we solved the Y2K problem without extending records, but by extending the BCD range. So the next year after 99 (for 1999 became A0 (or 2000). This worked as the decade is the topmost digit. Of course, all I/O functions had to be adjusted, but that was a lesser job. Changing data storage formats would have been a gigantic task with endless chances of bugs. Also, any date conversion would have meant to stop the live system maybe for days (there were billions of records to convert) -- something not possible for a mission critical system that needs to run 24/7.

We also added a virtual conversation layer, but that did only kick in for a small number of data records for a short time during roll over.

In the end we still had to stop short before midnight MEZ and restart a minute later, as management decided that this would be a good measure to avoid roll over problems. Well, their decision. And as usual a completely useless one, as the system did run multiple time zones (almost around the world, Wladiwostock to French Guiana), so it passed multiple roll over points that night.

edited Dec 07 '21 at 22:55

answered Jun 20 '20 at 03:30

Raffzahn

222,541
22
631
918

1

@Mr.ChemQuestion With today's oversimplified language it's easy to forget what's possible with rich data types :)) Just wait, you'll get some point soon. – Raffzahn Jun 20 '20 at 03:45
2

w.r.t. "BCD native" arithmetic - there were many programs at the time - business processing applications, sometimes just report generation, sometimes weekly/monthly accounting runs - where it was faster to do the little arithmetic there was to do in BCD even without native BCD arithmetic than to convert to binary integer - large integer, remember, dealing with 6+ digits $$$ plus 2 digit cents - do the addition/subtraction - then convert back. Divides (even divide by 10) were very slow on most machines. Multiply slow too. Multi-precision made everything slower. Base conversion: Expensive! – davidbak Jun 20 '20 at 04:16
1

Re the point about economizing RAM, consider the wide range of options for "packed storage" that were built into languages like COBOL and PL/1, compared with the minimalist approach of C. – alephzero Jun 20 '20 at 10:38
With regards to storage: the customer paid rent for the machine, its storage (including what we would call RAM.) So more memory needed could mean entering completely different rent rates - expensive. – Stefan Skoglund Jun 20 '20 at 14:00
30

Another mini-anecdote: in the mid-'90s I was working on IBM mainframe systems, and added a new file to the system. IIRC, each record was something like 20-30 bytes, and included one date field; I took the hit and included the century. A couple of years later, colleagues were doing Y2K analysis, and asked me to estimate how much effort it would take to convert that file — they were amazed when I told them it was already Y2K-compliant! Even with only 5 years to go, most of my colleagues weren't planning for the end of the century… – gidds Jun 20 '20 at 17:34
10

"by extending the BCD range" Should we expect a 2060 "millennium" bug, then? :-D – Luca Citi Jun 20 '20 at 22:38
13

@LucaCiti Exactly 2059 would have been the last year were it worked. In fact, 2049 would have been the last, as X'F0' would have triggered another mechanism ... did I mention that space was premium when the system was conceived in the late 70s? THen again, it was finally retired in 2018, so it ended with X'B8', barely touching the X'Cx' range for some dates in the future (contract end dateor alike). – Raffzahn Jun 20 '20 at 22:47
7

I'd just add that BCD 2-digit year is still in use in some chips. Obviously now it implies year 20XX rather than 19XX. A recent example I've dealt with is BQ32002 RTC (but it also had a single century bit, to be fair), and I worked with a few more before, including a GPS module. – Alice Jun 21 '20 at 18:56
3

As someone who has had to write code to send data to these back-end financial systems, I can confirm that BCD, 80-character-wide fixed-width files, EBCDIC, and other such "outdated" formats are alive and well in 2020. I can't recall running into any which still only used two-digit years off the top of my head, but I wouldn't be surprised if they existed. – Bobson Jun 22 '20 at 14:07
@Raffzahn Now I'm curious what the X'F0' trick may be... – Luca Citi Jun 23 '20 at 00:00
1

@LucaCiti Nothing special. And no trick, just a format trigger. Call it of data morphology or object detection ot whatever. It was simply a sign for the format manager preparing an output, that the address he got for this field already contains unpacked data. So when ordered to display a date (6 digits as DD.MM.YY) or date / time (10 digits as DD.MM.YY HH:MM) was not given as BCD (covering 3/5 bytes), but as EBCDIC with 6/10 bytes. After all, X'F0'..X'F9' are the numeric codepoints in EBCDIC :)) – Raffzahn Jun 23 '20 at 00:46
2

@Raffzahn Thanks for your reply. By the way, I think the extended BCD range was a very clever solution to the y2k problem! – Luca Citi Jun 23 '20 at 00:51
3

@LucaCiti It was the usual lunch result ... we had a habit to use lunch time as generic conference call for everything from planing the next version to exchange of vacation stories. It extended more than once way past our cantinas opening time. But results were always useful. Of course recorded on napkins :)) – Raffzahn Jun 23 '20 at 01:32
1

Have you considered writing a book, your knowledge needs to be immortalised for the next generation of nerds. – Neil Meyer Jun 24 '20 at 13:32
@NeilMeyer You're pulling my leg, aren't you? What I do is 97.5% common place supported by some Google-Fu. All the information is out there, ready at your finger tip. (It shows that I do not know answers to the real cool questions ). The rest is nostalgia of an old fart, way too long in the business. Not enough story to fill even a pocket booklet. – Raffzahn Jun 24 '20 at 13:52
Write the book from grandpa's vault and fill it with it anecdotes. I can read that all day. You seem to have lived a more interesting life than half the people with biographies. – Neil Meyer Jun 24 '20 at 14:05
@NeilMeyer Naah, there isrn't much story. it's an ordinary life, and I'm in no need to be 'famous' like these other people :)) And I got way more unfinished projects than done ones ... for example I always wanted to do stuff on the VCS. Got several projects, but what good is any game without a score and I never understood how music works. And all musicians I asked didn't get the limitations of a VCS ... so here you go... – Raffzahn Jun 24 '20 at 14:34
@Raffzahn: It's been too long since I've touched the VCS. Or the NES for that matter. Both interesting platforms with unharvested potential. – supercat Apr 14 '21 at 21:25

mannaggia · Answer 2 · 2020-06-21T02:33:28.173

58

I programmed in Cobol for nearly 20 years at the start of my career and it was quite common to see two digit years used. Initially this was due to storage concerns - not necessarily only memory, but disk storage also.

Cobol has no intrinsic date field.

It was very common to store dates in Cobol like:

01 EMP-HIRE-DATE.
     03 EMP-HIRE-DATE-YY    PIC 99.
     03 EMP-HIRE-DATE-MM  PIC 99.
     03 EMP-HIRE-DATE-DD   PIC 99.

If it’s 1970, you may not think your program is going to be used for another 30 years... it was more important to save those two bytes. Multiplied out by multiple fields and millions of rows, it all adds up when storage is a premium.

As 2000 started approaching, people had to start dealing with this. I worked for a company that had software that dealt with contracts and the end dates started stretching into 2000’s in the mid-80’s. As a rookie, I got to deal with fixing that.

There were two ways. First, if it was feasible, I added

03 EMP-HIRE-DATE-CC  PIC 99.

To every date field. It ended up looking like:

01 EMP-HIRE-DATE.
     03 EMP-HIRE-DATE-YR.
          05 EMP-HIRE-DATE-CC.   PIC 99.
          05 EMP-HIRE-DATE-YY    PIC 99.
     03 EMP-HIRE-DATE-MM  PIC 99.
     03 EMP-HIRE-DATE-DD   PIC 99.

But in Cobol, you can’t just change that if it is a record description of a file... Cobol has fixed length records, so if you insert two bytes in the middle, you have to create a new file, read all the records in the old format, move them to the new record format, and write to the new file. Then “swap” the file back to its original name. If you are dealing with huge datasets you needed lots of disk space and time to do all that. And also rebuild all the programs.

When it wasn’t feasible, we added code when printing or displaying dates.

IF EMP-HIRE-DATE-YY > 50
   MOVE 19 TO PRINT-HIRE-DATE-CC
ELSE
   MOVE 20 TO PRINT-HIRE-DATE-CC.

This bought another 50 years...

edited Jun 21 '20 at 02:33

answered Jun 20 '20 at 17:46

mannaggia

3,264
2
16
15

4

Exactly as it was. – Raffzahn Jun 20 '20 at 18:38
4

The system I'm working on now was designed in the late nineties, around the same everyone else was busy fixing Y2K. The original designers were downright fanatical about versioning every single record, no matter how trivial or how unlikely to change. Guess where that came from... – Michael Graf Jun 20 '20 at 20:30
10

Maybe worth noting that COBOL uses a base-10 format. You chose how many decimal digits you want. That seems incredible, but it's an old language, well before 8-bit bytes. In PIC 99 the 9's are place-holders and PIC means picture. It means "the picture is 2 digits, no decimal point". There's no PIC 255 -- other options are PIC 999 and PIC 9999. – Owen Reynolds Jun 20 '20 at 20:47
1

@OwenReynolds PIC S9(2) USAGE IS BINARY should give you a byte variable, shouldn't it? – Michael Graf Jun 21 '20 at 00:02
6

USAGE BINARY aka USAGE COMP aka USAGE COMPUTATIONAL could be used to take less storage but in my experience were generally not used in file record descriptions, mostly Working Storage variables. The storage of COMP varies by implementation I believe, based on word size, endianness, etc. – mannaggia Jun 21 '20 at 02:45
1

@mannaggia - when I was a COBOL programmer on the Wang VS100 and subsequent models, disk storage was so expensive that just about anything that could be COMP was COMP in our shop. Actually, reading all this about COBOL has made me realise how much I miss it, in a funny, nostalgic way. – Spratty Jun 23 '20 at 11:26
@Spratty Yep, and we may have too, at least dollar amounts. But even saving the 1 byte from the PIC 99 COMP from the century was worth it! The younger folk don’t realize how frugal we had to be... – mannaggia Jun 23 '20 at 13:52
2

@mannaggia Frugal? Coding sheets! Coding sheets so the code could be sanity-checked before we wasted disk space and processor cycles! I'll stop there - I'm aware I'm in danger of becoming one of the Four Yorkshiremen... – Spratty Jun 23 '20 at 14:16

score 27 · Answer 3 · answered Jun 20 '20 at 18:10

To add to Raffzahn's answer, here's an example from a 1958 book on data processing:

… The names of authors are coded in columns 11-14. The first four consonants of the name form the code. The last two digits of the year of the reference are punched directly into columns 15-16. The journal, or other source for the reference, is coded in columns 17-20. …

(from Punched Cards Their Applications To Science And Industry. 2nd ed. Edited by Robert S. Casey, James W. Perry, Madeline M. Berry and Allen Kent. New York, Reinhold Publishing Corp; 1958; emphasis mine. The card image shows 52, assumed to mean 1952, in the noted columns)

This book barely touches on computer applications, since unit record usage was more prevalent at the time. Computers inherited these tabular data formats, and the physical inertia of the data format — cardboard boxes containing millions of records — meant that tabular years stuck around for a very long time. The book does have a couple of mentions of dealing with centuries, but the typical outlook can be summarized by the quotation from p.326 “The century is neglected inasmuch as most entries for this particular file are in the 20th century”. Would you spend/waste 2½% of your available fields in a record just to encode something you (probably) won't use?

Data, especially government data, sticks around for a long time. Consider this: the last person to receive a US Civil War pension died in May 2020. The US Civil War ended in 1865. Tabulating equipment didn't come in until the 1880s. Bytes didn't happen until 1956. JSON was this century. So data formats tend to be what the past gives you.

Good perspective about longlivety of data formats. – Raffzahn Jun 20 '20 at 18:15 — Raffzahn, Jun 20 '20 at 18:15

score 15 · Answer 4 · answered Jun 20 '20 at 14:45

15

There are actually still RTC chips on the market which store the year only as a pair of BCD digits, and do not have a century field. These are usually packed into one byte when read, in common with the fields for day-of-month, month-of-year, hour, minute, and second. The same format is used for the alarm settings.

Old software using these chips would simply prefix the year digits with 19 for display. Current systems using them would prefix 20. The problem rears its ugly head whenever the system is used in a century that it wasn't designed for - so potentially also in the year 2100.

If an RTC chip existed that counted in binary instead of BCD, then it could handle years in two and a half centuries without trouble. An old machine adding 1900 to the stored year would not see any rollover trouble until after the year 2155. This would however require the host system to convert binary to decimal for display - not a problem for a PC, a minicomputer, or even an 8-bit microcomputer, but potentially a burden for extremely lightweight embedded systems.

answered Jun 20 '20 at 14:45

Chromatix

16,791
1
49
69

Don't forget that many systems would go about this with an assumption of useful range not matching a century. One system I know goes by a simple 'compile date + 30 years' rule. So when compiled 2020 all values of 50 and below will get 100 added, with a result still being an offset to 1900. As a result the range of 51..99 will be mapped to 1951..1999, while 00..50 reslts in 2000..2050. All based on the assumption that the software won't survive 30 years (after the last release) and future dates are far ahead. Of curse other applications use other, better fitting strategies. – Raffzahn Jun 20 '20 at 16:26
1

I was surprised to find that most standalone RTC chips only support two-digit BCD years, seemingly to maintain a level of compatibility with the IBM PC AT's MC146818. Ranging and century assumptions will cause additional problems with leap years, too. – scruss Jun 20 '20 at 17:09
2

There is a rollover issue with RTC. The Atari 520 ST and 1040 ST use a 6301 RTC chip. The chip will rollover from year "00" to year "01" assuming year "00" is a leap year. This works for 2000, but not for 1900 or 2100. The Atari ST BIOS just adds or subtracts 80 from the year when converting from/to the RTC format to a 16 bit integer format, where the year field is year minus 1980 (same as PC-DOS). I wrote a program to support years 1980 to 2079 with the RTC. – rcgldr Jun 20 '20 at 19:38
3

@scruss: Even the RTC hardware built into some 32-bit ARM chips such as the STM 32L series uses BCD for everything. What irks me most about such things is that there are very few applications where having a hardware clock in YYMMDD format would offer any advantage whatsoever compared with having a linear counter, despite the fact that the hardware is almost certainly more expensive. – supercat Jun 20 '20 at 22:08
2

@supercat The hardware (but really software - firmware or microcode?) is marginally more expensive, but it means that retrieval is much simpler, as you don't have to decode a number of days (what's the start date? leap years?) into YYMMDD, which is what you typically want. Unless you want Unix time...then the Y2038 problem. – manassehkatz-Moving 2 Codidact Jun 21 '20 at 04:58
@manassehkatz-Moving2Codidact: So if when the phone is powered on the RTC says March 1, 30 minutes after midnight, and the last time the phone had been powered on was in September, what time and date should it show? – supercat Jun 21 '20 at 05:02
1

@supercat It should show whatever the RTC says. If the RTC was working correctly (I'm assuming it gets updated while the phone is off, just like in a typical computer) then that should be the correct date/time. – manassehkatz-Moving 2 Codidact Jun 21 '20 at 05:21
@manassehkatz-Moving2Codidact If it's a phone, it'll get the actual date when connectingto the network anyway ... an if not connecting, the date might be the least issue :) – Raffzahn Jun 21 '20 at 05:30
1

@Raffzahn I actually thought of that. In most modern cases, that's likely to be the case. But for example with typical wired (even if cordless, but wired base station) phones, date/time is stored in RTC and only updated from the network as a side-effect of Caller ID - no incoming calls, no date/time update. – manassehkatz-Moving 2 Codidact Jun 21 '20 at 05:45
1

@manassehkatz-Moving2Codidact: In places that observe daylight saving time, the correct time would be February 28 or 29, depending upon the year, at 11:30pm. When working with a Unix-style linear time, subtracting an hour for time change is simple. But with a BCD date, it's a lot harder. – supercat Jun 21 '20 at 07:16
1

@supercat One can apply any local correction factor on a BCD date the very same way as on a time stamp. Done that, been there - with a system that had to enter, plan and utput dated for worldwide use with adjustment for either phase depending on viewpoint - a UK manager needs to see dates in another frame than an IN based engineer working in NP supervised by from SG. And don't even get me going on the added complexions by service level and its times as well as hollidays and work hours of either party involved :) Nonetheless, it worked quite great using BCD date/time fields. – Raffzahn Jun 21 '20 at 08:09
1

@Raffzahn: Is advancing BCD date-time fields forward and backward significantly easier than converting to linear time and back? – supercat Jun 21 '20 at 14:57
@supercat Incrementing in BCD is simple enough that there are 74-series logic chips that can do it. You just need to reset the digit to zero and emit a carry, instead of stepping to the next higher value, when the 8 and 1 bits are set. By comparison, efficient linear time conversion requires the ability to multiply and divide. – Chromatix Jun 21 '20 at 15:16
1

@Chromatix: Sure, if one stores each digit separately, and ignores the effort required to deal with variable-length months, etc. But how is working with twelve BCD digits, some of which have to be limit-checked in pairs, easier than working with a 32-bit quantity? – supercat Jun 21 '20 at 16:28
@supercat It depends what sort of processing you're doing with it. Implementing the RTC hardware itself is quite simple, as evidenced by the fact that you can run an RTC chip off a button cell for several years without it losing time. The two BCD counters making up a pair of digits are simply mapped to a single byte when it comes to reading and writing them, and simple gate networks are sufficient to perform limit checking even on digit pairs. – Chromatix Jun 21 '20 at 17:51
2

@supercat If you presuppose that your computer has a sophisticated date/time processing library, then it probably does make sense to implement a straight binary counter and let the host CPU turn it into something human readable. However, most devices containing RTC hardware have extremely limited processing capability. The discrete RTC chips are most likely direct derivatives of the hardware that also goes into watches, clock radios, and suchlike. Having the RTC work in human-readable format directly is then more convenient. – Chromatix Jun 21 '20 at 17:56
Correction to my prior comment, the 6301 RTC assumes 00 is a leap year. It advances year 00 month 02 day 28 to year 00 month 02 day 29, then year 00 month 02 day 29 to year 00 month 03 day 01. This works for year 2000, but not 1900 or 2100. – rcgldr Jun 21 '20 at 19:06
@Chromatix: If one wants to do automatic daylight saving time compensation, or do almost anything with time other than display it as-is, one will need a way of adding an interval to a date, subtracting an interval from a date, and subtracting two dates to yield an interval. If one exploits a few tricks, converting a linear date to a YYYYMMDDhhmmss date is easy and efficient. Use a routine that divides a two-byte number whose upper byte is at a global and whose lower byte is targeted by a pointer, by a byte which is at least as big as the upper byte, placing the quotient at the pointer and... – supercat Jun 21 '20 at 19:13
@Chromatix: ...the remainder in that global. To e.g. divide a four-byte value by 60, zero the global, and then invoke that routine on each byte in sequence. The global will be left holding the residue. To convert a time to a date, use divmod32by8 to divide out 60, 60, and 24. One can then process the remaining linear date using 16-bit math. Then zero the year counter and, while the value is at least 1461, add 4 to the year and subtract 1461 from the counter. Then add two to the year if the value exceeds 730 (subtract 730, and one if it exceeds 365 (subtract 365). – supercat Jun 21 '20 at 19:17
@Chromatix: Ever since I started writing date-linearizing code, I've used March 1, 2000 as my epoch; if one treats March 1 as new-year day, one can always treat February as having 29 days since the year will advance after day 365 of most years. Once one has day from March 1, subtract out as many months as will fit, incrementing the month counter, and if the month is 13 or 14, subtract 12 and bump the year. Not really much harder than what would be done to increment a date. – supercat Jun 21 '20 at 19:20
1

@supercat Great! Now implement that on a 4-bit microcontroller with 256 bytes of ROM and a microamp power budget. Do you now see the problem? – Chromatix Jun 21 '20 at 19:36
1

@Chromatix: The primary reason for a specialized RTC chip or peripheral is to for situations where the main CPU is fabricated with a process that is optimized for speed instead of quiescent current. An RTC which is on a separate die, or is on the same die but can be kept powered while everything else is powered down, can use much less current when the system is idle than would be possible if the main CPU had to wake up once per second to bump a counter. If one is using a slow-speed low-quiescent-current CPU, however, there would be no need to have a dedicated hardware RTC. – supercat Jun 21 '20 at 21:38
1

@Chromatix: If a project never has to do anything with an RTC value other than display it, and in particular never needs to do any sort of arithmetic with times, then BCD RTC may offer some slight advantage over a linear counter. The moment a program has to do anything else with the value, however, any advantages that the hardware-parsed date would have had evaporate. Suppose, for example, one is supposed to have a clock automatically set the day of the week when given month+day+year. Really easy when using linear time. How would you figure out the weekday for a YYMMDD date? – supercat Jun 21 '20 at 21:41
2

@Chromatix: Incidentally, a similar issue arises with zero-terminated strings. There are some use cases where they offer some advantages over alternative formats, but unless strings are of trivial length, or the only thing one will do with a string is process the characters thereof sequentially, tracking the length as a number will be better than trying to use the location of the first zero. – supercat Jun 21 '20 at 22:24
1

@Chromatix: BTW, how would one efficiently go about incrementing a YYMMDD-format date on a 4-bit micro? Even on a 4-bit micro, the logic to convert a number from up to 0x3:0xB into a decimal number up to 0x5:0x9 would seem easier than the comparison logic necessary to handle months whose lengths may have an upper nybble that is either 0x2 or 0x3. – supercat Jun 21 '20 at 22:33
@supercat The only month that has a day count below 30 is February, which already needs special handling due to leap years. The incrementing is done within the RTC hardware anyway, the point is that the values don't need to be manipulated just to display them. – Chromatix Jun 22 '20 at 13:41
1

@Chromatix: If one wants to do anything with values other than display them, code will be required to handle advancement across different lengths of month. How would you go about applying the end of daylight saving time if the system is powered on between 12:00am and 1:00am? And how would you compute a weekday from a YYMMDD-format date? – supercat Jun 22 '20 at 19:20
@supercat DST - I've seen that butchered so often with modern systems that I'm almost surprised when it is done right on any system - old or new. For example, I have received numerous calendar invites via Outlook that have a time listed "mm/dd/yyyy hh:mm STANDARD TIME" during the summer and almost always the sender has no clue and intends the same time in DST - e.g., if they say 10:00AM STANDARD they really mean 10:00AM DST. And that's on a modern system with plenty of CPU cycles and RAM to handle anything! Plus can't expect a small RTC to get DST right anyway because the formula – manassehkatz-Moving 2 Codidact Jun 22 '20 at 22:25
and start/end dates change over time in different locations. As far as weekday from YYMMDD - that's easy, just make your assumption (depending on the use case) as to the century and then day-of-week is trivial. – manassehkatz-Moving 2 Codidact Jun 22 '20 at 22:26
@manassehkatz-Moving2Codidact: The simplest way I know of to handle DST correctly is to convert dates/times to linear UTC, and then identify within IO routines whether a particular date/time combo is within the daylight saving time interval. Making this work retrospectively requires having a list of DST-rule changes, but YYMMDDhhmmss dates/times are not a convenient format for such purposes. – supercat Jun 23 '20 at 17:09
I've done this using YYMMDDhhmmss dates/times - build a table for the local time zone and see where the times hit. Not that big a deal. The big deal is usually figuring out what the data is supposed to be - and sometimes that means receiving data during standard time and not knowing until the next DST change whether the sending system will adjust for DST itself or not. I've seen it all... – manassehkatz-Moving 2 Codidact Jun 23 '20 at 17:22
Given a Unix-style time, divide by 3600, subtract the number of hours between the Epoch and 2:00am local time of the first Sunday in March in the epoch. Then divide by 24 (note the residue), get the residue mod 1461, and the residue mod 365 to get the Sunday-relative yearday. If greater than 180 and the hour residue was 23, add 1 to the yearday. Depending upon the exact daylight savings rules, check for a yearday within a certain range and you're done. It's not even necessary to have a month-length table. – supercat Jun 23 '20 at 21:48
@supercat This still requires a routine to perform division with remainder by a 12-bit divisor, which is not trivial. On a full-fledged PC you can do whatever you like, but on a severely cost and power limited device, working in BCD is easier overall. – Chromatix Jun 23 '20 at 23:09
@Chromatix: Subtract 1461 until the result goes negative. Such a loop would require 25 iterations per 100 years. Then add 365 until it goes positive (a max of four iterations). – supercat Jun 23 '20 at 23:15
@supercat That still takes code space and active power. I think maybe you don't understand embedded development very well. – Chromatix Jun 24 '20 at 11:56

score 9 · Answer 5 · answered Jun 20 '20 at 22:25

Another point that hasn't yet been mentioned is that many programs would store data in whatever format it was keyed in. If data was stored as six digits YYMMDD, that's what data-entry clerks would be expected to type. For a program to store data as YYYYMMDD, it would be necessary to either have clerks enter an extra two digits per record, or have software compute an extra two digits based upon the two digits of the year the person typed. Keeping information as typed, but using modular arithmetic so that 01-99=02 wouldn't necessarily be any more difficult than having to add extra logic at the data entry stage.

Space on punch cards was limited to 80 characters. Short cuts like a two digit year number could be important for keeping a complete record on one card. — Patricia Shanahan, Jun 22 '20 at 07:48

score 9 · Answer 6 · answered Jun 21 '20 at 18:23

Additional examples:

In banking, many computations have historically used binary-coded decimal (BCD) as a compromise between space, speed, and consistent rounding rules. Floating point was uncommon, slow, and difficult to round off consistently. So even things like interest rates might be stored in fixed-point BCD. Something like an interest rate might be stored as four BCD digits, in two byes, with an implicit decimal point after the first digit. Thus, your account could pay interest rates from 0.000% to 9.999%, which seemed perfectly sufficient, because a bank would never pay 10% or more. This thinking left 1970s programmers scrambling to update data structures when high inflation and new types of accounts like Certificates of Deposit crossed that threshold.
When gas prices first cross the $1/gallon threshold, many of the pumps could not be set with the actual value. Though that was largely a mechanical limitation rather than what we would consider a computing problem today, the thinking that led to the situation was largely the same: reduce costs by not bothering to enable values that were unimaginably large at design time.
[Apocryphal] When the first 9-1-1 emergency dispatch service was put into use in Manhattan, legend says its database schema hadn't anticipated five-digit apartment numbers, leading to delays of paramedics reaching sick or injured high-rise dwellers.
In the 1990s, phone numbers in North America Number Plan were all of the form (AAA) BBB-CCCC, and the second A was always a 0 or 1, and the first B could not be a 1. (There were other restrictions, too, but they aren't relevant here.) By exploiting the well-known restrictions, the contact management software on your PC could represent a phone number with a 32-bit value, saving disc space and RAM (at the run time cost of a little bit swizzling to encode and decode the values). But the soaring popularity of fax machines, pagers, modems, and cell phones drove demand for phone numbers until North America ran out of area codes, causing software publishers to scramble to update their software to use less elegant space-saving tricks.
It's not always about memory savings or binary-coded decimal. Sometimes it's about space on a form or a display. In high-end software for stock trading, display space is at a premium. When it first seemed the DOW Jones Industrials Average could potentially break $10,000.00, there was concern about whether these displays would show the additional digit, and whether there was room for a thousands separator. If the values were truncated or displayed in other confusing ways, would it slow down the traders who relied on these system? Could it trigger a DOW10K panic sell off if it confused them into thinking the value had plummeted?

The phone number part is quite interesting - and complete new to me. Do you remember any software doing so? — Raffzahn, Jun 21 '20 at 19:33
@Raffzahn: QuickBooks for DOS. But I'm sure there were others as well. — Adrian McCarthy, Jun 21 '20 at 21:09
The North American phone number rules were well known (in North America...) and commonly used either for compression (as noted) or as a classic example of data validation/input field formatting. — manassehkatz-Moving 2 Codidact, Jun 21 '20 at 23:51
Another example from banking: I knew someone who had to work with an electronic funds transfer program that limited transactions to 999 999 999.99 of the monetary unit involved. A reasonable limit, because there just aren't very many billionaires, right? Well, not in USD, EUR, or GBP. But someone had to wire a large amount of money to Turkey, before its 2005 currency reform, when the exchange rate was around a million Turkish lira to a dollar. — dan04, Jun 26 '20 at 00:14

score 9 · Answer 7 · answered Jun 23 '20 at 06:08

Microsoft Word did

I distinctly remember looking into the binary .doc format used by Microsoft Word 5.0 for DOS, because documents saved when the year was after 1999 couldn't be opened again. It turned out that the internal meta-data included the date using two ASCII digits for the year and 2000 was blindly stored as 100, thus overwriting one byte of an adjacent meta-data field, breaking the document.

Artelius · Answer 8 · 2020-06-21T13:49:38.390

The mainframes have been covered. Let's look at what led up to the PC which of course is where a lot of business software evolved.

Many PC programmers had previously been programmers for 8-bit systems, and had grown up on the "every cycle counts" mentality. Many programs for MS-DOS had ports for older 8-bit systems, too, so would have often followed the same internal formats.

Most 8-bit CPUs (and also the 16-bit CPU used in IBM PCs) had specific BCD instructions, which operated on one pair of decimal digits.
Most 8-bit CPUs did not have multipy or divide (or mod) instructions at all! To convert from binary to base 10 and back was a lot of work and involved loops or tables. OK, you could use two BCD bytes for a 4-digit number, but if you wanted to, e.g. subtract one date from another it would have required maybe five times as many assembly instructions/cycles.
In terms of RAM, having each date take an extra byte or two is one thing—having 5-10 extra instructions to deal with it was not worth it when you have 64KB of RAM or less. Consider that even by just typing the number "86" the program would have to add 1900 to it.
Assembly language instructions don't just take up space—assembly language programming is hard! Every instruction took up a line of screen space, had potential for a bug, and consumed a bit of the programmer's headspace. (Everyone was confident at working with BCD; it has now fallen by the wayside because of high-level languages.)

While we're on the topic of screen space, many programs didn't even display a leading "19" on the screen. (Hello—40 character screen width!) So there was hardly any point at all for the program to keep track of that internally.

score 8 · Answer 9 · answered Jun 21 '20 at 15:04

8

Back in olden times (1960s and earlier), when data came on Hollerith cards (80 columns, 12 rows), using 2 extra columns (of the 80 available) to store the fixed string "19" as part of a date seemed useless.

Having more than 80 columns of data meant that one had to use multiple cards, and lost several columns to the overhead needed to say "I'm with him", plus the extra programming effort to validate the pairing.

Hardly anybody thought their code would still be in use when "19" changed to "20".

Anybody who stores fixed field size dates as "yyyymmdd" has a Y10K problem pending.

answered Jun 21 '20 at 15:04

waltinator

347
1
4

12

It really needs to be emphasized that to everyone born prior to ~1995, 2000 wasn't just "the year after 1999", it was literally "THE YEAR TWO THOUSAND"... a futuristic utopia where we had colonies on other planets, flying cars, android servants, and a Star Trek economy. By the 1990s, it was obvious none of them were happening anytime soon, which made Y2K seem even FURTHER in the future. The fact that we had no agreed-upon demonym for "the first decade of the 21st century" made it even harder to talk about... The Fifties? Sixties? Seventies? Eighties? Nineties? Then... um... EVENT HORIZON! – Bitbang3r Jun 21 '20 at 16:23
I still want my flying car from Back to the Future 2, the Future day was October 21, 2015. – Richard Crossley Jun 22 '20 at 09:25
2

Quite often the comment returned by the coder would be that will be in 20/30 years' time - I don't expect that they will still be using this program – cup Jun 24 '20 at 05:25
4

@Bitbang3r: I (born in 1982) can attest to that. During my childhood, almost every prediction about the future, whether optimistic (flying cars, men on Mars, cure for cancer) or pessimistic (destruction of the rainforests, World War 3) was “by the year 2000”. It was The Great Deadline for everything. Then 2000 actually came, and went, and life went on as usual. – dan04 Jun 25 '20 at 23:47

score 8 · Answer 10 · answered Jun 22 '20 at 09:09

I worked with IBM minicomputers (S/36, S/38, and AS/400) in the 1980s and 1990s. The very ugly programming language RPG was popular. It looked very different from Cobol but had similar abilities. It supported only two data types: fixed length alphanumeric and fixed precision decimal numeric. The numeric variables could be zoned which meant one digit per byte or packed which was two digits per byte. Storage of dates varied quite a lot but the most common was as 6 digit numbers rather than separate year, month, and day. Usually, but not always, these would be YYMMDD in the database so that the sorting would be sensible.

Numeric fields were always signed so the last half of the last byte was reserved for the sign (hex F for + and hex D for -). So, a 6 digit number required 4 bytes rather than 3 and the first half of the first byte was unused. This allowed a trick when year 2000 became an issue: the unused first half of the first byte could be used for a century flag: 0 for 19xx and 1 for 20xx. The format is usually described as CYYMMDD. Systems which use this format will have a 2100 or 2900 problem but the culprits won't be around to see it. Actually, many systems will have a problem sooner as 6 digit format is still popular for display and entry. Arbitrary cut off dates, e.g. 40, are used to guess the century when 6 digits are entered. So, 010195 is assumed to be 1995 but 010115 is assumed to be 2015. So, a problem is due in next couple of decades for some but it won't be as bad as the year 2000 since at least the database can cope beyond the limit.

On BCD and performance, BCD is not necessarily much or any slower since machines intended for business use usually have hardware support. Also, conversion to and from human readable format is simpler. Many business programs will display and accept entry of data far more often than perform arithmetic on it so even if there was an arithmetic performance impact it would typically be outweighed by the I/O gains.

Not relevant to dates but BCD is useful for money. Try adding 0.01 up 100 times in a float or double and then comparing it to 1.00. BCD gets this right but float and double don't. Accountants get very upset if the balance sheet is off even by a cent.

All of my programming experience had been Perl with PostgreSQL on Linux then VB Script/C# on Windows with SQL Server and Oracle. About a year ago I started reading data from DB2 on an iSeries and wondered where CYYMMDD came from. This is enormously helpful! — Wildcat Matt, Jun 22 '20 at 14:31
So, you have always had a date data type in the database. This date nonsense is from an era when there was no such thing. Not only did we worry about a few bytes but we also lacked many tools and concepts. Most people did not even feel the lack of a date data type. Dates looked like numbers so they were numbers. DB2 on iSeries now has a date data type but a lot of the software comes from the earlier era. — badjohn, Jun 22 '20 at 14:36

score 7 · Answer 11 · edited Sep 27 '22 at 22:27

7

The NORAD Two Line Element format remains in wide use for cataloging satellite orbits. Originally intended for punched cards, it uses a two character year. Years <57 are defined to be in the 21st century. It may be supplanted soon, though. The immediate problem is not years, but the number of trackable objects in orbit.

edited Sep 27 '22 at 22:27

Community

1

answered Jun 20 '20 at 21:03

John Doty

2,344
6
12

telms · Answer 12 · 2020-06-22T08:21:29.363

You may not remember the enormous market share that IBM occupied in the business computing space in the 20th century. I don't mean just the mainframe market where S/360, S/370, S/390 & their successors play, but also the much larger (in terms of units) midrange/small business market occupied by the S/3, S/32, S/34, S/36, S/38, AS/400, and their successors. IBM's midrange systems were ubiquitous in smaller businesses and in branch offices of mega-corporations around the world, and at their peak they brought in $14 billion annually to IBM.

To be successful in this target market, these systems had to be far more affordable than a mainframe. And since "inexpensive" disk storage cost around $10,000 per GB in 1990, disk capacities were small by modern standards. You might only have 13 MB -- yes, MEGABYTES -- of total disk capacity to work with online on a S/32. That's to serve a corporate operation with reveneues of $250 million and up (that's IBM's definition of "small business") and running accounting, billing, payroll, inventory control, manufacturing production control, and similar applications, which are all about DOLLARS and DATES.

And these were all EBCDIC machines.

So yeah, absolutely, EVERYONE used 2-digit years on these systems prior to Y2K.

Fortunately, disks got cheaper in the late 1990s, and thousands of contract programmers put out Y2K consulting shingles at about the same time. In the 3 years from 1997 to 1999, most of the mess got cleaned up. I'm sure that here and there, a dental office had billing with dates in the wrong century, or a mortgage amortization schedule showed negative dollars owed after the turn of the millenium, but the truly bad-news applications that could hurt customers on a massive scale got fixed before zero hour on December 31, 1999, 23:59:59 UTC.

And Bitbang3r is absolutely correct about THE YEAR TWO THOUSAND being THE FUTURE. It didn't feel like a real year that might actually arrive in our lifetimes until 1996 or 1997. — telms, Jun 22 '20 at 08:35

score 6 · Answer 13 · answered Jun 21 '20 at 11:32

Even systems that stored the year as a byte often just printed "19" in front of the value. So the year 2000 (stored as 100) would be printed as 19100 or 1910 or 1900 depending on exactly how the number was converted from a byte to printable characters.

Of course a lot of systems didn't even both printing the 19, they just printed 100/01/01 or 00/01/01. Similarly date entry systems often only accepted two digits making it impossible to enter years beyond 1999.

eckes · Answer 14 · 2020-06-23T17:11:43.367

Many fixed record text formats (with or without COBOL copybook definitions) used two digit years, and some do still in this year. So it does not only affect storage format (like in databases) but also interchange formats. Those formats deliberately did not use binary representations but did try to be compact.

This happened way into the 90ies, where everybody knew about the upcoming millennial change.

Funny side note, many systems used the year 2020 as cutoff, which results in a increased problem rate this year, again.

Even in standardized EDI Formats like X.12 or EDIFACT you see the 4 digits years optional. The Date or Time Period Format Code 2379 for example starts with code "2" being DDMMYY. (And back then I was a really bad idea to allow more then one unique format for each instant type). One of the major drivers for switching EDIFACT syntax version from 3 to 4 in 1998 was the 8 digit date stamps in UNB service segment.

Another example would be the VDA fixed record messages used in the (German) automotive sector, things like VDA 4912 from 1991 specify delivery date in 6 digits (or even shorter with week numbers). They have been in used way past 2000.

Another issue with 2019 or 2020 is that some programs may expect people to type dates without pressing enter (or, for phone-response systems, the pound key) afterword; for years prior to 2019, one could assume that if the first two digits typed were "19" or "20", they were the hundreds part of 20th-century or 21st-century years. Once there was no need to accommodate twentieth-century years, one could drop the special treatment of "19". — supercat, Jun 23 '20 at 21:40

score 1 · Answer 15 · edited Dec 07 '21 at 11:40

Y2K issue was principially caused by the need to save disk space/storage in relation to dates - as mentioned already storing a date of DD/MM/YY requires less space that that of a date of DD/MM/YYYY - i.e. you would be saving a two bytes for every date on every record - as mentioned when disc space was limited/expensive this was the norm.

Prior to Y2K activity (late 90s) I had to work on a 1980 date issue, because our systems that were in use would remove the decade part of the year and then pack the date i.e. the date was stored as DDMMY. When this was saved as packed decimal this would then only require three bytes of storage. When ever dates needed to be printed then date was reformatted and a constant of a 7 added for the decade (i.e. date saved as S31057 - printed as 310577).

Mid 1979 we started getting statement due dates appearing with a year of 1970 instead of 1980 due to the above, which needed remediating on the same basis as some Y2K fixes i.e. checking single part of the year date to be greater than five force a 7 if less than or equal to 5 force a 8. This ensures a couple of years extra use. Glad to say I left company before 1985 so don't know what materialised then.

Since that time I have met older developers who say they has same issue with dated oving from the 1960s to 1970s, although I guess not as many.

score 0 · Answer 16 · answered Dec 22 '21 at 20:35

To add another data point, there are systems in development today that do not use 4 digit years.

One such system uses a 7 bit number to store the year, with the assumption that 0 is the year 2000. This is done to save space during transmission. This will theoretically work till the year 2127, but I really don't know what will happen at year 2100.

Did many programs really store years as two characters (Y2K bug)?

16 Answers16

Microsoft Word did