Why do old computers perform a long memory test on every boot?

Question

Basically any computers from the mid 90s and earlier perform a slow memory check on every single boot. The more memory there is present, the slower that process becomes, for example: https://www.youtube.com/watch?v=A3Po8zneaLE

Why are they doing that? Modern computers, as far as I am aware of, only check their memory when explicitly told to. What exactly are retro computers doing during that check that more modern computers seem to not do and why?

Modern computers now have to overwrite all memory on boot to avoid cold boot security attacks. — user71659, Oct 06 '18 at 16:10
And modern computers have to perform a memory check to enumerate available memory — tofro, Oct 06 '18 at 20:21
Not all old computers "boot" per se. Booting involves loading an operating system off floppy disk, hard disk or some other external storage. Many computers from the '80s and earlier simply power up and run code on ROM. — Jim MacKenzie, Oct 08 '18 at 14:49
@JimMacKenzie I would speculate that the OP has meant IBM PC XT / AT by saying "old computers". Indeed, you are right that quite a number of home computers did boot from ROM and did not perform any memory tests upon cold start. However, XT / AT were notorious for those lengthy RAM tests, I've seen such tests a number of times myself when I was a teenager. Later on, BIOS introduced an option to bypass the RAM test, but earlier models kind of "forced" it if my memory serves me well. — DmytroL, Aug 30 '21 at 07:58
@DmytroL On any such machine I've ever run, hitting "escape" lets you abort the RAM check. — Jim MacKenzie, Sep 06 '21 at 01:42

score 32 · Accepted Answer · edited Oct 06 '18 at 19:36

32

Why are they doing that?

The most important reason is that IBM introduced that check as part of the BIOS startup code, so everyone copied it to be compatible.

The PC did differ from many other machines of the same era in that it did a thorough test of all components installed at power up to make sure the configuration was operable. Something carried over from mainframes or similar professional systems. Other machines just initialized components and let the user guess what the problem was when an error occurred.

Modern computers, as far as I am aware of, only check their memory when explicitly told to.

RAM got more reliable over the years. Equally important, RAM size increased manyfold, making a thorough memory test anything but quick. Last but not least, memory design for PCs did split in the (late) 90s between consumer PC with error detection (like the first PC) and professional machines with error correction (ECC). Where consumer grade machines just let the process/OS die on the user, professional systems will not only correct starting RAM failure, but also report it which (hopefully) leads to preemptive RAM change.

What exactly are retro computers doing during that check that more modern computers seem to not do

Various bit patterns are written to RAM and read again to detect cell failure or certain kinds of crossover. The test is split into two parts: base RAM (first 16/64 KiB, *1,2) and memory above 64 KiB. On AT (286+) class machines, a third (faster) test may be used for memory above 1 MiB (*3), together with an additional test in protected mode and even more diverging POST codes.

Conventional memory (up to 1 MiB ,*4) is checked in 4 KiB blocks (*5) and reported as such. The BIOS halts if there is an error in the first 16 KiB (original PC) or first 64 KiB (XT and above).

The bit pattern used (*6) for the first 64 KiB is AA, 55, 00, FF, 01, 02, 04, 08, 10, 20, 40 and 80. They are written (and read) in a way to not only detect single bit failures, but also address and data line mismatch/failure.

For the remaining memory it is shortened to AA, 55, FF, 00 and 01.

Here is a nice explanation of basic bit walking and increment tests similar to what the PC does/did and what it will show.

and why?

To alarm the user of an imminent RAM problem before it occurs so that they don't lose hours of work due to a flipped bit.

*1 - 16 KiB on the first series of 5150 PCs (64 Kib Motherboard), 64KiB on the later (256 KiB motherboard and XT)

*2 - On the XT there is a separate BIOS POST code for the first 32 KiB.

*3 - The beep codes do not distinguish between above 64 KiB and above 1 MiB.

*4 - Well, in reality on the early PCs only until 544 KiB. Later PCs would go until 640 KiB.

*5 - Looks like a hint as if they expected 4 KiB chips to be used - at least during early development stage - or that test was copied from some other device using them.

*6 - Caveat: Bit patterns are taken from an old man's memory. To verify, browsing the BIOS would be helpful.

edited Oct 06 '18 at 19:36

isanae

1,009
2
6
10

answered Oct 06 '18 at 13:23

Raffzahn

222,541
22
631
918

2

Wasn't it also in some cases to check total amount of memory? A lot of retro computers didn't have any place to save that information until next start. – UncleBod Oct 06 '18 at 13:36
2

@UncleBod Other computers did, but not for the (original) PC, as it's memory size was set by switches. One switch (group) noted what banks are filled and another group the amount of RAM inserted. The BIOS was ment to obey these setings, not search for themself. – Raffzahn Oct 06 '18 at 13:39
1

@AndreasHartmann Maybe check this additional page: http://www.esacademy.com/en/library/technical-articles-and-documents/miscellaneous/software-based-memory-testing.html – Raffzahn Oct 06 '18 at 15:17
One of my old computers corrupted the google chrome installation every boot, and letting the build in bios ram test run (instead of pressing the skip key) suggested a ram error that was solved by repluggin the ram sticks – Ferrybig Oct 06 '18 at 16:10
@Ferrybig Ermm ... if there's chrome running, it might not realy be an old computer in the sense of RC, may it? – Raffzahn Oct 06 '18 at 16:18
@Raffzahn since "computers from mid 90s and earlier" to me mean so much more than PC I was not only referring to the IBM PC and clones. My Sincliar QL did a memory check each startup, IIRC . I don't think it only was to check memory for corruption. – UncleBod Oct 06 '18 at 16:24
@UncleBod I guess it's worth to keep in mind that there is a difference between a (somewhat) thruout memory check as like a PC or Mac (or professional workstations) did and a quick memory sweep most others did. Not exactly comparable, is it? – Raffzahn Oct 06 '18 at 16:39
Adding fuel to the fire... on a modern PC, performing naive "write bytes, read them back" tests of RAM might not even directly touch the ram that's ostensibly being tested AT ALL... modern PCs have so much primary & secondary cache, a strategy that fails to explicitly take cache-management into account is potentially doing nothing more than wasting the user's time by repeatedly exercising the cache while ignoring the underlying RAM. Even IF the values eventually got written to "real" RAM, it might not happen until long AFTER you seemingly read the bytes back & decided they matched. – Bitbang3r Oct 06 '18 at 18:02
Regarding the point on alerting the user to RAM problems before they lose hours of work, I'm pretty sure IBM used error-correcting RAM, and certainly error-detecting RAM, in the early PCs. RAM errors would cause the system to freeze (preventing the problem from becoming worse). And even if they didn't, you'd have to have pretty seriously bad luck to lose hours of work due to a single flipped bit on a system with a few dozen kilobytes of RAM to store whatever you were working on (in addition to the resident parts of the operating system plus the software you're using to do the work). – user Oct 06 '18 at 19:38
@MichaelKjörling The PC's RAM had a parrity bit for error detection, no correction. And no (default) way to recover from a memory error. When one occures, a NMI is issued. MS-DOS got no NMI handler and the BIOS just displays "PARRITY ERROR 1" when it's on mainboard RAM or "2" when it's on an expansion card - and of course only as long as NMI generation is enabled (Port A0). In fact, to make it worse, there were tools to disable parrity check. People used it to work with unreliable setups insetad of buying new RAM. – Raffzahn Oct 06 '18 at 19:59
@MichaelKjörling Also, I wouldn't call 640 KiB a few dozend - beside, the amount doesn't matter, if a bit fliped in your text processors code, you may loose quite a lot even on a 64 KiB machine - after all it's not just about the RAM content. I got no idea how long you're into computers, my experiance goes back into the 70s, and loosing a days work due some machine failure, RAM included, wasn't exactly unheared of. There is a reason memory checking - and parity was introduced. – Raffzahn Oct 06 '18 at 20:02
@Raffzahn How many IBM 5150s had 640 KiB RAM installed? – user Oct 06 '18 at 20:43
@MichaelKjörling Well, I say many, but even without using expansion cards, the majority of sold PC, they did come with up to 256 KiB on mainboard - thats already a bit more than 21 dozend KiB. The XT was sold with 128 KiB minimum in 1983 and 256 KiB minimun in 1985, expandable to 640 on the mainboard. Keep in mind, while DOS 1 could boot with just 32 KiB (64 for any DOS above 2.0), useful work did also need some RAM for program and data. While IBM did sell the PC with 64 to 256 KiB, not many bought one with less than 256 KiB - the ones who did upgraded ASAP :)) – Raffzahn Oct 06 '18 at 21:16
1

Introducing IBM as the "inventor of the memory check" sounds a bit odd to me - Basically all decent computers I happen to know before the IBM PC check their memory as well - After all, this is not a PC-specific question. – tofro Nov 28 '18 at 14:56
@tofro Well, to me it sounds rather PC-ish. Oh, and I #ike the restriction of 'decent' you make, as thare are quite a lot not doing any memory test, many not even checking installed size (or existance at all). – Raffzahn Nov 28 '18 at 20:16

Kaz · Answer 2 · 2021-08-29T09:37:45.597

In the era of the original IBM PC (early 1980s), home computers would often use many chips of RAM (eight or more) to provide the system's memory. These would either be soldered directly to the motherboard, or fitted in individual sockets. (The inline memory modules or SIMMs/DIMMs we see today, with several RAM chips soldered to a removable board, came years later.)

Memory chips can fail in a variety of different ways. For example, they might always output some fixed value, fail to retain or refresh stored data, write or read data to/from the wrong location, or something else entirely. Some errors will stop the operating system from booting correctly, others may only show up later when running your software (and potentially corrupting your important data!)

To avoid this happening, the IBM PC's BIOS runs a series of read and write tests on its memory during the Power ON Self-Test (POST), before handing over to the operating system. If an error is detected, a message is displayed on-screen which a technician can use to determine the faulty chip. (In a fully expanded IBM PC, there'd be 36 AM9016 memory chips; finding a faulty chip by trial-and-error would be time consuming.)

As mentioned in the question, the more RAM fitted to a machine, the longer it takes to test all memory locations in that RAM. Because nobody enjoys waiting for their computer to boot, the option to skip the extended memory test was included. Improved manufacturing techniques meant fewer RAM chip errors, and it was often the case that a detected RAM fault was caused by a RAM chip that had become loose in its socket, a phenomenon known as "chip creep". Frustration with this situation led to the introduction of SIMMs, which were held in position more reliably, and also saved space on the motherboard.

Because memory tests were becoming slower, and faults were becoming rarer, manufacturers changed the default to not running a memory test during POST. (A faster boot-time would be a marketing advantage.) The BIOS option is still available on modern machines, usually by disabling an option named "fast boot" or similar.

I wonder why the PC's memory test did all of the testing for each 16K section before advancing to the text? I would think it would have been both faster and more effective to write an odd-length pattern to all of memory, disable RAM refresh for a little while while avoiding accesses to half the rows, allow a refresh of all RAM, disable RAM refresh for a little while while avoiding access to the other half of the rows, and then verify that everything was stored correctly, and then repeat with another pattern. — supercat, Jan 11 '22 at 19:39
Code which processes 16KB sections separately wouldn't detect situations where erroneous configuration would cause a chunk of memory to be mapped to two different addresses, but if one fills all of memory with e.g. a 53-byte pattern, then every 16KB section of RAM would end up with a different pattern in it, so any chunk of RAM that gets double-mapped would report an error when the first image is read back. — supercat, Jan 11 '22 at 19:43

Why do old computers perform a long memory test on every boot?

2 Answers2

Linked

Related