24

Many retrocomputers from the late 80s and early 90s had audio DAC hardware support. This includes the Amiga, Classic Macintosh, and PC's with SoundBlaster add-on cards. Despite the fact that the MP3 codec is designed to perform well at pretty low bit rates (<128 Kbps), these retrocomputers can't handle MP3 playback.

Below is an excerpt from the AmigaAMP System Requirements.

Basically AmigaAMP runs fine on any stock Amiga with at least a 68030-50 MHz processor. However, if you want to be able to play MP3 without downsampling and quality loss you need at least a 68060-50. Be prepared to face heavy system load! :-) If you've got a PowerPC you can play MP3 streams in the background without noticeable system slowdown at all.

What is the critical bit of computational capability that is so lacking in the earlier machines that makes MP3 playback infeasible?

Leo B.
  • 19,082
  • 5
  • 49
  • 141
Brian H
  • 60,767
  • 20
  • 200
  • 362

4 Answers4

34

What's lacking is mostly raw number-crunching ability. The MP3 format is relatively lightweight, and can be implemented using only fixed-point math (no FPU required), but it still takes a fair amount of computation to turn compressed frequency-domain data into uncompressed time-domain data that can be fed to a DAC. This IEEE article quotes a minimum performance of 24 MIPS, and that is based on platform optimizations of the decoder.

Wikipedia says that the 68030 at 50 MHz performs at about 18 MIPS, while a 50 MHz 68060 would do 67 MIPS. Specific differences likely to be relevant are the much larger caches (8 kb versus 256 bytes), dual instruction pipelines permitting the CPU to execute two instructions at a time, and general optimization of integer math circuits.

Brian H
  • 60,767
  • 20
  • 200
  • 362
Mark
  • 8,556
  • 1
  • 40
  • 63
  • MP3 decoding appears to require at least a Huffman decoompression (not computationally expensive) and a Fourier transform. FFT is one of the most researched computer algorithms. Is it just that nobody ever optimized FFT for retro CPUs (hard to believe), or does really fast FFT need some sort of instruction retro CPU's are lacking? – Brian H Apr 14 '17 at 14:51
  • Some speed problems are easily solved by just throwing more memory at the problem for buffering. Retro computers seem to accomodate plenty of RAM to fully buffer uncompressed/lightly-comptressed audio lasting several minutes. – Brian H Apr 14 '17 at 14:53
  • 1
    @BrianH RAM is a cycle or two slower than cache - and is many, many orders of magnitude slower as cache often doesn't take one cycle. – wizzwizz4 Apr 14 '17 at 17:04
  • 8
    @BrianH - Back in the day, if you wanted to FFT on a small system, you added a special extra processor called a DSP -'Digital Signal Processor'. As an example of such a product, I worked the Neurotrac II. We used a DSP for all all the heavy math (various flavors of FFT and filters on realtime and stored data) because the 68020 we used for disk and other I/O couldn't begin to keep up with 8 channels of EEG. Our current product (the CNS) uses a PC with a low-spec CPU to do 32 channels twice as fast, plus a LOT of other features the NTII didn't have. – Michael Kohne Apr 14 '17 at 17:24
  • 10
    @BrianH it's not lack of work optimizing FFT, and it's not lack of specialized instructions, it's just lack of raw processing power. MP3 decoding is both more involved than what you describe and actually doesn't involve an inverse FFT (but there is an inverse MDCT, a subband filter bank, stereo decoding, and some other interesting stuff). – hobbs Apr 14 '17 at 17:33
  • 4
    My memory from PCs is that a fast 486 could just barely decode low-bitrate MP3, while a Pentium-class machine could do it more easily. Which is in line with this answer in terms of MIPS, as well as dual pipelines. – hobbs Apr 14 '17 at 17:39
  • @hobbs The Wikipedia article on MDCT seems pretty good. The computation would seem to benefit from an FPU with hardware support for transcendental math, like the 68881. I wonder if its actually the 68060's and the Pentium FPU's that make MP3 feasible. Also, you can use inverse FFT to perform inverse MDCT, according to thie Wikipedia article, at least. – Brian H Apr 14 '17 at 18:28
  • @Mark Is it possible to provide evidence to more precisely say "raw floating point number-crunching ability" is the key attribute? I am looking for something relating an audio bit-rate to MFLOPS performance needed for real-time decoding of it, but haven't found an answer. That would draw a more definitive line around roughly when general-purpose computers became capable MP3 decoding at common bit-rates like 128 Kbps. – Brian H Apr 14 '17 at 18:36
  • 4
    @BrianH, floating-point ability is irrelevant. MP3 is designed to be decoded using fixed-point (integer) arithmetic. – Mark Apr 14 '17 at 22:08
  • Do you know of any references or sample code online that proves MP3 can be decoded without resorting to FP math? – Brian H Apr 14 '17 at 22:49
  • @BrianH, libMAD is entirely integer/fixed-point. (Most of the others probably are too, but that's the first one I looked at.) – Mark Apr 14 '17 at 23:26
  • Ah, I see MAD relies exclusively on fixed-point for the non-integer computations. I have not yet found an LGPL fixed-point MP3 decoder. – Brian H Apr 15 '17 at 04:59
  • 1
    Could some or all of the decoding take place in non-real time? For example, let the computer spend 10 minutes to perform enough of the decoding of a 3 minute mp3 such that the rest of the decoding can be completed during playback. – Robert Columbia Apr 16 '17 at 19:04
  • 1
    @RobertColumbia, from a theoretical standpoint, certainly. The problem is that a 3-minute stereo mp3 will decompress to about 32 MB of audio data, and a computer that can't decode in realtime probably doesn't have the memory to do that much buffering, either. – Mark Apr 16 '17 at 19:55
  • 3
    This IEEE article quotes a requirement of 25 MIPS: http://ieeexplore.ieee.org/document/4564625/ – pjc50 Apr 21 '17 at 12:03
  • CPU at the time has really weak multipliers, which is why DSP is a thing . A DSP is basically a CPU with a fast multiplier. MIPS isn't a good metric because it's not taking into account how slow 68k's multiplier is. – user3528438 Nov 25 '18 at 19:49
  • Although MP3 can be decoded using integer types alone, there are very few actual integer implementations - you need to choose carefully if targeting hardware without floating-point instructions. (That's why Empeg has a closed-source decoder - no other suitable integer-only decoders for its ARM SA1100). – Toby Speight Nov 27 '18 at 14:39
  • Like hobbs, I remember my 486 DX4 100mhz could barely play MP3s (Mpxplay would hit close to 99% CPU usage for 128kbps file). According wikipedia it is 70 MIPS, which seems way more than 25 as quoted by IEEE. – tigrou Dec 11 '19 at 21:43
12

Mp3 is primarily a lossy compression format for audio. It must be decompressed, and the process needs a lot of CPU time (as for retro computers.)

Modern computers both have speeds good two orders of magnitude higher, and CPUs with specialized "multimedia" instructions that streamline operations like fast cosine transform. A trivial operation like "flip a byte backwards, bit-wise" on Amiga can take some 30 instructions. A modern PC will take maybe two.

The situation is somewhat similar for JPEG - I remember how my Amiga600 (7MHz) would take 20 minutes to display a JPEG that a PC (300MHz) would show in a blink of an eye. The CPU would suggest a speed-up to something of order of 20-30 seconds, but the multimedia instructions made all the difference.

SF.
  • 7,095
  • 7
  • 30
  • 58
  • For further comparison I remember something like less than 10s for an A1200 with 28MHz 68030/882. 20min does sound like a hard life! – nsandersen Apr 14 '17 at 09:42
  • I think you are talking about SIMD instructions, like I recall being marketed as "MMX" by Intel and "AltiVec" for PPC in the 90s. But i wonder what part of MP3 decoding needs SIMD instructions, and what type of instructions. – Brian H Apr 14 '17 at 15:02
  • Note that MMX came much later (1997) than the Pentium (1993). – user Apr 14 '17 at 18:02
  • 3
    @BrianH: Note early Pentiums struggled with mp3 too. – SF. Apr 14 '17 at 18:20
  • It looks like maybe it comes down to FPU performance. IIRC, the Pentium had a superior FPU to the 68060, and both were superior to the prior generation (486 and 040). – Brian H Apr 14 '17 at 18:30
  • 1
    I remember on my first Pentium system when mp3s were just becoming a thing, I had WinAmp set to play back my (128Kbps fixed bitrate) mp3s at 22khz @ 8bit instead of 44khz @ 16bit so that it would only use ~30% of the CPU and allow multitasking instead of using 80+% of the CPU to play back at full quality. – mnem Apr 14 '17 at 21:42
  • "flip a byte backwards" is the sort of thing you'd do with a lookup table on the Amiga; I think you could do that with one instruction: MOV.b (0, d0, a0), d1 but that single instruction (if reading from FAST memory) is 14 cycles on a 68000. – user3570736 Apr 18 '17 at 22:39
  • I would also guess that the incredibly slow multiply instructions on a 68000 (70 cycles!) also significantly hinder its ability to decode MP3. – user3570736 Apr 18 '17 at 22:50
  • @user3570736: If you wish to sacrifice enough memory for the look-up table. If you're operating on 8 bits, that's no big deal. If you reverse a 32-bit word, you'll likely want to shuffle the reversed bytes manually, meeting halfway between flipping the word mathematically and using a look-up table. – SF. Apr 18 '17 at 23:14
  • 2
    I remember somebody proposing a cheap and simple hardware solution to the "reverse the bits in a byte" problem: just connect a parallel input port and a parallel output port with a suitably wired cable! – alephzero Apr 19 '17 at 20:44
  • 2
    That is in the end building a specialized coprocessor ... out of 8 wires :) .... BTW, SSH logins can take M I N U T E S on 68k, 80486 and early SPARC machines.... – rackandboneman Apr 20 '17 at 13:36
  • @alephzero: On the CDP1802 processor, which is generally rather slow, one could double the speed of memory-move operations by adding a latch to the bus triggered off an I/O strobe, and a circuit that would hit the DMA-write line when that latch is written. The OUT instruction will hit I/O strobes while reading a byte of memory and incrementing the address. A DMA write request will put R0 on the address bus and increment it while issuing a write request. Use the latch to put the just-read value back on the bus and bingo--three cycles per byte copy instead of three two-cycle instructions. – supercat Jun 13 '17 at 22:33
  • @rackandboneman And for comparison modern CPU's can hardware-accellerate the ssh-encryption. Did some experiments - scp could saturate a gigabit ethernet connection almost ithout the encoding cpu noticing. – Thorbjørn Ravn Andersen May 11 '20 at 16:10
  • The need to take bytes apart is probably also the real reason why graphic LCDs (VS HD44780 style parts) on embedded devices were unusual before the proliferation of ARM - handling display fonts (row and column translations) in real time is trickier than one would think on 8051 or similar 8 bit controllers.... – rackandboneman May 13 '20 at 10:44
9

Like others already answered: old CPUs does not have enough MIPS, and/or they are lacking instructions that would helped in MP3 decoding algorithm (remember that MP3 is standardized in 1995.).

Nice example of 1992. computer that can play MP3 out-of-box is e.g. Atari Falcon030 from 1992. Despite slow main CPU, MC 68030 at 16 MHz, Atari Falcon also have Motorola 56001 DSP chip (16 MIPS at 32 MHz) and he can replay MP3 at 16 bit and 44.1 kHz, even in multitasking:

Two examples from Youtube:

Raffzahn
  • 222,541
  • 22
  • 631
  • 918
Milan Kovac
  • 121
  • 1
  • 1
  • 1
    Welcome to Retrocomputing Stack Exchange. Thanks for the answer; it'd be good if you used [link text](https://url.org) markup to add link text for those YouTube links. Check out the [tour] to learn how Stack Exchange works. – wizzwizz4 Nov 23 '18 at 19:22
  • The Falcon is a marvel; cf. https://youtu.be/WpwlZgQPCpk — yes, that's the base hardware. Other than a RAM upgrade. – Tommy Nov 24 '18 at 04:58
8

Early 68k processors implemented a multi-cycle multiply instruction (70 cycles per MULx), so the inverse MDCT would likely be the limiting factor in terms of raw CPU time.

hotpaw2
  • 8,183
  • 1
  • 19
  • 46
  • 1
    So to do the math, a 7MHz Amiga 500 would be able to do 100,000 multiplies a second (unpipelined, so unable to do anything else) which for a 44.1KHz stereo MP3 is barely over one MUL per sample per channel! I don't know if lookup tables or shifts and adds could be used instead but that sounds like a deal breaker to me. – user3570736 Jun 23 '17 at 06:46
  • 1
    It's not necessarily 70 cycles; for unsigned multiply it's 38 + 2n cycles, where n is the number of set bits in the multiplicand. So 70 is worst case (the 68000 performs only 16x16->32), and 38 is the best, with 54 being the average. So a 7Mhz 68000 would be able to do at least 100,000 multiplies a second. Assuming no wait states, though obviously they're not going to make a huge difference when so little of the cost is memory access. – Tommy Jul 06 '17 at 15:21
  • 2
    Whoops; forgot to cite my reference for that claim: http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/timstandard.HTML — provides 70 as the official figure but the footnote conveys that 70 is a maximum and gives the formula as quoted. – Tommy Jul 06 '17 at 17:16