9

I know they remove certain sounds, like extreme highs and lows, but how does this really work?

  • What information is decided to be thrown out? Is it just done by frequency and typical human hearing range?

  • Are other types of compression used, like redundancy removal? How do they work?

  • When the resulted file is produced, how are the sound waves encoded digitally? Is there a constant shape of the wave that's being recorded very precisely, or some other way of representing the sound digitally?

Gordon Gustafson
  • 705
  • 5
  • 13

2 Answers2

14

MPEG-1 Layer 3 is complex stuff. I recommend starting your reading here: https://web.archive.org/web/20140221055027/http://www.oreilly.com/catalog/mp3/chapter/ch02.html

Another resource that I found helpful to get started with is at: http://www.mp3-tech.org/

In a nutshell, MP3 encoding works by taking a frame (say, 576 samples, which is the smallest frame size for standard MP3) and making a spectral model of it. Think of a spectrum analyzer showing you the current levels at various bands, only with a lot of bands.

How it does this depends on the types of sound in that frame. There are several psychoacoustic models used. For example, masking can occur as louder sounds end up being much more prevalent than quieter ones. Our ears tend to do this anyway, which is why MP3 often gets away with it if you aren't carefully listening.

Frames with transients are set to the lowest number (576) of samples, and frames without them can be over a thousand samples. (Less precision where it isn't needed.)

As far as which frequencies are thrown out, yes a wide bandpass filter is applied to the sound before encoding. Beyond that, the higher the bitrate generally, the better the frequency response. You can see some graphs here: http://www.mp3-tech.org/tests/pm/MP3-128k.htm Note that the author of those pages was using program material, not test samples. That's why the bass looks so high, but you can definitely see the high-end falloff.

The artifact in MP3 that bugs my ears the most is pre-echo, or transient smear. Basically, since the sample chunk is 576 at its lowest, the sound of attacks on cymbals and such is all sorts of messed up. Even at 320kbit, the smallest frame is 576 samples. To me, it's what really makes the difference between the original and the encoded version. Most of the rest of the encoding loss, I can live with, at least at the higher bit rates.

Also, many MP3 files are encoded in joint stereo. In this mode, most of the bands are in mono, and a handful are left for the left and right channels, where necessary. There are two ways this is done... intensity stereo, and mid/side. You can read more here: http://en.wikipedia.org/wiki/Joint_(audio_engineering)

I hope this is helpful. Please ask more questions if anything is unclear. Also, I am no expert on this... I've just been researching it for awhile. Please suggest corrections if I am wrong about anything.

Brad
  • 2,588
  • 14
  • 29
  • 2
    Don't forget about the lossless compression done at the end! –  Apr 05 '11 at 03:42
  • FYI the O'Reilly link is dead. – Dragomok Jun 21 '17 at 16:19
  • 1
    @Dragomok Fixed. https://web.archive.org/web/20140221055027/http://www.oreilly.com/catalog/mp3/chapter/ch02.html – Brad Jun 21 '17 at 16:44
  • @Brad That source is imprecise, or just plain incorrect about astronomy, the JPEG format, and maybe some other things. I'd be hesitant to trust what is said about MP3s, but it's still a useful starting point. – jpaugh Jun 21 '17 at 22:11
  • @jpaugh Sure, imprecise is probably the right word, but it seems to get the point across. If there's anything inaccurate or if you have a better source, please post it! – Brad Jun 21 '17 at 23:21
  • @Brad Atfter having read the whole chapter, I think it's useful for developing a mental model, but probably wrong about some of the specifics. (Actually, if something were more accurate, it might make it harder to develop a useful mental model, so overall, it's a pretty good beginner's source). I just wanted to leave a note for the unwary. – jpaugh Jun 22 '17 at 20:26
  • @jpaugh Please post corrections when you can, and if there's anything wrong in my answer, please comment. Thanks. – Brad Jun 22 '17 at 20:27
0

It throws out what we can't hear, to put it simply. To quote the Wikipedia article on MP3 "It uses psychoacoustic models to discard or reduce precision of components less audible to human hearing, and then records the remaining information in an efficient manner." http://en.wikipedia.org/wiki/Psychoacoustics