Audio normalization

Question

I have a recording in pcm format and I want to do some simple analysis.

I have some questions about what is normalization. As far I understand it is to get all the amplitudes between a range i.e. [-1, 1]

The obvious way to do that is:

max_amplitude = max(array_of_amplitudes)
for amplitude in array_of_amplitudes:
   amplitude = amplitude / max_amplitude

I read about RMS normalization. Can somebody explain how it is done?

Moreover could you please explain what is the benefit of normalization?

score 10 · Accepted Answer · answered Jul 11 '12 at 21:57

Your normalization code is incorrect. If the input signal has a big dip (say a negative value at -5.0), your code won't detect it, and you will still have values outside [-1, 1]. Use max(abs(array_of_amplitudes)) instead. Prior to normalization, it is also recommended to remove any DC offset the signal might have.

RMS normalization consists in computing the RMS (root-mean-square) level over short-term windows, taking the maximum of those values, and dividing the signal by the maximum. This won't guarantee that the result will lie within [-1, 1] - you will have to clip values outside of this. The benefit is that it is more robust to outliers in the signal. Let's say you have a relatively quiet recording, with just a short peak at 1.0 somewhere due to a soundcard driver glitch or a temporary "pop" on the microphone. Normalization won't affect the level of the signal (it is already normalized since the maximum is 1.0) ; while RMS normalization will still boost its level (and the "pop" will cause clipping).

Regarding applications:

In audio recording/reproduction, normalization is important because it ensures that the full dynamic range of the output converters is used. If you play a signal peaking at 0.25 through a 16-bit DAC you are not making use of the 2 upper bits of your converter (which will always be 0) and thus increase your quantization noise by 12dB.
In some audio classification tasks (such as emotion recognition; music genre classification; or even speech recognition), amplitude/loudness might be used as a feature. Thus, you really want all input files to be similarly "calibrated" in term of level.

+1. Other benefits are avoiding overflow (not too common with floating point, but can happen), and analysis (you know exactly how "strong" a normalized 0.8 is, while who knows how strong/weak an unnormalized 1082 is?). — Jim Clay, Jul 11 '12 at 23:16

Audio normalization

1 Answers1