Help implementing audio dynamic range compression

Question

I am trying to implement audio dynamic range compression in JavaScript (not using web audio API).

There is a lot of articles for sound technicians, and some high level documentation, but I couldn't find any helpful reference for actually implementing digital dynamic range compression.

From what I understand, there is at least 3 steps in calculating the rectified signal.

computing the input level
computing the gain to apply to the signal
applying the gain

I process the audio in blocks, so for 1) I was thinking of computing the RMS of one block

Any pointers to a good reference? Or anybody willing to explain me a bit the steps required to implement this?

Just to be sure: you want to implement something like this: http://www.waves.com/plugins/c1-compressor, right? — Deve, Aug 29 '13 at 12:55
yes :) but much much simpler! The controls I need would be threshold, knee, ratio, attack, release. But I can start with even simpler — sebpiq, Aug 29 '13 at 13:03

score 4 · Answer 1 · answered Aug 29 '13 at 14:12

4

Here are some suggestions:

There are plenty of opensource implementations (Sox, Audacity, etc). Even if you don't understand them, you might be able to translate the code from C to javascript.
I'm not aware of a good explanation of the process online, but there are plenty of books on the subject:
- Digital audio signal processing covers this topic and is well written. (As does DAFX, but DAFX is poorly organized and the coverage is less straightforward)
- Digital Audio with Java also covers this topic and comes with working Java code that should be easy to translate to other languages, such as javascript. This book has many many flaws, but it is good for someone with no audio programming experience.

The principle is to create an envelope of the signal (controlled by attack and release), shape that envelope using using some transfer function (controlled by ratio, threshold and knee), and then apply that result back to the original signal. A makeup gain stage often follows.

@Deve's answer suggests some possible transfer functions.

answered Aug 29 '13 at 14:12

Bjorn Roche

1,006
6
13

The thing is I don't know C, and yes that's a big obstacle when working with audio dsp, cause I cannot check existing implementations. Otherwise, thanks for the books, I'll definitely check the first one at least. – sebpiq Aug 29 '13 at 14:53
The DAFX book has matlab code. IDK if it has matlab code for a compressor. – Bjorn Roche Aug 29 '13 at 19:33
For the "Digital audio signal processing" book, comments on amazon say that it is better for people with electrical engineer background. Do you know of any book that focuses more on algorithms, the best would be for example in pseudo-code ? – sebpiq Sep 02 '13 at 08:54
1

One way or another, you are going to have to learn some new skills, I think. – Bjorn Roche Sep 02 '13 at 13:11
Yes, I know! And I am ready for that. It's just that I don't really want matlab code for example, as I don't want to pay for matlab to be able to test them. That's why I asked if you know of other books with for example pseudo code, instead of a specific language (like Java which I am really not interested in learning). – sebpiq Sep 03 '13 at 12:26

Deve · Answer 2 · 2013-09-01T11:45:55.680

For a simple start I would use a non-linear characteristic $g(x)$ that compresses your input signal:

$$ y = g\left(x\right) $$

where (as pointed out by endolith in the comments) $x$ is the envelope of the input audio signal and $y$ is the output envelope that is applied to the actual audio signal. $g(x)$ can be any function that attenuates large input values stronger than small input values. The A-Law and $\mu$-Law functions have been developed to compress speech signals for telephony, for example. I don't know how good this sounds for music, though.

Another, very simple compression function would be to attenuate all amplitudes above a certain threshold $\delta$: $$ g(x) = \begin{cases} x & \text{for} & x \leq \delta \\ ax + (1-a)\delta & \text{for} & x > \delta \end{cases} $$ where $a < 1$ is attenuation. But this won't work very well as our sense of hearing is logarithmic so that the attenuation might be much too strong. This is why audio compressors work on a logarithmic scale and it leads to the same function as above, but all values taken logarithmic and with respect to the maximum possible value. For $x > 0$: $$ \log(g(x)) = \begin{cases} \log(x) & \text{for} & x \leq \delta \\ \frac{1}{r}\log(x) + \left(1-\frac{1}{r}\right)\log(\delta) & \text{for} & x > \delta \end{cases} $$

For audio compressors, $\delta$ is usually given in dB and $r$ is expressed as some ratio, e.g. 3:1 (i.e. $r=3$). This yields an exponential function, when linearly expressed (hope it's correct, please check it, also $x>0$): $$ g(x) = \begin{cases} x & \text{for} & x \leq \delta \\ \delta^{1 - 1/r}x^{1/r} & \text{for} & x > \delta \end{cases} $$ This function has a "hard knee", meaning that the function $\log g(x)$ is not differentiable at $x = \delta$. For a "soft knee" you would need some smooth transition at that point. The extension of the above functions for negative $x$ is straightforward, just multiply with the signum function and take the absolute value of $x$.

Attack and release have an impact on different sounds like kicks, snares and vocal. They determine how long before the threshold is reached the compressor should start working and how long it should still be working after the signal has fallen below the threshold. To implement this you will have to use some sort of look-ahead.

As all amplitudes below $\delta$ are attenuated, the available dynamic range ist not fully exploited. This is corrected for by the so-called "make up gain" which is just a simple multiplication of the compressed signal with a gain factor $G>1$. By first reducing the dynamic range and then amplifiying the signal compressors can make music appear "louder".

Thanks! Great explanation. I came to a transfer function close to the one you gave, but mine didn't include the threshold (I simplified with threshold = 0), so I have to recalculate. — sebpiq, Aug 29 '13 at 14:54
Actually the transfer function I get by adding the threshold is x^(1/r) * 10^(-sigma/(20 * r)) — sebpiq, Aug 29 '13 at 16:03
Sorry, my fucntion definitions were a mess because I forgot to add the constant bias. I've corrected that. The last expression, however, is correct in my opinion. It must fulfill $g(\delta)=\delta$. There's no factor of 20 involved here because we're taking the logarithm of the left and right hand side so that any constant factor cancels out. — Deve, Aug 30 '13 at 07:29
These look like distortion, not compression. Distortion is a change in level that occurs on a sample-by-sample basis, while compression is something that happens over many cycles of the waveform (each of which is made of many samples). Non-linear distortion is going to sound awful. — endolith, Aug 30 '13 at 17:36
@endolith You're right, thanks for the hint. I've updated my answer accordingly. — Deve, Sep 01 '13 at 11:46
@endolith : I was thinking of calculating a gain target using Deve's function, and using some kind of smoothing, so this gain would be reached slowly (and not on a sample-by-sample basis), would this make it a compression then? — sebpiq, Sep 02 '13 at 08:50

Help implementing audio dynamic range compression

2 Answers2

Linked

Related