For a simple start I would use a non-linear characteristic $g(x)$ that compresses your input signal:
$$
y = g\left(x\right)
$$
where (as pointed out by endolith in the comments) $x$ is the envelope of the input audio signal and $y$ is the output envelope that is applied to the actual audio signal. $g(x)$ can be any function that attenuates large input values stronger than small input values. The A-Law and $\mu$-Law functions have been developed to compress speech signals for telephony, for example. I don't know how good this sounds for music, though.
Another, very simple compression function would be to attenuate all amplitudes above a certain threshold $\delta$:
$$
g(x) = \begin{cases}
x & \text{for} & x \leq \delta \\
ax + (1-a)\delta & \text{for} & x > \delta
\end{cases}
$$
where $a < 1$ is attenuation. But this won't work very well as our sense of hearing is logarithmic so that the attenuation might be much too strong. This is why audio compressors work on a logarithmic scale and it leads to the same function as above, but all values taken logarithmic and with respect to the maximum possible value. For $x > 0$:
$$
\log(g(x)) = \begin{cases}
\log(x) & \text{for} & x \leq \delta \\
\frac{1}{r}\log(x) + \left(1-\frac{1}{r}\right)\log(\delta) & \text{for} & x > \delta
\end{cases}
$$
For audio compressors, $\delta$ is usually given in dB and $r$ is expressed as some ratio, e.g. 3:1 (i.e. $r=3$). This yields an exponential function, when linearly expressed (hope it's correct, please check it, also $x>0$):
$$
g(x) = \begin{cases}
x & \text{for} & x \leq \delta \\
\delta^{1 - 1/r}x^{1/r} & \text{for} & x > \delta
\end{cases}
$$
This function has a "hard knee", meaning that the function $\log g(x)$ is not differentiable at $x = \delta$. For a "soft knee" you would need some smooth transition at that point. The extension of the above functions for negative $x$ is straightforward, just multiply with the signum function and take the absolute value of $x$.
Attack and release have an impact on different sounds like kicks, snares and vocal. They determine how long before the threshold is reached the compressor should start working and how long it should still be working after the signal has fallen below the threshold. To implement this you will have to use some sort of look-ahead.
As all amplitudes below $\delta$ are attenuated, the available dynamic range ist not fully exploited. This is corrected for by the so-called "make up gain" which is just a simple multiplication of the compressed signal with a gain factor $G>1$. By first reducing the dynamic range and then amplifiying the signal compressors can make music appear "louder".