To clear up a whole lot of misconceptions.
First, there has never been a 56k “baud” modem. Baud is about state change, and was maxed out at 1200 baud. Anything beyond that required more sophisticated encoding.
Second, human hearing perceives not only the fundamental tones, but also many orders of harmonic content far above and beyond the fundamental. When that harmonic content is removed, the audio sounds less natural and pleasing. Higher (than 8Khz) resolution audio is both more intelligible and more pleasing to the ear.
Third, Nyquist works within a fixed time domain. If you begin sampling at the exact moment of a peak or trough, then you only need 2x the sample rate to the frequency. However, in the real world your sample points can occur at any random offset in time to the peak or trough, therefore requiring a higher sampling rate. For example, if you sample a sine wave and your sample moment occurs at precisely 90 degrees offset from the start of the wave, your data will suggest a straight line rather than a wave. For fundamental tones this is critical. For harmonic content, it is more of a nice to have, with diminishing returns near the top end of the audible range. Nyquist applied to audio processing is one of the most poorly interpreted theorems out there.