How sound data is stored .wav format?

Question

I want to make a simple .wav player in C# for learning purposes. I want to get more insight into how audio is stored and played on the computer so I want to play a .wav manually rather than with the simple call of a built in function.

I've looked at the structure of .wav files and found some great resources. What I've found is the wav file format stores the data of the sound starting from the 44th byte. It contains data about channels and sample rates in previous bytes but that is not relevant to my question.

I found that this data is a soundwave. As far as I know the height of a sample of a soundwave represents it's frequency. But I don't get where the timbre comes from? If I only played sounds for the correct amount of time on a correct frequency I would get beeps. I could play them simply with System.Console.Beep(freq, duration); but you could hardly call that music.

I have tried looking through multiple resources but they only described the meta data and didn't cover what is exactly in the sound byte stream. I found a similar question and answer on this site but it doesn't really answer that question, it is not even marked accepted because of that I believe.

What is the data exactly in the wave byte stream and how can you make that into an actual played sound on the computer?

score 1 · Answer 1 · answered Nov 27 '21 at 20:40

You are mistaken: The height of a sample does not represent a frequency. As a matter of fact, the wav-format doesn't use frequencies at all. wav basically works following way:

An analog signal is sampled at a specific frequency. A common frequency for wav is 44,100 Hz, so 44,100 samples will be created each second.
Each sample contains the height which the analog signal has at the sample time. A common wav-format is the 16 bit format. Here, 16 bit will be used to store the height of the signal.
This all occurs separately for each channel.

I'm not sure in which order the data is stored, but maybe some of the great resources you found will help you with that.

And so what does the height of the signal (that 16 bit, one sample) represent exactly? How would you make that 16 bit into a unique sound? — 66Gramms, Nov 27 '21 at 20:49

score 1 · Accepted Answer · answered Dec 02 '21 at 10:40

Adding to the above answer the height of the sample is the volume when played back. It represents how far back or forward the speaker is pulled or pushed to re-create the vibration.

The timbre you refer to is determined by the frequency of the audio wave.

There is a a lot going on in audio, a simple drumbeat will produce sound on several frequencies including harmonics or repeated vibrations at different frequencies, but all of this is off topic for a programming site, so you will need to research sound and frequencies and perhaps DSP.

What you need to know from a computers perspective is that sound is stored as samples taken at set frequencies, as long as we sample at twice the frequency of the sound we wish to capture we will be able to re product the original. The samples record the current level (volume) of the audio at that moment in time, turning the samples back into audio is the Job of the Digital to Analogue Converter found on your sound card.

The operating system looks after passing the samples to the hardware via the appropriate driver. In windows WASAPI and ASIO are two API’s you can use to pass the audio to the sound card. Look at open source projects like NAudio to see the code required to call these operating system APIs.

I hope this helps I suspect the topic is broader than you first imagined.

How sound data is stored .wav format?

2 Answers2