What is the equivalent of a pixel in the realm of audio? I know that pixels are the basic components which are used to form a picture so what basic component is used to form a sound?
5 Answers
It is effectively a sample, but the comparison isn't quite exact. Photos have measurements of intensity on two axis with each pixel having an intensity measurement for red, green and blue. A sound sample is the measure of the intensity of an audio signal at a moment in time, so it is kind of like a cross between a pixel and a frame. There are generally tens of thousands of samples taken per second and from them we can determine what the actual analog wave looked like if the frequencies of the wave are below the Nyquist rate for the given sampling rate.
Bit depth is even more synonymous across the two. In a pixel, the bit depth determines the number of colors that can be represented by the pixel, and thus how close it can get to the original color. Similarly, the bit depth of a sample determines how many possible recorded values can be used to describe the intensity of the sound wave for that sample and produces a sample closer to the exact shape of the original signal or allows for a greater range of intensities of signals to be described.
- 8,162
- 2
- 18
- 32
-
also note that audio is anti-alias filtered before sampling, while images (space aliasing) and video (time aliasing) are not. – endolith Aug 06 '14 at 21:46
-
Actually, photos are generally anti aliased, as is at-least some video (since some video is shot on similar sensor designs that utilize an anti-aliasing filter) – AJ Henderson Aug 06 '14 at 21:55
-
How is that done? – endolith Aug 06 '14 at 21:59
-
@endolith - with an anti-aliasing filter. They are sometimes removed for certain uses, like astrophotography, but they help prevent moire, so they serve a very analogous function. – AJ Henderson Aug 06 '14 at 22:02
-
Huh. I thought they were only used in special applications, and that Moire is more common – endolith Aug 07 '14 at 01:52
-
Moire does still happen but the aa filter reduces it significantly – AJ Henderson Aug 07 '14 at 04:27
-
@endolith It's only some high-end high-resolution cameras which don't have anti-aliasing filters. Like, you can get the DSC-RX1 from Sony as a DSC-RX1R without antialiasing filter. Modern sensors use microlensing for better light yield; that tends to be done in a manner that partly serves antialiasing purposes as well. – user107063 Oct 06 '23 at 12:12
If we're talking about digital audio, which would make sense given the pixel analogy, then the equivalent is a sample.
- 8,162
- 2
- 18
- 32
- 243
- 1
- 11
Sample is an obvious answer, but it lacks in many aspects.
Before the following analysis, we must bear in mind that sound can be put along with moving image (a.k.a. video) and not photographs. The latter are still in time, something that sound can not have.
First of all, the pixel is an attribute of the reproducing device. By this I mean that the pixel can be found on the screen or, more precise, in the projection mechanism of the transducer used to convert digital signals to light. There are samples of moving image. Whatever is originally in analogue domain and has to be converted in the digital one, it must undergo a sampling procedure. Thus, the sample is not analogous to pixel because there are samples also in the video (termed differently but still are samples). In addition, the sample refers to the signal it self and not to the reproducing device, something that the pixel does (especially for vector graphics).
Also, you can not hear a sample. You cannot even hear the actual samples because the movement of diaphragm in your electroacoustic transducers (a.k.a. loudspeakers) adds a lot of elements that are not apparent in the actual samples that are in your digital form of sound.
Thus, there is not such a thing as the audio equivalent of a pixel.
Finally, a sample is a fixed point in space with different values for its light. If you could have a 800x600 loudspeakers arranged in a speaker wall, then each loudspeaker could be a pixel.
- 121
- 1
-
1You misunderstand the definition of a pixel. That "sample" in the video/picture domain IS a pixel. A pixel is the basic record of a visual sample (ok, technically 3 samples in most cases) in an image or video file. It also happens to be the term that is often used to describe the hardware that displays that point in a display, which can lead to some confusion. – AJ Henderson Aug 15 '14 at 15:31
-
1Vector graphics don't use pixels, that's the entire point of vector graphics. They are mathematical representations of the graphic (hence vectors). Separately, on the playback point, it is true that you could display a single pixel, but it isn't meaningful on it's own, so I'm not sure that the argument about not being able to play back an audio sample really applies either. (You also can't play back a pixel without other information like color space.) – AJ Henderson Aug 15 '14 at 15:35
-
In addition to Henderson's great replies - you can hear a single sample just fine. It does have to be of low enough frequency, and you will need a speaker that can handle it and turn off the Nyquist, but it is there and it will be square. With two samples, you will get half a sine-wave if you apply the same measures, save for the Nyquist-filter. – Christian van Caine Aug 16 '14 at 10:20
-
@ChristianvanCaine No. You can not hear a sample. A sample of 16Bit length is just 16 zeros and ones which represent a level of electrical current. For one second of sound you have to have (at the 441.kHz quality) 44100 of them. One sample is just 1/44100 seconds long in time. If you can hear this, then I rest my case. Also, such a short excitement at the speaker should not cause a vibration and you will not hear anything. – Xxxo Aug 16 '14 at 22:50
-
@AJHenderson, I could agree with you if you just show me a pixel at a photograph when is an array in MATLAB. This photograph (of which you cannot show a pixel when it is as an array in MATLAB) can be shown at 800x600 and 1200X400 and and and ... resolutions. My point is that the pixel is an attribute of the projection/presentation mechanism and not of the signal it self whereas a sample in the audio signal is a basic element of the signal. Thus, they cannot be contrasted and compared. – Xxxo Aug 16 '14 at 22:53
-
And just in the case that a 800x600 greyscale photo's array elements are though as pixels, they are just elements of an array and nothing more. They r e p r e s e n t pixels and they actually are not pixels. – Xxxo Aug 16 '14 at 22:56
-
@Kostas - You can play back the same sample value for a second as well. It isn't audible, but a square wave of a set position will be played by the speaker. A single point of color information (ie, a pixel), regardless of how many actual screen pixels you use to display it, is just a block of meaningless color in relationship to the image it is part of. You have to have many, many points of light information (pixels) to have anything meaningful from the data, just like you do with audio samples. Read the definition of a pixel, the elements in the array of a raster image ARE pixels. – AJ Henderson Aug 17 '14 at 02:59
-
@AJHenderson At Wikipedia says: "In digital imaging, a pixel, pel,[1] or picture element[2] is a physical point in a raster image". N.B. "P h y s i c a l". Digital values ARE NOT physical. So, if we are based on your point about raster definition you prove exact my saying. Pixel is PHYSICAL representation whereas a sample is NOT. Thus, pixel != sample. Wikipedia page is from: http://en.wikipedia.org/wiki/Pixel – Xxxo Aug 17 '14 at 10:22
-
@Kostas - at that point you are really arguing semantics. I'll leave it at this since this is becoming a discussion rather than an attempt to improve an answer at this point (I'd be happy to continue in chat if you want.) A pixel is a minimal data element that represents the amplitude of a signal (light value) at a physical location (both time and space) in an image. An audio sample is a minimal data element of a signal (measure of rate of change of a transducer) at a physical location (in time). A sample is as much a "physical" representation as a pixel and just as context dependent. – AJ Henderson Aug 17 '14 at 17:26
-
You may also find this link helpful as it is actually an academic resource which gives a better description of an image as a collection of point samples. The generation of a digital image or video is a sampling process. – AJ Henderson Aug 17 '14 at 17:50
Yes I agree the audio sample can be a solid answer however it does not cover all the bases ... here is another perspective
A pixel is to an image as a sine wave is to sound ... just as any possible image can be synthesized from a collection of pixels, any possible audio can be synthesized from a set of sine waves (along with their amplitudes and phase)
200 years ago Joseph Fourier gave us the insight you can combine a set of pure sine waves ( each with their amplitude and phase shift ) to synthesize any possible sound (inverse Fourier Transform) ... perhaps more subtle he also laid the mathematical groundwork to do the reverse, namely any arbitrary sound (or more generally any time series) can be uniquely defined by a set of frequencies together with their amplitudes and phase (Fourier Transform)
a typical pixel can be stored as (R, G, B, A) where each of those color channels contains one byte of resolution and so 256 distinct values per channel (color science is vast so this is only a tiny glimpse of how a pixel may be represented) ... greater fidelity can be achieved by using more bytes per channel ...
to boost fidelity of an arbitrary audio signal you include a greater number of distinct sine wave frequencies (and their amplitudes and phase)
- 101
- 3
From a practical standpoint, I routinely convert short audio segments into printable images that then can be scanned and converted back into audio. I do this by opening the image in Audacity and exporting it as a raw file (no header, obviously). This file prints as any other pixelated file. I later scan the print, open it in Photoshop and save it as a raw file. Open this file in Audacity and it will play as the original audio. Major limitations are the print resolution and the need for pixel-by-pixel correspondence between the original image and the scanned image. I don't know how well-known this method is, but I got it from a YouTube video on the internet.