I've seen a few publications that feed an RGB image of a spectrogram to a neural net, and someone claiming a network does better with RGB than grayscale or raw spectrogram.
A spectrogram is fundamentally a 2D representation with each point being a non-negative real value. Converting it to RGB adds no information. Worse, it introduces a dependence on choice of colormap, which is just noise${}^{1}$. It's worse than making grayscale images RGB, as it breaks a spectrogram's spatial dependencies by splitting into channels.
Why would a spectrogram saved as RGB ever outperform a raw spectrogram?
Clarification: originally I didn't realize this, but "RGB image" implies "image", meaning it involves a conversion step that compresses and reshapes the raw spectrogram. Additionally, it's not just any RGB in sense of $\text{R} \approx \text{G} \approx \text{B}$, but a color mapping for intensity heatmaps, like turbo (plt.imshow(np.arange(9)[None], cmap='turbo')):
.
It's possible the image doesn't reshape, in which case there's no compression, but even if a colormap isn't specified, doesn't mean there's no color mapping: what matters is how the array values compare between the raw spectrogram and what's decoded as image as input to NN.
1: that was my impression at the time, it's the case with $R \approx G \approx B$, but otherwise definitely not. Depending on colormap, it can be noise (or worse) though.
Example pub with good results, but there's reason to suspect incompetence per e.g. "[1356x1071] images were lossless scaled to 32x32", which is impossible. There's no comparison with grayscale approach so we can't tell if it outperformed.
There are some "trivial" explanations I'll list to avoid answers containing them:
- Transfer learning: using nets pretrained on RGB
- Architectures tailored specifically to maximize RGB utility
While they're valid explanations, it's no evidence that RGB is any better.


y = sqrt(x). – OverLordGoldDragon May 13 '23 at 15:02