DCT, KLT and PCA is often said to be preferred due to their «energy compaction» properties, where an N-element vector can be expressed as K<N element with minimal loss of accuracy.
If your signal compression system consists of run-length encoding with no limitation on scalar accuracy, I can see why this may be a good property (clip small coeffs to zero, code a long zero-run efficiently with small loss)
But for a more generic/complex set of downstream processing, why is this beneficial? If you are going to quantize coeffs and you have fancy inter-coeff redundancy techniques, what is the benefit of a linear transform with energy compaction?
Edit: « It is well known that the Karhunen-Loeve transform (KLT) is the optimal transform in the sense that it provides the best energy compaction property.» https://arxiv.org/pdf/1908.10967.pdf
« You can often reconstruct a sequence very accurately from only a few DCT coefficients. This property is useful for applications requiring data reduction.» https://se.mathworks.com/help/signal/ref/dct.html