So, I've just begun a speech and language processing course and have found the explanation of the process of getting the cepstrum of a signal and its properties a little confusing. The following is a description of my current understanding and an explanation of confusion it's causing me:
- start with the speech signal. we can think of it as the formant signal convolved with a the excitation signal which is a dirac comb (approximately).
- take the FFT, giving the spectrum of the excitations multiplied with the spectrum of the formants. the FFT of the excitation signal is another dirac comb with period 1/T
- take logs. so the 2 signals above are now added
- inverse fourier transform - the two signals from 1 should now be combined in addition (the FT transform is linear)
So if those 4 step are right, then why does the excitation appear at a particular region in the quefrency domain? it should emerge as a Dirac comb, added to the formant impulse response shouldn't it?