I'm currently working on voice activity detection (VAD) theme and I just started to wonder - if computer generated speech (as when using voice generators) can be discriminated from human speech with classic VAD approaches (ex. 4Hz energy modulation, zero-crossing ratio, modulation entropy, short time energy, MFCC coefficients).
Or - from mathematical point of view - are they just the same signals (or with very similar characteristics)?