Detect human speech in realtime audio on mobile phones

Question

I am looking to develop an Android app. As part of the functionality the app would require to randomly sample 3-5 seconds of audio and classify it as containing human speech or not. I understand that this concept is called Voice Activity Detection?

What would be the best way to implement this on a Mobile phone. I developed a basic system using energy based features and thresholds. I am hoping to find something less susceptible to noise, probably using features such as MFCC or formants? I did go through a number of papers, but most of them would require me to collect data and train models. Is there any library or framework I could use which would work in realtime?

score 1 · Answer 1 · answered Nov 29 '17 at 20:41

1

I believe that speex at http://www.speex.org/ open source code has VAD inside. Try to see if you can see it and get some implementation ideas, with obaying their license.

answered Nov 29 '17 at 20:41

VladP

279
1
4

Detect human speech in realtime audio on mobile phones

1 Answers1