9

I am looking to develop an Android app. As part of the functionality the app would require to randomly sample 3-5 seconds of audio and classify it as containing human speech or not. I understand that this concept is called Voice Activity Detection?

What would be the best way to implement this on a Mobile phone. I developed a basic system using energy based features and thresholds. I am hoping to find something less susceptible to noise, probably using features such as MFCC or formants? I did go through a number of papers, but most of them would require me to collect data and train models. Is there any library or framework I could use which would work in realtime?

Dony George
  • 113
  • 3

1 Answers1

1

I believe that speex at http://www.speex.org/ open source code has VAD inside. Try to see if you can see it and get some implementation ideas, with obaying their license.

VladP
  • 279
  • 1
  • 4