I want to detect, in real-time, a 1-2 word spoken phrase within continuous speech, speaker independently, in hardware. The smaller the number of gates the better.
What is the best way to do this? Are the requirements feasible?
I want to detect, in real-time, a 1-2 word spoken phrase within continuous speech, speaker independently, in hardware. The smaller the number of gates the better.
What is the best way to do this? Are the requirements feasible?
There is really no need to do this in hardware. You can do the same on any cpu and it will be more flexible, energy efficient and updatable. You can pack simple computing core into your chip and run on it if you want. I don't think you'll gain anything from hardware implementation.
For a good implementation of keyword spotting you can check CMUSphinx http://cmusphinx.sourceforge.net, it has recently introduced good efficient keyword spotting. Fixed point arithmetics, noise suppression, tunable threshold. All that you need in single package.