Could anyone recommend good speech pre-processing/filtering algorithms/methods that would improve the performance of my speech recognition. I have been primarily been doing image processing related stuff and hence am not much aware of speech pre-processing methods.
I am currently using Google's Web Speech API for speech to text recognition. This is the current flow of things.
- Take input from microphone
- Segment the audio using Voice Activity Detection. This is done since Google's API has a limit on the length of the segment it can process.
- Send the segment to Google's web speech API (on a thread)
- Print the response of the API
Using this I could only get up to 60-65% conversion accuracy. Is there any way to improve this? Would doing a pre-processing on the audio (noise-removal, filtering, etc.) help?