1

Could anyone recommend good speech pre-processing/filtering algorithms/methods that would improve the performance of my speech recognition. I have been primarily been doing image processing related stuff and hence am not much aware of speech pre-processing methods.

I am currently using Google's Web Speech API for speech to text recognition. This is the current flow of things.

  • Take input from microphone
  • Segment the audio using Voice Activity Detection. This is done since Google's API has a limit on the length of the segment it can process.
  • Send the segment to Google's web speech API (on a thread)
  • Print the response of the API

Using this I could only get up to 60-65% conversion accuracy. Is there any way to improve this? Would doing a pre-processing on the audio (noise-removal, filtering, etc.) help?

patel deven
  • 121
  • 2
  • 1
    Please be clear about your question. DSP.SE is not a forum but a formal Q&A site. It will welcome all questions where you apply signal processing algorithms. A simplest answer to your above question is - now that you know Google API gives 60% accuracy, try a better algorithm! Please ask only a specific question pertaining to specific algorithm. – Dipan Mehta Jun 15 '15 at 14:46
  • I am not aware of 'regular' speech pre-processing methods (if they are used). Also if someone has already used this API with more success, I would like to know what steps they had taken to make it work. – patel deven Jun 19 '15 at 09:57
  • OK. I have requested for reopen of the question. – Dipan Mehta Jun 19 '15 at 15:17

0 Answers0