How does Kaldi ASR compare with Mozilla DeepSpeech in terms of the speech recognition accuracy (e.g., in terms of word error rate)?
Asked
Active
Viewed 1.3k times
2 Answers
5
Our word error rate on LibriSpeech’s test-clean set is 6.5%, which not only achieves our initial goal, but gets us close to human level performance. [...] (5.83% according to the Deep Speech 2 paper). On a MacBook Pro, using the GPU, the model can do inference at a real-time factor of around 0.3x, and around 1.4x on the CPU alone. (A real-time factor of 1x means you can transcribe 1 second of audio in 1 second.)
Bonus: Facebook AI Research Automatic Speech Recognition Toolkit (Torch+lua, BSD License) gets 4.8% WER test-clean and 14.5% WER test-other on the LibriSpeech corpus.
Franck Dernoncourt
- 1,588
- 2
- 12
- 35
-
Since you found the answer, you could mark your own answer as "accepted". Otherwise, the question will keep getting bumped to the front page by SE bots. – prash Dec 30 '17 at 12:42
-
@prash Good point, done – Franck Dernoncourt Jan 01 '18 at 08:26
-
WER is not the only parameter we should be measuring how one ASR library fares against the other, a few other parameters can be: how good they fare in noisy scenarios, how easy is it to add vocabulary, what is the real-time factor, how robustly the trained model responds to changes in accent intonation etc. – absin Feb 19 '19 at 04:03
-
@AbSin 100% agreed. – Franck Dernoncourt Feb 19 '19 at 04:04
3
Kaldi provides WER of 4.28% whereas deepspeech gives 5.83% on librispeech clean data. Check this out: https://github.com/syhw/wer_are_we
purushotam radadia
- 31
- 2