Translation of spoken words into text, usually with the help of computers.
Questions tagged [speech-recognition]
48 questions
8
votes
2 answers
How does Kaldi compare with Mozilla DeepSpeech in terms of speech recognition accuracy?
How does Kaldi ASR compare with Mozilla DeepSpeech in terms of the speech recognition accuracy (e.g., in terms of word error rate)?
Franck Dernoncourt
- 1,588
- 2
- 12
- 35
4
votes
2 answers
Any software that label a WAV file into phonemes
I have a WAV file contains a subject speech. The subject speaks a sentence once at a time, then a short period of silent appears. I'm interested to analyze the phonemes of that speech and what time each phoneme occurs. For instance, I am looking for…
cyberic
- 143
- 1
- 5
3
votes
0 answers
What does "parsimonious representation" mean in the context of speech recognition?
I was studying about LPC (Linear Predictive coding) and in this topic the author was explaining the reasons why LPC is used widely for speech recognition system. One reason he gave was this:
The way in which LPC is applied to the analysis of…
Rameshwar.S.Soni
- 183
- 1
- 8
1
vote
1 answer
How important is speaker-balance for speech recognition models?
Working on ASR models, I have encountered several datasets which have distributions where a small amount of speaker make a huge part of the actual dataset.
The following image shows the extracted time spoken (log) per speaker from the Voxforge (de)…
Stefan Falk
- 121
- 4
1
vote
0 answers
How does Microsoft speech recognition API compare with Google Cloud Speech API in terms of speech recognition accuracy?
How does Microsoft speech recognition API compare with Google Cloud Speech API in terms of speech recognition accuracy (e.g., in terms of word error rate, character error rate (CER), or sentence error rate (SER))?
I'm also interested in other online…
Franck Dernoncourt
- 1,588
- 2
- 12
- 35
0
votes
0 answers
Why does Whisper ASR sometimes have a really bad output?
The Whisper ASR is a very accurate speech recognition system, on average. But in some rare cases, it has very bad output.
E.g.:
ground truth
whisper output
seven
damn it!
her jewelery shimmered
Hey, did you lose your mind?
the englishman…
Franck Dernoncourt
- 1,588
- 2
- 12
- 35