Usually ASR systems are evaluated using WER (word error rate), which summarizes 3 types of changes when calculating the edit distance: insertions, deletions and substitutions. According to the wikipedia page, there are two versions:
A. each type is given the same score: 1, 1, 1 (S, D, I)
B. Hunt's version: 1, 0.5, 0.5
I've noticed that sclite has it's own weights
C. sclite: 4, 3, 3
What are the use cases for each weighting scheme? My goal is to compare different speech recognition APIs.