SpeechRecogntion quality is extremely poor especially compared to Word

Question

I'm using the WPF speech recognition library, trying to use it in a desktop app as an alternative to menu commands. (I want to focus on the tablet experience, where you don't have a keyboard). It works - sort of, except that the accuracy of recognition is so bad it's unusable. So I tried dictating into Word. Word worked reasonable well. I'm using my built-in laptop microphone in both cases, and both programs are capable of hearing the same speech simultaneously (provided Word retains keyboard focus), but Word gets it right and WPF does an abysmal job.

I've tried both a generic DictationGrammar() and a tiny specialised grammar, and I've tried both "en-US" and "en-AU", and in all cases Word performs well and WPF performs poorly. Even comparing the specialised grammar in WPF to the general grammar in Word, WPF gets it wrong 50% of the time e.g. hearing "size small" as "color small".

    private void InitSpeechRecognition()
    {
        recognizer = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));

        // Create and load a grammar.  
        if (false)
        {
            GrammarBuilder grammarBuilder = new GrammarBuilder();
            Choices commandChoices = new Choices("weight", "color", "size");
            grammarBuilder.Append(commandChoices);
            Choices valueChoices = new Choices();
            valueChoices.Add("normal", "bold");
            valueChoices.Add("red", "green", "blue");
            valueChoices.Add("small", "medium", "large");
            grammarBuilder.Append(valueChoices);
            recognizer.LoadGrammar(new Grammar(grammarBuilder));
        }
        else
        {
            recognizer.LoadGrammar(new DictationGrammar());
        }

        // Add a handler for the speech recognized event.  
        recognizer.SpeechRecognized +=
                            new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);

        // Configure input to the speech recognizer.  
        recognizer.SetInputToDefaultAudioDevice();

        // Start asynchronous, continuous speech recognition.  
        recognizer.RecognizeAsync(RecognizeMode.Multiple);
    }

Sample results from Word:

Hello 
make it darker 
I want a brighter colour 
make it reader 
make it greener 
thank you 
make it bluer 
make it more blue
make it darker 
turn on debugging 
turn off debugging 
zoom in 
zoom out

The same audio in WPF, dictation grammar:

a lower
make it back
when Ted Brach
making reader
and he
liked the
ethanol and
act out
to be putting
it off the parking
zoom in
and out

I got the assembly using Nuget. I'm using Runtime version=v4.0.30319 and version=4.0.0.0. If I'm supposed to "train" it, the documentation doesn't explain how to do this, and I don't know if the training is shared with other programs such as Word, or where the training is saved. I've been playing around with it long enough now for it to know the sound of my voice.

Can anyone tell me what I'm doing wrong?

Prajay Basu · Accepted Answer · 2021-06-08T19:42:31.670

This is expected. Word's dictation uses a cloud based, AI/ML assisted speech service: Azure Cognitive Services - Speech To Text. It is being constantly trained and updated for the best accuracy. You can easily test this by going offline and trying the dictation feature in Word - it won't work.

.NET's System.Speech uses the offline SAPI5 which hasn't been updated since Windows 7 as far as I'm aware. The core technology itself (Windows 95 era) is much older than what is available on today's phones or cloud based services. Microsoft.Speech.Recognition also uses similar core and won't be much better - although you can give it a try.

If you want to explore other offline options, I would suggest trying Windows.Media.SpeechRecognition. As far as I'm aware, it is the same technology as used by Cortana and other modern voice recognition apps on Windows 8 and up and does not use SAPI5.

It's pretty easy to find examples for Azure or Windows.Media.SpeechRecognition online, the best way to use the latter would be to update your app to .NET 5 and use C#/WinRT to access the UWP APIs.

Thanks. I've tried out Azure CognitiveServices. It was a bit of a hassle to set up but seems to be working well. I'd prefer something that works offline but not if the quality is poor. — Tim Cooper, Aug 08 '21 at 04:44
Is there a way to make Windows.Media.SpeechRecogntion works offline for continuous dictation? The sample from windows require internet connection. — Ali123, Aug 11 '21 at 10:37

Rekshino · Answer 2 · 2021-05-12T17:57:50.987

Your best bet I would say to use not a DictationGrammar but specific grammars with whole phrases or with key-values assignments:

private static SpeechRecognitionEngine CreateRecognitionEngine()
{
    var cultureInf = new System.Globalization.CultureInfo("en-US");

    var recoEngine = new SpeechRecognitionEngine(cultureInf);
    recoEngine.SetInputToDefaultAudioDevice();
            
    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "weight", new string[] { "normal", "bold", "demibold" }));
    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "color", new string[] { "red", "green", "blue" }));
    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "size", new string[]{ "small", "medium", "large" }));

    recoEngine.LoadGrammar(CreateKeyValuesGrammar(cultureInf, "", new string[] { "Put whole phrase here", "Put whole phrase here again", "another long phrase" }));

    return recoEngine;
}

static Grammar CreateKeyValuesGrammar(CultureInfo cultureInf, string key, string[] values)
{
    var grBldr = string.IsNullOrWhiteSpace(key) ? new GrammarBuilder() { Culture = cultureInf } : new GrammarBuilder(key) { Culture = cultureInf };
    grBldr.Append(new Choices(values));

    return new Grammar(grBldr);
}

You may also try to use Microsoft.Speech.Recognition see What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?

score 1 · Answer 3 · answered Nov 19 '21 at 22:22

If everyone needs to use a speech recognition engine that has 90% of the accuracy of Cortana it should follow these steps.

Step 1) Download the Nugget package Microsoft.Windows.SDK.Contracts

Step 2) Migrate to the package reference the SDK --> https://devblogs.microsoft.com/nuget/migrate-packages-config-to-package-reference/

The above mentioned SDK will provide you with the windows 10 speech recognition system within Win32 apps. This has to be done because the only way to use this speech recognition engine is to build a Universal Windows Platforms application. I don't recommend making an A.I. application in the Universal Windows Platform because it has sandboxing. The sandboxing function is isolating the app in a container and it won't allow it to communicate with any hardware and it will also make file access an absolute pain and thread management isn't possible, only async functions.

Step 3) Add this namespace in the namespace section. This namespace has all the functions that are related to online speech recognition.

using Windows.Media.SpeechRecognition;

Step 4) Add the speech recognition implementation.

Task.Run(async()=>
{
  try
  {
    
    var speech = new SpeechRecognizer();
    await speech.CompileConstraintsAsync();
    SpeechRecognitionResult result = await speech.RecognizeAsync();
    TextBox1.Text = result.Text;
  }
  catch{}
});

The majority of the methods within the Windows 10 SpeechRecognizer class require to be called asynchronously and this means that you must run them within a Task.Run(async()=>{}) lambda function with an async parameter, an async method or an async Task method.

In order for this to work go to Settings -> Privacy -> Speech in the OS and check if the online speech recognition is allowed.

score 0 · Answer 4 · answered Apr 21 '21 at 10:39

0

As you are actually creating a voice user interface and not only doing speech recognition, you should check out Speechly. With Speechly it's a lot easier to create natural experiences that don't require hard-coded commands but rather support multiple ways of expressing the same thing. Integrating it to your application should be pretty simple, too. There's a small codepen on the front page to get a basic understanding.

answered Apr 21 '21 at 10:39

ottomatias

21
2

Does Speechly have a C# library? Can I use it similarly to the Microsoft library? – Tim Cooper Apr 22 '21 at 05:07

SpeechRecogntion quality is extremely poor especially compared to Word

4 Answers4