Making your app a better listener... Using multiple grammars to improve speech recognition and to allow for runtime, state based recognition configuration

Friday, February 03, 2012

Making your app a better listener... Using multiple grammars to improve speech recognition and to allow for runtime, state based recognition configuration

Robert Lucero's Testing Blog - Speech Recognition - Using Multiple Grammars to Improve Recognition

"A difficult problem both users and developers face is recognizing words that are similar sounding, but wrong for the current context. An example of this would be the words “yellow” and “hello”.

Using the simple WPF app from the previous Exploring Grammar Based Recognition post, I will show an example of this confusion and a simple way to improve recognition based on a defined context. Specifically, a button to enable and disable grammars will be added to simulate context switching.

Check 2, 3… Check…

This is a continuation of the previous Exploring Grammar Based Recognition post. Please make sure that you’ve installed the Windows SDK as a prerequisite to both of these tutorials.

Step 1: Identifying Recognition Confusion

Using the Simple Speech Recognizer, add the word “hello” to the list of words to be recognized. Then repeat saying “hello” and “yellow” with various inflections. Depending on how I said it, I was able to get the wrong word recognized.

...

What Was Improved?

In this case, pressing the button changes the words that the Speech Recognition engine is listening for. If the grouping inside of grammar rules or grammars are clever, developers can enable and disable scenarios when the system moves into a specific state. It can give context and, in some cases, better accuracy for the words the system is listening for.

However, it doesn’t improve the more basic problem of confusion if someone says a word that sounds very similar to a word the engine is listening for. This process primarily helps by focusing or broadening the words available for recognition.

Summary

By dynamically enabling and disabling grammars, apps have another tool to help improve the recognition scenarios. Contexts that are provided and acted upon can make for a better recognition experience.

Here are some posts I found useful:

MSDN Definition for Grammar.Enabled

MSDN Grammar Loading Example

For more ideas or for more background on this post check out my previous post: Exploring Grammar Based Recognition. As always, if you have feedback or questions feel free to leave a comment or contact me through the MSDN blog dashboard tools!

This simple example shows how you can tweak speech recognition by the state of the app, to easily control what your app is listening for... Now if only I could turn off my internal Wife Grammar list (oh... did I really say that!? I mean turn it on! err... um... I mean... um... damn) :P