Can you Kinect me now... Using the Kinect for Windows SDK v1.8&#39;s JavaScript API to add voice recognition to your web app...

Wednesday, October 02, 2013

Can you Kinect me now... Using the Kinect for Windows SDK v1.8's JavaScript API to add voice recognition to your web app...

Kinect for Windows Dev - Using Kinect Webserver to Expose Speech Events to Web Clients

In our 1.8 release, we made it easy to create Kinect-enabled HTML5 web applications. This is possible because we added an extensible webserver for Kinect data along with a Javascript API which gives developers some great functionality right out of the box:

Interactions : hand pointer movements, press and grip events useful for controlling a cursor, buttons and other UI

User Viewer: visual representation of the users currently visible to Kinect sensor. Uses different colors to indicate different user states

Background Removal: “Green screen” image stream for a single person at a time

Skeleton: standard skeleton data such as tracking state, joint positions, joint orientations, etc.

Sensor Status: Events corresponding to sensor connection/disconnection

This is enough functionality to write a compelling application but it doesn’t represent the whole range of Kinect sensor capabilities. In this article I will show you step-by-step how to extend the WebserverBasics-WPF sample (see C# code in CodePlex or documentation in MSDN) available from Kinect Toolkit Browser to enable web applications to respond to speech commands, where the active speech grammar is configurable by the web client.

A solution containing the full, final sample code is available on CodePlex. To compile this sample you will also need Microsoft.Samples.Kinect.Webserver (available via CodePlex and Toolkit Browser) and Microsoft.Kinect.Toolkit components (available via Toolkit Browser).

...

So, What Functionality Are We Implementing?

More specifically, on the server side we will:

Create a speech recognition engine

Bind the engine to a Kinect sensor’s audio stream whenever sensor gets connected/disconnected

Allow a web client to specify the speech grammar to be recognized

Forward speech recognition events generated by engine to web client

Registering a factory for the speech stream handler with the Kinect webserver

..."