Saturday, January 29, 2005

Automated transcription of WAV files in .NET using Microsoft Speech API (SAPI)

Automated transcription of WAV files in .NET using Microsoft Speach API (SAPI)

Interesting VB.Net code snip that uses Microsoft Speech SDK, SAPI 5.1's speech recognition on a wave file.

Interested in "OCR" for wave files, I played with this a little last night. I've only been playing with it a little, but in very short order I was able to take a wave file, where I recorded myself reading the Declaration of Independence and then convert that to text (with position marks).

It's no where near perfect (eyeballing the text file, I'd say 70-80% accuracy). But it IS free.

I'm thinking it might be used to triage voice files prior to sending them for professional transcription. Given the costs of profession transcription, the more focused you are in using such a service, the better…

I'm still in the very early phase of investigating this tech and am only doing it with spare cycles... One thing I'm not happy with yet is how well it handles low sampling level wav files (i.e. like a voice mail converted to wav). Taking my sample and lowering is level from 22khz to 8 destroyed the accruacy... It was still clearly understandable, but SAPI didn't like it... So more research (cause I could be doing something stupid too)...

Even in the best circumstances, if you've played with the dictation feature in Office2k3/XP you'll know there's only so much you can expect.

No comments: