Monday, September 14, 2009

The (short) quest of Speech Recognition on Android

Well yeah, speech recognition is possible BUT apparently it is handled by default by the Google Voice Search package (I am not 100% sure but it's all I have learned till now, read this for example) which is available only for the brave :-D. And I am weak.

So I am currently investigating other ways of implementing it. There would be two main directions to investigate:

1.) Perform the speech recognition on the device - I'm not sure how practical this is, would certainly eat up a lot of CPU/battery, if possible at all. However I didn't find any resource on this, at least not yet and not for Android.

1.1) As an intermediary step maybe it would be easier to implement recognition for a limited vocabulary for which you would store sound/text values, record voice and try to match the recorded sound against the existing entries. This may work, the issue here would be analyzing/matching the two sound samples.

2.) Capture a sound file/stream, upload it to a server, convert it to text and get the text back. I found Sphinx as a possible server side package to do the speech recognition and get the text back, but there are some drawbacks to this approach: you need a server, you need bandwidth (3G, Wifi I'd say) and you need to convert from the capture format on Android (3GPP, mpeg whatever) to the formats supported by Sphinx which is dependent on JavaSound (cringe!), these being aiif, au and wav.

However for the 2.) point there is a slight advantage in being able to choose the server platform/package which would not need to be Java (I specifically looked for Java, Sphinx 4 is Java) and I assume that speech recognition efforts have been made in other languages too.

Has anyone else tried/thought of doing something similar ?

No comments: