I've actually tried out the vlingo application a couple of times, and the speech recognition is surprisingly good. They trained the system on a vast number of business names and addresses (easily over a million), and thus the application of vlingo I used was for "point of interest" queries in mobile search. When their CTO said "find me a Starbuck's in " and it worked, I naturally wanted to test it on other more odd queries. Even though the server-based recognition had adapted itself for the CTO's voice (based on the caller id information of his phone), I tried "find me Caribou Coffee in Wheaton Illinois" and it got it word for word. I tried a couple more place queries and even one that was fictitious but plausible, and it worked fine: their system is not based on a fixed speech grammar outlining all possible expected utterances, but a much more flexible statistical approach based on phoneme lattices.
Voice input seems very appealing for mobile search when you contrast it to keypad entry. This study of a million Google Local Mobile queries showed that it took 56-63 seconds -- a full minute! -- to enter an average query by 12 key keypad, and about half that to enter the query via a PDA with a stylus and virtual keypad. So if a speech recognition interface that does it 2-3 seconds is a huge win if the accuracy is high enough for most users. I feel vlingo is at least tantalizingly close to this level of accuracy.
You can get a feel for a similar system by trying out Google's free 1.800.GOOG411, to see how it works for you.
Perhaps Leopard was chosen deliberately here. The word connotes great speed, but there is also the saying "Can a leopard change his spots?" Maybe this one can, and look like another animal entirely!
You're only looking at the *second* of three tabs on this web page.
I've actually tried out the vlingo application a couple of times, and the speech recognition is surprisingly good. They trained the system on a vast number of business names and addresses (easily over a million), and thus the application of vlingo I used was for "point of interest" queries in mobile search. When their CTO said "find me a Starbuck's in " and it worked, I naturally wanted to test it on other more odd queries. Even though the server-based recognition had adapted itself for the CTO's voice (based on the caller id information of his phone), I tried "find me Caribou Coffee in Wheaton Illinois" and it got it word for word. I tried a couple more place queries and even one that was fictitious but plausible, and it worked fine: their system is not based on a fixed speech grammar outlining all possible expected utterances, but a much more flexible statistical approach based on phoneme lattices. Voice input seems very appealing for mobile search when you contrast it to keypad entry. This study of a million Google Local Mobile queries showed that it took 56-63 seconds -- a full minute! -- to enter an average query by 12 key keypad, and about half that to enter the query via a PDA with a stylus and virtual keypad. So if a speech recognition interface that does it 2-3 seconds is a huge win if the accuracy is high enough for most users. I feel vlingo is at least tantalizingly close to this level of accuracy. You can get a feel for a similar system by trying out Google's free 1.800.GOOG411, to see how it works for you.
Perhaps Leopard was chosen deliberately here. The word connotes great speed, but there is also the saying "Can a leopard change his spots?" Maybe this one can, and look like another animal entirely!