Vista Speech Recognition Goes Awry
An anonymous reader writes "It seems even MSNBC is willing to take a jab on those rare occasions when Microsoft products don't work. During a demo of Vista's speech recognition technology, Vista couldn't differentiate between mom and aunt, and all attempts to rectify the problem just made it worse. Wait until you see what it spat out, I think we have a new 'All your base.' Don't you just love Microsoft's live demonstrations?"
Well, I've taken a look at that (a while back): Dragon seems to be the leader, they get (with a month of traning) te best accurracy.
However, sound recognition engineers are slowly realizing that the problem of recognising words is not just the algorithm's fault. Even people arn't able to understand all words from a taped conversation in a cafeteria.
Dragon is currently the best, getting further will probably require more input, like a webcam to read your lips. This is just another Microsoft product where they read the wikipedia page on it, produced a flashy interface and packaged it with their OS. If you want sound recognition, don't go with Microsoft, they don't have the expertise.
PS: Don't forget, that getting a good or even special microfoon can make all the difference.
In fact voice recognition would be a great playground for non-profit open source software projects.
Voice recognition means permanent beta. Voice recognition only slightly improved during the last ten years. One reason is that the VR market it a trivial patent minefield. The rest is just performance.
Sure, we will get proper voice recognition some day. I would source it out to open source and integrate it back into my products once it will be ready.
Reason 1: You don't have to train this software. That's when you have to read aloud a canned piece of prose that it displays on the screen -- a standard ritual that has begun the speech-recognition adventure for thousands of people.
I can remember, in the early days, having to read 45 minutes' worth of these scripts for the software's benefit. [...] NatSpeak 9 requires no training at all."
FWIW, Nuance claims that their latest version of Dragon Naturally Speaking (v9) doesn't require training before use. But of course this is different software. But consider this -- aren't "Mom" and "Aunt" phonetically dissimilar enough that you should NOT need to train it?
I'm not one to defend MS, but I speculate that the volume on his microphone was set too high, causing distortion and clipping. Look at the volume meter when he talks -- it goes all the way to the top.
I hold it, that a little rebellion, now and then, is a good thing. -- Thomas Jefferson
Let's give this guy some credit. He clearly has some degree if competence if he's selected to showboat the app at a major presentation, at least enough to know that you need to train, or at least test, a voice recognition demo.
A far more likely scenario, in my mind, is that he trained and tested it 100 times and got it working nearly flawlessly, but in a different room and with a different setup. In fact he may have overtrained it. Programs like this can behave very badly when they end up overfitting the data.
On the day in question he may have had a different mic and the acoustics were certainly different and the program went whacko.
That's so last century. NPR did a bit on the new Dragon Dictate 9. The NPR reporter got 100% accuracy out of the box, no training.
Dictation Software Improves Usability, Accuracy
Footnote: Microsoft was a monopolistic, backwards company that started the PC revolution.
;-)
They don't deserve credit for starting the "PC revolution". The credit properly belongs to the hundreds of little startups and hobbyists, the whole CP/M crowd and others like Amiga. Microsoft was a subcontractor to a giant monopoly (IBM) that stepped in after the little guys demoed there was a market, and took over that market. They succeeded mostly because of a marketing budget greater than the budgets of all the little companies combined.
And there's a good argument that, by marketing PC/DOS rather than CP/M, they set back the PC revolution by 5 to 10 years, the time it took for PC/DOS to match the capabilities of CP/M when IBM started their PC marketing campaign.
Sorry; that's the way "the Market" works in the computer field. Small, independent developers make something new and start selling it; the big companies then step in and take over the market through traditional monopoly strategies.
It's likely that we're now going to hear people crediting Microsoft for starting the "voice recognition" revolution by inventing the new idea that computers can understand speech. Marketing can redefine history like that.
(Whereas we computer geeks know that Al Gore invented speech recognition.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
I expect in 300+ years when Star Trek is set, our AI will beat the piss out of Star Trek AI. Hell, the computer has been around for only a little over 50 years. A little over 100 years ago we had just first discovered electricity and flight.
Well, maybe. But we invented microscopes around 300 years ago, and discovered microorganisms immediately thereafter. The understanding that some bacteria were involved in diseases followed quickly. But it was nearly 300 years before we successfully eradicated a disease (smallpox). Today, we're still battling new diseases, and we don't have anything like a general solution to all diseases. We have a few antibiotics that effect more than one disease, but we haven't made much progress in solving the problem of the development of resistance to our antibiotics. Hell, we can't even convince the general public that it's the evolutionary process at work here, and we've understood that for around 150 years.
I wouldn't predict any general solution to a complex problem like voice recognition in a mere 300 years. Maybe we will. But our history of general solutions to other complex biological problems is not encouraging. Neither is the history of our first 50 years of AI, despite the constant hype and Hollywood movies claiming that AI is just around the corner.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
My gf's brother works for the MS subsidiary that does network set up and tech support for MS trade shows. He's personally wired Gates and Ballmer and has admitted that many "live" demo's the head honchos have presented were actually canned like Ashlee Simpson lip synching on SNL. They don't trust their own products enough to put their execs in the same embarrassing position that this presenter got himself into.
One pronunciation of 'aunt' (the less common in US) uses the exact same vowel sound, and 'm' and 'n' are very similar. If the salesgenius had trained it to his (probably the more common) pronunciation of 'aunt', he wouldn't have had the problem. I suspect MS's program figures that in the salutation, 'aunt' is one of the more common words. That's why it got 'dear' correct, else it might've said something like 'tear'.
No 90% is no place near good enough for dictation but it sure might be good enough for some applications.
Think "computer lights", if it gets it wrong you just try again. All those media PC would be good candidates as well. If I say "change to channel six" and thing swiches to sixty 1/10th of the time well I could repeat myself that often in that application anyway; and still be pretty satisfied.
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
Is this "voice recognition" as in "understand English vocabulary and grammar rules and differentiate between speech and commands" or as in "match this sound up with one of 10 prerecorded ones to autodial a number"?
Try this:
Said: "How to recognize speech"
Understood: "How to wreck a nice beach"
No, it's not always easy to tell the difference...
"Good news, everyone!"
I guess you've never used the voice recognition in OS X. Out of the box, worked perfectly for me. I can speak very normally, sometimes even faster than I usually speak to people, and it works fine. I've never trained it (I don't even think you can). Microsoft is simply half-assing it again.
Regards,
Steve