Vista Speech Recognition Goes Awry
An anonymous reader writes "It seems even MSNBC is willing to take a jab on those rare occasions when Microsoft products don't work. During a demo of Vista's speech recognition technology, Vista couldn't differentiate between mom and aunt, and all attempts to rectify the problem just made it worse. Wait until you see what it spat out, I think we have a new 'All your base.' Don't you just love Microsoft's live demonstrations?"
Reminds me of the Roald Dahl short story about the ant-eater who ate someone's aunt because their accent rendered the two words the same.
I can't remember what the story was called.
the layman's guide to computer science
It's just a one-time thing.
I mean, it's not like they have a reputation for releasing half-assed code that's been hyped up through marketing to the point that it will never perform as advertised.
And it's not like this is a company that is having image problems due to its monopolistic nature.
Or headed by an infamous ragaholic with a history of intolerance towards free standards.
Nope, I'm sure that this is just an accident by a company that spends its off hours petting little baby chickens and bunnies.
Reminds me of the time when I worked at a computer store and we played with the voice recognition card in a PowerMac floor model. Somebody programmed it so that if someone said "Computer, bite me", it would respond with "Can't bite what's not there". Over time the accuracy of the recognition fell. One day as a salesman was talking to a customer about the computer it misinterpreted something he said and said "Can't bite what's not there". Needless to say that system was wiped and we weren't allowed to play with it anymore.
Yes, once again Microsoft S/W Engineers learn that the more public the demo or the more important the audience, the more likely some will go wrong. It's one of Murphy's laws. Been there. Did that. Barely survived.
Experience is the human quality that enables you to recognize a mistake immediately when you make it again.
Dacap
English -- gotta love it! / The engineers refuse to refuse the rocket until the refuse is removed from the launch pad.
This isn't the first presentation went wrong, isn't it? ;)
Win98 gone wild: http://www.youtube.com/watch?v=Hrbx9_AY720
Media Center Edition gone wild http://www.youtube.com/watch?v=j7EEbokKLHI
We can add this one to the list too
Final text:
Not quite as embarrassing as the Windows 98 BSOD, but more entertaining than the Ballmer developer's video.
http://www.ntk.net/media/developers.mpg
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
Steve Ballmers accidently send an e-mail while diligently testing the software. The e-mail says:
"Sir put down the chair, then we'll talk"
"No Steve wait up, don't do that"
"BOOM CRASH BOOM CRASH BOOM CRAASH WAAAH NOOO STOOOOP"
"DUDE, THE COMP HAS A BSOD! WAAH!"
I have used Naturally Speaking, it can take a bit of time to train it, but if only you use it then you can eventually get it to the point where you can talk at a normal speed (although it has to be clear) and it will get to approaching 90% accuracy, sometimes I had it higher. The point was that it couldn't be used as an alternative to typing for extended periods though because you had to check everything it wrote.
One thing it did do which was good though is tried to understand sections of speech, rather than just each word, which did improve accuracy. Words often follow patters and there are few words that make sense after a word, so it was often right with "over there".
SR tech will eventually be as good as on star trek as long as people work on it. I would give it 20 years if it is seen as something which could make a lot of money, 40 if you have to wait for interested people to do it for free on their own time
*''I can't believe it's not a hyperlink.''
Microsoft routinely puts out their excellence over everyone else including OSS. Hear them talk about Office w.r.t. OpenOffice. They talk down about it, mock it, dismiss it, etc...
It's called modesty. If MSFT had any [and some humility] they wouldn't get laughed at so hard for this. I mean look at Linux. Find a bug in the Kernel, fix it, post notices that its. You don't see anyone saying "Oh hahaha, Linus is at it again!" That's because you also don't see Linus on CNN mocking the rest of the world.
Microsoft deserves all the negative press and humilitation they get because they are shameless, deceitful, greedy monopolistic bastards.
Tom
Someday, I'll have a real sig.
The computer in Star Trek (at least in the Next Generation) was WAY too smart. For it to do what it supposedly did in the show, it would have to be sitting there, monitoring the conversation all the time, and be totally able to understand the context of what was being said to know what to do. Not only when people directly asked the computer a question, but also when people wanted to converse with someone.
;)
For example, how does the computer know that Picard wants to call Riker and isn't just talking about him? Oh and keep in mind the computer never misinterpreted something. In other examples, people would carry on intelligent conversations with the computer - all those holodeck scenes, Troi ordering chocolate, etc.
Star Trek-style of SR I think would be the holy grail and is probably always going to be out of reach. Barring some amazing breakthrough in AI algorithms, the computer power required just for the situations above would be incredible - and that's computer time that probably could be put to better use elsewhere, even if it was found to be possible.
I think the computer in the original Star Trek was more realistic - but even there the voice-recognition was far beyond what we're capable of today, as Microsoft has demonstrated so well. Plus all the blinkenlights that seemed to have no useful purpose were cool.
Speech recognition is still just a gimmick anyway. We still have a LONG way to go before it gets to the point that Joe Average User imagines it should be. Joe average user wants his computer to respond like the one in Star Trek. I still want to set up my Asterisk server with speech recognition, though, so that people can either dial or say the extension they want. It'd also be neat to pick up the phone, say "Call Mom" to the dial tone and have it call my aunt for me.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
...or a new slashdot signature.
Dear aunt, let's set so double the killer delete select all
why not just use two mics, one to record the ambient noise (positioned away from the voice mic) the other to record the voice (headset) then as you have two signals just subtract the ambient noise signal from the heaset signal , voila clean headset mic audio
works for music too, you could control your music player by voice even when its playing loud (at a party) by removing the music signal from the mic signal
-AJS
Hmmm, no. Maybe it's the way they deal with failures. Remember Bill gates trying hard to demonstrate the Media Center? Some time after that Steve Jobs gave his regular Macworld keynote when his Mac didn't respond anymore. He moved a monitor switch to continue the presentation on another Mac and said: "Well, that's why we have backup systems here."
all those holodeck scenes, Troi ordering chocolate, etc.
Hmmm - holodeck, Troi, chocolate.....the combination of those three items is something that gives one pause to ponder.
Um, I'll be back in a little bit.
Please stand clear of the doors, por favor mantenganse alejado de las puertas
A friend of mine called me at work (since he knew that to access MSNBC's videos requires Internet Explorer, Windows Media 9 or better, and Flash, and I have neither IE nor WMP at home) and told me about this.
I went to msnbc.com - and there it was, third on the list of videos on the main page.
I called this to the attention of two of my coworkers, and we viewed the video - total elapsed time, maybe twenty minutes.
Then I went to call it to the attention of a third coworker - and the video was no longer on the front page of MSNBC. OK, so maybe they've moved it off the front page, but it should still be on the Technology subsection, right?
Wrong.
Nor was it under Videos, nor anywhere else I could find it easily.
Perhaps this was just a normal rotation of a video. Perhaps not. But no matter what the real cause, there is the appearance that it was removed from the page because it was too embarrassing. Not good for Microsoft.
However, I will give MSNBC this - they didn't give Microsoft a free ride on this, they ribbed them pretty hard.
However, I knew that this would be appearing on other sources as a video that could be viewed outside of Windows. Actually, I am rather surprised that it took this long.
Now, as to the demonstration itself - it looks to me (a person who does signal processing and analysis for a living) like the presenter had the mike gain too high - every time he spoke he maxed out the bar graph on the display. *IF* he had the gain too high, and the audio was clipping significantly, that could make "mom" have enough of a pop to maybe sound like AUNT - especially if the software is using context to try to reduce the search-space for the words. Of course, that's why I would have a monitoring routine in the system, and if any of the samples are at 100% full scale, or if many of the samples are over 90% full scale, or the signal power is too high, I'd have my software adjust the mike gain down *and* flag an alert to the user. I'd also try to look for the mike element itself being overloaded.
www.eFax.com are spammers
Who knows how the algorithm they implemented works. Chances are the computer scientists behind it are not total asshats and they assumed the sales guy would follow the same procedure they did [e.g. to train it].
Point is, if the sales guy had tried the system out beforehand he would have noticed it not working.
That is, suppose the code is total shit [I know, big stretch for MSFT]. Then isn't it likely it would have failed during the preparation stage? If you are saying "mom" and it always comes back "aunt" you may want to cancel the presentation.
That's why I think he didn't do any prep work for the presentation.
Tom
Someday, I'll have a real sig.
Got ham pizza ship
I use Dragon NaturallySpeaking every day (carpal tunnel syndrome), and version 9 has around 99% accuracy, with around 98% out-of-the-box with no training. This means 10 or so errors out of a 1000 word dictation.
g ue.html?ex=1154318400&en=6fd795114b3f72ea&ei=5070
y Id=5577523
I didn't believe it either, until I actually tried it. Dragon is the first worthwhile speech recognition solution I've seen that's practical for general use (Though I'd love if they'd release a "programmers" version to compliment the Medial/Legal versions). I get about 99% accuracy (a decent microphone is *very* important!)
Dragon 9 also doesn't "technically" need training, but accuracy further improves if you do bother to train it a bit. The NYT reviewer was able to get 99.6% accuracy after a short training session.
Here's a few reviews of version 9:
http://www.nytimes.com/2006/07/20/technology/20po
http://www.npr.org/templates/story/story.php?stor
Since when did Mum sound anything like Aunt?