Vista Speech Recognition Goes Awry
An anonymous reader writes "It seems even MSNBC is willing to take a jab on those rare occasions when Microsoft products don't work. During a demo of Vista's speech recognition technology, Vista couldn't differentiate between mom and aunt, and all attempts to rectify the problem just made it worse. Wait until you see what it spat out, I think we have a new 'All your base.' Don't you just love Microsoft's live demonstrations?"
Reminds me of the Roald Dahl short story about the ant-eater who ate someone's aunt because their accent rendered the two words the same.
I can't remember what the story was called.
the layman's guide to computer science
Yeah, well its MS. You get that.
It's just a one-time thing.
I mean, it's not like they have a reputation for releasing half-assed code that's been hyped up through marketing to the point that it will never perform as advertised.
And it's not like this is a company that is having image problems due to its monopolistic nature.
Or headed by an infamous ragaholic with a history of intolerance towards free standards.
Nope, I'm sure that this is just an accident by a company that spends its off hours petting little baby chickens and bunnies.
Reminds me of the time when I worked at a computer store and we played with the voice recognition card in a PowerMac floor model. Somebody programmed it so that if someone said "Computer, bite me", it would respond with "Can't bite what's not there". Over time the accuracy of the recognition fell. One day as a salesman was talking to a customer about the computer it misinterpreted something he said and said "Can't bite what's not there". Needless to say that system was wiped and we weren't allowed to play with it anymore.
It's not a bug. It's a feature. It reads your mind and finds what you are TRYING to say. In this case, he wanted to write a letter to his aunt about setting bombs so as to double the amount of serial killers at large (currently two) and then remove the Select All command from all programs ever made, and so it outputted "Dear Aunt, let's set so double the killer delete select all".
it could lead to surprising porn....
Monstar L
Let's set so double the killer! I guess all that left to do is point and go "Haa Haa!"
Yes, once again Microsoft S/W Engineers learn that the more public the demo or the more important the audience, the more likely some will go wrong. It's one of Murphy's laws. Been there. Did that. Barely survived.
Experience is the human quality that enables you to recognize a mistake immediately when you make it again.
Dacap
English -- gotta love it! / The engineers refuse to refuse the rocket until the refuse is removed from the launch pad.
There is Dragon Naturally speaking 9, which apparently is pretty good, but will SR ever really be the Star Trek kind? You can't really expect to use SR in a noisey environment. Has anyone here ever worked on a SR application?
And totally the quality I'd come to expect from Microsoft.
you say:
Dear Mom,
Vista is gonna suck. Enjoy this powerbook instead.
Sincerely,
Little Girl
It hears:
Dear Aunt,
The explosives are deleted in the moon star night.
Buy more Vista.
Bite me,
Bitten World
Tom
Someday, I'll have a real sig.
Well selling 'i's worked for Apple. (iPod, iSight, iMac, etc.)
It is not as bad as you dog to the car in double tuesdays golf.
This isn't the first presentation went wrong, isn't it? ;)
Win98 gone wild: http://www.youtube.com/watch?v=Hrbx9_AY720
Media Center Edition gone wild http://www.youtube.com/watch?v=j7EEbokKLHI
We can add this one to the list too
Final text:
Not quite as embarrassing as the Windows 98 BSOD, but more entertaining than the Ballmer developer's video.
http://www.ntk.net/media/developers.mpg
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
...
"Don't you just love Microsoft's live demonstrations?"
As if MS is the only one who has problems with demonstrations. This is the problem with anti-ms guys... its not that MS is perfect, its just that the zelots are blinded by hate.
Steve Ballmers accidently send an e-mail while diligently testing the software. The e-mail says:
"Sir put down the chair, then we'll talk"
"No Steve wait up, don't do that"
"BOOM CRASH BOOM CRASH BOOM CRAASH WAAAH NOOO STOOOOP"
"DUDE, THE COMP HAS A BSOD! WAAH!"
Ok, all you anonymous cowards. Show me your speech recognition engine. And show me your demo.
"No seriously, stop throwing that chair!"
"What the hell is wrong with your comp Steve!?"
"Is that even a BSOD? It's just skewed now!"
"How the fuck do you turn this thing off!"
"What does this button do?"
"It's doing crazy things man, maybe we should just pull the plug!"
"No Steve, throwing a chair at it doesn't work!"
"CRASH BOOM CRASH"
that comment is as sad as the video.
I'll bet someone is Redmond is asking the demo organizer:
"Can you say -- canned like a tuna?"
3 things about computers: they're alive, they're self-aware, and they hate your guts.
Does this mean there's a lot of inbreeding going in Microsoft?
Having seen some of the research that goes on, voice recognition is still far from good.
Yes it works in some contexts, especially if it's been trained with the person speaking, and the language is limited, such as in a professional environment.
but for home computers, it's not only overkill it's also inadequate and non-functional.
I say COOL feature, but hopeless waste of time and money, which in the end will be paid by you-know-who (not ms)
on another topic can someone please ask ms to stop the increasing and excessive stream of bundling?
Blah blah sig blah blah blah irony blah blah
Open Source Speech Recognition?
I believe one such system is called Sphinx.
It's hard to wreck a nice beach. :)
Here is a different perspective on why speech recognition STILL sucks;
Natural language interaction is one of BillG's hot buttons. Back in 95 he used to love demonstrating with poly the parrot. Polly the parrot could recognize speech and react to it - like "play miles davis". He demo'd it many times, and yes, it occasionally glitched but the potential was pretty cool. When he built his house, he put speech recognition technology all through it, thinking that it would be perfected very soon.
WTF happened? Well along came this distraction called 'the internet' and 'netscape'. And then another distraction called 'open source' and 'linux'. As a result of those distractions it set natural language recognition back 10 years. Yep, this is case where competition has stifled a particular innovation. I'm not saying that it's a bad thing, maybe competition encouraged 5 other innovations, but I am positive it stifled this one in particular.
Just a different thing to think about...
slashdot troll = you make a compelling argument I do not like the implications of.
My guess is that the marketer "showing off" the voice recognition didn't properly train the software before the demonstration. If he did do that then he obviously did not pick and test something that was at least known to work which is not a bad idea when you are doing product demos. The software obviously has much work left since it interpreted the two syllable sentence "select all." as 13 syllables "so double the killer delete select all" (while it did finally get "select all" where the hell did the rest of that come from?). I am suprised that Microsoft had so much confidence they would go on live TV with it.
This is one reason that I believe until software is done being fully tested and is fully released that someone technical and who knows the software inside and out (one of the developers) should be demonstrating the products. Leave the non technical marketers to demonstrate products that they can't mess up, kind of like the classes they had to take in college to get a marketing degree.
Hey, there is only one Return and it's not of the King, it's of the Jedi.
"Okay, Steve, I think it's back to normal again."
"What? Taking money out of the Xmas gift to pay for your chair and office damage? Well, wouldn't be the first time."
"But seriously man, just between you and me, but what's with all this MILF porn man?"
"Bill told you not to keep this shit on your protected network drive."
"Aw come on! Everybody knows your password is 'MacroBig'!"
"Imagine if the company digged deep enough?"
"Just in case this doesn't work out, we'll fire a bunch and write it off as "underperformed".
"Hey, what's that weird thing on your screen? OMG IT'S SENDING AN EMAIL, QUICK CUT IT O.F....."
Eek, I'm getting carried away here.
Maybe they're twin sisters... ;)
"Dear mom comma"
Dear aunt,
"Fix aunt"
Dear aunt, let's set
"Delete that"
Dear aunt, let's set
"Delete that"
Dear aunt, let's set
"Delete that"
Dear aunt, let's set so
"I think it's picking up a little bit of echo here...delete - select all"
Dear aunt, let's set so double the killer delete select all
*Manually selects all and deletes*
"Okay, I'm glad you're enjoying this"
*Laughter*
Does Microsoft have to copy EVERYTHING??? I used OS/2 Warp for the second half of the 90s but my experience with _its_ built-in speech recognition was pretty much identical to that demo.
Speech recognition is still just a gimmick anyway. We still have a LONG way to go before it gets to the point that Joe Average User imagines it should be. Joe average user wants his computer to respond like the one in Star Trek. I still want to set up my Asterisk server with speech recognition, though, so that people can either dial or say the extension they want. It'd also be neat to pick up the phone, say "Call Mom" to the dial tone and have it call my aunt for me.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
"Vista couldn't differentiate between mom and aunt"
Seems that the problem originated from spammers messing up the keyword-based research features. After cleaning up the Porn db Vista correctly differentiated all the different kinds of mature porn pictures.
I'll second the previous comment. Star Trek grade SR is just another fiction device, present for the very reasons we read fiction: we DON'T want realism in our fiction! Reality is far too full of lost days of work because you discovered the approach you took crashed hard into some flaw.
... representative ... hardware ... other issue ... connect me." Then the computer misses one of the words, and you get "Sorry, I didn't get that."
I am an enthusiastic fan of Speech Production, because computers understand source text just fine. Hoping for miracles in Speech Recognition is at best 50 years away, and simply "wishing" at best. (If you insist loud enough, can you retroactively re-create the past?)
Even with better accuracy in their limited environments, I still find voice prompts irritating, because people around you have to put up with "... yes
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
why not just use two mics, one to record the ambient noise (positioned away from the voice mic) the other to record the voice (headset) then as you have two signals just subtract the ambient noise signal from the heaset signal , voila clean headset mic audio
works for music too, you could control your music player by voice even when its playing loud (at a party) by removing the music signal from the mic signal
-AJS
Yeah, this was just as funny. Right... "Dear aunt, let's set so double the killer delete select all" HAHAH LOLOL!!1!1! I'm peeing my pants it's so funny.
Related in there is the urge to have a computer "perfectly understand" everything, so they can indulge in unclear thinking. Not a day goes by without my supervisor saying, "So, how is that progressing?", referring to something about seven topics back. When I haven't a clue, I continually reply: "I don't understand the question."
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
Demo Rules: 1) You never demo with out a script. 2) You never demo on any system that you did not use to create or at least test the script. 3) You never deviate from the script. 4) If someone wants to see something not in the script, show it to them later, in private.
Undetectable Steganography? Yep, there's an app fo
I would give it 20 years times the number of steps, with waiting at each step for the patents to run out, if Elektroschock is correct in his comment that it is a patent minefield.
Reason 1: You don't have to train this software. That's when you have to read aloud a canned piece of prose that it displays on the screen -- a standard ritual that has begun the speech-recognition adventure for thousands of people.
I can remember, in the early days, having to read 45 minutes' worth of these scripts for the software's benefit. [...] NatSpeak 9 requires no training at all."
All your aunt are belong to Microsoft...
Should a driver for the keyboard be bundled? To somebody who does not have use of hands, speech recognition is as indispensable as a keyboard driver. This is important when trying to get your product certified as disability-safe for use by agencies of governments.
omg...at least i had a good laugh xD
Maybe his mom is his aunt.
You're welcome.
argh, I hate living in china, ...google video is not available.
Other things which are banned here are:
bbc.com and most other western news sources (cnn.com is not banned strangely enough)
google.com and google.cn chache
wikipedia.com
I have to resort to using an anonymiser for most of my surfing (www.anonymouse.com)
I wonder how long untill they ban the whole internet, and replace it with a huge jpeg of Mao.
(ps I'm a westerner, most chinese don't notice)
rare occasions?
---- "XML is like violence. If it doesn't fix the problem, you aren't using enough."
OS/2 Warp had speech recognition in 1994 with OS/2 Warp. Better yet, the OS/2 version of netscape at the time was speech enabled (browse simply by speaking the link). Even cooler was that the netscape developers actually listened to the OS/2 community with that version (I remember them implementing something that I had asked for...very cool). Keep in mind that the average system of that time was a pentium 133 with 100MB of ram. And here we are at 2006, With GHz processors and GBytes of RAM dirt cheap, and M$ is just now starting to experiment with this? By now this technology should be damned near perfectly integrated across the board! Thanks for abusing your monopoly power to destroy all of the competition and REAL innovation, Microsoft!
double the niece's killers select all delete DIE!
They hired the outsourcing tech support firm that just splitted with Dell. Vista probably understands Dell tech support lackeys just fine.
That's so last century. NPR did a bit on the new Dragon Dictate 9. The NPR reporter got 100% accuracy out of the box, no training.
Dictation Software Improves Usability, Accuracy
On the Money = CNBC
Yes bugs happen, yes vista is still in beta but rather then just admit "vista is still a buggy piece of crap software that can't even be used properly by its own engineers" they tell us to sit and wait because we can trust them to fix it.
To MS credit, it is a strategy that works.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
I have no speakers at the moment; what did the machine end up spitting out that was so funny?
Brought to you by Carl's Junior.
Yes, sales and marketing guys are asshats, but they're also professionals who usually spend a lot of time with the software rehearsing and choreographing their demos, because they're the ones who have to deal with the immediate fallout, after all.
This demo didn't just drop a couple of words, or misinterpret an ambiguous sounding phrase, this was a complete melt down. A more plausible explanation is that the guy's voice was also amplified through the PA system in the room, and the computer's microphone was therefore picking up both the original voice and the delayed, amplified voice coming out of the PA.
ENDUT! HOCH HECH!
He did say delete select all... *shrugs* As for that killer stuff... Vista knows when it's dead.
They must have licensed the technology from the same people iListen did. It was acting *exactly* the same as my experience with that useless piece of shit Mac software, including having a terribly hard time recognizing special commands it should be super good at, like "delete that."
It was funny, but the end of the video was funnier. Apparently Microsoft sent the heavies round and made it clear they weren't happy about the video being shown, and that the problem was down to background noise. However, CNN obviously found it funny, and the newsreader pointed out that it was a very quiet room until it started going wrong and people started laughing.
"Live television is rough. Welcome to our world." she said. Ooooooh. Nice kick below the belt. Sounds like they're not keen on Microsoft at CNN.
A friend of mine called me at work (since he knew that to access MSNBC's videos requires Internet Explorer, Windows Media 9 or better, and Flash, and I have neither IE nor WMP at home) and told me about this.
I went to msnbc.com - and there it was, third on the list of videos on the main page.
I called this to the attention of two of my coworkers, and we viewed the video - total elapsed time, maybe twenty minutes.
Then I went to call it to the attention of a third coworker - and the video was no longer on the front page of MSNBC. OK, so maybe they've moved it off the front page, but it should still be on the Technology subsection, right?
Wrong.
Nor was it under Videos, nor anywhere else I could find it easily.
Perhaps this was just a normal rotation of a video. Perhaps not. But no matter what the real cause, there is the appearance that it was removed from the page because it was too embarrassing. Not good for Microsoft.
However, I will give MSNBC this - they didn't give Microsoft a free ride on this, they ribbed them pretty hard.
However, I knew that this would be appearing on other sources as a video that could be viewed outside of Windows. Actually, I am rather surprised that it took this long.
Now, as to the demonstration itself - it looks to me (a person who does signal processing and analysis for a living) like the presenter had the mike gain too high - every time he spoke he maxed out the bar graph on the display. *IF* he had the gain too high, and the audio was clipping significantly, that could make "mom" have enough of a pop to maybe sound like AUNT - especially if the software is using context to try to reduce the search-space for the words. Of course, that's why I would have a monitoring routine in the system, and if any of the samples are at 100% full scale, or if many of the samples are over 90% full scale, or the signal power is too high, I'd have my software adjust the mike gain down *and* flag an alert to the user. I'd also try to look for the mike element itself being overloaded.
www.eFax.com are spammers
Since when did Mum sound anything like Aunt?
Try calling the post office. Their entire menu is voice activated. "If you want to track a package say Track Package". Worked fine for me the other day. Guess they're not using Windows huh?
If the computer was listening to the same microphone that the sales guy was using to address the crowd, it seems like ambient noise and echo was low. If it was using another microphone, there's no telling what it heard.
The infamous Howard Dean scream was not loud compared to the room of screaming people and he didn't look like a lunatic until later when people heard him recorded through the mic he was holding, without the crowd noise. A person (or computer) in the room would have heard something completely different.
What I heard on the video is only evidence of that particular microphone, not of what the computer was hearing. Broadcast media is almost as arrogant as Microsoft.
It was an audio gain issue
I don't know. I have a laptop with the tablet PC edition of windows. It has voice recognition. When you mess up and want to delete something, you have to say "command mode" first. If you just say, "delete that" on my laptop, then it will type out the words "delete that" But if you say, "command mode" it will beep. then you can say delete, select all, whatever.
That's what I was thinking when I saw the video.
I tested a tablet a couple of years ago that had MS handwriting and voice recognition built in. You have to "train" the voice recognition by reading pre-made texts out loud (strangely, they were mostly about how great Microsoft is). Even after the training, the results were like the computer was channeling James Joyce. It especially had trouble with short words like "our." If the presenter had said, "Dear Mother, please rendezvous with me in Luxembourg" it would have been just fine.
My Cell Phone's voice recognition hasn't given me a problem, EVER. Perhaps Microsoft could learn a thing or two from the mobile industry?
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Im sorry but it seems like this has happend before. A half-assed operating system that doesn't do everything they planned. Lets jsut watch all the pieces fall off vista until it becomes a switch similar from 98 to ME. Hell, im still waiting for 95 to do what they said it would.
It's not -1 Flamebait! It's +5 Funny. You just didn't get the joke...
According to Rob Chambers, a developer on the Vista speech recognition team, the failures during the demo were caused by audio gain issues.
From his blog:
Read the entire blog post for a more complete explanation of what happened... one that's just slightly more plausible than most of the explanations proffer by your fellow Slashdotters.
I'm the original poster...This guy put coffee all over my LCD. (bastard)
I use Dragon NaturallySpeaking every day (carpal tunnel syndrome), and version 9 has around 99% accuracy, with around 98% out-of-the-box with no training. This means 10 or so errors out of a 1000 word dictation.
g ue.html?ex=1154318400&en=6fd795114b3f72ea&ei=5070
y Id=5577523
I didn't believe it either, until I actually tried it. Dragon is the first worthwhile speech recognition solution I've seen that's practical for general use (Though I'd love if they'd release a "programmers" version to compliment the Medial/Legal versions). I get about 99% accuracy (a decent microphone is *very* important!)
Dragon 9 also doesn't "technically" need training, but accuracy further improves if you do bother to train it a bit. The NYT reviewer was able to get 99.6% accuracy after a short training session.
Here's a few reviews of version 9:
http://www.nytimes.com/2006/07/20/technology/20po
http://www.npr.org/templates/story/story.php?stor
http://cmusphinx.sourceforge.net/html/cmusphinx.ph p
x ?pg=12 -core 2 duo
0 /index.x?pg=6 -early Athlon XP vs P4
http://sourceforge.net/projects/cmusphinx/
It's been around for a while. I think it's pretty good, though quite resource-demanding. The peeps at the tech-report usually benchmark it.
Here for instance:
http://techreport.com/reviews/2006q3/core2/index.
http://techreport.com/reviews/2002q1/athlonxp-210
(middle of the page)
As you can see, we can do real-time sphinx now.
One pronunciation of 'aunt' (the less common in US) uses the exact same vowel sound, and 'm' and 'n' are very similar. If the salesgenius had trained it to his (probably the more common) pronunciation of 'aunt', he wouldn't have had the problem. I suspect MS's program figures that in the salutation, 'aunt' is one of the more common words. That's why it got 'dear' correct, else it might've said something like 'tear'.
..chair recognition
"You're everywhere. You're omnivorous."
C'mon, peeps, the year is 2006. AYBABTU is dead and buried.
Do you see what I did there?
If a product is ready for the world, it absolutely should be ready for a live demo. It shouldn't be just barely ready to work, under perfect conditions, minutes after careful testing by technicians.
In fact, they should have called up a volunteer from the audience... preferably a member of the press so you'd know it wasn't a confederate... to do the talking.
We realize things can go wrong under the best of circumstances. But it is still a completely valid test of the company's confidence in their product. In the real world, conditions will be far worse, there will be no technicians around, the software will be running in an environment that's had three security patches and two other major products installed on top of it, and so forth.
In the days of live TV there was a program, it might have been Ed Sullivan, in which they regularly ran commercials for Timex watches, which, they said, "takes a licking and keeps on ticking," in which they put a Timex through various torture tests. It didn't always survive: the one I remember was the one where they buckled the watch onto the blade of an outboard motor's propellor and ran the motor in a tank of water, onstage, live. When they finished, at first they couldn't find the watch: the strap had broken. The propellor had flung the watch into a corner of the tank. And it wasn't ticking. But mostly, the Timex demos worked.
On the Steve Allen show, they would regularly demonstrate Polaroid cameras. This was in the days before the cameras were motorized and the processing operation was tricky: you had to pull a long strip of paper-film sandwich firmly against fairly stiff resistance, wait ninety seconds, open the camera back, get a fingernail into a slit, and peel the perforated picture base away from the backing. Occasionally they had problems. As with the Timex demos, you got a completely convincing picture of the product's reliability and usability, and the company's confidence in their products.
In Jack London's "The Sea-Wolf," a character asks "Do you know Dr. Jordan's final test of truth?" and answers: "Can we make it work? Can we trust our lives to it? is the test."
I don't say Microsoft should wait to ship until they are ready to trust their lives to Vista voice recognition, but they should darn well be prepared to demonstrate it, live, in public.
"How to Do Nothing," kids activities, back in print!
What would be awesome is if Steve Jobs used the Mac's voice recognition to have it type out "Dear Aunt double the killer delete select all"
In some states your mom IS your aunt.
Don't fight for your country, if your country does not fight for you.
"It's easy to wreck a nice beach"
Google Video used to have a "save file" feature (even had PSP/iPod/etc. choice). Where is it?
Since when did Mum sound anything like Aunt?
With this new technology, hacking Vista is going to be pathetically easy.
Wish I had mod points...
This isn't a new technology (as Microsoft would like you to believe), speech recognition has been around for well over 5 years (innovation???)
I had a copy of DragonSpeak about 3 years ago, it never made such simple mistakes (actually, i would recommend it!) and the computing world has come on leaps and since then.
To me, at the moment, Microsoft's (specifically Vista) latest dev model seems to resemble a badly organised, feature saturated, open sourced effort. Think of all those bloated applications, with thousands of features, none of which work properly, and the aim of the project (those KEY features) have been so buried and sidelined, that they are buggy as well.
Can we not persuade them to use the best of of OS and (dare i say it) the best bits of closed source to actaully come up with a product that people will be happy to upgrade to, as opposed to a usergroup who don't like the product but are forced to upgrade anyway.
You feel sleepy. Close your eyes. The opinions stated above are yours. You cannot imagine why you ever felt otherwise.
Has Microsoft taken into account how this small change will affect the transparent aluminum industry?
...there is some REALLY funny stuff that comes out ot speech-to-text. For example:
On a mobile phone that was being demonstrated in Italy just after the Pope had been elected, an Italian spoke into it with a strong accent that the machine wasn't accustommed to. The speaker said "Congratulations to our new Pope."
The phone repeated: "Congratulation to our new poke."
Who says computers don't have a sense of humour?
Those using pirated Tinysoft signatures(TM) are a real threat to society and should all be thrown in jail.
When I was last involved in adding speech control to an app, I attended a developer workshop at Apple, and found out much to my surprise that my mic wasn't any good. It sounded fine when I used it for voice recording, but for recognition the gain curve was all wrong. When I tried one of the mics that the speech team from Apple provided, the hit rate went from under 20% to well over 90%.
When Kim Silverman demos Apple's speech recognition, he uses a high-quality noise cancelling mic. It makes all the difference.
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
Each command was prefaced with "Computer", and seemed to have a common vocabulary to it.
They never said "Computer - what's up with that?" It was always "Computer - Find all data pertaining to [a specific event]". It seem what they had in fact mastered in the future was not only voice recognition, but also a common language programming and database system.
My mom is in a coma.
My aunt is in a,
Google search for "Let's set so double the killer" I want to see 95 youtube remixes by monday.
My turnips listen for the soft cry of your love
In the 90's, a bit after Apple released the Newton, I saw a demo by the handwriting recognition team at Microsoft. They were very proud of their work, which they promised would be much better than Apple's. As part of the demo, the presenter wrote "Windows Rules" on the tablet, which the software recognized as "Windows Pales". That got a pretty embarrassing round of laughter considering it was an internal demo.
To be fair, I'll throw in a more recent Apple story. I ordered a Mac Book Pro the day they were announced, and when it arrived it had the now famous squeal. I called Apple's 800 number and endured the automated voice menu:
Apple Robot: "Say the name of the product you are calling about."
Me: "Mac Book Pro"
Apple Robot: "Did you say Apple Cinema Display?"
Me: "No"
Apple Robot: "I'm sorry. Please say the name of the product you are calling about"
Me: "Mac Book Pro"
Apple Robot: "Did you say Unix?"
I got one of the very first shipping units, so they obviously hadn't added "Mac Book Pro" to the known list of products yet, but it was hysterical how far away the nearest matches were.
These two demos worked perfectly:
1 2378047444&q=Motorrider&pl=true
- http://video.google.com/videoplay?docid=-41344461
- http://maclive.net/sid/135
Wonder what internal build of Vista they used...
Every time I say the word "Linux" it gets typed out as "Windows". Go figure.
Best. Comment. Ever. Enjoy!
The PR guy was demoted. Microsoft put him in a box with a keyboard and called him "Vista Speech Recognition." See, such technology is in reach!
I have freaks! I did something right...
The video is funny, but I've tried the Vista public beta, and I loved the speech recognition. Outside of minor beta bugs, it worked flawlessly. I only wish it were available now.
Ad Astra Per Asper
I can hardly wait for the days when "Deoxyribonucleic acid" is synonymous with "dead ox E ride O nuclear lick ass lid aunt."
http://slashdot.org/~tomstdenis http://slashdot.org/~tomstdenis Last 16 comments. You must really hate MS. I think you're winning.
I hear there's rumors on the Internets that we're going to have a draft.
Ok -- might as well finish the ad. What do you recommend for microphones?
However, it's done passively (e.g. without electronics) in system using a single mic with two openings (pretty flash demo of the tech at: http://www.theboom.com/technology.html ). It was actually designed for voice recognition on the Wall Street trading floor but is now used in Black Hawk helicopters. List the to demo sound files: http://www.theboom.com/theboomO.html
then The Free Open Source Community could have improved it and made it better!!!!
Linux!
To start with, I've never used any voice recognition software. Nevertheless, I think voice recognition is way overhyped as a concept for word processing, even assuming it works flawlessly. Here is why.
First of all, spoken English is very different from written English. Try to record a typical conversation and then write it down exactly as you hear it. You'll notice almost no structure, very short phrases, no full stops, no commas, etc. You can't send such a "document" to anybody.
So, if want your text to have any quality, you'll spend much more time editing it than actually writing new words. Now try to edit your document efficiently using voice recognition software. Marketing voice-recognition as a keyboard replacement for producing documents ("it's so fast and easy, you don't even have to learn how to type") is just bullshit.
Sometimes, you don't need quality, for example when writing a short note to a friend. But isn't it then easier just to record yourself and let the receipient listen to your message directly (like voicemail)? What's the point of having a computer do voice recognition if humans can do it much better?
Finally, voice recognition defies privacy. Do you seriously want to dictate an intimate email/letter to your girlfriend? Why do you think text messaging got so popular?
I see voice recognition useful in two cases: (1) if the final receipient is a computer system, for example, star-trekish "computer, dim the light in this room", (2) the author is disabled so that he/she can't use the standard keyboard.
Actually the last MS demo flame out of this magnitude came when the beta of Windows 98 was being demoed. They wanted to show a scanner "just working" with Windows 98 and USB. W98 hit the blue screen of death when the USB scanner was plugged in.
There, I found it. The file is an old QuickTime movie. I'm going to put this up on YouTube. There, that's done. Have at it.
Knowledge is power. Knowledge shared is power multiplied.
Dragon Systems had better speech recognition than that years ago. Plus, they declared bancruptcy several times? or something since the late 90s. Why the hell wouldn't Microsoft just buy them instead of trying to roll their own (and doing their usual suckage job of it.)
Avoid Missing Ball for High Score
Background noise? I think I can hear chairs flying!
Blonde Speech Recognition Engine
Now where's your demo?
Just announced, Microsoft, a known petter of baby chickens and bunnies has just announced the signing if Larry Flint. Said a Microsoft spokesperson, as a lover of chickens, we thought Larry might be a good, errr, fit.
-- I ignore anonymous replies to my comments and postings.
It seems that all of you are just a bunch of geeks trying to kill Money$oft, you take every mistake of microsoft as a capital sin... but you forget one thing, our beloved Open Source is way toooo buggy, from the freezing of a day to day application as gaim to the problems of Flex. So, if it's not about antitrust, can you please Microsoft alone and grow up at least a little?
Regular people who speak at work like NPR reporters and anchors usually end up getting sent off for drug tests.
-- I'm old enough to have lived through six different meanings of the word "hacker."
OSX
Flame away gentlemen...
s/rare/frequent
Help us build a better map!
Yes, because when you saw it first was obviously when it was put up. You are the worst troll on Slashdot, but keep trying!
I think the main problem may not be the program, it's probably the developer who doesn't realize that mom and aunt should be different people, which makes me worry about the implications of microsoft trusting people that inbred
I think the concept behind MSNBC was originally to be a sort of tech TV channel, as it emerged right around when the internet was coming into the mainsteam But that whole concept flopped.. and eventually they just became a 24 hours news network.
Originally, the ownership of MSNBC was 50% microsoft and 50% NBC, but back in late 2005 NBC bought 32% of microsoft's share in the company. So, Microsoft really doesn't have a controlling stake in the company.
Although... NBC has always said that Microsoft doesn't have editorial control.
But I know I heard an interview on radio regarding Dragon Naturally Speaking version 9 which didn't require any training at all. The only way to lose it was to have an interview and expect it to render both people's speech in real time. That was a mistake.
cheers...ank
Still hoping for Gentle Treatment...
I have rarely used a speech-recognition software before, but all the experience I've had thus far is that one needs to speak in a very artificial way in order to use those. I've heard about this Dragon thing being very good now, but does it allow you to speak naturally (as you would to your buddies), or do you still need to adapt your speech to the machine and sound like a retard when you dictate?
After 3 days without programming, life becomes meaningless
- The Tao of Programming
Just imagine.. "Houston, we are go for launch" --> "Houston, we are gone for lunch"
Reminds me of a tv show called the "Computer Chronicles" with Stewart Cheifet. Don't take my word for it, see for yourself at the Internet Archive:
http://www.archive.org/details/OS2Warp
The voice dictation is at the end. Of course, this was an add in product. Later in 1996, with OS/2 Warp version 4, it was already included. Too bad IBM doesn't sell OS/2, you'll have to buy eComstation from Serenity Systems instead.
>>> For instance, the word "patent" is pronounced differently in the UK from North America. In the UK it is "pay-tent" and over here it's "pah-tent". That's just one example.
... "oh you mean the pay-tent office" ... from what I recall most Examiners used pa-tent.
I'm in the UK, from North Nottinghamshire (for my first 18 years). I prononce the word "patent" as pa-tent (with a short a as in "apple" [a-pul]).
Perhaps I'm wrong. I was a Patent Examiner for several years though - people did correct my pronounciation when I told them where I worked (!)
I have a better solution, and I'm surprised that it's taken so long for nobody else to come up with it. If I weren't so poor and unemployed right now, I'd patent this idea because it's a real winner. Are you ready? Wireless in-ear cranial vibration sensing microphones. The only ambient noise that will be picked up is the sound of you chewing some gum, so don't chew when you're talking to HAL.
The coolest thing about this idea (I think) is that each mic could be coded to a specific user, and then fifty people could all be talking to the same supercomputer at the same time, and the computer could distinguish based on their user ID and not have to distinguish voices at the audio level at all. I've talked about this idea elsewhere in the past, and I'm still pretty surprised that nobody is doing it.
Speech recognition (speech-to-text) is pretty trivial without training as the newest release of Dragon Naturally Speaking, version 9, demonstrates. With training, it can become nearly flawless. Speech-to-text is another area where AT&T's research lab has made some pretty good progress. I'm looking forward to the future of voice-interaction with my computer with a certain optimism, especially if somebody will just (please) do the cranial-mic thing. (Contact me for licensing! My idea! heh.)
Reinvent the wheel only at either a lower cost, greater effectiveness, or your own personal enrichment and satisfaction.
truth is, it is almost impossible for a computer program to understand English just using phonemes: some phrases are indeed pronounced in the same exact way, the most famous example being "lettuce spray" and "let us pray". Some state of the art research is indeed trying to peruse accents and intonation, to distinguish such phrases.
The brain works differently than the computer (no news, but it needs to be said once in a while).
First of all, speech is context-dependent. The computer, in order to reliable understand speech, it has to use a context-sensitive grammar, which is a major nightmare for computer linguistics. And since certain words sound the same, the only possible solution for reliable speech recognition is context-sensitive parsing.
Secondly, recognizing different accents requires experience, something that algorithms don't have. Speech recognition will not be useful unless Chinese, Hindu, Arab, Mediterranean, Russian accents (when speaking English) can be successfully parsed.
My intuition says that in order to have reliable speech recognition, the computer shall work like the brain, i.e. like a neural network. Of course computers are not that powerful yet, so I think we should not blast Microsoft yet.
I had a similar problem with Dragon Dictate a few years ago. It refused to accept anything I said, even in the training phase, unless I spoke with an American accent (and I can only do a lousy American accent).
remember to loot and pillage before you burn!
When a microsoft "new" technology fails amusingly or embarrassingly at a demo, they benefit from a) all the free publicity of people like us laughing, and b) the failures create a mindset in non-technical user markets that "gee, it must be knew stuff, even the demo went buggy." Apple had working, reliable speech recognition built-in to the OS way back in the early 90s with System 7.5. It didn't do dictation out of the box, but it provided alternative access methods to the OS, and could easily distinguish between words as different as "aunt" and "mother." People will believe Microsoft are pushing the boundaries, when in fact they're just bloating Doze even more by shoe-horning computer lab curios into an already overloaded OS. Then again, by even commenting, I guess I'm adding to the buzz (I roll my eyes as much at myself.)
"I hope you like Guinness, Sir. I find it a refreshing substitute for, er... food." Col. Jack O'Neil, SG-1
Myself!
https://www.eff.org/https-everywhere
But unfortunately I work in the speech industry, and it's hard enough selling stuff at the best of times, without these idiots pitching up and convincing a watching world that this is the state of the art...
There's an explanation of this bug and yes, it's gain levels related.
I was at this meeting and wrote up an account of what happened that counters the newscast here.