Vista Speech Recognition Goes Awry

← Back to Stories (view on slashdot.org)

Vista Speech Recognition Goes Awry

Posted by ryuzaki0 on Saturday July 29, 2006 @01:25AM from the egg-on-face dept.

An anonymous reader writes "It seems even MSNBC is willing to take a jab on those rare occasions when Microsoft products don't work. During a demo of Vista's speech recognition technology, Vista couldn't differentiate between mom and aunt, and all attempts to rectify the problem just made it worse. Wait until you see what it spat out, I think we have a new 'All your base.' Don't you just love Microsoft's live demonstrations?"

13 of 418 comments (clear)

Awww...c'mon guys.... by JoeLinux · 2006-07-29 01:31 · Score: 5, Funny

It's just a one-time thing.

I mean, it's not like they have a reputation for releasing half-assed code that's been hyped up through marketing to the point that it will never perform as advertised.

And it's not like this is a company that is having image problems due to its monopolistic nature.

Or headed by an infamous ragaholic with a history of intolerance towards free standards.

Nope, I'm sure that this is just an accident by a company that spends its off hours petting little baby chickens and bunnies.
1. Re:Awww...c'mon guys.... by tomstdenis · 2006-07-29 01:41 · Score: 5, Insightful
  
  Most likely the system was trained by an engineer and handed off to the ass in marketting. He was probably supposed to train it to his voice too but decided to hit the bar instead.
  
  Voice recognition requires some training regardless of who provides it. We're not Star Trek here....Prep work and rehearsal people. If mr. sales guy had tried the demo before the presentation he would have noticed it wasn't working and avoided the embarassment.
  
  This is why sales people are asshats. They're unprofessional non-technical people who sap back the high life while the rest of us have to put up with the mess they create through their daily barrage of verbal diarhea.
  
  Tom
  
  --
  Someday, I'll have a real sig.
2. Re:Awww...c'mon guys.... by tomstdenis · 2006-07-29 02:14 · Score: 5, Insightful
  
  Generally, from what I've seen you need to train it a bit on the way you speak. There are thousands of distinct English accents and pronounciation variations.
  
  For instance, the word "patent" is pronounced differently in the UK from North America. In the UK it is "pay-tent" and over here it's "pah-tent". That's just one example.
  
  Point is [to paraphrase ballmer]:
  
  Preperation (clap), preperation (clap), preperation (clap), preperation (clap), preperation (clap), [pitch of voice higher], preperation (clap), preperation (clap), [wheeze out of breath, pitch even higher], preperation (clap), preperation (clap), yeah!!!
  
  Something tells me this sales guy will get neither punished nor lose their x-mas bonus. Some poor schmuck in engineering will take the fall for not making the demo "people ready".
  
  Tom
  
  --
  Someday, I'll have a real sig.
3. Re:Awww...c'mon guys.... by calculadoru · 2006-07-29 02:42 · Score: 5, Funny
  
  Preperation (clap), preperation (clap), preperation (clap), preperation (clap), preperation (clap), [pitch of voice higher], preperation (clap), preperation (clap), [wheeze out of breath, pitch even higher], preperation (clap), preperation (clap), yeah!!!
  
  One would be inclined to think that since you went and typed that word nine times, you would have managed to spell preparation correctly at least once...
  
  --
  The power of accurate observation is commonly called cynicism by those who have not got it. -- G.B. Shaw
4. Re:Awww...c'mon guys.... by BasilBrush · 2006-07-29 03:30 · Score: 5, Funny
  
  I have been using the Microsoft Vista speech recognition feature for a while now and I can assure you all that it worms delete no delete bastard undo select back 8 you fucking aunt no select all enter
The Voice of Experience by dacap · 2006-07-29 01:36 · Score: 5, Insightful

Yes, once again Microsoft S/W Engineers learn that the more public the demo or the more important the audience, the more likely some will go wrong. It's one of Murphy's laws. Been there. Did that. Barely survived.

Experience is the human quality that enables you to recognize a mistake immediately when you make it again.

Dacap

--
English -- gotta love it! / The engineers refuse to refuse the rocket until the refuse is removed from the launch pad.
So? by Klaidas · 2006-07-29 01:39 · Score: 5, Informative

This isn't the first presentation went wrong, isn't it?
Win98 gone wild: http://www.youtube.com/watch?v=Hrbx9_AY720
Media Center Edition gone wild http://www.youtube.com/watch?v=j7EEbokKLHI
We can add this one to the list too ;)
Re:Dear aunt by James+Manning · 2006-07-29 02:11 · Score: 5, Informative

For the curious, it was an audio gain issue. Details on Rob Chambers' blog:

http://blogs.msdn.com/robch/archive/2006/07/29/682 479.aspx

--

Various ramblings
Re:Well by acariquara · 2006-07-29 02:16 · Score: 5, Funny

...or a new slashdot signature.

--
Dear aunt, let's set so double the killer delete select all
On MSNBC's front page - for about 30 minutes.... by wowbagger · 2006-07-29 03:05 · Score: 5, Informative

A friend of mine called me at work (since he knew that to access MSNBC's videos requires Internet Explorer, Windows Media 9 or better, and Flash, and I have neither IE nor WMP at home) and told me about this.

I went to msnbc.com - and there it was, third on the list of videos on the main page.

I called this to the attention of two of my coworkers, and we viewed the video - total elapsed time, maybe twenty minutes.

Then I went to call it to the attention of a third coworker - and the video was no longer on the front page of MSNBC. OK, so maybe they've moved it off the front page, but it should still be on the Technology subsection, right?

Wrong.

Nor was it under Videos, nor anywhere else I could find it easily.

Perhaps this was just a normal rotation of a video. Perhaps not. But no matter what the real cause, there is the appearance that it was removed from the page because it was too embarrassing. Not good for Microsoft.

However, I will give MSNBC this - they didn't give Microsoft a free ride on this, they ribbed them pretty hard.

However, I knew that this would be appearing on other sources as a video that could be viewed outside of Windows. Actually, I am rather surprised that it took this long.

Now, as to the demonstration itself - it looks to me (a person who does signal processing and analysis for a living) like the presenter had the mike gain too high - every time he spoke he maxed out the bar graph on the display. *IF* he had the gain too high, and the audio was clipping significantly, that could make "mom" have enough of a pop to maybe sound like AUNT - especially if the software is using context to try to reduce the search-space for the words. Of course, that's why I would have a monitoring routine in the system, and if any of the samples are at 100% full scale, or if many of the samples are over 90% full scale, or the signal power is too high, I'd have my software adjust the mike gain down *and* flag an alert to the user. I'd also try to look for the mike element itself being overloaded.

--
www.eFax.com are spammers
Re:Is SR ever going to be good enough? -- Yes! by oblique303 · 2006-07-29 03:32 · Score: 5, Informative

I use Dragon NaturallySpeaking every day (carpal tunnel syndrome), and version 9 has around 99% accuracy, with around 98% out-of-the-box with no training. This means 10 or so errors out of a 1000 word dictation.

I didn't believe it either, until I actually tried it. Dragon is the first worthwhile speech recognition solution I've seen that's practical for general use (Though I'd love if they'd release a "programmers" version to compliment the Medial/Legal versions). I get about 99% accuracy (a decent microphone is *very* important!)

Dragon 9 also doesn't "technically" need training, but accuracy further improves if you do bother to train it a bit. The NYT reviewer was able to get 99.6% accuracy after a short training session.

Here's a few reviews of version 9:

http://www.nytimes.com/2006/07/20/technology/20pog ue.html?ex=1154318400&en=6fd795114b3f72ea&ei=5070

http://www.npr.org/templates/story/story.php?story Id=5577523
Re:removing ambient noise by Anonymous Coward · 2006-07-29 03:53 · Score: 5, Informative

For those interested, merely subtracting the two signals doesn't work. The signal at the microphone is not just the music signal (called far-end signal) plus the mic signal (near-end signal). The music signal has travelled across the room before it reaches the microphone, giving it some reverberations (echo). If you simply subtract the two signals, you will still hear the music signal quite loudly.

What is done in practice and works extremely good, is modelling that "echo" as a filter (a FIR transversal filter, which is simply a delay line). You estimate the coefficients of the filter and use the music signal after the "room filter" has been applied to substract from the microphone signal. You then have the voice-only signal left.

This is setup is called AEC or Acoustic Noise Cancellation. It is used in every telephone and mobile phone there is and is crucial to ADSL. If an ADSL modem would not cancel out its own sent signal at its receiver, the attainable speed would be several times less. AEC is also the reason why talking immediately when you pick up a mobile phone leaves an audible echo of your own voice: estimating the coefficients of the filter is still taking place at that point.

See http://www.dspalgorithms.com/products/echo.html for a diagram of the AEC or read Haykin's Adaptive Filter Theory if you're looking for a decent book on the subject.
Since when did Mum sound anything like Aunt? by tomhudson · 2006-07-29 04:18 · Score: 5, Funny

Since when did Mum sound anything like Aunt?
... probably in the same jurisdictions where "wife" is spelled "first cousin" or "sister".