Slashdot Mirror


Vista Speech Recognition Goes Awry

An anonymous reader writes "It seems even MSNBC is willing to take a jab on those rare occasions when Microsoft products don't work. During a demo of Vista's speech recognition technology, Vista couldn't differentiate between mom and aunt, and all attempts to rectify the problem just made it worse. Wait until you see what it spat out, I think we have a new 'All your base.' Don't you just love Microsoft's live demonstrations?"

77 of 418 comments (clear)

  1. Roald Dahl by Ithika · · Score: 4, Funny

    Reminds me of the Roald Dahl short story about the ant-eater who ate someone's aunt because their accent rendered the two words the same.

    I can't remember what the story was called.

    1. Re:Roald Dahl by EddWo · · Score: 4, Informative

      From the book Dirty Beasts

      --
      "Taligent is still pure vapor. Maybe they'll be the last who jumps up on Openstep... "
    2. Re:Roald Dahl by palad1 · · Score: 2, Funny
    3. Re:Roald Dahl by Reverend528 · · Score: 2, Funny

      Eat up Martha?

  2. Awww...c'mon guys.... by JoeLinux · · Score: 5, Funny

    It's just a one-time thing.

    I mean, it's not like they have a reputation for releasing half-assed code that's been hyped up through marketing to the point that it will never perform as advertised.

    And it's not like this is a company that is having image problems due to its monopolistic nature.

    Or headed by an infamous ragaholic with a history of intolerance towards free standards.

    Nope, I'm sure that this is just an accident by a company that spends its off hours petting little baby chickens and bunnies.

    1. Re:Awww...c'mon guys.... by kripkenstein · · Score: 4, Insightful

      Nothing to worry about, I'm sure they'll get all the kinks out by the time Vista is released - sometime in 2008 or so, it seems, based on this video.

      This was really a dreadful presentation. There was no ambient noise (as the commentators say later, and despite what Microsoft says), and there was no echo as the demonstrator claims during the actual test. It seems to have been done under really good test conditions, but still it failed miserably.

    2. Re:Awww...c'mon guys.... by tomstdenis · · Score: 5, Insightful

      Most likely the system was trained by an engineer and handed off to the ass in marketting. He was probably supposed to train it to his voice too but decided to hit the bar instead.

      Voice recognition requires some training regardless of who provides it. We're not Star Trek here....Prep work and rehearsal people. If mr. sales guy had tried the demo before the presentation he would have noticed it wasn't working and avoided the embarassment.

      This is why sales people are asshats. They're unprofessional non-technical people who sap back the high life while the rest of us have to put up with the mess they create through their daily barrage of verbal diarhea.

      Tom

      --
      Someday, I'll have a real sig.
    3. Re:Awww...c'mon guys.... by tomstdenis · · Score: 5, Insightful

      Generally, from what I've seen you need to train it a bit on the way you speak. There are thousands of distinct English accents and pronounciation variations.

      For instance, the word "patent" is pronounced differently in the UK from North America. In the UK it is "pay-tent" and over here it's "pah-tent". That's just one example.

      Point is [to paraphrase ballmer]:

      Preperation (clap), preperation (clap), preperation (clap), preperation (clap), preperation (clap), [pitch of voice higher], preperation (clap), preperation (clap), [wheeze out of breath, pitch even higher], preperation (clap), preperation (clap), yeah!!!

      Something tells me this sales guy will get neither punished nor lose their x-mas bonus. Some poor schmuck in engineering will take the fall for not making the demo "people ready".

      Tom

      --
      Someday, I'll have a real sig.
    4. Re:Awww...c'mon guys.... by calculadoru · · Score: 5, Funny

      Preperation (clap), preperation (clap), preperation (clap), preperation (clap), preperation (clap), [pitch of voice higher], preperation (clap), preperation (clap), [wheeze out of breath, pitch even higher], preperation (clap), preperation (clap), yeah!!!

      One would be inclined to think that since you went and typed that word nine times, you would have managed to spell preparation correctly at least once...

      --
      The power of accurate observation is commonly called cynicism by those who have not got it. -- G.B. Shaw
    5. Re:Awww...c'mon guys.... by jc42 · · Score: 4, Insightful

      There are thousands of distinct English accents and pronounciation variations.

      Aw, c'mon; how many English dialects pronounce "mom" and "aunt" similarly?

      Even to someone who's worked with voice recognition, that mistake simply isn't credible. If the software were anywhere near usable, it wouldn't confuse those words from anyone, especially not in a low-noise, no-echo demo.

      This is a "No excuses" situation. That demo was simply a dismal failure due to some major bug(s).

      Of course, the speech recognition field has a long history of staying in such a state forever. It's hard to find a product that, even with extensive training, doesn't produce howlers like this.

      I did like the "killer" part ...

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    6. Re:Awww...c'mon guys.... by teridon · · Score: 2, Interesting

      FWIW, Nuance claims that their latest version of Dragon Naturally Speaking (v9) doesn't require training before use. But of course this is different software. But consider this -- aren't "Mom" and "Aunt" phonetically dissimilar enough that you should NOT need to train it?

      I'm not one to defend MS, but I speculate that the volume on his microphone was set too high, causing distortion and clipping. Look at the volume meter when he talks -- it goes all the way to the top.

      --
      I hold it, that a little rebellion, now and then, is a good thing. -- Thomas Jefferson
    7. Re:Awww...c'mon guys.... by Illserve · · Score: 3, Interesting

      Let's give this guy some credit. He clearly has some degree if competence if he's selected to showboat the app at a major presentation, at least enough to know that you need to train, or at least test, a voice recognition demo.

      A far more likely scenario, in my mind, is that he trained and tested it 100 times and got it working nearly flawlessly, but in a different room and with a different setup. In fact he may have overtrained it. Programs like this can behave very badly when they end up overfitting the data.

      On the day in question he may have had a different mic and the acoustics were certainly different and the program went whacko.

    8. Re:Awww...c'mon guys.... by tomstdenis · · Score: 4, Funny

      I used the Vista Beta Speech Recognition?

      [and I copy/pasted it. Yeah I know, I'm hardly literate. What you wanna fight about it?]

      Tom

      --
      Someday, I'll have a real sig.
    9. Re:Awww...c'mon guys.... by tomstdenis · · Score: 4, Insightful

      I never said training was the only cause of the failure. I said it's likely that he didn't train it. Because most high powered sales people are just cocaine snorting asshats that make peoples lives miserable.

      Chances are he never even did a walk through of the presentation before the press was there.

      Tom

      --
      Someday, I'll have a real sig.
    10. Re:Awww...c'mon guys.... by tomstdenis · · Score: 3, Insightful

      You clearly don't work for a large corporation. Sales people (who are not all bad) are typically the sort that don't really understand technology and are all juiced up to make sales. Around technology they often leap before actually finding out the facts which nets them in a world of trouble.

      I seriously doubt this presentation was rehearsed. At the very least, they should have tested it in that room with that mic, etc. But in all honesty, this is going to be used by millions of people in all sorts of rooms with all sorts of mics. That shouldn't matter anyways.

      Anyways, I doubt he prepared at all, that is, other than snorting cocaine off a mirror in the back room before the show.

      Tom

      --
      Someday, I'll have a real sig.
    11. Re:Awww...c'mon guys.... by BasilBrush · · Score: 5, Funny

      I have been using the Microsoft Vista speech recognition feature for a while now and I can assure you all that it worms delete no delete bastard undo select back 8 you fucking aunt no select all enter

    12. Re:Awww...c'mon guys.... by KlaymenDK · · Score: 3, Interesting
      Aw, c'mon; how many English dialects pronounce "mom" and "aunt" similarly?


      Try this:
      Said: "How to recognize speech"
      Understood: "How to wreck a nice beach"

      No, it's not always easy to tell the difference...
    13. Re:Awww...c'mon guys.... by WilliamSChips · · Score: 3, Funny
      Because most high powered sales people are just cocaine snorting asshats that make peoples lives miserable.
      And I think calling them that would be an insult to the cocaine-snorting asshat that makes people's lives miserable community.
      --
      Please, for the good of Humanity, vote Obama.
    14. Re:Awww...c'mon guys.... by LnxAddct · · Score: 2, Interesting

      I guess you've never used the voice recognition in OS X. Out of the box, worked perfectly for me. I can speak very normally, sometimes even faster than I usually speak to people, and it works fine. I've never trained it (I don't even think you can). Microsoft is simply half-assing it again.
      Regards,
      Steve

    15. Re:Awww...c'mon guys.... by Hikaru79 · · Score: 2, Funny

      the speech recognition field has a long history of staying in such a state forever.

      Wow. I think my head exploded just trying to think about that.

    16. Re:Awww...c'mon guys.... by miskatonic+alumnus · · Score: 2, Funny

      Or the classic "Scuse me while I kiss this guy".

    17. Re:Awww...c'mon guys.... by Atario · · Score: 2, Informative

      There were other problems that simply can't be put down to actual recognition problems; it clearly understood perfectly the pronunciation of "delete select all", yet didn't act on any of that as a command.

      --
      "A great democracy must be progressive or it will soon cease to be a great democracy." --Theodore Roosevelt
  3. Hee hee by kefoo · · Score: 4, Funny

    Reminds me of the time when I worked at a computer store and we played with the voice recognition card in a PowerMac floor model. Somebody programmed it so that if someone said "Computer, bite me", it would respond with "Can't bite what's not there". Over time the accuracy of the recognition fell. One day as a salesman was talking to a customer about the computer it misinterpreted something he said and said "Can't bite what's not there". Needless to say that system was wiped and we weren't allowed to play with it anymore.

    1. Re:Hee hee by Elektroschock · · Score: 2, Interesting

      In fact voice recognition would be a great playground for non-profit open source software projects.

      Voice recognition means permanent beta. Voice recognition only slightly improved during the last ten years. One reason is that the VR market it a trivial patent minefield. The rest is just performance.

      Sure, we will get proper voice recognition some day. I would source it out to open source and integrate it back into my products once it will be ready.

    2. Re:Hee hee by marcello_dl · · Score: 2, Insightful

      Yes, we'll get good voice recognition one day. It'll be right after 99% of the world population have mastered mouse and keyboard interfaces.

      --
      ---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
  4. Well by antifoidulus · · Score: 3, Funny

    it could lead to surprising porn....

    1. Re:Well by acariquara · · Score: 5, Funny

      ...or a new slashdot signature.

      --
      Dear aunt, let's set so double the killer delete select all
  5. The Voice of Experience by dacap · · Score: 5, Insightful

    Yes, once again Microsoft S/W Engineers learn that the more public the demo or the more important the audience, the more likely some will go wrong. It's one of Murphy's laws. Been there. Did that. Barely survived.

    Experience is the human quality that enables you to recognize a mistake immediately when you make it again.

    Dacap

    --
    English -- gotta love it! / The engineers refuse to refuse the rocket until the refuse is removed from the launch pad.
    1. Re:The Voice of Experience by son+of+remo+williams · · Score: 2, Interesting

      My gf's brother works for the MS subsidiary that does network set up and tech support for MS trade shows. He's personally wired Gates and Ballmer and has admitted that many "live" demo's the head honchos have presented were actually canned like Ashlee Simpson lip synching on SNL. They don't trust their own products enough to put their execs in the same embarrassing position that this presenter got himself into.

  6. So? by Klaidas · · Score: 5, Informative

    This isn't the first presentation went wrong, isn't it?
    Win98 gone wild: http://www.youtube.com/watch?v=Hrbx9_AY720
    Media Center Edition gone wild http://www.youtube.com/watch?v=j7EEbokKLHI
    We can add this one to the list too ;)

  7. Dear aunt by linvir · · Score: 4, Informative
    For the flashless. Here's the format:
    Microsoftie says this
    Speech recogniser hears this


    Dear mom
    comma
    Dear aunt,
    [laughter]
    Fix aunt
    Let's set
    Delete that
    Delete that
    Delete that
    so
    I think it's picking up a little bit of echo here
    Delete... select all
    double the killer delete select all
    [laughter]


    Final text:
    Dear aunt, let's set so double the killer delete select all
    1. Re:Dear aunt by James+Manning · · Score: 5, Informative

      For the curious, it was an audio gain issue. Details on Rob Chambers' blog:

      http://blogs.msdn.com/robch/archive/2006/07/29/682 479.aspx

    2. Re:Dear aunt by KlaymenDK · · Score: 2, Funny

      Wait a sec ..... I think [MS] marketing folks can't use noise-cancelling mics. Nothing would be recorded!

      Badabing! Thanks folks, I'll be here all week.

  8. Remember the Win98 BSOD? by Enderandrew · · Score: 4, Funny

    Not quite as embarrassing as the Windows 98 BSOD, but more entertaining than the Ballmer developer's video.

    http://www.ntk.net/media/developers.mpg

    --
    http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
  9. Just from MicroSoft Insider by Seiruu · · Score: 4, Funny

    Steve Ballmers accidently send an e-mail while diligently testing the software. The e-mail says:

    "Sir put down the chair, then we'll talk"
    "No Steve wait up, don't do that"
    "BOOM CRASH BOOM CRASH BOOM CRAASH WAAAH NOOO STOOOOP"
    "DUDE, THE COMP HAS A BSOD! WAAH!"

  10. Re:Is SR ever going to be good enough? by joe+155 · · Score: 4, Informative

    I have used Naturally Speaking, it can take a bit of time to train it, but if only you use it then you can eventually get it to the point where you can talk at a normal speed (although it has to be clear) and it will get to approaching 90% accuracy, sometimes I had it higher. The point was that it couldn't be used as an alternative to typing for extended periods though because you had to check everything it wrote.

    One thing it did do which was good though is tried to understand sections of speech, rather than just each word, which did improve accuracy. Words often follow patters and there are few words that make sense after a word, so it was often right with "over there".

    SR tech will eventually be as good as on star trek as long as people work on it. I would give it 20 years if it is seen as something which could make a lot of money, 40 if you have to wait for interested people to do it for free on their own time

    --
    *''I can't believe it's not a hyperlink.''
  11. Re:Is SR ever going to be good enough? by Ougarou · · Score: 2, Interesting

    Well, I've taken a look at that (a while back): Dragon seems to be the leader, they get (with a month of traning) te best accurracy.
    However, sound recognition engineers are slowly realizing that the problem of recognising words is not just the algorithm's fault. Even people arn't able to understand all words from a taped conversation in a cafeteria.

    Dragon is currently the best, getting further will probably require more input, like a webcam to read your lips. This is just another Microsoft product where they read the wikipedia page on it, produced a flashy interface and packaged it with their OS. If you want sound recognition, don't go with Microsoft, they don't have the expertise.

    PS: Don't forget, that getting a good or even special microfoon can make all the difference.

  12. Re:are u serious? by tomstdenis · · Score: 4, Insightful

    Microsoft routinely puts out their excellence over everyone else including OSS. Hear them talk about Office w.r.t. OpenOffice. They talk down about it, mock it, dismiss it, etc...

    It's called modesty. If MSFT had any [and some humility] they wouldn't get laughed at so hard for this. I mean look at Linux. Find a bug in the Kernel, fix it, post notices that its. You don't see anyone saying "Oh hahaha, Linus is at it again!" That's because you also don't see Linus on CNN mocking the rest of the world.

    Microsoft deserves all the negative press and humilitation they get because they are shameless, deceitful, greedy monopolistic bastards.

    Tom

    --
    Someday, I'll have a real sig.
  13. It's hard by wootest · · Score: 3, Funny

    It's hard to wreck a nice beach. :)

  14. Re:are u serious? by NixLuver · · Score: 2, Insightful

    Nah, that's not it. I don't hate "Microsoft"; that's just a name on a door somewhere. I don't hate 'the corporation'; corporations are not individuals, no matter what the law would have us believe.

    The reason I find this eminently amusing is that Microsoft is a company built on marketing. At no particular point has Microsoft had "The Superior Technical Solution"; they have always had luck and better marketing. Since DOS 3.3 there have frequently been products that were more stable, faster, easier to use - you name it. And Microsoft's captains have beaten them in two ways: Marketing and Money.

    So of course when a company who has built their foundation on marketing flubs it, it's more amusing than when a company who has built their foundation on performance of one kind or another flubs it. It's inescapable that the Bg Dog gets more scrutiny than the Contender, anyway. And Microsoft apologists should understand that.

  15. Vista couldn't differentiate between mom and aunt by jcraveiro · · Score: 3, Funny

    Maybe they're twin sisters... ;)

  16. Re:Is SR ever going to be good enough? by CastrTroy · · Score: 3, Insightful

    Who cares if they ever get up to star trek level. The technology still sucks. It's much quicker and less annoying to the people around you to just type on your keyboard. Sure it has some uses such as those who don't have full use of their hands. We shouldn't abandon all research on the subject, becuase it does have it's uses, but I don't think it's something worth pushing on the general population, especially before the technology is actually ready. People already don't like their computers, pushing buggy technology like this out will just increase the problem.

    --

    Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
  17. Re:Is SR ever going to be good enough? by Skater · · Score: 4, Insightful

    The computer in Star Trek (at least in the Next Generation) was WAY too smart. For it to do what it supposedly did in the show, it would have to be sitting there, monitoring the conversation all the time, and be totally able to understand the context of what was being said to know what to do. Not only when people directly asked the computer a question, but also when people wanted to converse with someone.

    For example, how does the computer know that Picard wants to call Riker and isn't just talking about him? Oh and keep in mind the computer never misinterpreted something. In other examples, people would carry on intelligent conversations with the computer - all those holodeck scenes, Troi ordering chocolate, etc.

    Star Trek-style of SR I think would be the holy grail and is probably always going to be out of reach. Barring some amazing breakthrough in AI algorithms, the computer power required just for the situations above would be incredible - and that's computer time that probably could be put to better use elsewhere, even if it was found to be possible.

    I think the computer in the original Star Trek was more realistic - but even there the voice-recognition was far beyond what we're capable of today, as Microsoft has demonstrated so well. Plus all the blinkenlights that seemed to have no useful purpose were cool. ;)

  18. Sequence of events by daveschroeder · · Score: 2, Funny

    "Dear mom comma"

    Dear aunt,

    "Fix aunt"

    Dear aunt, let's set

    "Delete that"

    Dear aunt, let's set

    "Delete that"

    Dear aunt, let's set

    "Delete that"

    Dear aunt, let's set so

    "I think it's picking up a little bit of echo here...delete - select all"

    Dear aunt, let's set so double the killer delete select all

    *Manually selects all and deletes*

    "Okay, I'm glad you're enjoying this"

    *Laughter*

    1. Re:Sequence of events by lurker412 · · Score: 4, Funny

      Got ham pizza ship

  19. OS/2 Still Kicking Microsoft's Ass by Greyfox · · Score: 4, Funny
    C'mon! IBM put on a great speech reco demo at the '95 Atlanta COMDEX. Their product worked flawlessly! Well... Except the guy fired it up and started talking and the little text editor was picking up the words when someone in the back of the audience yells "FORMAT C!" The crowd went wild and the guy doing the demo cracked up too, which caused the speech engine to freak out a bit. He had to delete a bunch of junk out of his text editor once things settled down.

    Speech recognition is still just a gimmick anyway. We still have a LONG way to go before it gets to the point that Joe Average User imagines it should be. Joe average user wants his computer to respond like the one in Star Trek. I still want to set up my Asterisk server with speech recognition, though, so that people can either dial or say the extension they want. It'd also be neat to pick up the phone, say "Call Mom" to the dial tone and have it call my aunt for me.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

    1. Re:OS/2 Still Kicking Microsoft's Ass by maxwell+demon · · Score: 3, Funny
      It'd also be neat to pick up the phone, say "Call Mom" to the dial tone and have it call my aunt for me.

      Seems Microsoft's speech recognicion is just right for you :-)
      --
      The Tao of math: The numbers you can count are not the real numbers.
  20. removing ambient noise by sh0rtie · · Score: 4, Insightful



    why not just use two mics, one to record the ambient noise (positioned away from the voice mic) the other to record the voice (headset) then as you have two signals just subtract the ambient noise signal from the heaset signal , voila clean headset mic audio

    works for music too, you could control your music player by voice even when its playing loud (at a party) by removing the music signal from the mic signal

    -AJS

    1. Re:removing ambient noise by Anonymous Coward · · Score: 5, Informative

      For those interested, merely subtracting the two signals doesn't work. The signal at the microphone is not just the music signal (called far-end signal) plus the mic signal (near-end signal). The music signal has travelled across the room before it reaches the microphone, giving it some reverberations (echo). If you simply subtract the two signals, you will still hear the music signal quite loudly.

      What is done in practice and works extremely good, is modelling that "echo" as a filter (a FIR transversal filter, which is simply a delay line). You estimate the coefficients of the filter and use the music signal after the "room filter" has been applied to substract from the microphone signal. You then have the voice-only signal left.

      This is setup is called AEC or Acoustic Noise Cancellation. It is used in every telephone and mobile phone there is and is crucial to ADSL. If an ADSL modem would not cancel out its own sent signal at its receiver, the attainable speed would be several times less. AEC is also the reason why talking immediately when you pick up a mobile phone leaves an audible echo of your own voice: estimating the coefficients of the filter is still taking place at that point.

      See http://www.dspalgorithms.com/products/echo.html for a diagram of the AEC or read Haykin's Adaptive Filter Theory if you're looking for a decent book on the subject.

  21. Re:are u serious? by Udo+Schmitz · · Score: 4, Insightful
    "As if MS is the only one who has problems with demonstrations."

    Hmmm, no. Maybe it's the way they deal with failures. Remember Bill gates trying hard to demonstrate the Media Center? Some time after that Steve Jobs gave his regular Macworld keynote when his Mac didn't respond anymore. He moved a monitor switch to continue the presentation on another Mac and said: "Well, that's why we have backup systems here."

  22. Mr. Pogue begs to differ: by Udo+Schmitz · · Score: 2, Interesting
    He writes: "The software I'm using is Dragon NaturallySpeaking 9.0, the latest version of the best-selling speech-recognition software for Windows. This software, which made its debut Tuesday, is remarkable for two reasons.

    Reason 1: You don't have to train this software. That's when you have to read aloud a canned piece of prose that it displays on the screen -- a standard ritual that has begun the speech-recognition adventure for thousands of people.

    I can remember, in the early days, having to read 45 minutes' worth of these scripts for the software's benefit. [...] NatSpeak 9 requires no training at all."

  23. Re:Is SR ever going to be good enough? by wkitchen · · Score: 2, Insightful
    There is Dragon Naturally speaking 9, which apparently is pretty good, but will SR ever really be the Star Trek kind?
    Probably. But it will have to get much better at using context. They're already using grammar as a cue, but it's going to take much more than that. Humans draw on memories of previous conversations, knowlege about the interests and mannerisms of the person speaking, and knowlege of the situation at hand. Even just knowing what's big in the news can help.

    As for ambient noise, there's often useful contextual information there too. Ambient noise can give information about where the speech is occurring and about what is happening at that location. In some rare cases the ambient noise might even be responsive to the speech itself. The audience laughing in the example was a clue that a) an audience is watching, and b) the system made a mistake. A human would have recognized that and used it to advantage. For a speech recognition system to work as well as a human, it will not only have to get better at separating speech from ambient noise, but it will need to be able to recognize the ambient noise.
  24. Microsoft Innovation by SCHecklerX · · Score: 2, Insightful

    OS/2 Warp had speech recognition in 1994 with OS/2 Warp. Better yet, the OS/2 version of netscape at the time was speech enabled (browse simply by speaking the link). Even cooler was that the netscape developers actually listened to the OS/2 community with that version (I remember them implementing something that I had asked for...very cool). Keep in mind that the average system of that time was a pentium 133 with 100MB of ram. And here we are at 2006, With GHz processors and GBytes of RAM dirt cheap, and M$ is just now starting to experiment with this? By now this technology should be damned near perfectly integrated across the board! Thanks for abusing your monopoly power to destroy all of the competition and REAL innovation, Microsoft!

  25. Voice recognition requires some training regardles by glrotate · · Score: 2, Interesting

    That's so last century. NPR did a bit on the new Dragon Dictate 9. The NPR reporter got 100% accuracy out of the box, no training.

    Dictation Software Improves Usability, Accuracy

  26. Re:Man, that brings back memories!!! by TheRaven64 · · Score: 3, Funny

    I recall a friend of mine demoing the MacOS speech recognition engine a decade or so ago. The dialogue went something like this: User: Open SimpleText. Computer: Are you sure you wish to shut down. User: No. Computer: Shutting down... As you can imagine, he didn't leave it switched on all that often...

    --
    I am TheRaven on Soylent News
  27. This sounds so much like Microsoft by SmallFurryCreature · · Score: 3, Insightful
    How the fuck did this bug go unfixed for so long? It to me sounds to much like the old MS sale strategy of saying "Just wait please, do not buy X now, our product will be much better in the future."

    Yes bugs happen, yes vista is still in beta but rather then just admit "vista is still a buggy piece of crap software that can't even be used properly by its own engineers" they tell us to sit and wait because we can trust them to fix it.

    To MS credit, it is a strategy that works.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

  28. Re:Is SR ever going to be good enough? by NormalVisual · · Score: 4, Funny

    all those holodeck scenes, Troi ordering chocolate, etc.

    Hmmm - holodeck, Troi, chocolate.....the combination of those three items is something that gives one pause to ponder.

    Um, I'll be back in a little bit.

    --
    Please stand clear of the doors, por favor mantenganse alejado de las puertas
  29. On MSNBC's front page - for about 30 minutes.... by wowbagger · · Score: 5, Informative

    A friend of mine called me at work (since he knew that to access MSNBC's videos requires Internet Explorer, Windows Media 9 or better, and Flash, and I have neither IE nor WMP at home) and told me about this.

    I went to msnbc.com - and there it was, third on the list of videos on the main page.

    I called this to the attention of two of my coworkers, and we viewed the video - total elapsed time, maybe twenty minutes.

    Then I went to call it to the attention of a third coworker - and the video was no longer on the front page of MSNBC. OK, so maybe they've moved it off the front page, but it should still be on the Technology subsection, right?

    Wrong.

    Nor was it under Videos, nor anywhere else I could find it easily.

    Perhaps this was just a normal rotation of a video. Perhaps not. But no matter what the real cause, there is the appearance that it was removed from the page because it was too embarrassing. Not good for Microsoft.

    However, I will give MSNBC this - they didn't give Microsoft a free ride on this, they ribbed them pretty hard.

    However, I knew that this would be appearing on other sources as a video that could be viewed outside of Windows. Actually, I am rather surprised that it took this long.

    Now, as to the demonstration itself - it looks to me (a person who does signal processing and analysis for a living) like the presenter had the mike gain too high - every time he spoke he maxed out the bar graph on the display. *IF* he had the gain too high, and the audio was clipping significantly, that could make "mom" have enough of a pop to maybe sound like AUNT - especially if the software is using context to try to reduce the search-space for the words. Of course, that's why I would have a monitoring routine in the system, and if any of the samples are at 100% full scale, or if many of the samples are over 90% full scale, or the signal power is too high, I'd have my software adjust the mike gain down *and* flag an alert to the user. I'd also try to look for the mike element itself being overloaded.

  30. Re:Is SR ever going to be good enough? by jc42 · · Score: 3, Interesting

    Footnote: Microsoft was a monopolistic, backwards company that started the PC revolution.

    They don't deserve credit for starting the "PC revolution". The credit properly belongs to the hundreds of little startups and hobbyists, the whole CP/M crowd and others like Amiga. Microsoft was a subcontractor to a giant monopoly (IBM) that stepped in after the little guys demoed there was a market, and took over that market. They succeeded mostly because of a marketing budget greater than the budgets of all the little companies combined.

    And there's a good argument that, by marketing PC/DOS rather than CP/M, they set back the PC revolution by 5 to 10 years, the time it took for PC/DOS to match the capabilities of CP/M when IBM started their PC marketing campaign.

    Sorry; that's the way "the Market" works in the computer field. Small, independent developers make something new and start selling it; the big companies then step in and take over the market through traditional monopoly strategies.

    It's likely that we're now going to hear people crediting Microsoft for starting the "voice recognition" revolution by inventing the new idea that computers can understand speech. Marketing can redefine history like that.

    (Whereas we computer geeks know that Al Gore invented speech recognition. ;-)

    --
    Those who do study history are doomed to stand helplessly by while everyone else repeats it.
  31. Re:Oh Please by tomstdenis · · Score: 4, Insightful

    Who knows how the algorithm they implemented works. Chances are the computer scientists behind it are not total asshats and they assumed the sales guy would follow the same procedure they did [e.g. to train it].

    Point is, if the sales guy had tried the system out beforehand he would have noticed it not working.

    That is, suppose the code is total shit [I know, big stretch for MSFT]. Then isn't it likely it would have failed during the preparation stage? If you are saying "mom" and it always comes back "aunt" you may want to cancel the presentation.

    That's why I think he didn't do any prep work for the presentation.

    Tom

    --
    Someday, I'll have a real sig.
  32. Audio Gain Settings Caused the Problem by ThinkFr33ly · · Score: 2, Informative
    As much as many of you would like to believe that the reason this demo failed was because Microsoft code is horribly designed and implemented, and that they are completely incompetent, there just might be a slightly more realistic explanation for the demo's abject failure.

    According to Rob Chambers, a developer on the Vista speech recognition team, the failures during the demo were caused by audio gain issues.

    From his blog:

    If you watch the video clip on MSN Video you can see in the speech user interface that the microphone "volume" is very high. It pushes up into the red frequently while Shanen is speaking to the computer. That's caused by the fact that the audio sub-system wasn't respecting the audio gain settings we've asked it to use.

    This is a known bug in current builds, and has already been fixed by the audio team in their private builds in preparation for RTM.


    Read the entire blog post for a more complete explanation of what happened... one that's just slightly more plausible than most of the explanations proffer by your fellow Slashdotters.
  33. Re:Is SR ever going to be good enough? by jc42 · · Score: 3, Interesting

    I expect in 300+ years when Star Trek is set, our AI will beat the piss out of Star Trek AI. Hell, the computer has been around for only a little over 50 years. A little over 100 years ago we had just first discovered electricity and flight.

    Well, maybe. But we invented microscopes around 300 years ago, and discovered microorganisms immediately thereafter. The understanding that some bacteria were involved in diseases followed quickly. But it was nearly 300 years before we successfully eradicated a disease (smallpox). Today, we're still battling new diseases, and we don't have anything like a general solution to all diseases. We have a few antibiotics that effect more than one disease, but we haven't made much progress in solving the problem of the development of resistance to our antibiotics. Hell, we can't even convince the general public that it's the evolutionary process at work here, and we've understood that for around 150 years.

    I wouldn't predict any general solution to a complex problem like voice recognition in a mere 300 years. Maybe we will. But our history of general solutions to other complex biological problems is not encouraging. Neither is the history of our first 50 years of AI, despite the constant hype and Hollywood movies claiming that AI is just around the corner.

    --
    Those who do study history are doomed to stand helplessly by while everyone else repeats it.
  34. Re:Is SR ever going to be good enough? -- Yes! by oblique303 · · Score: 5, Informative

    I use Dragon NaturallySpeaking every day (carpal tunnel syndrome), and version 9 has around 99% accuracy, with around 98% out-of-the-box with no training. This means 10 or so errors out of a 1000 word dictation.

    I didn't believe it either, until I actually tried it. Dragon is the first worthwhile speech recognition solution I've seen that's practical for general use (Though I'd love if they'd release a "programmers" version to compliment the Medial/Legal versions). I get about 99% accuracy (a decent microphone is *very* important!)

    Dragon 9 also doesn't "technically" need training, but accuracy further improves if you do bother to train it a bit. The NYT reviewer was able to get 99.6% accuracy after a short training session.

    Here's a few reviews of version 9:

    http://www.nytimes.com/2006/07/20/technology/20pog ue.html?ex=1154318400&en=6fd795114b3f72ea&ei=5070

    http://www.npr.org/templates/story/story.php?story Id=5577523

  35. Re:Is SR ever going to be good enough? by westlake · · Score: 2, Insightful
    For example, how does the computer know that Picard wants to call Riker and isn't just talking about him? Oh and keep in mind the computer never misinterpreted something. In other examples, people would carry on intelligent conversations with the computer - all those holodeck scenes, Troi ordering chocolate, etc.

    The fleet's computers have "known" Picard since he entered the service. They should be pretty well trained .

    The communicator badges in TNG could be transmitting supplementary biometric data and non-verbal commands, which by now have become almost automatic: "Watson. I need you!"

  36. Re:Is SR ever going to be good enough? by Yvan256 · · Score: 2, Insightful
    The communicator badges in TNG could be transmitting supplementary biometric data and non-verbal commands, which by now have become almost automatic: "Watson. I need you!"


    The badge also indicates the location of the person. So if Picard says "Will" (or "number one", which is simply an alias that Picard made for "Riker, William T.") and the computer sees that Will isn't in the same room as Picard (or isn't within normal hearing distance), it simply connects the two via a communication channel.
  37. Re:Is SR ever going to be good enough? by DarkOx · · Score: 2, Interesting

    No 90% is no place near good enough for dictation but it sure might be good enough for some applications.

    Think "computer lights", if it gets it wrong you just try again. All those media PC would be good candidates as well. If I say "change to channel six" and thing swiches to sixty 1/10th of the time well I could repeat myself that often in that application anyway; and still be pretty satisfied.

    --
    Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
  38. Re:That's REALLY sad... by Ant+P. · · Score: 3, Interesting

    Is this "voice recognition" as in "understand English vocabulary and grammar rules and differentiate between speech and commands" or as in "match this sound up with one of 10 prerecorded ones to autodial a number"?

  39. Re:Is SR ever going to be good enough? by SleepyHappyDoc · · Score: 2, Insightful

    90% accuracy is nowhere near enough for voice recognition in a dictation context

    Depends on your own context...I deployed (admittedly, an older version) of Dragon NaturallySpeaking in an office full of mobility-impaired employees. They found it much easier to spend 10% of their writing time fixing errors than 100% of it trying to, for example, type with the onscreen keyboard. If you can't use a keyboard, even crappy voice recognition is a godsend.

    --
    Stasis is death. Embrace change.
  40. Since when did Mum sound anything like Aunt? by tomhudson · · Score: 5, Funny

    Since when did Mum sound anything like Aunt?

    ... probably in the same jurisdictions where "wife" is spelled "first cousin" or "sister".

    1. Re:Since when did Mum sound anything like Aunt? by tomhudson · · Score: 2, Funny

      I think the government has a monopoly on f*cking the dead.

  41. Probably a bad Mic. by jcr · · Score: 2, Informative

    When I was last involved in adding speech control to an app, I attended a developer workshop at Apple, and found out much to my surprise that my mic wasn't any good. It sounded fine when I used it for voice recording, but for recognition the gain curve was all wrong. When I tried one of the mics that the speech team from Apple provided, the hit rate went from under 20% to well over 90%.

    When Kim Silverman demos Apple's speech recognition, he uses a high-quality noise cancelling mic. It makes all the difference.

    -jcr

    --
    The only title of honor that a tyrant can grant is "Enemy of the State."
  42. Bad News by PixelScuba · · Score: 2, Funny

    My mom is in a coma.

    My aunt is in a,

  43. Re:Oh Please by NaugaHunter · · Score: 2, Insightful

    If you are saying "mom" and it always comes back "aunt" you may want to cancel the presentation.

    Or, in the very least, don't say 'mom'! I've had plenty of times where a salesperson tests something the day before a demo (usually after a week of knowing he had a demo, but that's a different rant) and finding something. Our usual response with that short of notice is 'well, don't show that' since we didn't have enough time to fix it. At best, we could send a version that had the error suppressed but not truly fixed.

    --
    R: That voice. Where have I heard that voice before? B: In about 365 other episodes. But I don't know who it is either.
  44. I've also noticed this as a beta tester by Lonath · · Score: 2, Funny

    Every time I say the word "Linux" it gets typed out as "Windows". Go figure.

  45. Re:Oh Please by Bing+Tsher+E · · Score: 3, Funny


    Who knows how the algorithm they implemented works.


    Probably nobody at Microsoft. . .

  46. Actually, not really by jpardey · · Score: 2, Funny

    The PR guy was demoted. Microsoft put him in a box with a keyboard and called him "Vista Speech Recognition." See, such technology is in reach!

    --
    I have freaks! I did something right...
  47. Not the first MS demo embarrassment. by MsGeek · · Score: 3, Informative

    Actually the last MS demo flame out of this magnitude came when the beta of Windows 98 was being demoed. They wanted to show a scanner "just working" with Windows 98 and USB. W98 hit the blue screen of death when the USB scanner was plugged in.

    There, I found it. The file is an old QuickTime movie. I'm going to put this up on YouTube. There, that's done. Have at it.

    --
    Knowledge is power. Knowledge shared is power multiplied.