Slashdot Mirror


Is Speech Recognition Finally 'Good Enough'?

jcatcw writes "Speech recognition software is fast, but it still may not be accurate enough. Clerical jobs usually ask for 40 wpm, but speech recognition software can keep up with someone speaking at 160 wpm. In Lamont Wood's demo it did very well at too/two/to and which/witch, but will it still render 'I really admire your analysis' as "I really admire urinalysis'? At 95% accuracy, people aren't jumping on the bandwagon. Wood's typing speed is about 60 wpm with 93% accuracy, so he found that using speech recognition was about twice as fast as typing. Those who type at hunt-and-peck speeds will experience results that are even more dramatic. There's really only one product on the US market: Dragon NaturallySpeaking from Nuance Communications. The free versions from Microsoft aren't up to the task and IBM sold ViaVoice to Nuance, where it's treated as an entry-level product."

35 of 313 comments (clear)

  1. Hmmm.... by DoofusOfDeath · · Score: 5, Funny

    Is Speech Recognition Finally 'Good Enough'?

    Is spinachry ignition rivaly gooery stuff? What the hell are you talking about?

    1. Re:Hmmm.... by value_added · · Score: 3, Funny

      What the hell are you talking about?

      Maybe he meant speech wreck ignition?

    2. Re:Hmmm.... by bearinboots · · Score: 3, Insightful

      Dragon is no more... and hasn't been for a long time.

      NaturallySpeaking has been sold a few times to various companies.

      (I keep track because I worked on V1.0)

    3. Re:Hmmm.... by cnettel · · Score: 4, Informative

      n-gram based language models are nothing new. Statistics is all fun and dandy, but it's no panacea. It might just be enough to throw in an even larger corpus (something like the complete Google index), but it's still hard. (BTW, n-gram Markov chains more or less originated in speech recognition, to get the individual phonemes right, and I'm quite sure they're doing at least something like it at the word level these days. It still sucks, as the quality users demand for proper dictation is extremely high.)

    4. Re:Hmmm.... by Helios1182 · · Score: 3, Interesting

      There is a lot of work on word prediction and language modeling in natural language programming and computational linguistics research. 95% accuracy is considered very good though. There are ways to help, but some of the most effective ways require a constriction of the language recognized. n-gram based language models provide a good statistical framework, but are very data hungry. You need lots and lots of relevant (this is the hard part) text. The model needs to be based on the language the user uses in order to be effective.

    5. Re:Hmmm.... by pluther · · Score: 3, Insightful

      but what if I really said "urinalysis"?

      Then your secretary would probably get it wrong too

      No, your secretary would almost certainly get it right. Your secretary would know, from experience with you and the kind of work you do and the overall context of the letter whether the person you are dictating the letter to has recently analyzed something for you, or if you are applying for a job in a medical lab.

      95% sounds good if you're not comparing it to a person. But 5% error rate is horrendous for business use. A secretary who missed one word out of every 20 would be fired after a few hours. A couple decades ago, when I temped for office work, I could transcribe about 80 wpm with close to 100% accuracy, and I was nowhere near the fastest.

      If you got a letter from a business containing a typo on almost every line, would you do business with them?

      --
      If the masses can keep you down, you're not the Ubermensch.
  2. Problems by Tribbin · · Score: 5, Insightful

    As a foreigner it is really hard to get the pronounciation right enough.

    Also command execution by others in the room is a problem.

    How about listening to music, or TV, and having the computer interpreting it.

    --
    If you mod this up, your slashdot background will turn into a beautiful sunset!
    1. Re:Problems by Sciros · · Score: 4, Informative

      It all depends what sort of corpus the SR system is trained on. So yeah, foreigners will have problems because a system trained for, say, British English will not perform well with American English. For this same reason an SR system trained for "normal" speech will do very poorly with lyrics in music.

      As for stuff like "i really admire your analysis" being interpreted as "i really admire urinalysis," that stuff can easily be ironed out by an n-gram based system that "ranks" English sentences based on probability. What is the chance that "urinalysis" will follow "your" which follows "admire"? Such things can be estimated well enough if you use a large corpus to train your n-gram system (as long as the corpus you're using for this is the same "kind" as whatever speech the SR system is interpreting -- that is, newswire, business meeting, etc.)

      --
      I like basketball!!1!
  3. This comment written by MS speech recognition by TodMinuit · · Score: 4, Funny

    Dear aunt, let's set so double the killer delete select all.

    --
    I wonder if I use bold in my signature, people will notice my posts.
  4. Depends on what you use it for by orclevegam · · Score: 3, Insightful

    Is Speech Recognition Finally 'Good Enough'?

    For typing up an inter-office memo in Word, most likely. But I'm a programmer, and I can barely read out loud some perfectly fine code, I can't imagine trying to enter it all with voice recognition, no matter how good it gets.

    --
    Curiosity was framed, Ignorance killed the cat.
  5. No. by Caspian · · Score: 5, Funny

    Speech recognition, handwriting recognition, species recognition... all of these suck, and will CONTINUE to suck, until strong AI is developed.

    And by that time, there will be a lot more important problems to worry about than making a computer understand Bubba Sixpack who can't type-- such as keeping the robots from taking over the planet in a bloody war.

    --
    With spending like this, exactly what are "conservatives" conserving?
  6. Of course it's good enough by ral315 · · Score: 5, Funny

    I use it myself. It's wonder full. delete that. delete that. delete that. double the killer delete select all

  7. I would say yes by UnknowingFool · · Score: 4, Informative
    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  8. Good enough for what? by traindirector · · Score: 4, Insightful

    TFA mentions that many people stop using speech recognition software because of poor accuracy. I don't think that's the major reason. I think they start using it because it's a neat idea that seems to have a lot of promise, but quickly realize there are only a few situations where it's actually helpful. The end of the article mentions rough drafts; I'd also say it might be a decent choice

    • when you need to enter hand-written documents into a computer
    • for transcripts of a single speaker
    • informal free-thought when not surrounded by other people
    • when you have horrible typing skills

    For the majority of office tasks, it just isn't a good fit.

    So if the "good enough" is being useful in any way whatsoever, it sounds like we're almost there.

    1. Re:Good enough for what? by L.+VeGas · · Score: 3, Insightful

      These are some good points. I don't know what I would use speech recognition for, and I'm someone that writes a lot.

      Seeing words laid out as text helps me think. I can compose things better, more coherently.

      I'll write an email in an instant, but make me leave a voice mail, and I'll usually hang up first.

    2. Re:Good enough for what? by nine-times · · Score: 3, Insightful

      informal free-thought when not surrounded by other people

      I think you're implying something here that is one of the major reasons people don't use speech recognition software: if anyone is around, you feel like a total moron.

      You might not realize this, but you probably speak differently than you write. Most of us do, because there are some things that look good in text that sound bad spoken, and vice versa. Also, a lot of composition goes on when writing, and so if you're playing with different word choices so you can see them written out, you just end up sputtering dumb little phrases. It's easier to edit on-the-fly when using a keyboard. And let's not forget that you might not want the people around you to know what you're writing.

  9. "New Directions" by parvenu74 · · Score: 5, Funny

    I used to work for a company that has the words "new directions" in their name. When I told people where I worked I would make a rather long pause between the "new" and "directions" so as not to sound like I was saying something else. I wonder how this software would render it...

    1. Re:"New Directions" by sd_diamond · · Score: 3, Funny

      I used to work for a company that has the words "new directions" in their name.

      Please tell me the first two words in the name weren't "Coming From".

    2. Re:"New Directions" by TrippTDF · · Score: 4, Funny

      Reminds me of when the company "Pen Island" or "Mole Station Nursery" set up their domain names...

    3. Re:"New Directions" by houghi · · Score: 4, Funny

      You though you had problems with "new directions"
      Can you imagine telling the software to go to this site?
      haatch tee tee pee double point slash slash slash dot dot org.
      http:///..org not found

      --
      Don't fight for your country, if your country does not fight for you.
    4. Re:"New Directions" by Anonymous Coward · · Score: 3, Funny

      And let's not forget the Italian energy company Powergen Italia... their name makes for a wonderful .com address!

  10. Speech recognition IS good enough by rinkjustice · · Score: 4, Informative

    I'm using Dragon NaturallySpeaking. Right now, as I write this calm it, comet, post, and it sure as hacking beats typing.

    Actually, I am using Dragon NaturallySpeaking right now, and it works very well. It actually works better if you speak quickly (as you normally would) and it's pretty good at inserting grammar along the way. I have bilateral tendinitis, and the software has been a godsend for me. I was even able to finish writing my book, a task that was becoming just too painful typing manually.

    Oh, and you are probably wondering how long it takes to train the software? About a half an hour, and I find the accuracy at around 95%.

  11. Pretty good by Richard+McBeef · · Score: 5, Funny

    95 percent is pretty good, only one word in twenty. I wouldn't have a problem with a 5% error ate.

  12. Welcome to the new AT&T! by poptones · · Score: 3, Funny

    Press or say one to speak with a representative in english...

    One

    When you hear the option you are calling about you may say it at any time. If you are calling about a billing problem, say billing. If you are calling about a technical issue, say technical. If you are calling about new service, say new customer. If you are...

    Billing

    I'm sorry, that is not an option. When you hear the option you are calling about you may say it at any time. If you are calling about a billing problem, say billing. If you are calling about a technical issue, say technical. If you are calling about new...

    Billing!

    I'm sorry, that is not an option. When you hear the option...

    Billing billing billing!

    I'm sorry, that is not an option. When you...

    Fuck you! Give me a human! Human human human!

    I'm sorry, that is not an option. When you hear the option...

  13. Re:Not Useful for Coders by Tackhead · · Score: 4, Funny
    > "Set v underscore tab equals space parenthesis parenthesis x minus lev schema dot all recs concatenate..."

    Yeah, but if you put a beat to it, you've got something.

    { } . ! /
    & ; ^ # -
    < > @ \
    { } _ SYSTEM HALTED

    "Left titty, right titty, dot bang slash.
    Ampersand semicolon, caret pound dash.
    Less than greater than, at back slash,
    left titty, right titty, under score crash!"

    * # ! ! (
    ~ & | )
    ' " . . DEL
    # ^G ! ! working... done.

    "Star pound bang bang, open-paren.
    Tilde and pipe, close-paren.
    One quote, two quote, dot dot delete,
    pound bell, bang bang, process complete!"

    Google's USENET archive dates it back to 1990, but it predates the 1990 post ("Stuck Shift Key Poetry") to rec.humor.funny by several years.

    You haven't lived until you've seen a dozen drunken geeks trying to sing "Waka Waka", or the entirety of "Hatless Atlas", while seeing only one character at a time. Well, maybe you have, but this is Slashdot.

  14. Maybe the question should be... by Mahjub+Sa'aden · · Score: 5, Insightful

    Instead of asking if speech recognition is "good enough", maybe we should be asking whether or not it's actually useful for anything in the first place. I mean, is it good enough... to do what?

    Can you imagine being in a cubicle farm full of people talking to their computers? Or trying to talk to your computer on the bus? You have to imagine that as computers become more ubiquitous, input methods will have to adjust alongside, and I simply can't see (or hear) speech recognition doing that very well.

    --
    What is is all that is. Isn't that obvious?
    1. Re:Maybe the question should be... by babblefrog · · Score: 3, Insightful

      Where I see it coming into its own is as an input method for really portable "wearable computing", where it would be extremely inconvenient to use a keyboard.

  15. Speech Reco Software Consolidation by __aajwxe560 · · Score: 4, Informative

    I am presently a financial customer of an enterprise speech recognition product that Nuance offers. For several years now, the speech recognition software industry has been under consolidation, with Nuance buying a few different competitors and technologies. Most recently, this dance has continued with Nuance being acquired by ScanSoft, a company known for specializing in type recognition.

    Nuance support is marginal at best, and through all the consolidations, understanding even within their own company of how the product works is quite lacking. We have found our own developers often times educating the Nuance support folks in various aspects of how the product is working, and then inquiring as to whether this is intended behavior or not. Crickets can often be heard finishing these types of conversations. We normally would have moved to another product under these conditions, but simply put - Nuance acquired what little was left, and now has no competition in the market. Competition is what spurs innovation, and so with the continued consolidation, it is hard to see significant advances in the technology without free help from academia.

    If you think the Microsoft monopoly is bad, imagine if they absorbed Apple and somehow took over Linux leaving you with a few "choices", but all under the Microsoft moniker. The technology is very neat and the enterprise level products do some basic things quite well, but there is still some glaring room for innovation that I don't expect anytime soon under present industry conditions.

  16. Mod parent up! by Doctor+Memory · · Score: 3, Insightful

    Seriously, the only things speech recognition is good for are bulk text entry and simple navigation. I imagine trying to use voice commands to operate modern software would be similar to letting my four-year-old help make pancakes — yes, it gets done, but it's so much easier and faster to just do it yourself. Imagine trying to edit a document using just voice commands. Is your WP going to be smart enough you can tell it "find all occurrences of 'scum-sucking bottom feeders' and replace it with 'esteemed colleagues'". Or are you going to have to say "Find. Scum hyphen sucking bottom feeders. Tab. Esteemed colleagues. Replace all." Face it, GUIs have rendered speech recognition for command and navigation moot. Most operations you perform don't have a verbal description, or at least not one that is quicker to say than to do.

    I also can't imagine it'd be that useful for actually writing things. I don't think I'm the only one who revises as they write. I think I actually write better when I write things out by hand, because it's slower so I tend to think my phrasing and sentence structure through more before I commit anything to paper. If I could suddenly type two or three times faster, I think it'd probably make my text even more incomprehensible than it usually is...

    --
    Just junk food for thought...
  17. too anstwer you question. by geekoid · · Score: 3, Funny

    Yeth.

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  18. Re:Wreck a nice beach by Montag2k · · Score: 3, Funny

    Sounds like someone wants to use Vi with their speech recognition engine!

  19. Re:open source speech recognition by zuzulo · · Score: 3, Informative

    The Sphinx project is the current 'gold standard' in open source speech recognition. It can be found at

    Sphinx Project at CMU

    I have used a variety of open source libraries in addition to 'rolling my own' and for general purposes Sphinx is certainly the most mature option.

    --
    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
  20. Zeno's Translator by Carcass666 · · Score: 3, Informative

    Speech recognition has been at a standstill for years now, it's been "almost there now" for well over five years. As mentioned in other posts, there has been a lot of consolidation and that has really hurt growth. Lernout & Hauspie and Dragon were constantly going back and forth a few years ago trying to get a leg up on each other. When L&H got into all of their accounting problems and shut down, that left Dragon and IBM. IBM's product went to Scansoft and went to Nuance where it languishes until somebody pulls the plug (for example, if you call for support on ViaVoice and mention you have XP SP2, they will tell you it is not a supported platform).

    Most of the improvement in the Dragon and ViaVoice over the last couple of years has been in the reduction of training required to get to the high-ninety's level of accuracy (assuming noise-cancelling mic in a quiet room and you do not have a cold/sore-throat). The advancements in training have not corresponded to much in the way of translation accuracy. A "trained" Dragon 7 recognizes speech pretty much as well as Dragon 9 (I haven't played with Dragon 10 yet).

    Most of the real speech recognition advancement these days is focused on discrete word sets for voice mail trees and other interactive systems. When you are on the phone giving your credit card number, two/to/too is all the same thing. While speech recognition in its current incarnation is good for people who can't type (disabilities, carpal-tunnel, etc.) it is not a replacement for typing, and isn't any closer today than it was five years ago.

  21. Until we get hard ai along with it no. by otomo_1001 · · Score: 3, Interesting

    I mean really, until I can say to my computer things like:

    Find all mp3's that were created by Trent Reznor and pipe them to /dev/audio on the neighbors computer. What use will it be?

    I can't program in it can I?

    if(i_can_write_code_I_mean_speak_code_to_the_compu ter() == true) then
        i_might_use_it_a_bit();
    else
        system("find /music -type f -name \"*trent*reznor*\" | xargs -t cat - | ssh hackeduser@neighborcomputer \"cat - > /dev/audio\"");
    endif

    But that is just me.

  22. You're joking but... by thepotoo · · Score: 3, Informative
    You hit the nail on the head with that one. My sister uses Dragon Speak Naturally exclusively (she's dyslexic and can't type or read worth crap, so she has to use Dragon Speak Naturally and Kurzweil (screen reader).

    Dragon requires MONTHS of training (literally), and even then it will make mistakes exactly like the one you noted. The plus side is that Dragon works pretty decently under WINE, but apart from their Linux "support", it's a complete mess.

    Screen readers aren't much better; they have the accuracy, but are hard to understand.
    For a little geeky fun, I had Kurzweil read a few English papers to Dragon. Even after some training, Dragon still couldn't get above 80% accuracy on a computer generated, 100% reliable, voice. Now that's just sad.

    --
    Obligatory Soundbite Catchphrase