Slashdot Mirror


The Future of Speech Technologies

prostoalex writes "PC Magazine is running an interview with two of the research leaders in IBM's speech recognition group, Dr. David Nahamoo, manager of Human Language Technologies, and Dr. Roberto Sicconi, manager of Multimodal Conversational Solutions. They mainly discuss the status quo of speech technologies, which prototypes exist in IBM Labs today, and where the industry is headed." From the article: "There has to be a good reason to use speech, maybe you're hands are full [like in the case of driving a car]. ... Speech has to be important enough to justify the adoption. I'd like to go back to one of your original questions. You were saying, 'What's wrong with speech recognition today?' One of the things I see missing is feedback. In most cases, conversations are one-way. When you talk to a device, it's like talking to a 1 or 2 year old child. He can't tell you what's wrong, and you just wait for the time when he can tell you what he wants or what he needs."

30 of 101 comments (clear)

  1. Solution to "one-way" problem by blair1q · · Score: 3, Funny

    I have a solution to the "one-way" communication problem.

    More popups.

    Audio popups!

    Heads-up display popups!

    Holy blackberries! Get me my patent attorney!

  2. Oh no! by Ardeocalidus · · Score: 5, Funny
    "Car, brake"

    "I'm sorry, Dave. I'm afraid I can't do that"

    1. Re:Oh no! by lukewarmfusion · · Score: 4, Funny

      That's because the car thought you said "break." You should speak more clearly.

  3. the footer offs peach take no allergy by backslashdot · · Score: 2, Informative

    mast and the stand can't aches.

    (the future of speech technology must understand context)

    1. Re:the footer offs peach take no allergy by knipknap · · Score: 3, Informative

      The present of speech technology already does, and did so for years. One problem is that you don't have a huge enough word corpus for training that technology (the knowledge of context is always limited to the domain that you have been training it against).

  4. its been a while by joe+155 · · Score: 3, Insightful

    I've been waiting for years for speach recognition technology to get to an acceptable standard and over that time I've used a couple, the one i got lately (dragonsoft I think) was ok, but they need to come quite a bit further before I'll be adopting all the way.

    I'm looking forward to when I can say "computer, open openoffice for me mate" and it'll go "sure"... That'll be sweet.

    --
    *''I can't believe it's not a hyperlink.''
    1. Re:its been a while by SoSueMe · · Score: 4, Interesting

      Dragon Naturally Speaking from Nuance is about 75-80% accurate out-of-the-box. It is the other 20-25% that you have to invest the time in to get it to your liking. Even after a few months, you will probably still only reach up to 95% accuracy.
      Using it when you have a cold, sore throat or when you have been indulging in your favorite alcoholic beverage can corrupt your voice profile and set you back considerably.

      Never let someone else use it under your voice profile.

      Will voice rec systems ever be 100% accurate and spearker independant? Maybe, but I don't expect to see it for a long time.

    2. Re:its been a while by eam · · Score: 2, Interesting

      We use Dragon in a digital dictation system for the radiology department where I work. We moved to the system about 6 years ago.

      We have all the problems mentioned (except drinking). There are also some others that you might not consider. For example:

      As the day wears on, the radiologist will get tired, and the recognition will become worse.

      Also:

      A radiologist who started at 6:30AM will see the sound characteristics of the room change dramatically as more people begin working and activity in the reading room increases. Even environmental systems cycling on & off can affect the recognition.

      Despite this, when we receive a complaint about the voice recognition and we observe the user in action, they usually achieve 90-95% accuracy. That is really the most the vendor ever claimed was possible.

      It is my understanding that for radiology practices in which the doctors share the profits, the voice recognition systems are a hit. You can see why when you look at the numbers. When we adopted the system, we had been using transcriptionists at a cost of about $600,000/year. After the change the annual cost of the speech recognition system was about $100,000. That doesn't take into account the greatly decreased turn-around time. Now we could have your report emailed to your doctor before you get your pants back on.

  5. What's wrong with speech? by Nuclear+Elephant · · Score: 5, Funny

    What's wrong with speech recognition today?

    I took a brief poll, and nobody seems to have a problem:

    Bruce: I sure like being inside this fancy computer.
    Vicki: Isn't it nice to have a computer that will talk to you?
    Agnes: Isn't it nice to have a computer that will talk to you?
    Kathy: Isn't it nice to have a computer that will talk to you?

    Except the trinoids, who complained:
    We can not communicate with these carbon units.

    I wasn't sure which Carbon they were talking about.

  6. Language Acquisition... by GnomeChompsky · · Score: 5, Interesting

    I'm a linguist, and it seems to me that Speech Recognition would be incredibly, incredibly useful in the research that's going on right now into Language Acquisition.

    You see, the problem right now is that there's really not much data that's in the public domain for linguists/psychologists/what-have-you to study, because it's incredibly, incredibly laborious to do longitudinal studies of children's utterances, or of input to the child. People spend hours and hours and hours transcribing 20 minutes of tape. They're understandably reticent to just share their data out of the goodness of their hearts. Even when they do, it's never a large sampling of children-and-their-interlocutors from-birth-to-age-X, it's usually just one child and maybe his or her parents from age 8 months to 3 years.

    So we have arguments about whether or not kids hear certain forms of input (Have you used passive voice with your child recently? Where's your child going to learn subjacency?) that go back and forth between psychologists and linguists, and people perform corpus studies on 3 children and feel that that's representative -- never mind the fact that these three kids were all harvested from the MIT daycare centre, and were the children of grad students or faculty members, and thus may not be representative of the population at large.

    Speech recognition would make it much, much easier to amass large corpora of data for larger samples of the population. It'd make it much more likely for people to share their data. And, what's more, it'd likely be possible to have a phonetic and syntactic-word-stub (for lack of a better word) transcription made from the same recording. We'd have a better idea of how the input determines how language is acquired by children, and what sorts of stages children go through.

    1. Re:Language Acquisition... by QRDeNameland · · Score: 2, Interesting

      Very interesting. Since you're a linguist, I wonder if you might address a concern I've had about speech recognition technology in general.

      I've dabbled a bit with Dragon Naturally Speaking in the past (v.7) and frankly found it still too immature to be of much use to me. I find it still far easier to deal with an accurate yet artificial interface (keyboard and mouse) than an inaccurate but more "organic" interface (speech recognition).

      But one of the things that stood out from the experience was the way in which I found myself quickly (if frustratedly) adapting my speech patterns to comply with the machine ability to interpret me.

      Is anyone out there considering the consequences of speech recognition technology on the evolution of human speech? It seems to me that any speech technology is going to be imperfect to some extent, but the better it gets, more people are going to use it and those people will inevitably end up adapting their speech patterns to the machine.

      Could this technology end up homogenizing human speech patterns to fit the computer's speech recognition model? Is this even a valid concern in your opinion, and if so, is anyone in the linguistics field considering these implications?

      --
      Momentarily, the need for the construction of new light will no longer exist.
    2. Re:Language Acquisition... by GnomeChompsky · · Score: 2, Informative

      Yes. I am aware; it's just that there isn't as much data available as there needs to be in order to be able to say with any confidence that, yes, this is what speech to children looks like, and this is what speech spoken by children looks like. Because like it or not, you have to get your grad students transcribing things for hours in order to get anything out of it. You want to research bilingual acquisition? Fine, but you're probably going to have to do years of legwork to get data for even three children learning the same two languages at the same time. Speech recognition would cut down significantly on the amount of time it took to take down utterances on either end. Which would be an enormous plus.

  7. IBM Speech - Needs Superhuman sales to survive? by Anonymous Coward · · Score: 5, Interesting
    On the other hand, IBM is not actually selling much speech technology.

    Scansoft, who earlier all but cornered the market for Optical Character Recognition (OCR) technology, did the same with speech recognition by acquiring the largest players in this space, SpeechWorks and Nuance. Scansoft changed their name to Nuance as a part of that last acquisition.

    IBM, meanwhile, has been struggling to find a market for their "Superhuman" (sneer) speech reco technology. A few years ago, they sold distribution of their retail desktop product, ViaVoice, to (wait for it) Scansoft. Their commercial product was RS/6000-AIX-only until a couple of years ago, when they ported it to more platforms, including Windows and Linux, and integrated it more tightly with their Rational and WebSphere marketing platforms.

    The current enterprise product sounds really sexy, at least for Rational-WebSphere shops. You can develop your WebSphere VXML application in Eclipse and leverage all those groovy WebSphere services you've built. No (or not much) special skill required!

    The problem is that their target market is Telecom Managers, who face a choice between IBM, with a few hundred ports installed, and Nuance (-ScanSoft-SpeechWorks), with tens- or hundreds-of-thousands of installed speech reco ports. Telecom Managers live in a world where their clients expect six-sigma/five-nines reliability. This is a hard sell to make.

    The question is, how long can IBM keep pouring money into speech R&D and product development in the face of dismal sales? Some in the industry expect the answer is, "Not too much longer." And that. of course, makes nervous enterprise buyers even more nervous and less likely to buy.

    1. Re:IBM Speech - Needs Superhuman sales to survive? by Kadin2048 · · Score: 2, Insightful

      I know nothing about the particular details of this deal, but wouldn't it make sense if IBM's sale of the patents also included a reciprocal agreement, that Scansoft would not sue IBM in the future for use of it's IP?

      It just seems like IBM, seemly a company obsessed with creating and preserving intellectual capital, wouldn't so hastily sell off patents that they might ever be able to use / need, unless there was a catch, like they got access to Scansoft's portfolio as part of the bargain?

      Just speculation, based on what I've read about how Big Blue operates.

      --
      "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  8. integration by caffeinemessiah · · Score: 3, Interesting

    personally, i can't wait till they take speech recognition and couple it with natural language processing as a standard part of the desktop interface. it should be quite feasible now that we're seeing affordable 64-bit computing with fast memory and bus speeds. imagine excel with a speech-recognition interface, so instead of typing and filling formulae you would just tell it to "sum the row labeled timing, but only include values greater than 10". ok, back to work...

    --
    An old-timer with old-timey ideas.
    1. Re:integration by Bloke+down+the+pub · · Score: 2, Funny
      There's enough wittering going on in the office already, thanks.

      Of course, it seems you'll have the advantage of not having to tell it to switch to uppercase no i meant put the letters in uppercase not the word quote uppercase quote shift shift er fuck hey Joe what is it for uppercase huh was that caps lock YOU SAID OK THANKS NO DELETE DELETE THAT.

      --
      It's true I tell you, feller at work's next door neighbour read it in the paper.
    2. Re:integration by cagle_.25 · · Score: 3, Insightful
      Spot on. Many interfaces today make it difficult to get from user's idea to computer's execution. Because we are much more facile at using spoken language to be precise than we are at using mouse+keyboard to be precise, a "G+AUI" (graphical+audio user interface) should, in principle, be much more powerful than a GUI.

      Dragon Naturally Speaking is a baby step in that direction, but it is pretty much limited to single nouns or verbs.

      --
      Human being (n.): A genetically human, genetically distinct, functioning organism.
  9. Re:your by legalize.ganja.now. · · Score: 2, Funny

    blame their speech recognition software

  10. can it replace court reporters? by RussP · · Score: 3, Interesting

    A few years ago my wife was thinking about studying to become a court reporter. The training is very demanding, and I heard the dropout rate is about 95%, but the pay is good if not great.

    In any case, I warned her about the potential for voice recognition technology to render court reporters obsolete. It probably won't happen, but the mere prospect tipped her in the direction of foregoing the opportunity. Was that a mistake?

    The same concern applies also to medical transcription.

    --
    I watch Brit Hume on Fox News
    1. Re:can it replace court reporters? by Anonymous Coward · · Score: 2, Informative

      Being a court reporter, I'd say no. A computer doesn't say "What?" when it doesn't understand the words, and it doesn't tell people not to talk at the same time so that the record's clear. Some courts try video, some try just audio recorders, but so far the results haven't been so good. You need people to operate the machine, people to catalog the recordings, people to transcribe the recordings if necessary. It's just better to have a court reporter there to do all that (and often cheaper).

      The problem with the field is that with fewer reporters to meet an increasing demand, the lack of capable court reporters is forcing more electronic recording -- good results or not.

      Now, for medical transcription, it's a great product. After about six months of use, the doctor (or anyone that dictates a lot) has gotten the computer trained to his voice and can go at a pretty good clip (150 words per minute or more). But this is one voice and a limited, task-specific vocabulary.

  11. So why is voice input in decline? by Animats · · Score: 3, Interesting
    Several good mainstream voice applications are on the way out. Wildfire is gone. TellMe is laying off people and no longer promoting their public services. These are good systems; you could get quite a bit done on the phone with them, and they had good speaker independent voice recognition. Yet they're gone, or going.

    Try TellMe. Call 1-800-555-TELL. It's a voice portal. Buy movie tickets. Get driving directions. News, weather, stock quotes, and sports. All without looking at the phone. So what's the problem?

    1. Re:So why is voice input in decline? by mikeylebeau · · Score: 3, Informative

      You're mistaken about Tellme laying people off; they are doing quite well and are growing. You're right that the voice portal idea is no longer emphasized, but Tellme's making great money selling voice services to enterprise customers.

  12. Actually... by ijablokov · · Score: 2, Informative

    ...the point of our multimodal work is that you can have a two way dialog with the device, as well as have visual feedback to the interaction. See http://ibm.com/pvc/multimodal for some examples.

  13. Re:MOD PARENT UP by penguin-collective · · Score: 2, Informative

    Yes, and Apple's speech recognition technology is many years behind the state of the art. IBM and others had better speech recognition and speech synthesis a decade ago than Apple has today.

    And where exactly is new speech technology supposed to come from inside Apple anyway? They fired all the people who knew anything about speech in the 90's and shut down the labs.

  14. Screw speech recognition by Anonymous Coward · · Score: 2, Interesting

    One great thing about keyboards and typing is that it's relatively private. Like phone menus. I hate when they ask me to speak my choice or answer a question or recite my account number just let me freakin type.

    Babblin' all over the place is dumb.

    Instead of speech recognition let's work on better speech synthesis. Here we are in 2006 and the average synthesized voice sounds hardly better than my freakin' Phasor card I had for my Apple // in 1988.

  15. Doctors are going to use speech recognition by Aggrajag · · Score: 3, Informative

    Doctors in Finland are starting to use speech recognition to update patient records. I think it is in testing at the moment, check the following link for details.

    http://www.tietoenator.com/default.asp?path=1;93;1 6080;163;9862

  16. It's not the tech, it's the applications once more by redzebra · · Score: 2, Insightful

    I'm convinced speech technologies have a fantastic future when they are used for improving human communications like providing for an electronic bablefish. However it looks like most are concentrating on using speech as a way to interact with machines.

    Which is so terribly ineffient and cumbersome. You really don't want to spend the time to socially interact with your coffeemachine at 7am.
    Unless it's able to go to the shop, put in exactly the right amount of coffee and is able to turn itself to on once it hears you stumbling out of bed. It's next to useless if the only added value is to switch itself to on after you grunted "on" to it.

  17. Re:Speech is the future! by Grimboy · · Score: 2, Insightful

    I think mouse and keyboard with screen is far faster than audio recognition/feedback will ever be.

  18. Speech recognition is for people who are alone by renfrow · · Score: 2, Insightful

    Something that has not been mentioned, because, evidently, no one has actually worked with it, is that it is seriously annoying to work in the proximity of someone USING speech recognition. I worked with a fellow that had speech recognition on his machine who used it for programming. YOU try working on YOUR own code when someone is droning in the background: "for left paren int i equals zero semi-colon i less than mumble mumble delete word delete word ..." ALL DAY LONG! Even with head phones on it sometimes seemed like he was asking a question and I'd remove the head phones and say "What was that?" "Nothing delete word". ARGGHHH. Leave me the heck away from people with speech recognition.

    Tom.

  19. Open source speech recognition engines by mandreiana · · Score: 3, Informative