Slashdot Mirror


The Future of Speech Technologies

prostoalex writes "PC Magazine is running an interview with two of the research leaders in IBM's speech recognition group, Dr. David Nahamoo, manager of Human Language Technologies, and Dr. Roberto Sicconi, manager of Multimodal Conversational Solutions. They mainly discuss the status quo of speech technologies, which prototypes exist in IBM Labs today, and where the industry is headed." From the article: "There has to be a good reason to use speech, maybe you're hands are full [like in the case of driving a car]. ... Speech has to be important enough to justify the adoption. I'd like to go back to one of your original questions. You were saying, 'What's wrong with speech recognition today?' One of the things I see missing is feedback. In most cases, conversations are one-way. When you talk to a device, it's like talking to a 1 or 2 year old child. He can't tell you what's wrong, and you just wait for the time when he can tell you what he wants or what he needs."

11 of 101 comments (clear)

  1. the footer offs peach take no allergy by backslashdot · · Score: 2, Informative

    mast and the stand can't aches.

    (the future of speech technology must understand context)

    1. Re:the footer offs peach take no allergy by knipknap · · Score: 3, Informative

      The present of speech technology already does, and did so for years. One problem is that you don't have a huge enough word corpus for training that technology (the knowledge of context is always limited to the domain that you have been training it against).

  2. Actually... by ijablokov · · Score: 2, Informative

    ...the point of our multimodal work is that you can have a two way dialog with the device, as well as have visual feedback to the interaction. See http://ibm.com/pvc/multimodal for some examples.

  3. Re:can it replace court reporters? by Anonymous Coward · · Score: 2, Informative

    Being a court reporter, I'd say no. A computer doesn't say "What?" when it doesn't understand the words, and it doesn't tell people not to talk at the same time so that the record's clear. Some courts try video, some try just audio recorders, but so far the results haven't been so good. You need people to operate the machine, people to catalog the recordings, people to transcribe the recordings if necessary. It's just better to have a court reporter there to do all that (and often cheaper).

    The problem with the field is that with fewer reporters to meet an increasing demand, the lack of capable court reporters is forcing more electronic recording -- good results or not.

    Now, for medical transcription, it's a great product. After about six months of use, the doctor (or anyone that dictates a lot) has gotten the computer trained to his voice and can go at a pretty good clip (150 words per minute or more). But this is one voice and a limited, task-specific vocabulary.

  4. Re:MOD PARENT UP by penguin-collective · · Score: 2, Informative

    Yes, and Apple's speech recognition technology is many years behind the state of the art. IBM and others had better speech recognition and speech synthesis a decade ago than Apple has today.

    And where exactly is new speech technology supposed to come from inside Apple anyway? They fired all the people who knew anything about speech in the 90's and shut down the labs.

  5. Doctors are going to use speech recognition by Aggrajag · · Score: 3, Informative

    Doctors in Finland are starting to use speech recognition to update patient records. I think it is in testing at the moment, check the following link for details.

    http://www.tietoenator.com/default.asp?path=1;93;1 6080;163;9862

  6. Re:IBM Speech - Needs Superhuman sales to survive? by Anonymous Coward · · Score: 1, Informative
    I want technology that'll run on a cheap single end-user or SOHO box.

    As I said, Nuance (Scansoft) bought them all up; not just SpeechWorks and Nuance, but Draggon, Lernout & Haupsie, etc. They still sell a bunch of (Windoze) retail SOHO packages for a hundred bucks or two.

    Microsoft has some crappy .NET-based stuff, but I'd give it a pass, if I were you. It's neither SOHO nor enterprise. Not sure what it is...

    It's not really soup yet, but there is also a free solution. See http://www.speech.cs.cmu.edu/. At least one commercial vendor has taken the source, hacked it up and is using it in a commercial product. At least it runs on Linux and (I think) *BSDs

    - The AC OP

  7. Re:Language Acquisition... by Yellow5 · · Score: 1, Informative

    I work with speech recognition and to me, your comments sound a little misleading. When "people spend hours and hours and hours transcribing 20 minutes of tape" they usually aren't simply transcribing to text. The time is consumed by transcription of all the additional features in the text (ie. time alignment of words and phonemes, prosody, additional syntactic information such as parsing structure or part of speech tags). This is where all the time is spent. There are, of course, automatic processes for each of these annotations, but some work much better than others. My opinion is that through the next 10 to 15 years, each piece of the speech recognition puzzle will come together to create ASR systems that will be comparable to human transcribers (you only have to be 95% correct to transcribe in a court room).

  8. Re:So why is voice input in decline? by mikeylebeau · · Score: 3, Informative

    You're mistaken about Tellme laying people off; they are doing quite well and are growing. You're right that the voice portal idea is no longer emphasized, but Tellme's making great money selling voice services to enterprise customers.

  9. Re:Language Acquisition... by GnomeChompsky · · Score: 2, Informative

    Yes. I am aware; it's just that there isn't as much data available as there needs to be in order to be able to say with any confidence that, yes, this is what speech to children looks like, and this is what speech spoken by children looks like. Because like it or not, you have to get your grad students transcribing things for hours in order to get anything out of it. You want to research bilingual acquisition? Fine, but you're probably going to have to do years of legwork to get data for even three children learning the same two languages at the same time. Speech recognition would cut down significantly on the amount of time it took to take down utterances on either end. Which would be an enormous plus.

  10. Open source speech recognition engines by mandreiana · · Score: 3, Informative