Slashdot Mirror


Rest In Peas — the Death of Speech Recognition

An anonymous reader writes "Speech recognition accuracy flatlined years ago. It works great for small vocabularies on your cell phone, but basically, computers still can't understand language. Prospects for AI are dimmed, and we seem to need AI for computers to make progress in this area. Time to rewrite the story of the future. From the article: 'The language universe is large, Google's trillion words a mere scrawl on its surface. One estimate puts the number of possible sentences at 10^570. Through constant talking and writing, more of the possibilities of language enter into our possession. But plenty of unanticipated combinations remain, which force speech recognizers into risky guesses. Even where data are lush, picking what's most likely can be a mistake because meaning often pools in a key word or two. Recognition systems, by going with the "best" bet, are prone to interpret the meaning-rich terms as more common but similar-sounding words, draining sense from the sentence.'"

6 of 342 comments (clear)

  1. Windows 7 by Anonymous Coward · · Score: 3, Interesting

    I've been using VR in Win7 for a few weeks now. I can honestly say that after a few trainings, I'm near 100% accuracy. Which is 15% better than my typing!

  2. Re:Mod parent up by x2A · · Score: 4, Interesting

    There's nothing special about computers though, people have to do that with other people... lets not kid ourselves into thinking that humans are immune to misunderstandings. No, the more you get to know someone, the way they think and express theirselves, the better you can become at communicating with them. Different words to different people have different connotations. It can take a lot of work to get all these down, and it'd be no different with a computer. For effective communication, you'd train and build up a common language with it, that might seem nonsense to outsiders... and I, for one, welcome this.

    --
    The revolution will not be televised... but it will have a page on Wikipedia
  3. Totally Not Dead Yet by RingDev · · Score: 4, Interesting

    A few years back I worked for an awesome company that did a IVR (interactive voice recording) systems.

    We had voice driven interactive systems that would provide the caller with a variety of different mental health tests (we work a lot with identifying depression, early onset dementia, Alzheimer, and other cognitive issues.

    The voice recognition wasn't perfect, but we had a review system that dealt with a "gold standard". I wrote a tool that would allow a human being to identify individual words and to label them. Then we would run a number of different voice recognition systems against the same audio chunk and compare their output to the human version. It effectively allowed us to unit test our changes to the voice recognition software.

    Dialing in a voice recognition system is an amazing process. The amount of properties, dictionaries, scripting, and sentence forming engines are mind blowing.

    Two of the hardest tests for our system were things like: Count from 1 to 20 alternating between numbers and letters as fast as you can, for example 1-A-2-B-3-C. And list every animal you can think of.

    The 1-A-2-B was killer because when people speak quickly, their words merge. You literally start creating the sound of the A while the end of the 1 is still coming. It makes it extremely difficult to identify word breaks and actual words. And if you dial in a system specifically to parse that, you'll wind up with issues parsing slower sentences.

    The all animals question had a similar issue, people would slur their words together, and the dictionary was huge. It was even more challenging when one of the studies that was nation wide. We had to deal with phonetic spellings from the north east coast and southern states accents. What was even worse was that there was no sentences. We couldn't count on predictive dictionary work to identify the most likely word out of those that would match the phonetics.

    That said, getting voice recognition to work on pre-scripted commands and sentences was pretty easy.

    And I can only imagine the process has been improving in the years since. Although we were looking into SMS based options, not for a dislike of IVR, but because our usage studies with children were showing most of them were skipping the voice system and using the key pad anyway. So why bother with IVR if the study's target demographic was the youth.

    -Rick

    --
    "Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs
  4. Re:IBM? by N1ck0 · · Score: 5, Interesting

    IBM closed many of their speech research offices 1-2 years ago and transferred most of the research/data to Nuance's Dragon Naturally Speaking research.

    Full Disclosure: I work for Nuance

  5. Re:Mod parent up by brian_tanner · · Score: 5, Interesting

    I think you're probably about 10-20 years out of date with your criticism. AI these days is *all about* statistical machine learning which is *all about* data and not about formal or expert systems at all. This is what Google and others are doing. The AI you are describing is from the late 80s and early 90s.

    Neural networks are part of the story, but many of the ideas from ANNs have been improved upon when more structured settings are available. There is actually a resurgence right now in deep neural network though.

  6. Re:Well duh. by Chris+Burke · · Score: 4, Interesting

    Or have an ounce of poetry in you... ;)

    Hmm... I guess I don't have that since I don't know what it is. That's okay, I can find out with the help of my AI using the latest in voice recognition software! Computer, what is "poetry"?

    Computer: "Poetry" is a form of literary art, frequently using an organized metric and rhyme scheme, that attempts to evoke an emotional response in the reader through the use of metaphor.

    Huh, okay, that's interesting. But computer, what is a metaphor?

    Computer: A "meta" is for people who lack the capabilities to contribute directly to a field or endeavor, but who still wish to sound educated and useful by discussing the nature of the field or endeavor itself. Example: "Physics has way too much math for me, but meta-physics is right up my alley!"

    Yeah, now I'm just confused.

    --

    The enemies of Democracy are