Rest In Peas — the Death of Speech Recognition

← Back to Stories (view on slashdot.org)

Rest In Peas — the Death of Speech Recognition

Posted by Soulskill on Monday May 3, 2010 @09:07AM from the yale-in-ox-boom-i-crows-off dept.

An anonymous reader writes "Speech recognition accuracy flatlined years ago. It works great for small vocabularies on your cell phone, but basically, computers still can't understand language. Prospects for AI are dimmed, and we seem to need AI for computers to make progress in this area. Time to rewrite the story of the future. From the article: 'The language universe is large, Google's trillion words a mere scrawl on its surface. One estimate puts the number of possible sentences at 10^570. Through constant talking and writing, more of the possibilities of language enter into our possession. But plenty of unanticipated combinations remain, which force speech recognizers into risky guesses. Even where data are lush, picking what's most likely can be a mistake because meaning often pools in a key word or two. Recognition systems, by going with the "best" bet, are prone to interpret the meaning-rich terms as more common but similar-sounding words, draining sense from the sentence.'"

10 of 342 comments (clear)

Min score:

Reason:

Sort:

Buffalo buffalo by Anonymous Coward · 2010-05-03 09:11 · Score: 5, Insightful

Buffalo buffalo Buffalo buffalo buffalo, buffalo Buffalo buffalo.
1. Re:Buffalo buffalo by liquiddark · 2010-05-03 09:30 · Score: 4, Insightful
  
  What human can parse this without an expert to tear apart the context? I don't see the point in trying to serve up a sentence that simply isn't a sentence to most speakers of the language.
2. Re:Buffalo buffalo by JanneM · 2010-05-03 11:58 · Score: 5, Insightful
  
  Most people won't be able to parse the sentence, though. I know I can't. I have no idea how to interpret it as anything but a string of nouns. My guess is, even fewer would be able to parse it if spoken (the capitals and the comma are, I assume, important hints). It'd be unrealistic and unproductive to require speech systems to actually do better than most humans on the task; if many of us can't parse the sentence then why expect a computer to do so?
  Better overall benchmark: require it to have the ability of a competent but not perfect second-language user. We're long used to dealing with that level of proficiency, whether because the conversant is a foreigner, a child, or has a dialect very different from our own.
  
  --
  Trust the Computer. The Computer is your friend.
AI by ShadowRangerRIT · 2010-05-03 09:17 · Score: 5, Insightful

Natural language processing *is* AI. And high accuracy speech recognition requires natural language processing if we expect to have accuracy rates approaching that of a human. Humans hear words partially or incorrectly all the time. We fill in the gaps from context, and we correct if the course of the conversation reveals that the original interpretation is wrong. Expecting computers to do better, when half the time the problem is the speaker, not the listener, means you need it to be able to make the same corrections from limited information on the fly, and after the fact that a human brain makes.

--
$_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
Since I don't have a flying car today, all is lost by liquiddark · 2010-05-03 09:26 · Score: 4, Insightful

Futurists should really learn what the word "plateau" means. The death of any given technical progression, particularly one that deals with information procesing, tends to be announced early and often, right up to the point where progress becomes meaningful again and then all of a sudden everyone saw it coming, and oh by the way where's my flying car?
Blame startrek by onyxruby · 2010-05-03 09:28 · Score: 4, Insightful

Blame Startrek for making it look flawless. Speech recognition is just like fusion technology, 20 years away from properly working - just like it has been for the last 20 years.
-RANT- I cant stand voice recognition systems that don't at least give you an option to press a number. Especially when they are out of tune and pick up back ground noises as voice. Please, please, please - always give the option to press a number instead of having to voice everything!!
Shout-outs to two idiots by Foobar_ · 2010-05-03 09:33 · Score: 5, Insightful

This blog post is retarded. The author is correlating a drop in internet news articles about Dragon NaturallySpeaking with a flatlining of speech recognition accuracy rate.
The Slashdot editor Soulskill is retarded for both not realizing this and for not reading the anonymously-submitted blog post (hmm no way it could have been the author) before approving it for the Slashdot front page. The guy is just out for more traffic to his rather pointless tech news commentary blog.
Decline of Slashdot, internet signal-to-noise ratio, get off my lawn, etc.
Watermelon Box by NReitzel · 2010-05-03 09:56 · Score: 4, Insightful

Long ago - decades, before Bill Gates was invented, a lot of research went into what would be required for actual voice recognition.
A counterexample was given, about an engineering marvel (of the time) that would recognise when someone said the word "watermelon". For a long time, people in the industry assumed that the path to voice recognition consisted of building more and better watermelon boxes.
Several authors, including Alan Turing himself, argued that actual voice recognition could never be accomplished with a large array of watermelon boxes. Current VR software divides input into a series of hyperplanes, and attempts to build a best match from the classification tree.
THis is the 2010 version of the watermelon box.
Real voice recognition won't be practical until the input is parsed, matched against context, and structured much akin to diagramming a sentence in those old English (or other) classes. In short, matching against a vocabulary is trying to solve an exponential problem with a (large) polynomial engine.
It won't be until the computer actually understands what is said that VR is likely to be practical in a global sense.
As a person who has been building computer systems for 35 years, it bothers me to see a huge body of research done into subjects like these ignored, because someone thinks that none of it applies to PC's.

--
Don't take life too seriously; it isn't permanent.
Re:Mod parent up by Antiocheian · 2010-05-03 10:00 · Score: 4, Insightful

Not necessarily. Speech recognition doesn't fail when it can't figure out elaborate grammatical constructs and lexical ambiguities. Speech recognition fails because it can't figure out simple sentences in conditions humans can.
Do other languages fare just as bad... by thewils · 2010-05-03 10:49 · Score: 4, Insightful

English, I would think is a pretty daunting language for speech recognition, what with a substantial array of homophones, but I wonder if other languages fare better. Maybe Spanish or, say, Japanese would be better since (I'm guessing) there is a closer relation to the written script and the actual sound that it makes.

--
Once I was a four stone apology. Now I am two separate gorillas.