Rest In Peas — the Death of Speech Recognition
An anonymous reader writes "Speech recognition accuracy flatlined years ago. It works great for small vocabularies on your cell phone, but basically, computers still can't understand language. Prospects for AI are dimmed, and we seem to need AI for computers to make progress in this area. Time to rewrite the story of the future. From the article: 'The language universe is large, Google's trillion words a mere scrawl on its surface. One estimate puts the number of possible sentences at 10^570. Through constant talking and writing, more of the possibilities of language enter into our possession. But plenty of unanticipated combinations remain, which force speech recognizers into risky guesses. Even where data are lush, picking what's most likely can be a mistake because meaning often pools in a key word or two. Recognition systems, by going with the "best" bet, are prone to interpret the meaning-rich terms as more common but similar-sounding words, draining sense from the sentence.'"
Buffalo buffalo Buffalo buffalo buffalo, buffalo Buffalo buffalo.
Natural language processing *is* AI. And high accuracy speech recognition requires natural language processing if we expect to have accuracy rates approaching that of a human. Humans hear words partially or incorrectly all the time. We fill in the gaps from context, and we correct if the course of the conversation reveals that the original interpretation is wrong. Expecting computers to do better, when half the time the problem is the speaker, not the listener, means you need it to be able to make the same corrections from limited information on the fly, and after the fact that a human brain makes.
$_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
Futurists should really learn what the word "plateau" means. The death of any given technical progression, particularly one that deals with information procesing, tends to be announced early and often, right up to the point where progress becomes meaningful again and then all of a sudden everyone saw it coming, and oh by the way where's my flying car?
Blame Startrek for making it look flawless. Speech recognition is just like fusion technology, 20 years away from properly working - just like it has been for the last 20 years.
-RANT- I cant stand voice recognition systems that don't at least give you an option to press a number. Especially when they are out of tune and pick up back ground noises as voice. Please, please, please - always give the option to press a number instead of having to voice everything!!
This blog post is retarded. The author is correlating a drop in internet news articles about Dragon NaturallySpeaking with a flatlining of speech recognition accuracy rate.
The Slashdot editor Soulskill is retarded for both not realizing this and for not reading the anonymously-submitted blog post (hmm no way it could have been the author) before approving it for the Slashdot front page. The guy is just out for more traffic to his rather pointless tech news commentary blog.
Decline of Slashdot, internet signal-to-noise ratio, get off my lawn, etc.
Long ago - decades, before Bill Gates was invented, a lot of research went into what would be required for actual voice recognition.
A counterexample was given, about an engineering marvel (of the time) that would recognise when someone said the word "watermelon". For a long time, people in the industry assumed that the path to voice recognition consisted of building more and better watermelon boxes.
Several authors, including Alan Turing himself, argued that actual voice recognition could never be accomplished with a large array of watermelon boxes. Current VR software divides input into a series of hyperplanes, and attempts to build a best match from the classification tree.
THis is the 2010 version of the watermelon box.
Real voice recognition won't be practical until the input is parsed, matched against context, and structured much akin to diagramming a sentence in those old English (or other) classes. In short, matching against a vocabulary is trying to solve an exponential problem with a (large) polynomial engine.
It won't be until the computer actually understands what is said that VR is likely to be practical in a global sense.
As a person who has been building computer systems for 35 years, it bothers me to see a huge body of research done into subjects like these ignored, because someone thinks that none of it applies to PC's.
Don't take life too seriously; it isn't permanent.
Not necessarily. Speech recognition doesn't fail when it can't figure out elaborate grammatical constructs and lexical ambiguities. Speech recognition fails because it can't figure out simple sentences in conditions humans can.
English, I would think is a pretty daunting language for speech recognition, what with a substantial array of homophones, but I wonder if other languages fare better. Maybe Spanish or, say, Japanese would be better since (I'm guessing) there is a closer relation to the written script and the actual sound that it makes.
Once I was a four stone apology. Now I am two separate gorillas.