Rest In Peas — the Death of Speech Recognition
An anonymous reader writes "Speech recognition accuracy flatlined years ago. It works great for small vocabularies on your cell phone, but basically, computers still can't understand language. Prospects for AI are dimmed, and we seem to need AI for computers to make progress in this area. Time to rewrite the story of the future. From the article: 'The language universe is large, Google's trillion words a mere scrawl on its surface. One estimate puts the number of possible sentences at 10^570. Through constant talking and writing, more of the possibilities of language enter into our possession. But plenty of unanticipated combinations remain, which force speech recognizers into risky guesses. Even where data are lush, picking what's most likely can be a mistake because meaning often pools in a key word or two. Recognition systems, by going with the "best" bet, are prone to interpret the meaning-rich terms as more common but similar-sounding words, draining sense from the sentence.'"
Buffalo buffalo Buffalo buffalo buffalo, buffalo Buffalo buffalo.
Natural language processing *is* AI. And high accuracy speech recognition requires natural language processing if we expect to have accuracy rates approaching that of a human. Humans hear words partially or incorrectly all the time. We fill in the gaps from context, and we correct if the course of the conversation reveals that the original interpretation is wrong. Expecting computers to do better, when half the time the problem is the speaker, not the listener, means you need it to be able to make the same corrections from limited information on the fly, and after the fact that a human brain makes.
$_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
One estimate puts the number of possible sentences at 10^570
What a completely useless metric. It makes sense to examine the context and meaning of speech in order to accurately transcribe words, but the number of possible sentences doesn't seem to accurately describe the problem here...
"Before criticizing someone, first walk a mile in his shoes. Then, you'll be a mile away... and you'll have his shoes."
I doubt it is completely dead. I have yet to hear it from the researchers working on AI. I work in affective computing, so I am thinking that it is possible that the missing component could be emotion or another way to increase the understanding and ability of computers to learn. In addition, even if it is not possible to increase speech recognition capabilities in this model of computing, in another model of computing this and more would be possible. I am not believing it until I hear it from researchers who have tried most possible options for improvement.
Having said that, Dragon works fairly well, provided you modulate your speech.
If you want a laugh with Dragon, turn away from the screen and talk normally, then look at what it has transcribed..
http://slashdot.org/~GuyFawkes/journal
Futurists should really learn what the word "plateau" means. The death of any given technical progression, particularly one that deals with information procesing, tends to be announced early and often, right up to the point where progress becomes meaningful again and then all of a sudden everyone saw it coming, and oh by the way where's my flying car?
I see a lot of claims, but not much evidence. If we're going to use perceptions and anecdotes as evidence, my impression is that speech recognition has always been considered vaguely stalled. In 2000, people didn't think much progress had been made since 1991 besides some commercialization of stuff academia already knew how to do. In 2010, this guy doesn't think much progress has been made since 2001 besides some commercialization of stuff academia already knew how to do. Yet I think some progress has been made over the past 20 years. There just haven't been any breakthroughs, which is maybe what he's expecting, given his vague suggestion that "AI", a pretty vague concept, is our hope.
I'm also skeptical that accuracy has flatlined, though it's possible that's true in some areas. My impression is that multi-speaker recognition, use of large corpora to improve accuracy, and use of language modeling to improve accuracy, have all improved over the past 10 years. Of course, not all improvements go everywhere: the speech recognition running in real-time on a mobile ARM processor is not using every possible state-of-the-art technique. The advance there is that you can run speech recognition in real-time on a mobile ARM processor at all, and get performance that was once only possible on pretty hefty workstations.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Blame Startrek for making it look flawless. Speech recognition is just like fusion technology, 20 years away from properly working - just like it has been for the last 20 years.
-RANT- I cant stand voice recognition systems that don't at least give you an option to press a number. Especially when they are out of tune and pick up back ground noises as voice. Please, please, please - always give the option to press a number instead of having to voice everything!!
What about the simple fact that conversation itself is a learning process?
You learn the extent of your audience's comprehension among other things. How can a computer be programmed to recognize everything when we lack a sufficient model to base it on?
There is a point in conversation when a sensible human being will recognize they are not getting their ideas through, and simply give up and say "never mind".
"Be prepared, son. That's my motto. Be prepared." --Joe Hallenbeck
This blog post is retarded. The author is correlating a drop in internet news articles about Dragon NaturallySpeaking with a flatlining of speech recognition accuracy rate.
The Slashdot editor Soulskill is retarded for both not realizing this and for not reading the anonymously-submitted blog post (hmm no way it could have been the author) before approving it for the Slashdot front page. The guy is just out for more traffic to his rather pointless tech news commentary blog.
Decline of Slashdot, internet signal-to-noise ratio, get off my lawn, etc.
People want "human quality" speech recognition.
As if we're ever going to get away from training speech recognition programs when we train listeners every day when we speak. It's just that most people don't look at it as being trained, since we're so used to doing it.
I'm sure you have more trouble understanding someone with a thick Cockney or Scottish accent if you're from the Midwest US. You'd ask that person to repeat a few times, wouldn't you?
To expect speech recognition programs to *not* use training is to expect them to exceed human intelligence. Indeed, it's to expect such programs to be psychic.
--
BMO
Long ago - decades, before Bill Gates was invented, a lot of research went into what would be required for actual voice recognition.
A counterexample was given, about an engineering marvel (of the time) that would recognise when someone said the word "watermelon". For a long time, people in the industry assumed that the path to voice recognition consisted of building more and better watermelon boxes.
Several authors, including Alan Turing himself, argued that actual voice recognition could never be accomplished with a large array of watermelon boxes. Current VR software divides input into a series of hyperplanes, and attempts to build a best match from the classification tree.
THis is the 2010 version of the watermelon box.
Real voice recognition won't be practical until the input is parsed, matched against context, and structured much akin to diagramming a sentence in those old English (or other) classes. In short, matching against a vocabulary is trying to solve an exponential problem with a (large) polynomial engine.
It won't be until the computer actually understands what is said that VR is likely to be practical in a global sense.
As a person who has been building computer systems for 35 years, it bothers me to see a huge body of research done into subjects like these ignored, because someone thinks that none of it applies to PC's.
Don't take life too seriously; it isn't permanent.
Not necessarily. Speech recognition doesn't fail when it can't figure out elaborate grammatical constructs and lexical ambiguities. Speech recognition fails because it can't figure out simple sentences in conditions humans can.
Interestingly enough, a computer would likely parse that sentence correctly, while nearly any human speaker (not familiar with the sentence) would think it's a nonsense phrase.
I gave up voice dialing when i sneezed and dialed my father. I coughed and got my mother,but no matter what i ddid a loud fart would not call my brother but open the web browser and visit slashdot.
Okay the last one might be a lie, but the sneezing to get my father is true. ry it, Make funny sharp noises at your voice dialer and see what it dials.
i thought once I was found, but it was only a dream.
You are exactly right. I've often said no two people actually speak the same language. They just sound very similar sometimes.
The word "data" is a plural countable noun. "Datum" is the singular form thereof. Plural countable nouns take the copula "are". Singular countable nouns take the copula "is". The sentence you quoted was thus grammatically correct: a datum "is", but data "are".
Though I admit, the treatment of "data" as a mass noun (the likes of which take the copula "is" as well) is common enough that it did sound jarring to my own ear, even knowing it was technically correct.
-Forrest Cameranesi, Geek of all Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
English, I would think is a pretty daunting language for speech recognition, what with a substantial array of homophones, but I wonder if other languages fare better. Maybe Spanish or, say, Japanese would be better since (I'm guessing) there is a closer relation to the written script and the actual sound that it makes.
Once I was a four stone apology. Now I am two separate gorillas.
I have been flamed more than a few times around here for suggesting Computer Science has not got a clue what they are doing when it comes to AI. Philosophy has been at this problem and more for the better part of the last 400+ years (more like a 1,000 years) in a serious way. The stock b.s., I get from the science fiction fan boys is that somehow natural language is a problem that can just be brute forced as if you were trying to figure out the password you forgot to your email account. Good luck with that.
By the way, language "recognition" by a computer is likly the easy part of the problem for AI researchers to crack. It is still not going to yield any real AI, just better cars and toasters.
Living in Chile
When you have lots of data, you don't have to build any "expert" knowledge into a learner.
This isn't really quite so clear cut. Feature engineering, model structure, model training techniques, and so on all bias statistical learners towards different parts of the hypothesis space. Hidden markov models (the standard in speech recognition) clearly constitute a data-driven approach, but usually they predict diphones (which appreciates the transitions between speech sounds) rather than phones themselves. That is, "cat" is recognized not by predicting a [k] followed by an [ae] followed by a [t], but (among other things) by a [k-ae] transition followed by a [ae-t] transition. This is a very direct way of encoding expert linguistic knowledge that speech sounds are pronounced differently in the context of other sounds. Think about where your tongue touches the top of your mouth in "keen" compared to "can."
You give computers way too much credit.
More likely it would think you said "Dear aunt, let's set so double the killer delete select all".
My experience with telephone Voice Rejection Systems is that they get what you say wrong more often than not, especially if you have a deep voice.
No, I won't to use a common dataset to train all software automatically, like VoxForge. What I was saying is that people don't need training to talk to each person they meet. A generic background training works fine, and so it should for computers.
Dilbert RSS feed
When I started on my Ph.D., I started out majoring in AI. One of several reasons I changed to computer architecture (CPU design, etc.) is because I just couldn't stand the broken ways that people were doing stuff.
I don't get it. You left a Ph.D. program because the field was immature? Isn't the whole point of a Ph.D. program to produce something new and share it? Yeah, I get that funding might be harder than a safer field like computer engineering, but it seems like you abandoned a huge opportunity. You make it sound like you had a whole slew of new, potentially great ideas, and you just dropped them because it would be "too hard".