Rest In Peas — the Death of Speech Recognition

Buffalo buffalo by Anonymous Coward · 2010-05-03 09:11 · Score: 5, Insightful

Buffalo buffalo Buffalo buffalo buffalo, buffalo Buffalo buffalo.

Re:Buffalo buffalo by Anonymous Coward · 2010-05-03 09:23 · Score: 3, Funny

This rest ponds was and turd you sings peach recon nation soft where
Re:Buffalo buffalo by CecilPL · 2010-05-03 09:24 · Score: 5, Funny

That comma is just out of place and makes the sentence hard to parse.
Re:Buffalo buffalo by liquiddark · 2010-05-03 09:30 · Score: 4, Insightful

What human can parse this without an expert to tear apart the context? I don't see the point in trying to serve up a sentence that simply isn't a sentence to most speakers of the language.
Re:Buffalo buffalo by Anonymous Coward · 2010-05-03 09:33 · Score: 1, Interesting

Has anyone really been far even as decided to use even go want to do look more like?
Re:Buffalo buffalo by hoggoth · 2010-05-03 09:37 · Score: 5, Informative

Buffalo bison whom other Buffalo bison bully, themselves bully Buffalo bison.

--
- For the complete works of Shakespeare: cat /dev/random (may take some time)
Re:Buffalo buffalo by Anonymous Coward · 2010-05-03 09:38 · Score: 5, Informative

For those that don't know:
http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo
'Buffalo bison whom other Buffalo bison bully, themselves bully Buffalo bison'.
Re:Buffalo buffalo by Hylandr · 2010-05-03 09:41 · Score: 1

Dood, Dood! Doooood. dood! DOOD!! Dewd...

--
~ People that think they are better than anyone else for any reason are the cause of all the strife in the world.
Re:Buffalo buffalo by blair1q · 2010-05-03 09:41 · Score: 1

Your marklar is well marklar.
Re:Buffalo buffalo by Hylandr · 2010-05-03 09:43 · Score: 2

Braincells are jumping to their deaths from my ears...

--
~ People that think they are better than anyone else for any reason are the cause of all the strife in the world.
Re:Buffalo buffalo by u38cg · 2010-05-03 10:00 · Score: 1

The point is not that it is a useful sentence, the point is that it is a sentence. What's even more remarkable is that you can add arbitrary repetitions of buffalo to it and still get a grammatical, meaningful sentence. The meaningful is important. Colourful green ideas sleep furiously. That parses grammatically, but it means absolutely nothing.

--
[FUCK BETA]
Re:Buffalo buffalo by jisatsusha · 2010-05-03 10:15 · Score: 1

James while John had had had had had had had had had had had a better effect on the teacher.
Re:Buffalo buffalo by grcumb · 2010-05-03 10:29 · Score: 1

Buffalo bison whom other Buffalo bison bully, themselves bully Buffalo bison.
Bull!
Bison bully beef before becoming Bully Beef. Bully Beef becomes the bully, Bison.

--
Crumb's Corollary: Never bring a knife to a bun fight.
Re:Buffalo buffalo by sbeckstead · 2010-05-03 10:31 · Score: 1

Station!

--
Why bother
Re:Buffalo buffalo by Anonymous Coward · 2010-05-03 11:10 · Score: 0

Yeah, you just keep yakking away...
Re:Buffalo buffalo by Anonymous Coward · 2010-05-03 11:33 · Score: 0

James, while John had had 'had', had had 'had had'; 'had had' had had a better effect.
Re:Buffalo buffalo by kindbud · 2010-05-03 11:43 · Score: 1

John while James had had had had had had had had had had had a better effect on the teacher.

--
Edith Keeler Must Die
Re:Buffalo buffalo by JanneM · 2010-05-03 11:58 · Score: 5, Insightful

Most people won't be able to parse the sentence, though. I know I can't. I have no idea how to interpret it as anything but a string of nouns. My guess is, even fewer would be able to parse it if spoken (the capitals and the comma are, I assume, important hints). It'd be unrealistic and unproductive to require speech systems to actually do better than most humans on the task; if many of us can't parse the sentence then why expect a computer to do so?
Better overall benchmark: require it to have the ability of a competent but not perfect second-language user. We're long used to dealing with that level of proficiency, whether because the conversant is a foreigner, a child, or has a dialect very different from our own.

--
Trust the Computer. The Computer is your friend.
Re:Buffalo buffalo by ClosedSource · 2010-05-03 13:19 · Score: 2, Interesting

If only speech recognition's problems were limited to these low-probability sentences. I've had a number of SR systems fail to recognize my "yes" and "no" responses.
Re:Buffalo buffalo by Anonymous Coward · 2010-05-03 13:24 · Score: 0

I posit that if an AI can understand speech as well as a second-language user, it'll have reached the point where it can easily surpass a first-language user.

What AI currently lacks is the ability to comprehend context.

If the machines rise up to destroy all humans, it'll be because of our incessant punning.
Re:Buffalo buffalo by jc42 · 2010-05-03 14:39 · Score: 1

What human can parse this without an expert to tear apart the context?
Well, I recall that when I first ran across that "N-buffalo" sentence for the first time several years ago, I got it on the second scan.
This time, though, it was a lot harder. This was because of that silly comma. Who ever uses a comma between the subject and verb of a simple declarative statement? But when I decided to try it without the comma, the meaning came right through.
I suppose this could make me an "expert", whatever that may mean. English is my native language, and I have a couple of college degrees, one of which included a minor in linguistics. But I'm not sure that means much. It's probably more meaningful that as part of my CS degree, I worked on machine translation, and thus I saw a lot of perverse examples like this one. (But I first ran across this one after getting that degree. ;-)
I did get a good understanding of why translation isn't possible without understanding. And sometimes it's impossible even with understanding, because you understand why a sentence in language X that looks simple may have two or more incompatible translations into language Y, so you need information not provided by the language X statement to decide which translation to choose.

--
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Re:Buffalo buffalo by arth1 · 2010-05-03 14:51 · Score: 1

We don't expect a computer to be able to understand constructs like that. But at this point, we don't expect a computer to correctly recognize normal phrases.
"A lesion" (allegiance) and "scuzzy babe" (SCSI bay) just isn't good enough.
Re:Buffalo buffalo by Anonymous Coward · 2010-05-03 15:33 · Score: 0

Badger badger badger Badger Badger badger badger Badger
Re:Buffalo buffalo by BikeHelmet · 2010-05-03 15:34 · Score: 1

Accurate speech recognition is easy. We know how to do it - we just don't have the speed required, yet.
It'd be a lot faster if we had a database of all the variations in how we speak most syllables - but I suspect such a database would easily surpass 4GB, for a single person. Add in the differences between people, and you'd have hundreds of terabytes of slightly different ways to pronounce things - for a single language. Actually, dialects can vary a lot, so it's probably far far higher.
Once we have such a database, perhaps some enterprising company will distill all that data into something easier to process? It happened for video - what was once stored analog, or close to, is now stored digital in formats like H.264, often getting 1:50 or 1:100 compression, with no perceivable quality loss. Once our desktops have 256GB of RAM, and 256-core CPUs, someone or some company will take that data, analyze it, and output something that can do accurate speech recognition with a meagre amount of RAM and CPU time. Perhaps 4-8 cores, and 2-4GB of RAM.
And then we'll all wonder why we didn't have perfect speech recognition in 2010, forgetting all the processing power required to distill that audio data into something meaningful.
Re:Buffalo buffalo by cgenman · 2010-05-03 15:45 · Score: 1

You might want to set the bar lower than a second-language user. According to Dragon Naturally Speaking's iphone app, you said:
"Most people get the parse the sentence though I know I can't I know I got interpreted as anything but string of bounds my guess is even fewer that would be the person spoken the B., R., I assume important. If it would be unrealistic and unproductive to require speech systems actually do better than most humans on the task is made of cat person send them I expect computer to do so better overall benchmark required have the ability of a competent but not perfect second language user would long used to dealing with that level of proviciency with it because the cumbersome to the former are a child or has a different dialect from our own."
This is run from their server, using their latest code, in non-realtime. I enunciated like a pro, in a reasonably quiet room. This is, essentially, the best circumstance that it could hope for. It's not even within the ballpark of what a second-language user could produce. It's only barely legible in spots because we have some sort of idea of what it should be talking about. At this point, any system that can do anything useful with an arbitrary (rather than limited) language input would be a huge breakthrough.

--
The ______ Agenda
Re:Buffalo buffalo by md65536 · 2010-05-03 16:01 · Score: 1

Buffalo buffalo Buffalo buffalo buffalo, buffalo Buffalo buffalo Buffalo buffalo buffalo?
Do not the bison that are buffaloed, also buffalo the same set of bison?
Re:Buffalo buffalo by rastoboy29 · 2010-05-03 17:12 · Score: 1

Well, when I speak this into my Android phone, it prints it exactly correctly. What is your point, exactly? :-p

--
expandfairuse.org
Re:Buffalo buffalo by JanneM · 2010-05-03 17:49 · Score: 1

I didn't imply that current systems reach that bar; I'm sorry if I gave that impression. We're clearly not close yet. What I meant really was more about setting expectations, and when to declare success. It's a parallel to AI in a way: we don't need human-level AI for it to be a success. 95% of everything we'd want AI systems to do would do nicely with the perceptual, motor and cognitive abilities of a dog or a rat.

--
Trust the Computer. The Computer is your friend.
Re:Buffalo buffalo by zwei2stein · 2010-05-03 18:28 · Score: 1

Or whole world can start using lojban ( http://www.lojban.org/ ) to communicate with computers.
No parsing problem there, the language was basically developped to be recognized (and understood) easily by computers (and humans) and be without homonyms, irregularities and other stuff.

--
-- Technology for the sake of technology is as pathetic as eschewing technology because it's technology.
Re:Buffalo buffalo by Vintermann · 2010-05-03 19:14 · Score: 1

Remember, the computer only has the context in the phrase itself (or possibly the document it's in). If you were in a bar, you might have misinterpreted the context and heard "scuzzy babe" yourself.

--
xkcd is not in the sudoers file. This incident will be reported.
Re:Buffalo buffalo by Anonymous Coward · 2010-05-03 19:37 · Score: 0

Spam spam Spam spam spam, spam Spam spam.
Re:Buffalo buffalo by cwry · 2010-05-03 20:44 · Score: 1

I had to help my uncle Jack off a horse.
Re:Buffalo buffalo by treczoks · 2010-05-03 21:45 · Score: 0, Flamebait

Better overall benchmark: require it to have the ability of a competent but not perfect second-language user.
As most non-native speakers have a better grasp of English than the average American, this benchmark will have its own kind of problems...
Re:Buffalo buffalo by asc99c · 2010-05-04 00:45 · Score: 2, Interesting

I'd never heard this one before, guess it's the American version! The one I was taught was a complaint by a pub landlord to their sign writer:
You've left too much space between pig and and and and and whistle
Re:Buffalo buffalo by Lonewolf666 · 2010-05-04 01:42 · Score: 1

Better overall benchmark: require it to have the ability of a competent but not perfect second-language user.
A competent but not perfect second-language user is what I consider myself. I might be able to understand the many-buffalo sentence after some thinking, if I was aware of all the synonyms of "buffalo" that appear in the construct. Let's check:
-"buffalo" as animal species: no problem
-"Buffalo" as name of a town: There is such a town, but it did not cross my mind at the moment. Not so obvious.
-"buffalo" as synonym for "bully": Highly unusual, and most second-language user will fail at this point.
Overall, I think this sentence is too difficult to be a reasonable test for the quality of language recognition.

--
C - the footgun of programming languages
Re:Buffalo buffalo by radtea · 2010-05-04 02:16 · Score: 1

Colourful green ideas sleep furiously. That parses grammatically, but it means absolutely nothing.
False. There's an entire sub-genre of micro-literature and poetry that creates contexts where "colorless green ideas sleep furiously" (to give what I think was Chomsky's original) can be used as a basis of meaning. It has been known for decades that Chomsky's example, however evocative, does not constitute anything but an artistic flourish that creates a false sense of plausibility for whatever Chomsky was arguing for at the time.
In any case, meaning is a verb, not an adjective. People who search for "the meaning of a sentence" are actually engaging in exactly the kind of nonsense activity that Chomsky in the '50's argued "colourless green ideas sleep furiously" implied.
People mean. Sentences just lie there, waiting for a human or other intelligence to be activily engaged with them and to mean something by them.
The entire premise of semantic linguistics is mistaken, the equivalent of the Ptolemaic model of the cosmos: useful up to a point, and full of curious internal structures that hint at something deeper, but ultimately a dead end that needs to be replaced at the foundations by a model of language and meaning built around a knowing subject that means things, not dead strings that happen to be one way of pragmatically and imperfectly conveying what we are meaning to others.

--
Blasphemy is a human right. Blasphemophobia kills.
Re:Buffalo buffalo by Beezlebub33 · 2010-05-04 02:59 · Score: 1

Obligatory XKCD

The point here is that 'normal' people understand speech reasonably well, most of the time. Computers can't. You simply cannot expect significant numbers of people to adapt to the computer that much (some, which is why we have the UI's and input devices we have, but not that much).

Speech is outrageously difficult. Try having a computer tell the difference between "gray day" and "grade A" without context. There are very subtle differences in them, but I don't think a computer is going to be able to determine the differences.

--
The more people I meet, the better I like my dog.
Re:Buffalo buffalo by wrightrocket · 2010-05-04 03:54 · Score: 1

My Droid Incredible had no problem recognizing this sentence. No, it didn't get the capitalization, or the comma, but it got every word and then found the article in wikipedia about this sentence. I've been amazed at how accurate it is at recognition without any training. So, as I see it, speech recognition is alive and well!
Re:Buffalo buffalo by u38cg · 2010-05-04 05:05 · Score: 1

Oh, don't get me wrong: this sentence is a totally unreasonable test of any language processor. It would surprise me greatly if any machine is ever able to tackle such constructs without just as much puzzling as us humans. The only reason it's interesting is because it does, technically, parse, with a bit of effort, and even just about means something sensible in the real world.

--
[FUCK BETA]
Re:Buffalo buffalo by CaptDeuce · 2010-05-04 18:45 · Score: 1

Buffalo buffalo Buffalo buffalo buffalo, buffalo Buffalo buffalo.
Is now rendered as:

Badger badger badger badger badger badger badger badger. Mushroom mushroom.

--
"Where's my other sock?" - A. Einstein

well by Anonymous Coward · 2010-05-03 09:13 · Score: 0

I the method which comes to make probably, them who see the work of the speech recognition software which is honest is suitable and language translation asserts! where

I hope.. by Anonymous Coward · 2010-05-03 09:14 · Score: 0

I certainly hope that TFA title is intentional...

I refuse to partake in a short sleep cycle process while lying in small round vegetables otherwise!

Goodnight, Sir!

Re:I hope.. by Zancarius · 2010-05-03 09:22 · Score: 1

I certainly hope that TFA title is intentional...
Considering the subject matter, I'd hope readers would be able to detect a play on words when they see one.
Nevertheless, it got your attention, didn't it?

--
He who has no .plan has small finger. ~ Confucius on UNIX

Key words by flaming+error · 2010-05-03 09:15 · Score: 2, Interesting

> meaning often pools in a key word or two
It's true.

My own hearing is not great. I often miss just a word or two in a sentence. But they are often key words, and missing them leaves the sentence meaningless. If I counted the words I understand correctly I'd probably have a 95% success rate. But if I counted the sentences I understand correctly, I'd be around 80%. So I get by, but I tend to annoy people when I ask for repeats over one missed word.

Re:Key words by SomeJoel · 2010-05-03 09:23 · Score: 4, Funny

& It's true.
My own ... is not great. I often miss ... a word or two in a sentence. But they are often ... words, and missing them leaves ... sentence meaningless. If I counted the words I understand ... I'd probably have a 95% success rate. But if I counted the ... I understand correctly, I'd be around ...%. So I get by, but ... tend to annoy people when I ask for ... over one missed word.
I can see how this would be annoying.

--
<Complete your profile by adding a signature!>
Re:Key words by CarpetShark · 2010-05-03 10:20 · Score: 3, Funny

I can see how...would be annoying.
Can see how WHAT would be annoying?
Re:Key words by bar-agent · 2010-05-03 11:59 · Score: 1

I accidentally the whole thing!

--
i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]
Re:Key words by Anonymous Coward · 2010-05-03 20:43 · Score: 0

Meaning is particularly vulnerable to very little words in English, sometimes:
Do you have the time to find out who I really am?
Do you have the time to find out who I really am on?

Android Speech Recognition Rules by bit+trollent · 2010-05-03 09:15 · Score: 5, Informative

I hardly type anything in to my HTC Incredible. Google's voice recognition, which is enabled on every textbox works just about perfectly.

Seriously, get an Android phone, try out the speech recognition text entry, and then tell me speech recognition is dead.

Re:Android Speech Recognition Rules by DeadDecoy · 2010-05-03 09:48 · Score: 1

Now the RIAA can use google's services to determine if you're singing a copyrighted song and hunt you down : D.
Re:Android Speech Recognition Rules by bertok · 2010-05-03 09:54 · Score: 2, Interesting

I hardly type anything in to my HTC Incredible. Google's voice recognition, which is enabled on every textbox works just about perfectly.
Seriously, get an Android phone, try out the speech recognition text entry, and then tell me speech recognition is dead.
I've tried Google voice recognition, but I found that it just detected gibberish unless I spoke with a fake American accent.
Re:Android Speech Recognition Rules by justinlindh · 2010-05-03 09:56 · Score: 1

You're right, with the caveat that most people tend to try to speak differently when they know they're speaking to digital transcription. The Android voice input also requires that you actually say the punctuation, as well (i.e. Hello comma Mom period Yes comma a visit would be nice exclamation point). So, unfortunately, even with Google's web powered voice transcription, you're still not speaking naturally.
I'm assuming that Google Voice uses the same technology for their automated transcription. In this case, the person will definitely be speaking naturally. The transcriber is spotty at best in that setting. I can usually get the gist of what's being said without needing to actually listen to the message and I appreciate how it applies different style types for things it thinks it could have gotten wrong (guesses are in a lighter shade of gray)... but it's far from perfect.
Re:Android Speech Recognition Rules by vanyel · 2010-05-03 09:57 · Score: 1

I haven't tried note taking on my Cliq, but voice dialing is a waste of time
Re:Android Speech Recognition Rules by Trogre · 2010-05-03 10:06 · Score: 4, Funny

... so its voice recognition works about as well as that of the average American then? ;)

--
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
Re:Android Speech Recognition Rules by ascari · 2010-05-03 10:06 · Score: 1

Tried it but was really put off by the device's ability to wreck a nice beach...
Re:Android Speech Recognition Rules by Digero · 2010-05-03 10:09 · Score: 3, Funny

We might get to the point where we can write text messages by speaking, then the person on the other end could have them read aloud by a computer. That would be so awesome. Maybe some day we'll be able to transfer the actual sound of our voices.
Re:Android Speech Recognition Rules by tool462 · 2010-05-03 10:24 · Score: 1

My wife has the same problem. She has a horrible time understanding what someone is saying unless it's in a standard American accent or perhaps Canadian. Southern accents, British accents, Irish accents, Australian, Indian might as well be unintelligible to her. And face it, no one can understand the Scots. We have to crank up the volume on TV when watching Top Gear or Dr. Who, and I still have to translate for her.
If Google voice recognition works as well as my wife's voice recognition, that's still pretty good.
Re:Android Speech Recognition Rules by peragrin · 2010-05-03 10:29 · Score: 2, Insightful

I gave up voice dialing when i sneezed and dialed my father. I coughed and got my mother,but no matter what i ddid a loud fart would not call my brother but open the web browser and visit slashdot.
Okay the last one might be a lie, but the sneezing to get my father is true. ry it, Make funny sharp noises at your voice dialer and see what it dials.

--
i thought once I was found, but it was only a dream.
Re:Android Speech Recognition Rules by mathfeel · 2010-05-03 11:05 · Score: 1

I share you observation that Google voice recognition has a close to 90% first try hit rate and 99% on second try. And I have a serious accent.

But it also sometimes fails hilariously. Once I have to look up an address 395 Grand Ave., and I got "Free WiFi Grand Ave.".

--
The only possible interpretation of any research whatever in the 'social sciences' is: some do, some don't
Re:Android Speech Recognition Rules by orangesquid · 2010-05-03 11:09 · Score: 5, Funny

What Dave said: "Open the pod bay doors, HAL."
What HAL heard: "Open the hot babe pornz, HAL."
HAL's speech recognition and morality programming* combined to give the famous reply, "I'm sorry, Dave. I'm afraid I can't do that." HAL knew certain things would have been too titillating to an all-ages film audience in 1968.
* Only for the film version. In the book version, it would have caused undue frustration to the reader, unable to see what Bowman was viewing. In that case, it was HAL's etiquette programming.

--
--TheOrangeSquid Is it any wonder things seem so awry? We swim in a sea of confusion and don't have to think to survive
Re:Android Speech Recognition Rules by icebraining · 2010-05-03 11:21 · Score: 1

My Nokia hasn't failed once, but I tend to click anyway; since I don't drive, I can usually reach for the keypad.

--
Dilbert RSS feed
Re:Android Speech Recognition Rules by dotancohen · 2010-05-03 11:34 · Score: 0

I hardly type anything in to my HTC Incredible. Google's voice recognition, which is enabled on every textbox works just about perfectly.
Seriously, get an Android phone, try out the speech recognition text entry, and then tell me speech recognition is dead.
Is that speech-recognition software distributed with Android, and hence distributed with the Linux kernel? If so, then it must be under a GPL-compatible licence, with source code available. How much effort would it take to compile it for Debian?

--
It is dangerous to be right when the government is wrong.
Re:Android Speech Recognition Rules by dotancohen · 2010-05-03 11:37 · Score: 1

Fine, you win, I won't call her then!
Thank god for sexting...

--
It is dangerous to be right when the government is wrong.
Re:Android Speech Recognition Rules by XnavxeMiyyep · 2010-05-03 11:57 · Score: 1

My old phone allowed me to record my voice manually for voice dials, which was great, since there are only a few people I call regularly. My current phone instead just tries to use voice regonition plus interpreting what I've typed as the contacts' name. It simply doesn't work; I wish it had the option to act as my old phone did.

--
I put the 't' in electrical engineering.
Re:Android Speech Recognition Rules by sootman · 2010-05-03 12:17 · Score: 1

That's one thing I'd love to see systemwide on the iPhone. Dragon is pretty sweet though. Does Android's voice recognition run locally or is it server-based like Dragon?

--
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
Re:Android Speech Recognition Rules by Daengbo · 2010-05-03 12:57 · Score: 2, Interesting

It's obviously tuned for that, but it wouldn't be fair to ask it to understand Scottish, now, would it? ;) Seriously, though, in my expat group we had at least ten English-speaking countries represented, and I had little trouble in most cases. There was still one New Zealander who I never understood, even after a year, and generally gave up asking him to repeat himself after the third time in a row and just tried to fake it. I'd get maybe 10-20% of the any sentence from him.
Speech recognition of random accents, even in one language, is virtually impossible. I think the computer needs to be given a clue about the accent of the speaker.

--
Put identity in the browser.
Re:Android Speech Recognition Rules by Daengbo · 2010-05-03 13:01 · Score: 1

The data is sent to Google and massively parallel processed, then sent back. It's SaaS.

--
Put identity in the browser.
Re:Android Speech Recognition Rules by Anonymous Coward · 2010-05-03 20:37 · Score: 0

The average American speaks with a fake American accent?
Re:Android Speech Recognition Rules by dotancohen · 2010-05-03 22:02 · Score: 1

Wow. Thanks.

--
It is dangerous to be right when the government is wrong.
Re:Android Speech Recognition Rules by Anonymous Coward · 2010-05-04 00:31 · Score: 0

I agree; I used my Android speech recognition for the first time yesterday and was wholly impressed. Used it about ten more times and it successfully interpreted every word (note that I was not TRYING to trip it up, though... an exercise for another day).
Re:Android Speech Recognition Rules by Anonymous Coward · 2010-05-04 00:40 · Score: 0

I'm from Italy and can understand top gears perfectly. Texans are those who give me most trouble.

And indians trying to speak american, but not as much as texans.
Re:Android Speech Recognition Rules by Voyager529 · 2010-05-04 01:37 · Score: 1

I meant to mod you 'funny', but hit 'overrated' by mistake. My bad.
Re:Android Speech Recognition Rules by ShakaUVM · 2010-05-04 02:39 · Score: 1

>>Seriously, get an Android phone, try out the speech recognition text entry, and then tell me speech recognition is dead.
Uh...
It's really, really, really bad.
I use it to drive one of my friends crazy from time to time. From my texting log on my Droid, when we were both visiting the Bay Area for different reasons:
Him: Meet up for dinner tonight?
Me: S gamma nothing in the bay area but a missionary visiting today.
Him: Is that a yes then? -Another friend- says yes.
Me: Okay good whenever you get out of the car on a disk give me a call and we can meet up for dinner.
Him: WTF?
Me, explaining: Is a google voice awesome lets me do you get text messages in the whole wide world.
Him: ...
Me, explaining more: Google voice translation sucks really awesome anniversary accurate accurate translation.
Him: I can't even figure out what your original sentences are supposed to be.
Me: It is possible you are a libertarian atc recipes distributors keralites make any sense.
Him: Uh... okay, you know what? I'm just going to call you now.
Looking at it above, probably the percentage of words guessed correctly was not too bad, better than half. But like the article says, it fucks up on the most important words in a sentence, the ones that carry the most meaning, and so it's not even as good as a kindergartner trying to transcribe your words badly with lots of spelling mistakes and such. Our brains can correct spelling mistakes, but these mistakes destroy meaning, irrecoverably.
Re:Android Speech Recognition Rules by ShakaUVM · 2010-05-04 02:58 · Score: 1

>>You're right, with the caveat that most people tend to try to speak differently when they know they're speaking to digital transcription.
I said this line to my Droid. Spoke normally, reference accent, pronounced the punctuation.
Transcription: "Here I come up with a caviar the most people tend to try to speak differently from another speaking to digital transcription." I think the "comma" became caviar, somehow.
>>The Android voice input also requires that you actually say the punctuation, as well (i.e. Hello comma Mom period Yes comma a visit would be nice exclamation point).
Transcription: "The android voice input also requires directions to the punctuation, is well open for the season i eat hello, mom. Yes, of the that would be nice! Cause of the seas."
Re:Android Speech Recognition Rules by Beezlebub33 · 2010-05-04 03:14 · Score: 1

My Differential Equations teacher grew up in Lithuania. And then learned English at Edinburgh.

I defy any speech recognition program to understand whatever the hell he was saying. I know I never new.

--
The more people I meet, the better I like my dog.
Re:Android Speech Recognition Rules by Anonymous Coward · 2010-05-04 03:26 · Score: 0

...and even better than the average American manager.
Re:Android Speech Recognition Rules by Anonymous Coward · 2010-05-04 04:17 · Score: 0

My results, naturally speech: You're right, the caveat that must've been actively women know their speaking to digital transcription.
Careful speech: correct, with the caveat that most people tend to speak differently than another speaking to digital transcription.

I've had much better luck with short search phrases. However, the #1 reason I don't use speech to text is the lack of privacy. I had to close my office door to do these tests.
Re:Android Speech Recognition Rules by SleazyRidr · 2010-05-04 04:18 · Score: 1

Yeah, but people do that too. I've learned that if I ask for a water, all I'll get is a blank stare, but if I ask for a wa-ater then I'll get a glass of water. I'm not sure why, but no-one can understand when I say coke either, so I've switched to pepsi.

--
Is 1563649 a prime number?
Re:Android Speech Recognition Rules by elrous0 · 2010-05-04 06:34 · Score: 1

And face it, no one can understand the Scots.
I thought we were talking about English speakers here.

--
SJW: Someone who has run out of real oppression, and has to fake it.
Re:Android Speech Recognition Rules by Deosyne · 2010-05-05 09:29 · Score: 1

Google Voice speech-to-text is fucking awful, which I'm guessing is due to recognition of speech rendered though a phone. Using the voice functionality on the handset, such as for searching for filling in text fields, on the other hand, is absolutely fantastic. I nearly always get 100% recognition when I speak clearly, even with words that I would never expect it to recognize. It doesn't provide any punctuation or capitalization, but otherwise is ridiculously good, like good enough to make me wonder why in the hell Dragon Dictate can't do nearly as well on my far more powerful computer with a much better microphone.
I just wish that I could find a desktop version of the Android speech-to-text engine so that I can set one of my keyboard's macro keys to work the same way as holding down the search button on my Droid.

Let me guess by Zerth · 2010-05-03 09:15 · Score: 4, Funny

That summary was written with speech recognition software?

Re:Let me guess by MollyB · 2010-05-03 09:37 · Score: 2, Funny

Hesitant grate watts peach wreck ignitions oft where kin dew ferrous?
Re:Let me guess by cgenman · 2010-05-03 15:54 · Score: 1

Speech recognition accuracy. Flatlined years ago. It works great for small vocabulary on your cell phone but basically computer still can't understand language prospects for a are dimmed and we seem to need AI for peters to make progress in this area time to rewrite the story of the future from the article the language universe is large google street words is a mere scrawl on a surface one estimate puts the number of possible sentences of 10 to the 570 though constant talking and writing more the possibilities of language enter into our possession but plenty of unanticipated combinations are made much for speech recognizes and risky guesses even where data are lush picking what's most likely to be a mistake because meeting often pools and keyword or two recognition systems going with the best that are prone to interpret the meaning of terms of the more common but similar sounding words draining sense from the set.
- As transcribed by Dragons' server-based iPhone interpreter.

--
The ______ Agenda
Re:Let me guess by Lemmy+Caution · 2010-05-03 17:35 · Score: 1

So, how are things in Glasgow these days?

What are you talk'in about ? by burni2 · 2010-05-03 09:17 · Score: 1

Years ago I used viavoice on Warp4, and it had a pretty decend recognitation rate ..

it was even better understanding my needs than I can get Windows7 understand mine by mice commands ..

I miss those times .. when grey was a chique color for OSes

Re:What are you talk'in about ? by CohibaVancouver · 2010-05-03 09:27 · Score: 1

Years ago I used viavoice on Warp4, and it had a pretty decend [sic] recognitation [sic] rate ..
Did you have to 'train' it to your voice using a script and a series of corrections, or did it have 'natural' speech recognition from the get-go, the way you do when you chat with a cashier at the supermarket?
Re:What are you talk'in about ? by corbettw · 2010-05-03 09:29 · Score: 4, Funny

Years ago I used viavoice on Warp4, and it had a pretty decend recognitation rate ..
Looks like whatever you're using now ain't quite as good.

--
God invented whiskey so the Irish would not rule the world.
Re:What are you talk'in about ? by bmo · 2010-05-03 09:53 · Score: 3, Insightful

People want "human quality" speech recognition.
As if we're ever going to get away from training speech recognition programs when we train listeners every day when we speak. It's just that most people don't look at it as being trained, since we're so used to doing it.
I'm sure you have more trouble understanding someone with a thick Cockney or Scottish accent if you're from the Midwest US. You'd ask that person to repeat a few times, wouldn't you?
To expect speech recognition programs to *not* use training is to expect them to exceed human intelligence. Indeed, it's to expect such programs to be psychic.
--
BMO
Re:What are you talk'in about ? by icebraining · 2010-05-03 11:27 · Score: 1

That's why we need speech corpus databases like VoxForge, so we can pre-train the software for multiple accents (or a similar to our own) without having to re-do it for every single installation, an with much poorer results thanks to the lower input set.

--
Dilbert RSS feed
Re:What are you talk'in about ? by bmo · 2010-05-03 12:35 · Score: 2, Interesting

Only you talk like you. There is no archive of speech large enough to encompass every speaker of a language except one that has a record of each and every speaker. And it still doesn't solve the teaching problem. The shotgun approach is problematic in many ways, most of all the size of the database and you'd still wind up teaching the speech platform to find what accent you're using, because if you ask most people, they don't have any accents at all.
Actually, I think the solution would be to make personal datasets portable, to standards, so when you go from one device to another, all you need to do is plug in your own dataset (or access it from the network) et voila, instant voice recognition wherever you go by systems designed to use that dataset standard. Sort of like an ODF for speech datasets. This way it's distributed, you don't have a humongously unwieldy database to manage, and it's personalized.
But that requires standards which don't yet exist, because every speech recognition platform reinvents the wheel every single time.
--
BMO
Re:What are you talk'in about ? by vlueboy · 2010-05-03 12:46 · Score: 1

Years ago I used viavoice on Warp4, and it had a pretty decend recognitation rate ..
Looks like whatever you're using now ain't quite as good.
<Big tangent ahead>
I tend to see the same issue on live TV. Apparently closed captioning in the US is a mixture of a speech recognition engine sometimes supervised by someone who can make changes a couple seconds before you see a word just pronounced. 15 years ago, I used to think the TV set could figure out the words and transcribe them, but it's fairly obvious now when a sentence transcribed live and badly scrolls by, and the "corrector" pushes a few dashes and writes a new sentence on the next line, so that we can see it scroll and note the intended sounds. It is still pretty amazing to watch live sports games and see very obscure non-game references to people's names be spelt properly on the first try. It's also kinda cool to watch something for an hour to realize that the captions aren't being corrected by a human at all, when you thought they were. A sentence like the GP's tends to slip by pretty often. Movies and non-live TV usually seems proofed and even comes in fixed block layout, instead of a slightly delayed "speech recognized" stream of text.
Re:What are you talk'in about ? by icebraining · 2010-05-03 13:05 · Score: 1

Only you talk like you. There is no archive of speech large enough to encompass every speaker of a language except one that has a record of each and every speaker.

Works for humans. I don't need to recite some Shakespeare (or in my case, something from Eça de Queiroz) excerpt before people can understand me. It's not perfect, but it works fine.

But that requires standards which don't yet exist, because every speech recognition platform reinvents the wheel every single time.
VoxForge uses the same dataset to compile models for four different engines.
Don't these different formats exist because of different approaches to the recognition, or could they simply be unified?

--
Dilbert RSS feed
Re:What are you talk'in about ? by Anonymous Coward · 2010-05-03 13:39 · Score: 0

Well, we know that they're working on systems that use bio-input to produce results on a computer (helmets to control a ball on the screen, for example). As rough as those systems are, is there any possibility they could be focused on speech recognition, either by assessing muscle movements in the face to aid understanding of spoken language (naturally, this would work against you if you were laughing your ass off while trying to talk to it) or by detecting minimal changes in physiology that tint meaning? I'm sure a person saying 'tear' (crying) vs. 'tear' (ripping) would have some slight difference in facial expression and/or mental activity that might be picked up by a machine, if it was taught to understand the context of both meanings.
Re:What are you talk'in about ? by rpresser · 2010-05-03 13:53 · Score: 1

"Works for humans. I don't need to recite some Shakespeare (or in my case, something from Eça de Queiroz) excerpt before people can understand me. It's not perfect, but it works fine."
Humans spend YEARS being trained to understand speech, by multiple speakers. We call those humans who are untrained at speech recognition "babies".
And you want to skip all that.
Re:What are you talk'in about ? by mattack2 · 2010-05-03 13:59 · Score: 1

I don't know if speech recognition is used in the US, but I think you're being misled by the incorrect translation of phonetic output is then corrected.
From:
http://en.wikipedia.org/wiki/Closed_captioning#Television_and_video

For live programs, spoken words comprising the television program's soundtrack are transcribed by a human operator (a Speech-to-text reporter) using stenotype or stenomask type of machines, whose phonetic output is instantly translated into text by a computer and displayed on the screen.
It DOES also say:

Automatic computer speech recognition now works well when trained to recognize a single voice, and so since 2003, the BBC does live subtitling by having someone re-speak what is being broadcast.
But that obviously requires someone to re-speak it, and doesn't add names to the output and so on.. So even though they're using some speech recognition, I am inferring that you're thinking that they simply play the audio into a speech recognizer (not have someone re-speak it).
Plus, I have no idea if they do that in the U.S.
I wish the captions were as good as the subtitles (see the Wikipedia page, at least in the U.S. those terms are used differently) on DVDs. The phonetic, as well as simply spelling errors, get pretty bad sometimes. Though I do definitely appreciate having captions there. I say that as a hearing person, but who often turns on captions if I miss a word.
Re:What are you talk'in about ? by bmo · 2010-05-03 14:24 · Score: 1

Works for humans. I don't need to recite some Shakespeare (or in my case, something from Eça de Queiroz) excerpt before people can understand me. It's not perfect, but it works fine.
If William Shakespeare came up to you, you'd have trouble communicating with him until you learned his lexicon and his pronunciation. Conversely, the reason why you are so easily able to communicate with your peers is because you have a similar accent and shared lexicon.
A computer speech recognition program is not a person living in society constantly immersed in the language. Language changes, especially English. Consider the words that have entered the vocabulary in the last 30 years. The database lag would be horrendous when trying to get every single pronunciation of colloquialisms. A database such as yours also doesn't adapt with the way you speak and pronounce words as you go through life. The way you spoke as a child was not the same way you spoke as a teenager or adult or as an elder. A centralized database is hindered in its ability to learn.
A personal dataset that goes with you solves all that.
If I sell a speech recognition device, in your scenario, I would also have to distribute the gigantic "universal" dataset you propose. This requires much more computing power than one that merely uses your own personal dataset. This is because I simply don't know just who is buying my device. I should not have to care who buys the device. If I don't want to ship a huge database with the device I _could_ require it to be persistently connected to the 'Net, but there are times when that's either not feasible or desired. If I sell a speech recognition device that uses the owner's own dataset, we wind up with a more accurate device while also scaling down the need for so much storage and it's *much* more portable.
Don't these different formats exist because of different approaches to the recognition, or could they simply be unified?
Unification should be one of the goals. Less duplication of effort is always a good thing.
--
BMO
Re:What are you talk'in about ? by Anonymous Coward · 2010-05-03 19:51 · Score: 0

We've got better than human quality speech recognition. Have you any idea of the quality of typical human quality speech recognition? It's fucking shite. We only think it's good because we're experts at fooling ourselves. Plus, common accents, personal history, jargon, context, body language etc provide metric fucktons of redundancy. Play a game of "telephone" or "chinese whispers" and you'll hear what I mean.
Re:What are you talk'in about ? by icebraining · 2010-05-03 20:04 · Score: 2, Insightful

No, I won't to use a common dataset to train all software automatically, like VoxForge. What I was saying is that people don't need training to talk to each person they meet. A generic background training works fine, and so it should for computers.

--
Dilbert RSS feed
Re:What are you talk'in about ? by clickety6 · 2010-05-03 20:31 · Score: 1

It's perfect - he's just from Brooklyn

--
----------------------------------- My Other Sig Is Hilarious -----------------------------------
Re:What are you talk'in about ? by Maxo-Texas · 2010-05-04 03:05 · Score: 1

this.

--
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
Re:What are you talk'in about ? by rpresser · 2010-05-04 03:34 · Score: 1

That "generic background training" inside your head is derived from years or decades of experience WITH MULTIPLE SPEAKERS. And when you meet someone with an accent you've never heard before, you DO need additional training. Deny it if you dare.
Furthermore, the data from your training is not accessible to be used to train new listeners. Each listener must be trained individually.
I'm not saying it will be impossible to do what you want. I'm just saying what you want is one or more orders of magnitude more complicated and more impressive than training a baby to understand speech.

Google Voice isn't Horrible by bobstreo · 2010-05-03 09:17 · Score: 1

It's close enough to usually understand. But I'm not sure if it's a computer translation or a bunch of pigeons typing to translate.

AI by ShadowRangerRIT · 2010-05-03 09:17 · Score: 5, Insightful

Natural language processing *is* AI. And high accuracy speech recognition requires natural language processing if we expect to have accuracy rates approaching that of a human. Humans hear words partially or incorrectly all the time. We fill in the gaps from context, and we correct if the course of the conversation reveals that the original interpretation is wrong. Expecting computers to do better, when half the time the problem is the speaker, not the listener, means you need it to be able to make the same corrections from limited information on the fly, and after the fact that a human brain makes.

--
$_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print

Re:AI by ShadowRangerRIT · 2010-05-03 09:24 · Score: 3, Insightful

Just as an example, my father is partially deaf. No hearing in one ear, and less than a quarter of human baseline in the other. But with a hearing aid (which still doesn't get him to full functionality), he gets 95% accuracy or better in regular conversation, and it gets better as the conversation progresses. It's not because the hearing aid is fixing the underlying problem (it can't, since the problem is in the inner ear). But if he knows the general topic, and picks up on 50% of the phonemes, he can fill in the blanks and figure out the gist of the sentence, despite hearing it in bits and pieces. As the conversation progresses, his accuracy improves because he is supplying the prompts; if the responses fall into the set of "expected" responses, filling in the gaps becomes even easier. By contrast, if you change topics abruptly or go off on a tangent, you may need to repeat yourself half a dozen times. Now a computer will have better "hearing", but if it doesn't know the topic before you start, it's going to have the same problem anytime you slur a word, elide a syllable, or clear your throat mid-sentence. People expect to speak to a computer and have it understand, forgetting that people aren't usually expected to interpret a sentence in isolation, with no idea of the topic.

--
$_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
Re:AI by ircmaxell · 2010-05-03 10:01 · Score: 2, Interesting

Exactly... In order to do anything more than just "the word that was just spoken was 'x'", you need contextual and object clues. Hofstadter did a great job talking about this in his book Gödel, Escher, Bach: An Eternal Golden Braid. Right now, computers can do nothing more than simple symbol lookups. Speech recognition tries to find the word that matches the vocal pattern. So when it stumbles, the result is useless (the same goes for OCR). With contextual recognition, it can more accurately guess at what was said (that's all we do. When we hear an address that ends with "United States", we automatically know that it's the same as if we heard "USA"). That's something that I do believe is possible, we just haven't gotten to that point yet. The problem is that right now, we don't have any kind of actual contextual analysis possible. We do have some hard coded context clues, but nothing that represents a system that can "learn". The interesting thing though is that to teach an AI program to "speak" a language you need to give it a vast amount of input. Who has lots of input, and gets constant information regarding the accuracy of said input? Search engines. So if anyone can do it, I'd bet that Google is a position to do it (along with the other major engines, it just seems like it would be one of Google's projects)...

--
If a man isn't willing to take some risk for his opinions, either his opinions are no good or he's no good
Re:AI by Graff · 2010-05-03 10:38 · Score: 1

Just as an example, my father is partially deaf. No hearing in one ear, and less than a quarter of human baseline in the other. But with a hearing aid (which still doesn't get him to full functionality), he gets 95% accuracy or better in regular conversation, and it gets better as the conversation progresses.
There have been a lot of papers written about the fact that English has a TON of redundancy built-in so that you can miss a lot of the conversation and yet still obtain the general meaning. In fact, English has been deemed to be one of the more fault-tolerant languages in existence. Here's an article by Claude Shannon (the same Shannon famous for the Nyquist-Shannon sampling theorem) on the topic.
The trick is to build algorithms which can properly analyze and utilize this redundancy in order to understand what is being said. Right now it's one of those tasks that the human mind easily handles but which we still haven't discovered how to do it via computers.

--
Sapere aude!
Re:AI by wurp · 2010-05-03 12:44 · Score: 2, Informative

Google voice recognition already does exactly that. It matches words against their database of words commonly used together via their search engine.
This message was composed using android voice recognition on my nexus 1 phone. I had to manually correct 2 words out of the whole post.
Re:AI by nine-times · 2010-05-03 13:05 · Score: 1

Well the article does talk about the relationship between speech recognition an AI, but it basically claims that good speech recognition requires far more sophisticated AI than previously expected. Various people had tried simple statistical analyses based on the context of each word in a sentence, but the result still wasn't good enough. Some people have begun to think that a computer would need to develop a real understanding of what you're saying in order to give human-quality speech recognition, and that's some serious AI.
My own thoughts on the subject is that getting an AI to "understand what you're talking about" is going to be even more difficult that most people suspect. Human understanding isn't merely guided by passive observance and interpretation. Empathy plays a big role. Human animal behaviors and instincts come into play. We also draw a *lot* from our own personal experiences. I'm not sure if you can get an AI to really understand unless it's out in the world itself, in a body, having "human" experiences of its own.
Re:AI by ShadowRangerRIT · 2010-05-03 13:07 · Score: 1

Well, assuming you count numbers as words, you needed to manually correct 2 out of 47 words. So you still had to correct over 4% of all words spoken (and that's assuming your final sentence was part of the count). My father might not get 96% of the individual words, but he'd probably get the meaning of your sentence as a whole. The question is whether Google could convey the meaning of the sentence despite missing those words. And remember, Google gets to hear you up close. Will it do nearly as well when Jean Luc Picard tells it to go to Maximum Warp, while sitting in a chair fifteen feet from the microphone? Or will it make the jump to Mardi Gras Wart?

--
$_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
Re:AI by ircmaxell · 2010-05-03 14:54 · Score: 1

Actually, it doesn't. It knows nothing about context. All it does know is what words are commonly used together. There's a huge difference between understanding the meaning of a sentence, and knowing the statistics of word relation. In order for TRUE speech recognition, you'd need context recognition. Otherwise the phrase "Some apples are green and others are red" makes just as much sense as "Some apples are bean and other side dead". It's only the context (the knowledge about Apples commonly having one of two colors: green and red) that lets you understand what the phrase really was. That's likely how our brains work. We create objects about stored memories and items. So when you hear the word apple, it pulls an instance of the class apple. Then, in the rest of the sentence, we compare that object to other objects conjured by the other words. Only then do we build the context of the sentence. That's how we intuitively know that the sentence should be "Some apples are green and others are red". Not because "green" and "red" are more commonly used with the word apple than "bean" and "dead". But you could make a sentence that would make sense with those words, "The apple fell onto a bean and ended up dead". Only if you understand something about Apples, Beans, dead, red and green, and the complex relation between them can you possibly try to understand the sentence. That's why rule based processing (how Google does it) will never be as accurate as a human. Not because it's a fundamental limit of recognition, but because it has none of the information about what each word means. Try giving a foreigner (non-English speaker) those two sentences and an English only dictionary. Do you think they'd be able to tell you which one was the proper sentence? No. Not because of a lack of information, but because of a lack of context. It's all about context... And until AI can contain complex information about words and their meaning, it has no hope of being accurate at text to speech on the large scale...

--
If a man isn't willing to take some risk for his opinions, either his opinions are no good or he's no good
Re:AI by Anonymous Coward · 2010-05-03 18:45 · Score: 0

Natural language processing *is* AI.
But is natural language processing AI-complete or just AI-hard? That's what we really want to know.
Re:AI by Anonymous Coward · 2010-05-03 22:18 · Score: 0

I can only expect that a robot will smash a peach and destroy a beach before it finally answers yes to the actual question at hand.
Re:AI by wurp · 2010-05-04 01:03 · Score: 1

I agree that understanding meaning is important to getting better speech recognition, but I disagree about "true speech recognition". What you're describing is almost true Turing complete AI.
What Google is doing is in fact paying attention to context. And it does let them identify speech that even humans would have difficulty with. If I say "Jonathon Coulton Code Monkey", Google gets it completely right, including the odd spelling of the last name, because it knows those words go together.
The article's assertion is that speech recognition has stagnated. On the contrary, I think it had mostly stagnated for 10 years, but what we have now is a big leap over what we had 10 years ago. Ten years ago, it was a toy for anyone with the use of both hands. Today, it is often the most convenient way to enter information on my phone, and I'm an excellent typist.
Re:AI by sjames · 2010-05-06 06:50 · Score: 1

Beyond just simple equivalents like "United States" and "U.S.A." we also use the context to determine which of several homophones was used and in many cases fill in a word we didn't actually hear and don't even realize we didn't hear it. We have an entire style of humor based on causing the listener's brain to fill in the wrong word (often a favorite amongst kids who aren't allowed to swear and Benny Hill) often involving slurring the first part of the word and clearly pronouncing the rest. See the law office of Barton, Darton, Larton and Farrrrrrrrgo.
Doing that to the degree that humans do every day would require the computer to have a full AI including an understanding of modern cultural references and perhaps even a sense of humor.

That's Because... by BJ_Covert_Action · 2010-05-03 09:17 · Score: 5, Funny

It only flatlined because nobody tried to write speech recognition software in perl*.

*Disclaimer: Poster is not responsible for attempts resulting in unintended AI development and/or end of the world scenarios brought on by such an irresponsible endeavor.

--
Motorcycles, Robots, Space Gossip and More!

Re:That's Because... by Anonymous Coward · 2010-05-03 09:57 · Score: 0

Pff, it's already done in Python:
import speech.recognition
Re:That's Because... by CarpetShark · 2010-05-03 10:24 · Score: 1

It only flatlined because nobody tried to write speech recognition software in perl*.
True. If someone had tried to write good speech recognition in perl, it'd be easy. Every word would be "FUCK!"
Re:That's Because... by Anonymous Coward · 2010-05-03 11:07 · Score: 0

"It only flatlined because nobody tried to write speech recognition software in Common Lisp."
Fixed that for ya there, Chief! As if Perl had anything on CLisp :P

Well duh. by bmo · 2010-05-03 09:17 · Score: 3, Funny

Even humans mishear speech.

"'Scuse me while I kiss this guy"

That misheard lyric is so common that there's a book about misheard lyrics with that as the title.

--
BMO

Re:Well duh. by Eudial · 2010-05-03 09:19 · Score: 1

Eggcorns constitute another great example of how humans get this wrong.

--
GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
Re:Well duh. by CityZen · 2010-05-03 09:25 · Score: 2, Funny

"Time flies like an arrow; fruit flies like a banana."
Re:Well duh. by Chris+Burke · 2010-05-03 09:28 · Score: 5, Funny

That misheard lyric is so common that there's a book about misheard lyrics with that as the title.
I know! A surprising number of people think Hendrix was talking about kissing the sky, rather than embracing the experimental, counter-culture, and free-love nature of the 60's, simply because they don't like to think of their testosterone-filled hero sucking face with another dude. Like, get over it! "Kiss the sky" doesn't even make any sense unless you're on some kind of mind-altering substance, and there's no way Jimmy would have put something like that in his body!

--

The enemies of Democracy are
Re:Well duh. by bunratty · 2010-05-03 09:30 · Score: 1

You mean Hendrix wasn't gay after all? Next you'll be telling me that CCR never said there's a bathroom on the right!

--
What a fool believes, he sees, no wise man has the power to reason away.
Re:Well duh. by Tynin · 2010-05-03 09:45 · Score: 1

Even humans mishear speech.
"'Scuse me while I kiss this guy"
That misheard lyric is so common that there's a book about misheard lyrics with that as the title.
-- BMO
Their was a Tool song my friends and I argued over the lyrics of for quite some time. Think it was the prison sex song. I was sure it said, "...my lamb and martyr, this will be over soon...". My friends were sure the song was, "...my loving mother, this will be over soon...". Considering the topic of the song I suppose it could have had yet another level of depravity to it with the whole mom/incest angle, my perverted friends sure thought so, even though I didn't think it made much sense.
Re:Well duh. by blair1q · 2010-05-03 09:46 · Score: 1

But...but computers are supposed to be perfect!
Re:Well duh. by swilver · 2010-05-03 09:51 · Score: 1

Well, I checked a few pages worth of content on that site, and I must say it looks like most of these "misheard" lyrics are people trying to make a funny (often sex related) joke (or are simply lacking the correct vocabulary knowledge) instead of actually mishearing the lyric. Some of the songs on that list have lyrics that are so clear it's near impossible to hear them wrong. I certainly didn't find any that I heard wrong.
Disclaimer: I'm not native English, I am a musician though.
Re:Well duh. by Anonymous Coward · 2010-05-03 09:57 · Score: 1, Informative

The article acknowledges this... mentions speech recognition topped out with a 20% word error rate, while humans have an error rate of 2%-4%.
Re:Well duh. by Graff · 2010-05-03 10:24 · Score: 1

Even humans mishear speech.
"'Scuse me while I kiss this guy"
It's called a mondegreen. Another famous one is "the girl with colitis goes by" instead of "the girl with kaleidoscope eyes" from Lucy In The Sky With Diamonds by the Beetles.
There are a ton more of them out there, such as "there's a bathroom on the right" instead of "there's a bad moon on the rise" from Bad Moon Rising by Creedence Clearwater Revival.
Speech is definitely a tricky thing that even people may not have mastered!

--
Sapere aude!
Re:Well duh. by CarpetShark · 2010-05-03 10:26 · Score: 1

"Kiss the sky" doesn't even make any sense unless you're on some kind of mind-altering substance,
Or have an ounce of poetry in you... ;)
Re:Well duh. by capo_dei_capi · 2010-05-03 10:28 · Score: 1

"Kiss the sky" doesn't even make any sense unless you're on some kind of mind-altering substance
Umh, the song's called purple haze.
Re:Well duh. by Chris+Burke · 2010-05-03 10:42 · Score: 4, Interesting

Or have an ounce of poetry in you... ;)
Hmm... I guess I don't have that since I don't know what it is. That's okay, I can find out with the help of my AI using the latest in voice recognition software! Computer, what is "poetry"?
Computer: "Poetry" is a form of literary art, frequently using an organized metric and rhyme scheme, that attempts to evoke an emotional response in the reader through the use of metaphor.
Huh, okay, that's interesting. But computer, what is a metaphor?
Computer: A "meta" is for people who lack the capabilities to contribute directly to a field or endeavor, but who still wish to sound educated and useful by discussing the nature of the field or endeavor itself. Example: "Physics has way too much math for me, but meta-physics is right up my alley!"
Yeah, now I'm just confused.

--

The enemies of Democracy are
Re:Well duh. by Anonymous Coward · 2010-05-03 11:29 · Score: 0

What's love, but a second hand in motion?
Re:Well duh. by Rick17JJ · 2010-05-03 11:35 · Score: 1

As a child, during our grade school's every morning flag salute and prayer, we would usually sing the "Star Spangled Banner." I always thought one line ended with "atom bombs bursting in air." After a few years of singing it that way every morning, I eventually realized that they were actually singing something roughly more like "and the bombs bursting in air."

During this same time period, the Cuban missile crisis occurred, where we had to frequently practice hiding under our desks, with one arm covering our closed eyes and the other arm protecting the back our heads. So my misinterpretation of the lyrics of the Star Spangled Banner, seemed appropriate for the time.

At one point during the Cuban Missile Crisis, a 10 inch wide , slowly spinning, charred black piece of paper slowly drifted down from the sky, and landed ominously next to us, during the morning flag salute. Several years later, we had an earthquake while saying the lord's prayer one morning, with the bushes shaking.
Re:Well duh. by Arthur+Grumbine · 2010-05-03 11:47 · Score: 1

"Kiss the sky" doesn't even make any sense unless you're on some kind of mind-altering substance
Umh, the song's called purple haze.
Actually, I'm pretty sure it's called "Sarcasm".

--
Now that I think about it, I'm pretty sure everything I just said is completely wrong.
Re:Well duh. by Anonymous Coward · 2010-05-03 13:58 · Score: 0

Unless you saw him in person at Woodstock as he pursed his lips and smacked them towards the clouds.
Re:Well duh. by SpaceCadets · 2010-05-03 16:42 · Score: 1

"The pastas in their heads, the future's in our hands!" What the hell, Living End?? What type of pasta?!?
Re:Well duh. by Anonymous Coward · 2010-05-03 18:33 · Score: 0

There's a Sonic Youth's song, "It is my body", sang by Kim Gordon which I found shocking to have misinterpreted during years. I understood "Have you got the time to find out who I really am?" and a groan, instead of "Have you got the time to find out who I really am on?". I gess I had the time!
Re:Well duh. by radtea · 2010-05-04 03:19 · Score: 1

I always thought one line ended with "atom bombs bursting in air."
There's an old Canadian patriotic song, "The Maple Leaf Forever" that's not much sung anymore but was once a reasonable substitute for the national anthem. It has a line in it about "Wolfe the dauntless hero came/to plant Britannia's flag upon Quebec's fair domain" that makes it strangely unpopular amongst 25% of the Canadian population, post Quiet Revolution.
Back in the '30's it was sung regularly by school-children, and one of them wrote an essay a few years back on his traumatic experience with lyric mis-interpetation: he and one of his little friends in about grade three heard "Wolfe the donkless hero came...", "donk" being slang for "penis".
This resulted in him and his friend speculating on how General Wolfe had got his privates blown off, ultimately ending in asking their third-grade teacher about it. She apparently turned bright red, told them the correct reading of the line, and retreated to the teacher's prep room, from whence were heard gales of laughter...

--
Blasphemy is a human right. Blasphemophobia kills.
Re:Well duh. by Anonymous Coward · 2010-05-04 04:40 · Score: 0

"Kiss the sky" doesn't even make any sense unless you're on some kind of mind-altering substance,
Or have an ounce of poetry in you... ;)
Whoa! Don't drop, like, a whole ounce of poetry dude. You'll be fucked for, like, days. If you're new to it, a gram of "pote" will do you the whole night...
Re:Well duh. by Anonymous Coward · 2010-05-06 09:03 · Score: 0

Is there a difference?

Sorry what? by Daas · 2010-05-03 09:19 · Score: 1

When talking to someone else, we can politely stop them and ask : "Sorry, what did you say?". As someone whose first language is not english, I tend to use these words a lot, mostly because of differences in pronunciation. Computers, on the other hand, are supposed to get everything right the first time! Why can't they, like us, ask those simple words instead of making stupid guesses??

Re:Sorry what? by cnoocy · 2010-05-03 09:24 · Score: 1

A lot of vpr systems do just that. Also, dictation systems display what you've typed on the screen, so you can correct by voice if necessary.

--
This sig is not the Zahir. Lucky for you.
Re:Sorry what? by Ethanol-fueled · 2010-05-03 09:30 · Score: 2, Interesting

When talking to someone else, we can politely stop them and ask : "Sorry, what did you say?"
That dosen't always work. When accents and the command of a language are so poor, you only get a few chances to ask, "Sorry, what did you say?" After asking three times, you either look like an asshole and/or give up and spend the next few minutes nodding and smiling before trying to parse what they said, hoping you get it right.

Which is why we need good speech-recognition and translation software. It's easy to infer the meaning of "come to me give the diagram" because there are at least intelligible words to work with. And no, I'm not being racist -- the situation applies to all cultures and languages.
Re:Sorry what? by Anonymous Coward · 2010-05-03 09:31 · Score: 0

Computers, on the other hand, are supposed to get everything right the first time! Why can't they, like us, ask those simple words instead of making stupid guesses??
They do and can. I called up a company and got a computer help line. The computer insisted I state my problem so it can look it up. Of course it never got it even close to right and insisted we keep trying again. By the forth time my question involved mostly swears words and finally I was transferred to a human. It only got worse from there.
Re:Sorry what? by Jeng · 2010-05-03 09:33 · Score: 1

It would seem that people learn computers better than computers learn people.
Much like talking to someone with a poor grasp on ones language you try to make things simple and easy to understand.

--
Don't know something? Look it up. Still don't know? Then ask.

Number of sentences? by Logarhythmic · 2010-05-03 09:19 · Score: 2, Insightful

One estimate puts the number of possible sentences at 10^570

What a completely useless metric. It makes sense to examine the context and meaning of speech in order to accurately transcribe words, but the number of possible sentences doesn't seem to accurately describe the problem here...

--
"Before criticizing someone, first walk a mile in his shoes. Then, you'll be a mile away... and you'll have his shoes."

Re:Number of sentences? by Anonymous Coward · 2010-05-03 09:39 · Score: 0

Also, that number is kinda bullshit. In the article, it links to one guy making some back of the envelope calculations about the numbers of sentences. The author (a phonetician) doesn't take into account things like new word creation, novel developments in syntactic structures, or even basic things like recursive embedding of sentence in other sentences.
Though there may be a practical bound on the number of sentences in a language, there's no theoretical limit.
Re:Number of sentences? by forkazoo · 2010-05-03 10:26 · Score: 1

What a completely useless metric. It makes sense to examine the context and meaning of speech in order to accurately transcribe words, but the number of possible sentences doesn't seem to accurately describe the problem here...
Yeah, my first reaction was to wonder how many valid C language programs are possible. Not that it effects the complexity of making a compiler in any way! If you take it just as a sort of random factoid that adds flavor to your understanding of the problem, I guess it works.
Personally, I've never understood why there hasn't been more work on teaching people to speak in a way that computers understand. I mean, we don't type plain English at the BASH prompt. We learn rules for quoting. We learn special characters. We learn to type command in an English-inspired language that the computer already knows.
Similarly, I think that if speech recognition ever becomes genuinely useful, it'll be because somebody gave up trying to make a computer understand English (which not even a human can do 100% correctly!), and instead tried to make computer-understandable-english. I'm imagining special characters stated with whistles and clicks. All commands defined in terms of phonemes instead of defining them in terms of written statements and attempting to recognise indirectly. Lots of short, 1-2 syllable non-words used for commands.
Re:Number of sentences? by ENIGMAwastaken · 2010-05-03 12:29 · Score: 1

It's worse than that because it's not even true. There are, quite literally, an infinite number of valid English sentences because you nest phrases like "The man who had the wife who had the son who had the cousin who had the...." and so on, ad infinitum. At no point does this stop being a valid, grammatical English sentence. Of course the utterence itself is necessarily finite because, eg. the heat of the death of the universe will prevent it from being produced, or we'll die of old age, but this is a physical limitation, not a linguistic one.
Re:Number of sentences? by vlueboy · 2010-05-03 12:55 · Score: 1

Similarly, I think that if speech recognition ever becomes genuinely useful, it'll be because somebody gave up trying to make a computer understand English (which not even a human can do 100% correctly!), and instead tried to make computer-understandable-english. I'm imagining special characters stated with whistles and clicks. All commands defined in terms of phonemes instead of defining them in terms of written statements and attempting to recognise indirectly. Lots of short, 1-2 syllable non-words used for commands.
Poets are way ahead of you. Repeat after me:
Waka waka bang splat tick tick hash [...]

Windows 7 by Anonymous Coward · 2010-05-03 09:19 · Score: 3, Interesting

I've been using VR in Win7 for a few weeks now. I can honestly say that after a few trainings, I'm near 100% accuracy. Which is 15% better than my typing!

Re:Windows 7 by adonoman · 2010-05-03 09:29 · Score: 3, Informative

People underestimate the value of training - we do it subconsciously when we meet people with different accents or vocal tones. At first people are hard to understand, but given an hour or so talking to someone, you eventually stop noticing their accent. Windows 7 seems to do a really good job at learning from use (it learns even without explicit training when you make corrections). I have windows 7 tablet and the voice recognition is impressive. Its handwriting recognition is even better than mine when it comes to my writing (it benefits from knowing the directions and order of strokes) - I just scratch out something vaguely resembling something I want to write and it seems to recognize it almost 100% of the time.
Re:Windows 7 by thePowerOfGrayskull · 2010-05-03 11:01 · Score: 1

I've been using VR in Win7 for a few weeks now. I can honestly say that after a few trainings, I'm near 100% accuracy.
That's great! But how's the computer's accuracy coming along?

Not Dead Yet by Shidash · 2010-05-03 09:20 · Score: 2, Insightful

I doubt it is completely dead. I have yet to hear it from the researchers working on AI. I work in affective computing, so I am thinking that it is possible that the missing component could be emotion or another way to increase the understanding and ability of computers to learn. In addition, even if it is not possible to increase speech recognition capabilities in this model of computing, in another model of computing this and more would be possible. I am not believing it until I hear it from researchers who have tried most possible options for improvement.

World model by Anonymous Coward · 2010-05-03 09:20 · Score: 2, Informative

Speech recognition mechanisms/algorithms are not entirely
the problem. What needs to back them up is called a "world
model," and, as the name implies, this can be large and open
ended. Humans being able to correct spoken/heard errors
on the fly is because of having an underlying world model.

Mod parent up by idiot900 · 2010-05-03 09:21 · Score: 2, Informative

Would that I had mod points today.

The above is a valid English sentence and a poignant example of how difficult it is to parse language without knowledge of semantics.

Re:Mod parent up by x2A · 2010-05-03 09:27 · Score: 4, Interesting

There's nothing special about computers though, people have to do that with other people... lets not kid ourselves into thinking that humans are immune to misunderstandings. No, the more you get to know someone, the way they think and express theirselves, the better you can become at communicating with them. Different words to different people have different connotations. It can take a lot of work to get all these down, and it'd be no different with a computer. For effective communication, you'd train and build up a common language with it, that might seem nonsense to outsiders... and I, for one, welcome this.

--
The revolution will not be televised... but it will have a page on Wikipedia
Re:Mod parent up by ground.zero.612 · 2010-05-03 09:28 · Score: 0, Insightful

What about the simple fact that conversation itself is a learning process?
You learn the extent of your audience's comprehension among other things. How can a computer be programmed to recognize everything when we lack a sufficient model to base it on?
There is a point in conversation when a sensible human being will recognize they are not getting their ideas through, and simply give up and say "never mind".

--
"Be prepared, son. That's my motto. Be prepared." --Joe Hallenbeck
Re:Mod parent up by gyrogeerloose · 2010-05-03 09:36 · Score: 1

The above is a valid English sentence and a poignant example of how difficult it is to parse language without knowledge of semantics.
Although it's either lacking in punctuation or using non-standard capitalization.
Then again, maybe he's invoking both the large mammal and the eponymous city in New York?

--
This ain't rocket surgery.
Re:Mod parent up by Anonymous Coward · 2010-05-03 09:40 · Score: 1, Informative

Hence why some of the words are capitalized.
Re:Mod parent up by RockoTDF · 2010-05-03 09:58 · Score: 1, Interesting

You raise a good point about learning. A problem with AI researchers is that they are scared of neural networks for reasons that have been solved since the 1980s, and are stuck with expert systems or other "symbol manipulating" programs. The problem with these programs is that they *suck* at learning. I really think that if the AI community looked at neural nets more often they would get closer to figuring this language thing out. With billions and billions of sentences it is hard to create a good system using the aforementioned techniques.

--
There is more to science than physics!

www.iomalfunction.blogspot.com
Re:Mod parent up by Antiocheian · 2010-05-03 10:00 · Score: 4, Insightful

Not necessarily. Speech recognition doesn't fail when it can't figure out elaborate grammatical constructs and lexical ambiguities. Speech recognition fails because it can't figure out simple sentences in conditions humans can.
Re:Mod parent up by brian_tanner · 2010-05-03 10:16 · Score: 5, Interesting

I think you're probably about 10-20 years out of date with your criticism. AI these days is *all about* statistical machine learning which is *all about* data and not about formal or expert systems at all. This is what Google and others are doing. The AI you are describing is from the late 80s and early 90s.

Neural networks are part of the story, but many of the ideas from ANNs have been improved upon when more structured settings are available. There is actually a resurgence right now in deep neural network though.
Re:Mod parent up by zegota · 2010-05-03 10:26 · Score: 2, Insightful

Interestingly enough, a computer would likely parse that sentence correctly, while nearly any human speaker (not familiar with the sentence) would think it's a nonsense phrase.
Re:Mod parent up by __aasqbs9791 · 2010-05-03 10:35 · Score: 1, Insightful

You are exactly right. I've often said no two people actually speak the same language. They just sound very similar sometimes.
Re:Mod parent up by Known+Nutter · 2010-05-03 11:19 · Score: 5, Informative

Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.

--
Beware of the Leopard.
Re:Mod parent up by Anonymous Coward · 2010-05-03 11:44 · Score: 2, Interesting

The main difference right now between human speech recognition and computer speech recognition is how the results are handled.
If I said "I had a hard time staying a wake", both a person and a computer would misunderstand and think I said "I had a hard time staying awake." However, if it was in the middle of a discussion of funeral ceremonies I had conducted, both a person and a computer could figure out what I really meant.
A person would likely hedge their response though, as either of the two meanings would be possible -- they'd probably respond with laughter and judge my physical reaction to that to identify which sentence I meant -- or, they might make some leading comment that forced me to add context.
Computers however, are expected to "know the answer" with no further cues, and as such, are designed to "best guess" between the two options. They're probably better at this than a person would be in the same situation -- especially if the person didn't know what the verb "to stay" actually meant, or what a "wake" was. People give many context cues based on tone and non-verbal interaction that a computer is just never designed to pick up on. Added to the fact that tonal cues are extremely tribal, and the complexity balloons.
For artificial verbal content recognition to really take off, the computer needs to be trained not only on words, grammars, and other parts of speech and lexical context, but also on tribal uses of tone -- most people have a pretty good grasp of the tones used in their own "tribe", and can identify many of the neighbouring "tribes" to the extent that they know what other cues to look for to complete the context. If a computer was trained in all the major inflections for all systems of language in the world, it would likely be better than most humans in a random sampling of sentences.
A Chinese ESL individual who learned English in Alabama would have an extremely difficult time understanding what someone from Newfoundland or the Hebrides was trying to say to them -- but a computer properly trained should be able to translate between the two with no difficulty.
Re:Mod parent up by gyrogeerloose · 2010-05-03 11:55 · Score: 1

Thanks. Interesting.

--
This ain't rocket surgery.
Re:Mod parent up by repapetilto · 2010-05-03 12:21 · Score: 2

Actually I went to Buffalo one time to try to get a picture of this occurring to put on the wikipedia page. Its harder than you'd think since the skyline isnt that huge and buffalo do alot of nothing most of the time. But here's one of Buffalo buffalo about to buffalo Buffalo buffalo that's thinking about buffaloing Buffalo buffalo.
http://tinypic.com/r/xcqa06/5
Re:Mod parent up by sleepy_sanchez · 2010-05-03 12:25 · Score: 1

Mod parent up. When you have lots of data, you don't have to build any "expert" knowledge into a learner.
Re:Mod parent up by Daengbo · 2010-05-03 12:37 · Score: 1

This is the kind of problem that I think can only be solved by massively parallel computing -- the kind we're not going to see on a personal level for a very long time -- paired with a huge amount of data to draw from. In other words, it's a Googly kind of problem. Other, similar problems like photo or face recognition lead me to believe that fighting the always-connected (cloudy) movement is probably the wrong way to go if we want these things.

--
Put identity in the browser.
Re:Mod parent up by Opyros · 2010-05-03 12:38 · Score: 1

Nonsense! For example, a real human could never mishear the phrase "guide dog" as "gay dog" and refuse to let a dog into a restaurant.
Re:Mod parent up by jpate · 2010-05-03 13:45 · Score: 3, Insightful

When you have lots of data, you don't have to build any "expert" knowledge into a learner.
This isn't really quite so clear cut. Feature engineering, model structure, model training techniques, and so on all bias statistical learners towards different parts of the hypothesis space. Hidden markov models (the standard in speech recognition) clearly constitute a data-driven approach, but usually they predict diphones (which appreciates the transitions between speech sounds) rather than phones themselves. That is, "cat" is recognized not by predicting a [k] followed by an [ae] followed by a [t], but (among other things) by a [k-ae] transition followed by a [ae-t] transition. This is a very direct way of encoding expert linguistic knowledge that speech sounds are pronounced differently in the context of other sounds. Think about where your tongue touches the top of your mouth in "keen" compared to "can."
Re:Mod parent up by arth1 · 2010-05-03 13:55 · Score: 3, Insightful

If I said "I had a hard time staying a wake", both a person and a computer would misunderstand and think I said "I had a hard time staying awake."
You give computers way too much credit.
More likely it would think you said "Dear aunt, let's set so double the killer delete select all".
My experience with telephone Voice Rejection Systems is that they get what you say wrong more often than not, especially if you have a deep voice.
Re:Mod parent up by Anonymous Coward · 2010-05-03 14:08 · Score: 0

My experience with telephone Voice Rejection Systems is that they get what you say wrong more often than not, especially if you have a deep voice.
telephone Voice Rejection System = one computer Google Voice Rejection Systems = 1000+ computers
Re:Mod parent up by Jeremi · 2010-05-03 14:44 · Score: 3, Funny

Nonsense! For example, a real human could never mishear the phrase "guide dog" as "gay dog" and refuse to let a dog into a restaurant.
Well to be fair, understanding Australians is an order of magnitude more difficult than understanding English speech.
(ducks)

--

I don't care if it's 90,000 hectares. That lake was not my doing.
Re:Mod parent up by Anonymous Coward · 2010-05-03 15:55 · Score: 1, Funny

(ducks)
quack?
Re:Mod parent up by cerberusss · 2010-05-03 18:42 · Score: 1

lets not kid ourselves into thinking that humans are immune to misunderstandings.
I can attest to this. When I'm in the pub, I often have trouble parsing simple sentences such as "I have a boyfriend" or "I'm married".
Come to think of it, I could actually use a computer in such a case, who'd parse these sentences, understand them, and then in a loud volume scream "NEXT!".

--
8 of 13 people found this answer helpful. Did you?
Re:Mod parent up by 5plicer · 2010-05-03 22:03 · Score: 1

There is actually a resurgence right now in deep neural network though.
Here's a Google Tech Talk on the subject of deep belief nets: http://www.youtube.com/watch?v=AyzOUbkUf3M
The same prof (Geoffrey Hinton) also has a video tutorial here: http://videolectures.net/mlss09uk_hinton_dbn/

--
The bits on the bus go on and off... on and off... on and off...
Re:Mod parent up by bkaul01 · 2010-05-04 01:11 · Score: 1

That sentence can be made to fit some rules of grammar, but I challenge you to find any English speaker who considers that to actually be a grammatical sentence. It's an example of rules not being capable of fully defining whether a group of words constitutes a valid English sentence, since it clearly is not, despite what some rules may indicate. That, of course, is the key problem with speech recognition: language isn't governed by a neat set of simple rules that computers can be programmed to understand.
Re:Mod parent up by treeves · 2010-05-04 10:30 · Score: 1

I remember seeing a sign on the side of a highway in Eastern Washington that said "Family Hunting Club". I always enjoyed that one.

--
...the future crusty old bastards are already drinking the Kool-Aid.
Re:Mod parent up by Anonymous Coward · 2010-05-05 13:38 · Score: 0

a computer properly trained should be able to translate between the two with no difficulty.
Yeah, the only difficulty is that "properly trained" bit. Nobody knows how to do that.
It seems there isn't any easy shortcut. Programming a dumb computer to imitate intelligence doesn't seem to be any more feasible than creating artificial intelligence.
I'm not holding my breath for this "singularity" thingy all the self-described atheists seem to have chosen as their Jesus substitute.
Re:Mod parent up by RockDoctor · 2010-05-08 00:56 · Score: 1

fighting the always-connected (cloudy) movement is probably the wrong way to go if we want these things.
You're forgetting a tiny, but non-trivial fact : if your connection breaks, for what ever reason, a system that depends on the connection breaks as well. So you cannot use connection-dependent architectures for important services unless you make an absolutely reliable connection. That doesn't mean a high-availability connection ; that means an absolutely reliable connection. Your connection only goes down at pre-planned times, with you being certain (not "likely", "certain") to have enacted your plans for living without that critical system.
You might have a problem envisaging this sort of world, but that is a failure of your imagination, not of reality. As an example, it is still routine for me to spend weeks or months working in remote locations where no internet connection is available at all. No Internet, not "limited to 14.4kbps modem speeds", but no internet. Our software slaves at work tried proposing that we do our software licensing using an online license server, but we've had to slap them around the face over that one, which is costing us around a £100/seat for hardware dongles. We'd love to be able to assume or require an internet connection for our software to run, but our work environment doesn't allow us to assume or require that.
Another real issue that using a "cloudy" architecture will have to be tested against will be latency. If your system requires getting an answer back from "the cloud" in (say) 1 second (which is going to provide a pretty shitty "real time" experience - try it), then you have built a system that can't be used more than approximately 150,000 km from the Earth's surface. (Actually the usable limit would be a lot lower than that - I'm allowing no time at all for "the cloud" to do it's immense calculations and lookups.)
In another thread on Slashdot, someone was asking me why the Deepwater Horizon (the oil rig that blew out in the Mexican Gulf last week, killing 11 people ; I work in the drilling industry) didn't have a remote control panel for it's BOP stack ; by remote they meant "off the rig". I've still got to get back to them, but I think the idea is just shockingly ignorant - the questioner really seriously doesn't understand the concept of losing communications. But when you're planning safety-critical systems, you dare not be anything other than screamingly paranoid. I'd be very reluctant to work on a rig where someone can close the shear rams from a passing boat. Equally, I'd be pretty unconfident of the average toolpusher remembering the 14 digit password needed to activate the blowout protectors - it's not the sort of thing that's in their skill set.
When (it is "when", not "if") a terrorist group manages to get a bomb into a major network hub then you will see an awful lot of people having the unpleasant experience of no having communications. Or maybe you'll fee it when there is another power outage (you did know that telephone and presumably internet providers spend a lot of money on their own uninterruptible power supply systems?) that lasts long enough to take out communications.

--
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
Re:Mod parent up by Daengbo · 2010-05-08 01:45 · Score: 1

Wow. You took "fighting the always-connected (cloudy) movement is probably the wrong way to go if we want these things" and came back with that response about how I don't understand connectivity (or the lack of it). Nowhere did I say "hey, we all need these services, and we all should be doing them through service providers."
Sure, there are plenty of places where you either don't get service at all or it's sporadic: in many cases, these are the same places that don't have access to have electricity and fresh water, and people or companies that want to live or work there provide their own. I, too, have worked in places where I was only allowed one quick-burst text communication per day over radio, and where no water or electricity was available. I get it. We need special equipment for that. I carried a hundred pound of batteries and solar cells on my back in that situation.
Most of the populated, developed world isn't like that.
I'm not really sure whether your point was that we shouldn't use "always connected" methods at all, that we should be careful, or that I was ignorant for even thinking about it (your post kind of sounded like the last one), but hobbling the whole world while personal computing catches up to what we want it to be for the sake of some outlying cases that wouldn't be able to take advantage of the cluster technology sounds like requiring everyone in developed areas to have their own water treatment plant until everyone on Earth gets access to clean, running water.
But, yeah, we should be careful in how it's implemented.

--
Put identity in the browser.

Focus, Dammit. by Jeremiah+Cornelius · 2010-05-03 09:24 · Score: 1

"What, all of us?"

--
"Flyin' in just a sweet place,
Never been known to fail..."

Re:Focus, Dammit. by Philip+K+Dickhead · 2010-05-03 09:50 · Score: 1

The sixth sheik's sixth sheep's sick.
[so, say said sentence sextuply...]

--
"Speaking the Truth in times of universal deceit is a revolutionary act." -- George Orwell
Re:Focus, Dammit. by linhares · 2010-05-03 13:40 · Score: 4, Funny

"she helped my uncle jack off a horse"
Re:Focus, Dammit. by FatdogHaiku · 2010-05-03 16:09 · Score: 1

"she helped my uncle jack off a horse"
Film at 11... ah hell, watch whenever you want.
http://www.youtube.com/watch?v=uJyrAWmpI8s

--
You have the right to remain sentient. If you give up the right to remain sentient, you will be elected to public office
Re:Focus, Dammit. by Anonymous Coward · 2010-05-04 12:01 · Score: 0

ok so the use of a lower-case "j" was ... intentional?

Conlangs by izomiac · 2010-05-03 09:24 · Score: 1

I've wondered why we can't meet computers half-way. Just design a constructed language that avoids the unsolvable problems. If operating computers by speech is truly better then learning the language would be akin to learning to type.

OTOH, if it's an attempt to simplify computing for those who don't wish to learn, well, that's an impossible task. The problem lies in the fact that such people don't give explicit commands, and even humans take quite a bit of intuition to figure out what they're implying.

Re:Conlangs by icebraining · 2010-05-03 11:40 · Score: 1

But what about dictating emails or other documents? It's not only about controlling the PC.

--
Dilbert RSS feed

Time flies like an arrow fruit flies like a banana by GuyFawkes · 2010-05-03 09:24 · Score: 2, Insightful

Having said that, Dragon works fairly well, provided you modulate your speech.

If you want a laugh with Dragon, turn away from the screen and talk normally, then look at what it has transcribed..

--
http://slashdot.org/~GuyFawkes/journal

Training by dominious · 2010-05-03 09:26 · Score: 1

speech recognition requires training because it lies on Machine Learning algorithms. Nobody has time to train their computer. I mean, even us humans need 2-3 years of such "training" in order to start recognizing words.

Speech recognition is higher intelligence by gurps_npc · 2010-05-03 09:26 · Score: 1

Speech recognition is a form of higher intelligence.

Intelligence is basically composed of pattern recognition, with two general categories. One) Specific pattern recognition is logic, math, etc. It requires incredibally exact matches. Yes or no. 1.0, not 1.00001. Computers are very very good at that.

Two) General pattern recognition is creativity, art appreciation, and our capacity to invent. It requires people to ignore a ton of irrelevant data and instea focus on only one aspect of identity, recognizing it despite the large amounts of irrelevant data. That tree kind of looks like a face, that falling object is like all other falling objects. Computers have always been very very BAD at this. Humans do it much much better than animals, but even a monkey is better at general pattern recognition than a computer is.

I am sure that we can make computers slightly better at speech recognition - enough to recognize all of a limited set of comand words like print, attach, email, open, run. Individual programs would have to include codes for their names and specific commands. But I think it will take a true Artificial Intelligence to recognize speech as well as a human. In fact, I would make that my Turing Test. I would also add that I don't think an intelligence built using current theory could become a true Artificial Intelligence. We would need to design a computer that is a non-determenistic device -one that does not rely soley on pure mathematical logic, but is itself based on an entirely new design. No I can't describe it - because if I could I would build one and be rich.

--
excitingthingstodo.blogspot.com

Re:Speech recognition is higher intelligence by icebraining · 2010-05-03 11:43 · Score: 1

Don't confuse "speech recognition" with semantic analysis. If you want to dictate a document, the computer doesn't need to understand the meaning of a phrase, only identify the words. It's a much easier job.

--
Dilbert RSS feed
Re:Speech recognition is higher intelligence by bar-agent · 2010-05-03 12:16 · Score: 1

Don't confuse "speech recognition" with semantic analysis. If you want to dictate a document, the computer doesn't need to understand the meaning of a phrase, only identify the words. It's a much easier job.
You'd think, but no. The computer needs to know what word and phrase set to use. You can tell it that, but then, when writing, you commonly shift from, say, a medical vocabulary to a more legal vocabulary, or you start using phrases associated with job performance, or leading up to a favor, or whatever — the point is, the computer needs to realize that you are dictating a different kind of thing now. It needs semantic info to know when you are exceeding the boundaries of one linguistic domain and moving into another.
Even aside from that, to do a really good job of dictation, the computer needs to put paragraphs, sentences, commas, and other punctuation in the right place. It needs semantic information for that, as well as an understanding of the guidelines you'd find in a style guide.

--
i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]

Since I don't have a flying car today, all is lost by liquiddark · 2010-05-03 09:26 · Score: 4, Insightful

Futurists should really learn what the word "plateau" means. The death of any given technical progression, particularly one that deals with information procesing, tends to be announced early and often, right up to the point where progress becomes meaningful again and then all of a sudden everyone saw it coming, and oh by the way where's my flying car?

Sssssh. by Allnighterking · 2010-05-03 09:27 · Score: 1

Don't tell the people actually doing it. They don't know that the author of this piece says it won't work. So they keep making it work. We don't want to upset them. Ssssh.
Speech recognition and translation is becoming a highly effective and proficient tool for the US military. You see it fit's in your iPod... and ... well translates. info here Kinda puts the knosh on this article. Speech recognition as a part of translation is a new application of the tech that is growing by leaps and bounds. 10 years ago we had to do text to text translation, now it's speech to voice. Then you have companies like Voxify,TuVox and others replacing routine call center calls with realistic voice recognition. Far from being a dead animal. It has moved from the realm of fantasy to the realm of direct application.

--

I'm sorry, I'm to tired to be witty at the moment so this message will have to do.

is there any evidence for this analysis? by Trepidity · 2010-05-03 09:27 · Score: 3, Insightful

I see a lot of claims, but not much evidence. If we're going to use perceptions and anecdotes as evidence, my impression is that speech recognition has always been considered vaguely stalled. In 2000, people didn't think much progress had been made since 1991 besides some commercialization of stuff academia already knew how to do. In 2010, this guy doesn't think much progress has been made since 2001 besides some commercialization of stuff academia already knew how to do. Yet I think some progress has been made over the past 20 years. There just haven't been any breakthroughs, which is maybe what he's expecting, given his vague suggestion that "AI", a pretty vague concept, is our hope.

I'm also skeptical that accuracy has flatlined, though it's possible that's true in some areas. My impression is that multi-speaker recognition, use of large corpora to improve accuracy, and use of language modeling to improve accuracy, have all improved over the past 10 years. Of course, not all improvements go everywhere: the speech recognition running in real-time on a mobile ARM processor is not using every possible state-of-the-art technique. The advance there is that you can run speech recognition in real-time on a mobile ARM processor at all, and get performance that was once only possible on pretty hefty workstations.

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10

Re:is there any evidence for this analysis? by RobDude · 2010-05-03 10:32 · Score: 1

I think there is a minimum success rate that you need to hit for voice recognition to be usable, but frustrating. As the technology has gotten better, we've seen more and more places trying to use it more more ways; and those new ways are more demanding and result in just barely reaching that minimum success rate.
The end result is that most users still feel frustrated.
But back in the day, you were frustrated while trying to get the VR software to understand your account number (limited to only digits). Now, you are frustrated because you called Goog-411 and it is having trouble understanding your City, State and business name.
Could be wrong, but that's just my take on it.
Re:is there any evidence for this analysis? by Anonymous Coward · 2010-05-03 11:46 · Score: 0

You are right, it hasn't flat lined, Look at the implementation Microsoft uses in Win7, it generally doesn't require a full vocabulary. It can use the phonetics of words that are not in the vocabulary to "sound out" and type what it thinks you said.
It can also determine, mid speach, the difference between commands and dictation.
Speach recognition typically only does one or the other (look up the original Dragon softwares development history). To say that the modern system flat lined is a fallacy. Most users "just give it a go" and don't train it , still a requirement, then they wonder why it performs at less than 99%.
Personally I use it for any long articles that need to be typed, 1 mistake every 100- 200 words is very acceptable, I misspell more than that myself while typing.

No it doesn't by Colin+Smith · 2010-05-03 09:27 · Score: 2, Interesting

It works great for small vocabularies on your cell phone

No. It doesn't.

It works great for small vocabularies on your cell phone if you happen to live in the same neighbourhood as the developer where "everyone talks this way". For the rest of the world, attempting to talk with a nasal American twang in order to get the phone to understand you, is shit.

--
Deleted

Re:No it doesn't by Anonymous Coward · 2010-05-03 09:46 · Score: 0

Whut are you tawking abowt boy, I mite just have ta kick yer ass.
Re:No it doesn't by icebraining · 2010-05-03 11:47 · Score: 1

Not really; My Nokia E65 recognizes Portuguese names said in Portuguese accent, just from reading the contact list, and rarely fails.

--
Dilbert RSS feed
Re:No it doesn't by Orestesx · 2010-05-03 11:56 · Score: 1

You don't gotta insult the accent to make a point about the software. There are two things I don't like, people who are intolerant of other cultures, and stuck-up Brits.
Re:No it doesn't by Daengbo · 2010-05-03 13:25 · Score: 1

Your phone was probably made in China. The software was probably outsourced to India. Only the profit stayed in the U.S., and that just went to the top people in the company.

--
Put identity in the browser.
Re:No it doesn't by codecore · 2010-05-03 15:30 · Score: 1

Then I thankfully live down the street from the developer. Do I have a nasal accent? Oh well, at least my phone understands me.

Blame startrek by onyxruby · 2010-05-03 09:28 · Score: 4, Insightful

Blame Startrek for making it look flawless. Speech recognition is just like fusion technology, 20 years away from properly working - just like it has been for the last 20 years.

-RANT- I cant stand voice recognition systems that don't at least give you an option to press a number. Especially when they are out of tune and pick up back ground noises as voice. Please, please, please - always give the option to press a number instead of having to voice everything!!

medical dictation - no go by Anonymous Coward · 2010-05-03 09:28 · Score: 1, Interesting

The radiology voice dictation transcription system at my former employer was horrible. Having to read the dictated reports was equally appalling considering there was a radiologist signing off on their accuracy, and they were certainly not completely accurate. The irony is that the things the system frequently had trouble with were simple words like "not" and recognizing quantities appropriately, whereas more complicated things such as "gastroschisis" would be dictated correctly.

I never understood it, but since I was not the radiologist, I didn't care either. I mostly was entertained by listening to them repeat the same stupid, simple word over and over trying to get the dictation system to behave, when it would have taken a fraction of the time to manually edit the document with a keyboard.

Re:medical dictation - no go by N1ck0 · 2010-05-03 10:11 · Score: 1

Real-time is okay, which is mainly why most of the big dragon medical based systems run back-end speech rec, which runs at about a 1:2, 1:3 ratio (dictation time to CPU time). Then are finally QAed by medical transcriptionists for medical accuracy. On back-end systems you can also look at completed & corrected historical information, look at the contextual information of the entire document, etc. Not to say that the real-time system is bad, its just more limited it what it can look at on the local PC (it cannot reprocess models off hours, it cannot store gigs of historical data, etc).
Of course you also tend to find a lot of systems that are using engines 4-5 years out of date too (which does tend to impact things).
Re:medical dictation - no go by Anonymous Coward · 2010-05-03 10:27 · Score: 0

Yeah, that is the thing, having a transcriptionist, or a technologist re-read everything defeats the purpose of the dictation system, IMO.

Re:Time flies like an arrow fruit flies like a ban by SomeJoel · 2010-05-03 09:28 · Score: 1

The eighties were like half as groovy as the seventies, but twice as cool as the nineties.

--
<Complete your profile by adding a signature!>

yale-in-ox-boom-i-crows-off by richdun · 2010-05-03 09:28 · Score: 1

Yay Linux! Boo Microsoft!

I win! Give me all your speech recognition monies.

Wait, what do you mean you don't believe I'm an AI? ... er, I mean ... Wait, what do you mean you do not believe I am an Artificial Intelligence?

Re:yale-in-ox-boom-i-crows-off by Anonymous Coward · 2010-05-03 10:17 · Score: 0

Hey, is there a Butts here? Seymour Butts? Hey, everybody, I wanna Seymour Butts!

Badger badgers badger Badger badgers by tepples · 2010-05-03 09:30 · Score: 1

Buffalo buffalo

Likewise, Badger badgers Badger badgers badger, badger Badger badgers. (UW taxideans harassed by UW taxideans harass other UW taxideans.) Oh, and mushroom mushroom.

Re:Badger badgers badger Badger badgers by Anonymous Coward · 2010-05-03 09:47 · Score: 5, Funny

snaaaaaaake!
Re:Badger badgers badger Badger badgers by Frnknstn · 2010-05-03 10:02 · Score: 1

Not likewise: "badger" != "badgers"

--
If it's in you sig, it's in your post.
Re:Badger badgers badger Badger badgers by tepples · 2010-05-03 12:24 · Score: 1

At least in the first couple measures of the song, the hi-hats are played in such a way as to mask whether the voice says "badger" or "badgers".
Re:Badger badgers badger Badger badgers by mogness · 2010-05-03 14:20 · Score: 0

mushroom mushroom

--
that's teh shizzle bizzle
Re:Badger badgers badger Badger badgers by Bu11etmagnet · 2010-05-03 18:22 · Score: 1

Badgers? We don't need no steenking badgers!

--
Life is complex, with real and imaginary parts.

Comment removed by account_deleted · 2010-05-03 09:30 · Score: 5, Funny

Comment removed based on user account deletion

IBM? by Darth+Snowshoe · 2010-05-03 09:32 · Score: 2, Funny

Didn't IBM a few years ago announce a big five-year-program to crack speech recognition? Whatever came of that?

Re:IBM? by PalmKiller · 2010-05-03 09:51 · Score: 1

They used it to make a better chess playing AI instead.
Re:IBM? by N1ck0 · 2010-05-03 09:55 · Score: 5, Interesting

IBM closed many of their speech research offices 1-2 years ago and transferred most of the research/data to Nuance's Dragon Naturally Speaking research.
Full Disclosure: I work for Nuance
Re:IBM? by Anonymous Coward · 2010-05-03 10:06 · Score: 0

They determined that computers are best able to process speech when the background noise consisted mostly of sounds of the Caribbean surf, with occasional chatter from hookers doing blow.
I understand research is ongoing.
Re:IBM? by Anonymous Coward · 2010-05-03 17:59 · Score: 0

Are you here at the NRC? This post is quite the joke around here.
Re:IBM? by giuda · 2010-05-06 20:21 · Score: 1

I'd like to know how this ends :D

Tea, Earl Grey, Hot by tokki · 2010-05-03 09:33 · Score: 5, Funny

How hard is it for a computer to understand the sentence: "Tea, Earl Grey, Hot"? That takes care of 90% of the use case scenarios right there. Next is "Computer, initiate auto-destruct sequence" is the next 8%.

Re:Tea, Earl Grey, Hot by Corporate+Drone · 2010-05-03 10:37 · Score: 1

"Tee, or I'll gray out!". Yep, not at all difficult to parse...

--
mmm... yeah... You see, we're putting the cover sheets on all TPS reports now before they go out...
Re:Tea, Earl Grey, Hot by noidentity · 2010-05-03 11:09 · Score: 2, Funny

Who is Earl Grey, and why do you want him hot? And stop calling me Tea!
Re:Tea, Earl Grey, Hot by maxwells_deamon · 2010-05-03 11:09 · Score: 2, Interesting

by definition the second phrase eliminates any remaining use cases after the count down finishes.
Re:Tea, Earl Grey, Hot by Anonymous Coward · 2010-05-03 11:38 · Score: 0

"Silent Countdown" accounts for another .5%
Re:Tea, Earl Grey, Hot by Anonymous Coward · 2010-05-03 13:32 · Score: 1, Funny

"This tea is grey!"
"Yes, Captain, just as you ordered."
"I most certainly did not order this!"
"Sir, I had Earl make a tea as you requested. Hot and grey."
"Why would I want that?"
"Why should I know. You said 'Tea, Earl; Grey, Hot.'"
Or, worse:
Earl with tea. Mrs. Grey looking hot.
Re:Tea, Earl Grey, Hot by martin-boundary · 2010-05-03 14:10 · Score: 3, Funny

Here I am, brain the size of a planet, and they ask me to make you tea for you. Call that job satisfaction, 'cause I don't.

Shout-outs to two idiots by Foobar_ · 2010-05-03 09:33 · Score: 5, Insightful

This blog post is retarded. The author is correlating a drop in internet news articles about Dragon NaturallySpeaking with a flatlining of speech recognition accuracy rate.

The Slashdot editor Soulskill is retarded for both not realizing this and for not reading the anonymously-submitted blog post (hmm no way it could have been the author) before approving it for the Slashdot front page. The guy is just out for more traffic to his rather pointless tech news commentary blog.

Decline of Slashdot, internet signal-to-noise ratio, get off my lawn, etc.

Try this one... by Aut0mated · 2010-05-03 09:34 · Score: 0, Offtopic

Alpha Kenny 1

Ken Lee!!! by CityZen · 2010-05-03 09:34 · Score: 1

And there's this nice meme from a couple years ago:
http://www.google.com/search?q=ken+lee
http://www.youtube.com/watch?v=_RgL2MKfWTo

no, it doesn't work on cell phones, either by swschrad · 2010-05-03 09:35 · Score: 1

this is the reason that millions of americans are faster with the thumb than Buddy Rich with the drumsticks... you can't see the finger move as they type 30 zeroes in a row to escape the mumblebots.

--
if this is supposed to be a new economy, how come they still want my old fashioned money?

Not free by em0te · 2010-05-03 09:36 · Score: 0, Offtopic

They gave this information to the "public" by handing it over to the LCD? It costs $150 to obtain a non-commercial license from LCD. This is ridiculous but i guess money is the best way to control information.

Data Input by fermion · 2010-05-03 09:37 · Score: 1

Automated data input is always tricky. Basically the technology is type it on a keyboard or use voice recognition software or dictate and pay someone to type it in a computer. When people talk about voice recognition they are think the it is competing against typing it in yourself, but it most is competing against paying someone else to type it in.

My understanding, from the people that use Dragon, it competes well against paying someone else to type. First it is a couple of orders cheaper. Second, if you pay someone to type, you still have to read and edit, and dragon is accurate enough. Of course you have to train yourself to use the technology, but that is the same with any technology. It is naive to think that we don't make subtle and not so subtle changes in ourselves so that we can benefit from the technology.

I think speech recognition is going to expand in the future. Beyond the dictation process, there is also simple commands. I don't use the voice controls on the iPhone, but it seems something that people like. I have used the voice controls on my Mac. Furthermore, i can certainly imagine a time when my fingers are not so limber that I might depend on something like Dragon.

I don't see the technology so commoditized that MS includes it in the 2015 version of MS Office, but I do have beilieve there is always room for improvement.

--
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black

Dear Aunt, by IorDMUX · 2010-05-03 09:37 · Score: 1

Any discussion of the history of speech recognition is incomplete without a reference to Microsoft's famous Windows Vista "double the killer delete select all" botch-up: http://www.youtube.com/watch?v=klU2zt1KdUY

--
>> Standing on head makes smile of frown, but rest of face also upside down.

Re:Dear Aunt, by Joce640k · 2010-05-03 23:02 · Score: 1

Can't wait for people to start controlling power stations with one of those...

--
No sig today...

Time flies by tepples · 2010-05-03 09:37 · Score: 1

"Time flies like an arrow; fruit flies like a banana."

Is a time fly an archer or a DDR player?

Crap, crap, crap into the toilet bowl by tepples · 2010-05-03 09:39 · Score: 1

PS1 music game Parappa the Rapper turns "There's a bathroom on the right" into a rap song.

Forget speech recognition.... by puppetman · 2010-05-03 09:39 · Score: 2, Funny

I'd settle for a grammar checker. From the fine summary:

"Even where data are lush"

A good one would have saved this summary from sounding stupid.

Re:Forget speech recognition.... by jaavaaguru · 2010-05-03 10:41 · Score: 2, Informative

There is nothing wrong with that phrase.

--
Follow me
Re:Forget speech recognition.... by Pfhorrest · 2010-05-03 10:42 · Score: 3, Insightful

The word "data" is a plural countable noun. "Datum" is the singular form thereof. Plural countable nouns take the copula "are". Singular countable nouns take the copula "is". The sentence you quoted was thus grammatically correct: a datum "is", but data "are".

Though I admit, the treatment of "data" as a mass noun (the likes of which take the copula "is" as well) is common enough that it did sound jarring to my own ear, even knowing it was technically correct.

--
-Forrest Cameranesi, Geek of all Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
Re:Forget speech recognition.... by Anonymous Coward · 2010-05-03 11:13 · Score: 0

I'd settle for a grammar checker. From the fine summary:
"Even where data are lush"
A good one would have saved this summary from sounding stupid.
What's wrong with that grammar? It seems you want a snobby writing checker, not a grammar checker.
Re:Forget speech recognition.... by kindbud · 2010-05-03 11:33 · Score: 4, Informative

The word "data" pluralizes "datum." "Data are lush" correctly pluralizes the singular form of the sentence.
Now who sounds stupid?

--
Edith Keeler Must Die
Re:Forget speech recognition.... by Anonymous Coward · 2010-05-03 11:49 · Score: 0

Umm.... data is plural. Datum is singular.
Re:Forget speech recognition.... by General+Wesc · 2010-05-03 13:03 · Score: 1

That's grammatically correct.
Re:Forget speech recognition.... by normaldotcom · 2010-05-03 13:09 · Score: 1

data[dey-tuh, dat-uh, dah-tuh] – noun: a pl. of datum.
Re:Forget speech recognition.... by Anonymous Coward · 2010-05-03 16:13 · Score: 0

I prefer being non-jarring to being technically correct.

Did you dictate your post? by Anonymous Coward · 2010-05-03 09:41 · Score: 0

"Ssssh"? "it fit's in your iPod"? "puts the knosh on this article"? "Far from being a dead animal. It has moved"?

Apparently your speech recognition software still needs a bit more R&D. In case you can correct it for the future, it should probably be "Shhhh", "it fits in your iPod", "puts the kibosh on this article", and "Far from being a dead animal, it has moved".

dom

Wrong problem by slasho81 · 2010-05-03 09:45 · Score: 1, Interesting

There won't be any meaningful development in speech recognition (or machine translation) until context is taken seriously. Context is an inseparable part of speech.
Right now the problem being solved is audio->text. This is the wrong problem, and why the results are so lame. The real problem is audio+context->text+new context. This takes some pretty intelligent computing and not the same old probabilistic approaches.

Re:Wrong problem by Anonymous Coward · 2010-05-03 15:55 · Score: 0

Good idea. However, why not start by picking the low hanging fruit? You can get spatial context for free (GPS). You can get the sex and age of the speaker almost for free. Those two variables should help a lot.
So, for example, if the location is an office building and the speaker is a woman aged 25 to 65, the sentence "Ask for *ndy." probably refers to a man named Andy. If the location is a shopping district and the speaker is a child age 6-15, the sentence "Ask for *ndy." probably refers to a wish for someone to ask someone to buy candy.
Re:Wrong problem by Anonymous Coward · 2010-05-03 20:57 · Score: 0

Right now the problem being solved is audio->text. This is the wrong problem, and why the results are so lame. The real problem is audio+context->text+new context. This takes some pretty intelligent computing and not the same old probabilistic approaches.
I thought convolutional codes (in telecommunications) take context into account. It seems like a good starting point for generalization of context-sensitive decoding. First line of decoding would be phoneme extraction, which is fairly accomplished so far, second line: matching long strings of phoneme-substitute symbols against known phrases and combinations, and finally: Bayesian selection of candidate phrases. New context is hard to infer, but I guess some tricks could be used, similar to humans who are only pretending to listen (keywords, tone).
Most importantly, AI should ask for clarifications if ambiguity is too high.
Re:Wrong problem by Anonymous Coward · 2010-05-04 04:37 · Score: 0

But does it have a bunch of candidate subjects that it tries to assign the speech to?
(Such as business transaction, biology lab discussion, resume writing, everyday conversation about what to have for dinner.)
Human thinking is both structured and unstructured/pattern detecting. We might have to model both those modes in order to create AI that is anywhere near human-level.

The sixth sheik's sixth sheep's sick. by Anonymous Coward · 2010-05-03 09:48 · Score: 0

Somehow Slashdot chose an apt fortune: "The sixth sheik's sixth sheep's sick." Let me know how your speech recognition software does on that sentence!

dom

Maybe we just need to speak binary by mwheeler · 2010-05-03 09:50 · Score: 1

Maybe we just need to speak binary.

Totally Not Dead Yet by RingDev · 2010-05-03 09:50 · Score: 4, Interesting

A few years back I worked for an awesome company that did a IVR (interactive voice recording) systems.

We had voice driven interactive systems that would provide the caller with a variety of different mental health tests (we work a lot with identifying depression, early onset dementia, Alzheimer, and other cognitive issues.

The voice recognition wasn't perfect, but we had a review system that dealt with a "gold standard". I wrote a tool that would allow a human being to identify individual words and to label them. Then we would run a number of different voice recognition systems against the same audio chunk and compare their output to the human version. It effectively allowed us to unit test our changes to the voice recognition software.

Dialing in a voice recognition system is an amazing process. The amount of properties, dictionaries, scripting, and sentence forming engines are mind blowing.

Two of the hardest tests for our system were things like: Count from 1 to 20 alternating between numbers and letters as fast as you can, for example 1-A-2-B-3-C. And list every animal you can think of.

The 1-A-2-B was killer because when people speak quickly, their words merge. You literally start creating the sound of the A while the end of the 1 is still coming. It makes it extremely difficult to identify word breaks and actual words. And if you dial in a system specifically to parse that, you'll wind up with issues parsing slower sentences.

The all animals question had a similar issue, people would slur their words together, and the dictionary was huge. It was even more challenging when one of the studies that was nation wide. We had to deal with phonetic spellings from the north east coast and southern states accents. What was even worse was that there was no sentences. We couldn't count on predictive dictionary work to identify the most likely word out of those that would match the phonetics.

That said, getting voice recognition to work on pre-scripted commands and sentences was pretty easy.

And I can only imagine the process has been improving in the years since. Although we were looking into SMS based options, not for a dislike of IVR, but because our usage studies with children were showing most of them were skipping the voice system and using the key pad anyway. So why bother with IVR if the study's target demographic was the youth.

-Rick

--
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs

Re:Totally Not Dead Yet by bar-agent · 2010-05-03 12:05 · Score: 1

Count from 1 to 20 alternating between numbers and letters as fast as you can, for example 1-A-2-B-3-C.
I stall out at G. I have to start running through the whole alphabet to get to the next letter.

--
i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]
Re:Totally Not Dead Yet by Anonymous Coward · 2010-05-03 19:14 · Score: 0

So THAT's what all these IVR (interactive voice recognition) automated attendants are doing. They are trying to figure out what our mental state is before we get to the technical support person.
Well, I'll tell you what my mental state is by the time I get to the technical support person... I'm DAMN MAD!
There. No more need for IVR automated attendants.

Re:Time flies like an arrow fruit flies like a ban by Anonymous Coward · 2010-05-03 09:54 · Score: 0

Once we get out of the eighties, the nineties are gonna make the sixties look like the fifties.

Best example: Google text captions. by Ancient_Hacker · 2010-05-03 09:55 · Score: 1

When you have a minute, go to YouTube and bring up an old Star Trek episode (not the CBS ones with very loud commercials).

Then turn on Google captions. More fun than a barrel of Rigelian monkeys!

About every third sentence gets a close or exact rendering, but oh, the other two! I should sue them for laugh-muscle strains.

Watermelon Box by NReitzel · 2010-05-03 09:56 · Score: 4, Insightful

Long ago - decades, before Bill Gates was invented, a lot of research went into what would be required for actual voice recognition.

A counterexample was given, about an engineering marvel (of the time) that would recognise when someone said the word "watermelon". For a long time, people in the industry assumed that the path to voice recognition consisted of building more and better watermelon boxes.

Several authors, including Alan Turing himself, argued that actual voice recognition could never be accomplished with a large array of watermelon boxes. Current VR software divides input into a series of hyperplanes, and attempts to build a best match from the classification tree.

THis is the 2010 version of the watermelon box.

Real voice recognition won't be practical until the input is parsed, matched against context, and structured much akin to diagramming a sentence in those old English (or other) classes. In short, matching against a vocabulary is trying to solve an exponential problem with a (large) polynomial engine.

It won't be until the computer actually understands what is said that VR is likely to be practical in a global sense.

As a person who has been building computer systems for 35 years, it bothers me to see a huge body of research done into subjects like these ignored, because someone thinks that none of it applies to PC's.

--

Don't take life too seriously; it isn't permanent.

Re:Watermelon Box by Anonymous Coward · 2010-05-03 11:19 · Score: 0

What the fuck is a watermelon box?
Re:Watermelon Box by frank_adrian314159 · 2010-05-03 11:27 · Score: 2, Informative

People were doing symbolic context recognition in the 60's-80's (look up frames). This went out of vogue with the use of neural nets and statistical recognition in the late 80's and continues up to this day. The problem is that getting better now probably needs new probabilistic models for symbolic context recognition, feeding up from statistical recognition of phonemes and words, feeding forward to later phrases being parsed. This would require either two teams, or one team with expertise in both areas. And, in the past, the symbolists fought the statisticians like dogs fought cats. The bottom line is that (a) we can do better, but (b) it will be more expensive to fund, and (c) requires academics to admit that their deep specialization in a given area does not provide the entire solution. Plus, grant writers like NSF, DoD, etc. are not often interested in funding large integrated projects, funding smaller, focused projects to reduce risk and to spread research finds around more broadly. As such, I predict the level of this technology to be stalled for an indefinite time (or until someone else does it).

--
That is all.
Re:Watermelon Box by MadUndergrad · 2010-05-05 18:30 · Score: 1

This.

its getting better but by luther349 · 2010-05-03 09:57 · Score: 1, Interesting

speech software has been evolving at a steady pace. but the issue isn't that its the fact 90% of the users out there don't use it. if you live in a loud place with kids or other noise it will not work well. windows 7 has built in speech software and how many people use it. i played with the latest dragon speech software and i gotta admit its very good even without traning it. i did emails with it without any issue. but as i said speech software is more a toy then anything usefull. as people said it probly will have a good use on a cell phone rather then on a pc being it would be a easy way to chat rather then using the cell phones keypad. .

Why should anyone care? by Anonymous Coward · 2010-05-03 09:57 · Score: 0

Most people won't benefit from speech recognition software in any manner that is critical, or might automate the mundane to the point that their lives might yield great benefit to mankind overall. If there's anyone out there, aside from the physically handicapped, who thinks they need speech recognition software to perform any task that isn't repetitive and it truly important for the greater good, I assert that it would be better for all if they had proteges who could learn from them and not machines facilitate isolation.

There is also the problem of meaningful work from those who might serve as assistants, and automation for the sake of automation didn't do the Luddites any good, albeit notwithstanding the motivation to rebel against already cruel and inhumane conditions of employment.

Cod am pizza ship by Trogre · 2010-05-03 10:01 · Score: 2, Funny

Obligatory UF

--
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife

Correction... by bartwol · 2010-05-03 10:03 · Score: 1

One estimate puts the number of possible sentences at 10^570.

Errr...I believe the precise number would be 10^571.

Rest in Peas by CODiNE · 2010-05-03 10:05 · Score: 2, Interesting

I know it's just an imaginary example of how bad text-to-speech is... but it is realistic and disappointing.

Even an idiot like me knows what Markov chain is. Perhaps the standard voice apps are so entrenched they're not recoding their apps to take advantage of huge leaps in memory capacity compared to when they first started selling.

--
Cwm, fjord-bank glyphs vext quiz

Where are the data? by hebcal · 2010-05-03 10:09 · Score: 1

The article doesn't really make the case. There are two interesting charts, and one is BS (measuring Google News hits for Dragon). He is trying to draw a deep result from the fact that the NIST data he cites ends in 2002. What happened in the last eight years? Lots of arm-waving in the article, but no hard data.

Re:Where are the data? by Anonymous Coward · 2010-05-03 10:55 · Score: 0

The speech group at NIST still does speech eval workshops, and coordinates evaluations for programs run by other agencies (DARPA, NSA, probably others).
There's plenty of research going into TTS/ASR systems, across languages and speech genres. The guy who wrote the article doesn't know crap about what he's talking about; I suspect he's just out to get pageviews.

I'd like to say I knew... by interval1066 · 2010-05-03 10:14 · Score: 1

...this would turn out to be the case. I should have published a book on how little this would work. But I did have my doubts, way back in the 90's. It came down to a simple question for me; could a speech recognizer ever "get" irony. I came to the conclusion that it would be very difficult at that time. Guess I was right.

--
Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'

Nanotechnology next? by mangu · 2010-05-03 10:25 · Score: 1

I certainly hope they have perfect voice recognition systems before they have the perfect nanoassembler.

Otherwise we will start seeing a lot of eleven inch pianists everywhere.

Yes, No, Yes. by fahrbot-bot · 2010-05-03 10:26 · Score: 1

I don't know how anyone else feels about this, but I wish the use of voice recognition in company phone systems would die. Seriously, please just let me press 1 for Yes and 2 for No. And stop programming them to be conversational with phrases like, "Let me see if I have this right..."

--
It must have been something you assimilated. . . .

Re:Yes, No, Yes. by Voyager529 · 2010-05-04 02:42 · Score: 1

A few thoughts...
1.) www.gethuman.com - type in a company name, and follow the instruction to get an actual person.
2.) I feel that button-pressing and voice recognition can and should coexist. I remember having an issue with my old iPhone where the screen didn't recognize button presses properly for 4, 5, and 6, so i saved myself many a headache being able to speak. just in general, there are a multitude of cases where button pressing isn't practical.
2b.) Voice recognition would work better with a few minor tweaks. It'd have to be more natural, since menu selections with voice really aren't much simpler than repeating the menu selection. If voice systems are to yield an advantage over "press 1" menus, they have to be able to accurately parse whatever the user says. Also, each system seems to be different in when it will allow you to speak - some systems don't mind if you "interrupt" the recording, while others will completely ignore you until they decide they're ready to listen to you. This needs to either be made consistent, or stated at the beginning.
3.) I think the conversational phrases are annoying too, but I think that's an uncanny valley thing. The voice is technically human, but its lack of ability to react to what you say beyond what amounts to an if/then statement makes those phrases seem completely empty and bother us.
Re:Yes, No, Yes. by fahrbot-bot · 2010-05-04 05:11 · Score: 1

1.) www.gethuman.com - type in a company name...
2a.) I feel that button-pressing and voice recognition can and should coexist.
2b.) Voice recognition would work better with a few minor tweaks.
3.) I think the conversational phrases are annoying too, but ...

Good points all, though here are some follow-up / continuation comments, some of which you mentioned.
The GetHuman site is useful, but not companies all are listed and some of those that are not don't have humans waiting. For example, OptOutPrescreen is completely automated. Furthermore, it doesn't seem to recognize button-presses past a certain point. Thank GOD the identical functionality is available via the web - as mentioned by the voice system.
Along that last theme, yes they should co-exist, but many sites that support voice recognition seem to be dropping button interaction. For many things, buttons are faster. Case in point, buttons are always faster for menu selection and, almost universally, support type-ahead. With voice systems, one must usually wait for the question to be completely asked before answering, or if it supports interruption, repeat your answer - or repeat it anyway whenever it doesn't understand you (rinse and repeat).
Lastly, button systems are more private. Granted one should be mindful of one's surroundings before placing a potentially sensitive call, but you can see where I'm going with this... "To access your STD test results, say HERPES."

--
It must have been something you assimilated. . . .

I beg to differ... by roc97007 · 2010-05-03 10:36 · Score: 1

Speech recognition does *not* "work great" for cell phones. Every new phone and/or new firmware upgrade, I try again to teach my phone to understand me, and each time I get embarrassed the first time I try to use it in public. The experience is similar to William H. Macy's in Wild Hogs.

"Call mother-in-law"

"Did you say, 'Hot Mothers in Slaw'?"

"Call mother in law!"

"Did you say 'my brother's my pa'?"

"Call. Mother. In. Law."

"Did you say, 'Call Hooters'?"

"What?"

"Did you say, 'What'?"

I do not know why this is so, but speech recognition does not work reliably enough to be other than a toy in any application I've ever seen. It exists for the amusement of those watching the poor sucker trying to use it. Sometimes I imagine a bunch of programmers in Taiwan laughing their asses off.

--
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.

Re:I beg to differ... by mbone · 2010-05-03 23:49 · Score: 1

I assume most corporate speech recognition is intended to get people to hang up in frustration, thereby saving the cost of the call. Same thing with entering in account information. If you keep pushing zero, you will generally eventually get a person, so (if I need to talk to an actual person), that's what I do.

YouTube by MaXimillion · 2010-05-03 10:41 · Score: 2, Interesting

Considering Google is now offering automatic transcription of all YouTube videos, I'd say they certainly haven't given up on speech recognition yet.

Do other languages fare just as bad... by thewils · 2010-05-03 10:49 · Score: 4, Insightful

English, I would think is a pretty daunting language for speech recognition, what with a substantial array of homophones, but I wonder if other languages fare better. Maybe Spanish or, say, Japanese would be better since (I'm guessing) there is a closer relation to the written script and the actual sound that it makes.

--
Once I was a four stone apology. Now I am two separate gorillas.

Re:Do other languages fare just as bad... by Anonymous Coward · 2010-05-03 14:33 · Score: 0

Actually, Japanese would probably be harder. There are fewer sounds in the speech, so accurate recognition is probably easier, but there are many more homophones. These are represented in the written language by different Kanji (chinese characters), but must be gleaned from context in speech. A non-intelligent speech program would have a very hard time. For example, "watashi no hana wa akaii desu." means either "My nose is red." or "My flower is red". Good luck letting your software choose for you.
Re:Do other languages fare just as bad... by Anonymous Coward · 2010-05-03 15:16 · Score: 0

Japanese is an awful language for homophones. It's so bad that you'll see people on the trains drawing characters in midair to clear up their meaning.
Re:Do other languages fare just as bad... by thewils · 2010-05-03 15:20 · Score: 1

Those aren't homophones, they are homonyms. But at least with Japanese the program could pick up the kana correctly. In English the sentence could be "My nose is read" "My knows is red"...etc. In Japanese the user would probably have to pick out the correct kanji like they do with Japanese input now.

--
Once I was a four stone apology. Now I am two separate gorillas.
Re:Do other languages fare just as bad... by cogitolv · 2010-05-03 18:14 · Score: 1

you are homophobic

--
Well, sometimes you eat the bear, sometimes the bear eats you.
Re:Do other languages fare just as bad... by Archimonde · 2010-05-03 23:02 · Score: 1

Maybe they could try croatian (or probably, serbian) language. Written and spoken letters/words match exactly.

--
Trolls are like broken clocks. They show the truth two times a day. The rest of the day they talk nonsense.
Re:Do other languages fare just as bad... by 2obvious4u · 2010-05-04 05:46 · Score: 1

I bet Chinese is the worst. The same sounding word can have four or more different meaning, depending on the inflection of the tones. It is so bad that even Chinese people have a hard time with the words unless they are written down or taken in a larger context.

Give peas a chance? by Anonymous Coward · 2010-05-03 10:53 · Score: 0

http://dl.dropbox.com/u/454062/IMG_0006.JPG

Philosophers, "we told you so". by cenc · 2010-05-03 11:06 · Score: 2, Insightful

I have been flamed more than a few times around here for suggesting Computer Science has not got a clue what they are doing when it comes to AI. Philosophy has been at this problem and more for the better part of the last 400+ years (more like a 1,000 years) in a serious way. The stock b.s., I get from the science fiction fan boys is that somehow natural language is a problem that can just be brute forced as if you were trying to figure out the password you forgot to your email account. Good luck with that.

By the way, language "recognition" by a computer is likly the easy part of the problem for AI researchers to crack. It is still not going to yield any real AI, just better cars and toasters.

--
Living in Chile

Re: Google Voice by colinnwn · 2010-05-03 11:09 · Score: 1

Shoot, read almost any GV translation for a good laugh. Though strangely every once in a while it gets almost every word right.

Re:Time flies like an arrow fruit flies like a ban by mathfeel · 2010-05-03 11:09 · Score: 1

Having said that, Dragon works fairly well, provided you modulate your speech.

If you want a laugh with Dragon, turn away from the screen and talk normally, then look at what it has transcribed..

For real hilarity, try chasing the dragon: http://www.southparkstudios.com/clips/155898

--
The only possible interpretation of any research whatever in the 'social sciences' is: some do, some don't

BULL! by Anonymous Coward · 2010-05-03 11:10 · Score: 0

What a pile of stinky! Speech recognition was 20x better than the best humans 10 years ago. The US Navy got the technology, and its basically useless crap from Dragon and whatever still pissing around. I saw their stuff about 1993 and it was bad then, and still really bad now. The Berger-Liaw speech recognition system (the one the Navy got), was wicked though. The big difference between their system (with about 6 neural nodes) and the others (with maybe 10,000) is that theirs kept temporal information, the others use a stock computer clock (oscillator). The video gives you an idea of the capabilities of this system. The US Navy got it all. I always wondered why it wasn't a common feature on computers already. So you can wring your hands and put up silly slashdot articles, but its all a lie!

Four candles by Anonymous Coward · 2010-05-03 11:16 · Score: 0

Fork 'andles

Re:Since I don't have a flying car today, all is l by sznupi · 2010-05-03 11:32 · Score: 1

Your dear information processing seems to be the culprit with flying cars...

--
One that hath name thou can not otter

Re:Best example: Google text captions. by mfnickster · 2010-05-03 11:34 · Score: 1

Better yet, turn on captions while watching a Day Job Orchestra Trek dub!!

--
"Slow down, Cowboy! It has been 3 years, 7 months and 26 days since you last successfully posted a comment."

Hardware is the future by Anonymous Coward · 2010-05-03 11:51 · Score: 0

I predict a triumphal return of neural nets, predicated on memresistors and recent advances in large scale quantum superpositions.

The time for a new paradigm is upon us.

How many times.. by SlashDev · 2010-05-03 12:00 · Score: 1

... have you mis-understood spoken words? I have, many times. Speech recognition is directly related with the quality of microphones used to process the speech as well as the quality and articulation of the person speaking.

--

TOP DSLR Cameras Reviews of the top DSLRs

Wreck a nice beach? by Anonymous Coward · 2010-05-03 12:01 · Score: 0

It's not as easy as it sounds!

discrete voice recognition is a solved problem by brokeninside · 2010-05-03 12:06 · Score: 1

And it's been a problem that's been solved since the late nineties.

The problem giving everyone fits is continuous speech recognition which is another problem entirely. It was a sad day for most of the disability community when all the speech reg vendors abandoned their discrete speech products in favor of continuous recognition.

Speach recognition tech is broken in many ways by Theovon · 2010-05-03 12:33 · Score: 5, Informative

When I started on my Ph.D., I started out majoring in AI. One of several reasons I changed to computer architecture (CPU design, etc.) is because I just couldn't stand the broken ways that people were doing stuff. Actually computer vision stuff isn't so bad -- at least there's room for advancement. But the speech recognition state of the art is just awful. I couldn't stand the way they did much of anything in pursuit of human language understanding.

With automatic speech recognition (ASR), the first problem is the MFCCs. (Mel-frequency cepstral coefficients.) What they essentially do is take a fourier transform of a fourier transform of the data. This filters out not only amplitude but also frequency, leaving you only with the relative pattern of frequency. Think of this as analogous to taking a second derivative, where all you get is accelerating, leaving out position and velocity. You lose a LOT of information. Then once the MFCC's are computed, they're divided up into the top 13 (or so) dominant MFCCs, plus the first and second step-wise derivatives, giving you a 39D vector. Then the top N most common ones are tallied, and code-booked, mapping the rest to the nearest codes, leaving you with a relatively small number of codes (maybe a few hundred).

So to start with, the signal processing is half deaf, throwing away most of the information. I get why they do it, because it's speaker independent, but you completely lose some VERY valuable information, like prosodic stress, which would be very useful to help with word segmentation. Instead, they try to guess it from statistical models.

Next, they apply a hidden Markov model (HMM). Instead of inferring phones from the signal, the way they model it is as a sequence of hidden states (the phones) that cause the observations (the codes). This statistical model seems kinda backwards, although it works quite well, when trained properly. To train it, you need a lot of labeled data, where people have taken lots of speech recordings and manually labeled the phonetic segments. What is usually learned is mostly a unigram, where what you know are the a priori probabilities of each phone label (the hidden states), and the posterior probability of each phone given each possible prior phone. Given a sequence of codes, you find the most likely sequence of phones by computing the viterbi path through the HMM.

Honestly, I can't complain too much about the HMM. What I do complain about is the fact that the "cutting edge" is to replace the HMM with a markov random field (just remove the arrows from the HMM), and conditional random fields (which are markov random fields with extra inputs).

My response to using MRFs and CRFs is "big whoop", because all you're doing is replacing the statistical model, which doesn't dramatically improve recognition performance, because they haven't fixed the underlying problem with the signal processing.

Then on top of the phone HMM, they layer ANOTHER HMM on top of it to infer words and word boundaries, based on a highly inaccurate phone sequence.

The main problem with all of this is not that the reseachers are idiots. They're not. The problem is that the people with the funding are totally unwilling to fund anything really interesting or revolutionary. The existing methods "work", so the funding sources figure that we can just make incremental changes to existing technologies. Which is wrong. Unfortunately, any radically new technology would be highly experimental, with a high risk of failure, and would take a long time to develop. No one wants to fund anything that iffy. As a result, all the scientists working in this are spend their time on nothing but boring tweaks of a broken but "proven" reasonably effective technology.

So I don't blame people for the conundrum, but I see no opportunity to do anything interesting, so I just couldn't stand studying it.

Re:Speach recognition tech is broken in many ways by rastoboy29 · 2010-05-03 17:29 · Score: 1

I've found in my life that you pretty much always have to make your own opportunities, one way or another. Just sayin'. I think a person like you should always do what's most interesting to him/her.

--
expandfairuse.org
Re:Speach recognition tech is broken in many ways by Anonymous Coward · 2010-05-03 23:27 · Score: 0

he main problem with all of this is not that the reseachers are idiots. They're not. The problem is that the people with the funding are totally unwilling to fund anything really interesting or revolutionary. The existing methods "work", so the funding sources figure that we can just make incremental changes to existing technologies. Which is wrong. Unfortunately, any radically new technology would be highly experimental, with a high risk of failure, and would take a long time to develop. No one wants to fund anything that iffy. As a result, all the scientists working in this are spend their time on nothing but boring tweaks of a broken but "proven" reasonably effective technology.
This is a problem with many NLP technologies, not just ASR. And, from what I've seen, it's one that most researchers are aware of, if only enough to be defensive about. Part of this is caused by treating the ASR as an engineering problem rather than a scientific problem - that is, actual research into ASR systems needs to be done, as you say, using different but riskier (from a failure perspective) methods. Unfortunately, the funding for this kind of research is all heavily results driven, so tweaking existing methods is the best anyone is willing to do or pay for. Which is kind of sad.
Re:Speach recognition tech is broken in many ways by Anonymous Coward · 2010-05-04 00:23 · Score: 0

If it is so blatant, why not give a try and propose your own theory ?
but, maybe, there is a lack of alternatives to this ?
Re:Speach recognition tech is broken in many ways by jam244 · 2010-05-04 02:25 · Score: 2, Insightful

When I started on my Ph.D., I started out majoring in AI. One of several reasons I changed to computer architecture (CPU design, etc.) is because I just couldn't stand the broken ways that people were doing stuff.
I don't get it. You left a Ph.D. program because the field was immature? Isn't the whole point of a Ph.D. program to produce something new and share it? Yeah, I get that funding might be harder than a safer field like computer engineering, but it seems like you abandoned a huge opportunity. You make it sound like you had a whole slew of new, potentially great ideas, and you just dropped them because it would be "too hard".
Re:Speach recognition tech is broken in many ways by Anonymous Coward · 2010-05-04 04:37 · Score: 0

Tee Ell; Dee Arr
Re:Speach recognition tech is broken in many ways by Theovon · 2010-05-04 05:31 · Score: 1

Since I have also studied Linguistics, I do have some ideas here, but honestly, I don't really think I understand the field well enough to make any impact myself. I also fear that I just don't have the intellect for it. There are some areas like high cardinality infinities in mathematics that just baffle me -- it would take another lifetime for me to understand the existing stuff, let alone invent something new. Maybe NLP isn't THAT hard, but there's only so much I can do.
Re:Speach recognition tech is broken in many ways by Theovon · 2010-05-04 05:38 · Score: 1

I didn't leave the program entirely. I changed specialties. The actual reason I left AI was somewhat different from that. My problem with ASR is that I would have hated working on it because there would have been no room for real innovation. Not to say that I'm some kind of genius in processor architecture, but there are some exciting things people are working on that are really outside of the box.
The main thing that got me out of AI was this: My subfield was knowledge-based reasoning (we used a lot of abductive inference). When I started in the program, there were 8 people in the lab I had joined. But by the time I finished the core courses, there was no one left. 7 had graduated, and one was AWOL. The prof who ran the lab retired. And the other prof who was left was always busy with his business. So there was absolutely no one to work with. As a junior grad student, trying to break into a big field is almost impossible if you have no mentoring. I tried to work on a few papers, but the reviews I got back convinced me that (a) I wasn't up to speed in the areas, and (b) the work required to fix the papers was beyond my capacity alone. If I'd had at least one other student to work with, I might have been able to keep going.
Instead, what I decided to do was cut my losses and go into another area where I already basically new the material. I'd worked as a chip designer and was well-read in the area of computer architecture. Plus, a new prof was hired in exactly that area. It's been great, because he's enthusiastic to make his mark and get tenure, and he has start-up funding, so we work closely together.

such systems already exist by Anonymous Coward · 2010-05-03 12:36 · Score: 0

If you call your bank or credit card company, or really any large company's support lines, you're likely to encounter an IVR system that basically does what you describe. Hacking down the range of responses to make it easier for the computer is great for some stuff (well, ok, it's passing in most cases, and miserable in some), but is utterly useless for a lot of applications. The range of accents and vocal quirks of the human voice is pretty amazing, so the fact that they work as well as they do is impressive.

Voice recognition systems for command and control systems are just one, very very, small part of the overall use for the technology. Automated transcript generation for TV broadcasts (either for CCAP or other purposes), intelligence and defense applications (this is a very active area in NLP), even stuff like dictation systems.

Unless you're proposing that everyone begin speaking in computer comprehensible English, all the time, which is silly and utterly unrealistic.

screw speech recognition by smash · 2010-05-03 13:33 · Score: 2, Funny

Its just a speed bump on the way to thought recognition, which will be far more useful.

--
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.

The number of possible sentences by Anonymous Coward · 2010-05-03 13:40 · Score: 0

is actually infinite not 10^570th. English is recursive so that any final count of sentences can be increased by prefacing the phrase, "Here is a list of all sentences that has now been made longer by the addition of this sentence."

real-time closed captioning by Anonymous Coward · 2010-05-03 14:00 · Score: 0

What are TV stations using for this? Sometimes I turn it on just for the lulz.

This really happened:
TV weather dude: If you're indoors tonight...
Closed captions: IF URINE DOORS TONIGHT...

My experience with Dragon by One_Minute_Too_Late · 2010-05-03 14:02 · Score: 1

I would have to agree with the contents of the original posted article. The hospital at which I presently work is switching slowly from transcriptionists to Dragon as a 'cost-saving' move. Maybe it has saved money, but it's cost a lot in frustration.

It's made it a lot harder to dictate reports. In plain text, it's almost able to cope, although I've seen embarrassing mistakes slip through ('and in' once came out as 'anus'). Even when I try to 'teach' and 'train' it, it consistently records 'core needle biopsy' as 'corneal biopsy' and the letters A, B, E, and C are absolute crapshoots. Template reports, which are in vogue in my field, are a headache to dictate. Often Dragon misunderstands my command to move to another template field as a request to add in the words 'next field.' If I have the patience, and speak to the Dragon as I would to a mentally retarded but well-meaning child, I can hope for only two or three errors per report. When I'm in a tearing hurry, as I am most of the time, I type out my own reports.

The worst part is that all the transcription errors come out as correctly spelled words, so they're even harder to detect than they were before. 80% accuracy seems about right to me. Luddite as it sounds, I'd rather have a human transcribing my speech over a machine.

This is a very difficult problem by Whuffo · 2010-05-03 14:15 · Score: 1

All of the speech recognition systems to date try to fake it - essentially all they do is match speech waveforms to their library and some do some very simple syntax checking. This is useful for situations where the vocabulary is small and the number of human speakers is also small. These systems don't work like we do and to achieve significantly better results a very difficult problem will have to be "solved" first.

Our method of communicating is both more and less than it appears. At a basic level, what we're communicating is not contained in the words we use - we use words as symbols and the listener "looks up" that symbol and applies meaning to the word. So when I say "horse" you access your knowledge of horse and that provides the meaning to the word. If you'd never seen or heard of a horse then this communication would fail.

It's shared knowledge - literally "common sense" that makes verbal communication possible. Our brains devote a lot of "processing" to this task - they have to not only recognize the word symbols, they have to cross-reference them to memory and do it in real time. We continue to make strides in increasing the amount of CPU power we can devote to problems like this one and we'll probably reach "human equivalent" processing power in our lifetimes. Even so, the machines won't be able to converse with us because they won't have the "common sense" needed to understand what the symbols mean. Without that, they can't make any kind of valid judgements about sentence structure or what meaning a particular word is using at the moment. You can buffalo Buffalo all you want but the machine has trouble with to, too, and two.

There's been some cute demonstrations made using huge rule sets that almost work - but they quickly fall apart when you try to converse with them. Even the very best of speech recognition systems suffers from not knowing anything about what's being talked about. When we can equip future machines with the knowledge of a 12 year old human they'll be able to talk with us - and we'll have solved a lot of other related problems at the same time. Until then, computer speech recognition is an AI trick - heavy on the A.

Re:This is a very difficult problem by plan10 · 2010-05-03 21:13 · Score: 1

A 5 yr old child will be enough. Most children pickup ~90% of their language capability by 5. The rest is mostly just lexical
Re:This is a very difficult problem by Whuffo · 2010-05-04 06:51 · Score: 1

They won't have enough of that elusive "common sense" until they're 12 or so. There's a lot that we commonly talk about that a 5 year old is unaware of.
Re:This is a very difficult problem by plan10 · 2010-05-05 14:31 · Score: 1

Concepts != language
You do not use any language structures (syntactic, phonological,semantic) in your life that an average 5-7 yr child has not already acquired. This is established in pretty much all literature on language acquisition.
After all, there is a lot that biomedical scientists talk about that I don't understand at all. Still, my language faculty remains intact.

I've been playing with Markov models lately by Short+Circuit · 2010-05-03 15:05 · Score: 1

And I intentionally used a phonetic hash I threw together in the key lookup. The script produced some cool output, but didn't do quite what I wanted to do.

Then I learned about Soundex. And then, even better, Metaphone. Better still, Double Metaphone. DM's benefit is that it returns multiple keys for a processed symbol, under the assumption that the symbol might be pronounced multiple ways. It was *almost* what I wanted, except it was still more or less limited to mostly-English words. I'd like to work with IPA, but whenever I asked about a library that attepts to take text and convert it to IPA symbols, I'm reminded that different dialects will say the same words different ways (engaging the vocal chords or not, for example.), and the same word may have a different meaning depending on how it's pronounced, which is also related to its context. A first-order markov model is likely to grant some self-correcting accuracy, though while a second-order or third-order model should do a decent job, they'd represent *huge* data sets.(When I was working with a 1st-order model, and considering moving to 2nd-order, I almost convinced myself to buy an SSD to dedicate to InnoDB.)

It seems obvious to me that you should be able to apply Metaphone's approach (a returned key for each possibility), and then use a markov model to refine which key has the most likely meaning in context. (Feeding it a language's dictionary with word/part-of-speech/IPA tuples would be most excellent)

As for speech recognition, aren't there any libraries or code bases out there that convert sound to IPA? It seems the most obvious solution. Heck, you could probably get away with some on-body sensors for more accurate detection of particular IPA symbols.

Incidentally, if you want the data and code I was playing around with, I put it here. Read the thirty or so lines of disclaiming comments before you complaint about it being a 65MB Perl script. (I didn't want to bother packaging multiple files, among other concerns.) LZMA compressed, so install the lzma package or grab 7zip, depending on your OS. Compressed, it's 6.4MB.

--
tasks(723) drafts(105) languages(484) examples(29106)

Re:I've been playing with Markov models lately by plan10 · 2010-05-03 21:10 · Score: 1

As for speech recognition, aren't there any libraries or code bases out there that convert sound to IPA? It seems the most obvious solution. Heck, you could probably get away with some on-body sensors for more accurate detection of particular IPA symbols.
The IPA is used to specifically describe sound, whereas a natural alphabet is used to provide a general representation of a multitude of sounds. What sounds, and the level of abstractions (phone/allphone/phoneme) vary from alphabet to alphabet.
Ultimately, you must transcribe from the your context-specific sample to the generalised representation in your target alphabet. Nothing is gained by using an IPA representation instead of hashed audio samples because they are both context specific.
Re:I've been playing with Markov models lately by Short+Circuit · 2010-05-03 23:21 · Score: 1

That's one thing markov models are useful for; they help determine a symbol's probable meaning in a given context. Rather than randomly selecting a subsequent symbol based on the current symbol, you can estimate the current symbol's fit based on the last symbol seen.

--
tasks(723) drafts(105) languages(484) examples(29106)
Re:I've been playing with Markov models lately by plan10 · 2010-05-05 14:44 · Score: 1

I should rephrase. Nothing is gained by using IPA instead of audio signatures because they represent the same context. Think of an IPA character as essentially an audio signature.
Translating to IPA doesn't move you any closer to translating to your target script.
Re:I've been playing with Markov models lately by Short+Circuit · 2010-05-05 15:19 · Score: 1

Ah, I see. My comment on using markov models to refine matches (and get sensible resulting symbol sequences) still applies, I think.
My interest in IPA simply derives from its being an existing standard representation. Also, taking an approach like* double metaphone in converting written language to the same symbol set might go a good way to way to get source material for training the markov models.
* "like", in that multiple potential pronounciations are considered for each character sequence

--
tasks(723) drafts(105) languages(484) examples(29106)

Re:Time flies like an arrow fruit flies like a ban by Anonymous Coward · 2010-05-03 15:32 · Score: 0

In Klingon. Before the Nuance company bought them, when Mark Mandel was still heavily involved at Dragon, they had a Klingon speech recognition project. It was rather fun because it was an entirely artificial language.

Police Police police by phaet0n · 2010-05-03 16:24 · Score: 1

A better (international) example: Police police police, police police.

Understood as: Police (whom) police police, police police.

You can make an arbitrary long (true) sentences: [Police (whom) police]^n police, police [police (whom) police]^(n-1) police.

Of course, you can parse this sentence many ways, in a grammatically correct form, however the sentence is no longer tautological.

Re:Police Police police by radtea · 2010-05-04 03:04 · Score: 1

Understood as: Police (whom) police police, police police.
Thanks--this is much clearer than the "Buffalo" example, which depends on both a place-name and a colloquialism.
The trick is: find a word that can be used as one or more nouns, an adjective and a verb, and as a noun is capable of being both subject and object of the verb (possibly in different senses of the noun, as in the Buffalo example, where in one place it refers to a city, in another an animal), then drop the pronoun because the adjective is treated as limited, and confuse people with nominally structurally correct sentences that are unparsable (and uninteresting).
Xadjective Xnoun who Xverb Xnoun', Xverb Xnoun''.
IA Cops who watch police, watch police.
IA Cops who watch IA Cops who watch police, watch IA Cops who watch police.
One curious thing is why anyone thinks dropping the pronoun is grammatical, as it would only be permited by a sane grammar when the adjectival limiting of the first noun is clear, which it manifestly is not in this case. So one could argue that sentences of this form are not gramatical, if grammar is more than just syntax (which it is.)
Green fish who eat fish, eat green fish.
Green fish eat fish, eat green fish.
Is that second sentence really grammatical?
"Police police police Police police" is arguably gramatical, although like many things that depend only on syntax and literal semantics it is still pretty close to the outer edges of the English language. I've never in my life ever heard of anyone under any circumstances refer to IA as "Police police". And put a comma in it and it doesn't make any more sense than the fish example above.
Even granted that, this is a useful algorithm for writing badly, I guess, but it doesn't demonstrate anything other than "bad writing is hard to understand", which is not exactly news. It has a certain novelty value, but really... so what?
Why does anyone find this interesting? It certainly isn't interesting from the point of natural language understanding, which is far more likely to have to deal with things like this [ganked from some bad-writing examples site]: "This change will allow us to better leverage our talent base in an area where developmental roles are under way and strategically focuses us toward the upcoming Business System transition where Systems literacy and accuracy will be essential to maintain and to further improve service levels to our customer base going forward."
Or: "Y'know... the thing that gets me is why... 'cause really, it pisses me off, y'know? I can't tell her nothin'. Then she says... Or was that after the dog did that? Fuck it, I said. You know what I mean?"

--
Blasphemy is a human right. Blasphemophobia kills.
Re:Police Police police by Anonymous Coward · 2010-05-05 14:39 · Score: 0

You've misunderstood the interpretation. IT's not equivalent to "green fish [who] eat fish", but rather to "green fish [whom] fish eat" -- in the former case you cannot, as you say, drop the pronoun, but in the latter it's perfectly natural to do so. Using everyday words, consider "the people I know" vs. "the people whom I know".

Google voice search on a cellphone by Anonymous Coward · 2010-05-03 17:31 · Score: 0

I voice googled "Glenn Close" (the actor) and got results for "Clean Clothes" (the laundry). I lol'ed.

What is a metaphor? by Anonymous Coward · 2010-05-03 17:55 · Score: 0

A: To keep cows in.

Re:turn away from the screen by TaoPhoenix · 2010-05-03 18:32 · Score: 1

Tay Zonday, is that you?

--
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine

Unfortunately, people insist on using it anyway... by Anonymous Coward · 2010-05-03 19:29 · Score: 0

Just today I tried to use an automated recognition system to confirm a change of address-- and it kept just hanging up on me when it couldn't figure it out-- and I doubt it's capable of figuring it out, my street is a weird european name that everybody mis-hears and probably isn't in their database. And their web-based alternative couldn't hack it either for unrelated reasons, directing me to their phone system instead, or I wouldn't have gone there in the first place. It's a big Wall Street company, BNY Mellon, wanting to send me a statement for a fraction of shares worth about 25 cents that would cost me more to sell than they are worth. They wouldn't accept the USPS change-of-address notification without confirmation. Well, fine, but their system is just too broken to do that. Fortunately I just don't care enough about their stupid statements anyway, so I'm just going to blow it off.

Why ignore linguistic models and information? by plan10 · 2010-05-03 20:47 · Score: 1

Speech recognition will continue to hit upon this wall because it disregards internal representation. What you hear may externally be a wave form that has some certain characteristics, but the phonetic structure is also dependent on the phonemic, morphophonemic, syntactic, and semantic interactions. To actually understand a word, your exposure to phonetic information must trigger the aforementioned interactions.

The best example of speech recognition learning comes from babies. Babies are born with the ability to distinguish a pretty much infinite number of phonemes. Continued exposure to their native language then narrows this to the phones that are applicable to their use, i.e. their language. In English, things like aspirated ps get ignored for the purposes of meaning, such that I can hear "stoph" and how it is not distinct from "stop". Built upon this we discover morphemes and morphophonemic rules, so that I can tell that "stop" becomes "stopt" in past tense. Similarly, we upon this we build syntactic and semantic relationships. This is context based understanding. I need a context of "past" to start applying morphemes for past tense, but I also need a the correct phonemic context to perform the correct allophonic substitutions. Similarly, if someone with a thick Scottish or Novacastrian accent comes up to me on the street, I need to combine my semantic context with my own abstract internal representations of my language to try and understand them.

This provides a form of natural error correction that allows me to understand something I have never heard before and that might contain deviations or ambiguity (either inherent in the language or introduced by the speaker). My internal representation of English should prevent me from ascribing wrong phone clusters or wrong morphemes (runn-ingk) to the processed sound.

It's stimulus plus rule matching plus context plus error correction that should ultimately help me decide if something can be understood.

All thing ignores the complexity of graphemic translation, which build another set of rules.

The article said (somewhat in jest) that throwing out linguists helped improve the accuracy of the system. Sure, methods not representative of human language capability might in the short term give greater results, and there is no definitive model of the how language is represented in the mind. You can probably provide a great system ignoring much linguistic information that functions in a limited context (i.e. one language, rigid contexts (yes/no, numbers etc.). However, ultimately if the goal is to produce a speech system that functions like a human -- that is, performs the error correction when appropriate, uses various types of linguistic information, and in certain circumstances requires clarification -- then linguistic models are important

Re:Time flies like an arrow fruit flies like a ban by RMH101 · 2010-05-03 21:22 · Score: 1

Amen. NetBSD claims that speech dictation is dying, yada yada. Meanwhile, in the real world, digital dictation is being used very day by vast numbers of people. Use a decent headset, spend the time goign through the training, and Dragon is scarily accurate. It's used by law firms, who can't ask Partners to bill for typing up time, and it works well.

10^570 sentences to copyright! by piotru · 2010-05-03 22:45 · Score: 1

Long way to go.

Never bet on AI by mbone · 2010-05-03 23:43 · Score: 1

If there is one thing we have learned from 60 years of AI research, it's to never bet on AI fulfilling its promises.

Neural Models of Speech Recognition by herwin · 2010-05-04 00:51 · Score: 1

We've been studying the inferior colliculus, and some of the processing there appears unexpectedly complex, suggesting that speech recognition software may not be using the full set of cues that the auditory system has available to it.

The writing on the wall by Rambo+Tribble · 2010-05-04 02:55 · Score: 1

While not as common today, a few decades ago machine dictation was used in much of business. Even with high-fidelity recorders and human transcribers with perfect hearing, there were constant problems with misinterpretation. It is hard to imagine machines readily overcoming hurdles that millions of years of evolution have failed to surmount.

Re:Since I don't have a flying car today, all is l by 2obvious4u · 2010-05-04 05:37 · Score: 1

Here is one. Oh, here is another. You must not be looking very hard.

Re:Since I don't have a flying car today, all is l by liquiddark · 2010-05-04 06:07 · Score: 1

You don't understand. *I* don't have a flying car. Therefore all is lost.

Slashdot Mirror

Rest In Peas — the Death of Speech Recognition

342 comments