Is Speech Recognition Finally 'Good Enough'?

Hmmm.... by DoofusOfDeath · 2007-05-18 09:10 · Score: 5, Funny

Is Speech Recognition Finally 'Good Enough'?

Is spinachry ignition rivaly gooery stuff? What the hell are you talking about?

Re:Hmmm.... by ThunkDifferent.com · 2007-05-18 09:14 · Score: 1, Redundant

i think speech recognition IS good enough for a lot of things. i'm not sure for what yet, but HAL 9000 was way ahead of its time in the movie 2001, to that means that were only a few.. hmm... years over that, i'm sure it is good enough for deep space explorations by now.

--
W: ThunkDifferent.com
Re:Hmmm.... by value_added · 2007-05-18 09:16 · Score: 3, Funny

What the hell are you talking about?

Maybe he meant speech wreck ignition?
Re:Hmmm.... by inviolet · 2007-05-18 09:22 · Score: 1

Is Speech Recognition Finally 'Good Enough'?

Funny, when I dictated this sentence to my computer today, it came out "Is Slashdot's Shameless Plug Recognition Finally 'Good Enough'?"
Today somebody at Dragon got moved to a corner office.

--
FATMOUSE + YOU = FATMOUSE
Re:Hmmm.... by __aaclcg7560 · 2007-05-18 09:24 · Score: 2, Funny

The funny thing is why haven't Microsoft mastered this technology yet? You would think with the billions of dollars they spend on R&D that they could up with better speech recognition. And funky AIs shouldn't be too far behind.
Re:Hmmm.... by Mahjub+Sa'aden · 2007-05-18 09:34 · Score: 2, Insightful

I'll be honest with you, Vista is way better at coming up with hilarious new Madlibs than you are.

--
What is is all that is. Isn't that obvious?
Re:Hmmm.... by Rei · 2007-05-18 09:43 · Score: 1

Getting that last 5% -- like the "your analysis/urinalysis" issue -- is doable. There's a translation technology that I read about a while which should be applicable to voice recognition. It's a technique to figure out how to properly translate words with multiple meanings. You build up a database of a great amount of writings of all kinds and compile statistical information about word associations from it. So, for our example case, it would find that "admire" comes before "analysis" and "your" a lot more often then it comes before "urinalysis", so it would choose "your analysis". I think that the technique was to check eight words around the word in question (both directions)

--
"'If one must live then one must die.' - oh, the truth must be funnier than this..." -- MammÃt
Re:Hmmm.... by bearinboots · 2007-05-18 09:48 · Score: 3, Insightful

Dragon is no more... and hasn't been for a long time.

NaturallySpeaking has been sold a few times to various companies.

(I keep track because I worked on V1.0)
Re:Hmmm.... by mooingyak · 2007-05-18 10:10 · Score: 1

but what if I really said "urinalysis"?

--
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
Re:Hmmm.... by snoyberg · 2007-05-18 10:21 · Score: 1

but what if I really said "urinalysis"? Then your secretary would probably get it wrong too

--
Thank God for evolution.
Re:Hmmm.... by cnettel · 2007-05-18 10:44 · Score: 4, Informative

n-gram based language models are nothing new. Statistics is all fun and dandy, but it's no panacea. It might just be enough to throw in an even larger corpus (something like the complete Google index), but it's still hard. (BTW, n-gram Markov chains more or less originated in speech recognition, to get the individual phonemes right, and I'm quite sure they're doing at least something like it at the word level these days. It still sucks, as the quality users demand for proper dictation is extremely high.)
Re:Hmmm.... by Ucklak · 2007-05-18 11:16 · Score: 1

If it weren't for the overuse of the phrase "Dear aunt, let's set so double the killer delete select all." already present in this discussion, I'd say this would be an appropriate spot for it.

--
if you steal from one source, that is plagiarism, if you steal from many, well, that's just research.
Re:Hmmm.... by neoform · 2007-05-18 11:18 · Score: 1

Let's recognize speech.

or..

Let's wreck a nice beach.

--
MABASPLOOM!
Re:Hmmm.... by Joebert · 2007-05-18 11:21 · Score: 1

but what if I really said "urinalysis"?

What's a lysis, & how do I get out ?

--
Wanna fight ? Bend over, stick your head up your ass, and fight for air.
Re:Hmmm.... by Anonymous Coward · 2007-05-18 13:05 · Score: 0

I'll agree that speech recognition bites. I was trying it out in Vista the other day and had a photocopier in the same room making that typical photocopier "hummmmmm" in the background. It wasn't loud, but it made Vista go nuts. It was rather amusing. So, to get recognition to work now you need a completely silent room as even someone coughing translates to a "What was that?" message on the computer.

Even with going through the whole voice recognition training program on Vista I still get butt awful results. By the time you're done correcting the errors made during dictation your 160 WPM turns to 2 or 3 WPM. Typing is still a hell of a lot faster.

Now, try speech recognition for a web page, dictating XHTML and some JavaScript or PHP..... hehehehehehe.... now that would be funny. =)
Re:Hmmm.... by Helios1182 · 2007-05-18 13:36 · Score: 3, Interesting

There is a lot of work on word prediction and language modeling in natural language programming and computational linguistics research. 95% accuracy is considered very good though. There are ways to help, but some of the most effective ways require a constriction of the language recognized. n-gram based language models provide a good statistical framework, but are very data hungry. You need lots and lots of relevant (this is the hard part) text. The model needs to be based on the language the user uses in order to be effective.
Re:Hmmm.... by pluther · 2007-05-18 14:58 · Score: 3, Insightful

but what if I really said "urinalysis"?
Then your secretary would probably get it wrong too

No, your secretary would almost certainly get it right. Your secretary would know, from experience with you and the kind of work you do and the overall context of the letter whether the person you are dictating the letter to has recently analyzed something for you, or if you are applying for a job in a medical lab.
95% sounds good if you're not comparing it to a person. But 5% error rate is horrendous for business use. A secretary who missed one word out of every 20 would be fired after a few hours. A couple decades ago, when I temped for office work, I could transcribe about 80 wpm with close to 100% accuracy, and I was nowhere near the fastest.
If you got a letter from a business containing a typo on almost every line, would you do business with them?

--
If the masses can keep you down, you're not the Ubermensch.
Re:Hmmm.... by camperslo · 2007-05-18 15:55 · Score: 1

I tried using the Vista (Excremento Grande Edition) Humanoid Output Analyzer on G.W.B.

It has had some luck figuring out what he ate from the smell of his secondary gaseous output port but it has not revealed any intelligence from the verbally-modulated primary hot air datastream.
Re:Hmmm.... by dctoastman · 2007-05-18 16:31 · Score: 1

Considering we just recently bought Dragon Naturally Speaking 9.0 Medical Edition, it most certainly is still Dragon.

Hell, even Nuance agrees (http://www.nuance.com/naturallyspeaking/). And they should know.

--
My twitter
Re:Hmmm.... by Anonymous Coward · 2007-05-18 18:58 · Score: 0

http://fulldecent.blogspot.com/2007/05/vista-speec h-recognition-demonstration.html
Re:Hmmm.... by Anonymous Coward · 2007-05-18 20:00 · Score: 0

You're in Alice's ..?
Re:Hmmm.... by heinousjay · 2007-05-18 20:02 · Score: 1

The fact that it's in the name of the product is no reason to say the parent is wrong about the company no longer existing.

--
Slashdot - where whining about luck is the new way to make the world you want.
Re:Hmmm.... by vertinox · 2007-05-19 00:19 · Score: 1

If you got a letter from a business containing a typo on almost every line, would you do business with them?

Seems to work well enough for email spammers.

--
"I am the king of the Romans, and am superior to rules of grammar!"
-Sigismund, Holy Roman Emperor (1368-1437)
Re:Hmmm.... by PPH · 2007-05-19 05:25 · Score: 1

Nope. She'd probably take the rest of the day off and then call in sick for a week until she was cleaned up.

--
Have gnu, will travel.
Re:Hmmm.... by commanderfoxtrot · 2007-05-19 05:42 · Score: 1

I actually bought Dragon NaturallySpeaking this week, so this whole discussion is very topical.

Version 9 is a huge step forward from version 8.

When installed, it automatically tries to look at your Sent emails and documents you have written in order to find out about your writing style and and improve the chance of recognizing your voice in future.

--
http://blog.grcm.net/
Re:Hmmm.... by bearinboots · 2007-05-19 07:08 · Score: 1

Indeed, I was speaking about the company. Naturally Speaking, though a breakthrough product, was not enough to keep Dragon afloat and it was sold to L&H shortly after I left the company.

The fact that the product itself is still going strong after being passed around from hand to hand is something that I, as a V1.0 NatSpeak developer, take not a small amount of pride in.
Re:Hmmm.... by BeanThere · 2007-05-19 09:16 · Score: 1

In all seriousness, I was also wondering, but I think they're simply waiting for the right moment to buy the technology from another company that actually does it well and bundle it into Windows ("right moment" presumably means "after competitive pressures force them to" e.g. if Apple gets excellent speech recognition or something, as MS tends to avoid progressing faster than the market conditions call for). Then they'll market it in a way that gives the impression they pioneered this stuff. I think that's what Bill Gates really means (and aims to achieve) by continually telling the world that they are leading some amazing innovation and development in this area, e.g.:
2005: "Internet security is Microsoft's greatest challenge while developing mainstream technology to be able to talk to a computer is a frontier about to be crossed, company chairperson Bill Gates said here on Friday"
2001: "Gates said there was plenty of room for innovation in future versions of Office. He said computing holy grails like voice and handwriting recognition were just around the corner. 'The world of computing has frontiers that we're finally tackling,' Gates said."
Repeat this kind of thing enough and by the time they actually buy a decent product from another vendor, most people will likely just assume that yet again Microsoft led the way bringing this cool stuff to PCs.
Re:Hmmm.... by dctoastman · 2007-05-21 01:21 · Score: 1

But isn't that a bit like saying Dodge doesn't exist because they are owned by DaimlerChrysler (or whoever it is now)?
It is clear that Nuance is keeping the Dragon brand. And essentially that is what a company is, a brand.

--
My twitter

Problems by Tribbin · 2007-05-18 09:12 · Score: 5, Insightful

As a foreigner it is really hard to get the pronounciation right enough.

Also command execution by others in the room is a problem.

How about listening to music, or TV, and having the computer interpreting it.

--
If you mod this up, your slashdot background will turn into a beautiful sunset!

Re:Problems by Sciros · 2007-05-18 09:19 · Score: 4, Informative

It all depends what sort of corpus the SR system is trained on. So yeah, foreigners will have problems because a system trained for, say, British English will not perform well with American English. For this same reason an SR system trained for "normal" speech will do very poorly with lyrics in music.

As for stuff like "i really admire your analysis" being interpreted as "i really admire urinalysis," that stuff can easily be ironed out by an n-gram based system that "ranks" English sentences based on probability. What is the chance that "urinalysis" will follow "your" which follows "admire"? Such things can be estimated well enough if you use a large corpus to train your n-gram system (as long as the corpus you're using for this is the same "kind" as whatever speech the SR system is interpreting -- that is, newswire, business meeting, etc.)

--
I like basketball!!1!
Re:Problems by lawpoop · 2007-05-18 09:23 · Score: 1

Also command execution by others in the room is a problem.

How about listening to music, or TV, and having the computer interpreting it. I think a noise canceling microphone would take care of those problems.

--
Computers are useless. They can only give you answers.
-- Pablo Picasso
Re:Problems by Drooling+Iguana · 2007-05-18 09:38 · Score: 1

Wouldn't a noise canceling microphone filter out pretty much all current music ant TV automatically?

--
... I'm addicted to placebos
Re:Problems by Sciros · 2007-05-18 10:07 · Score: 1

By the way, what I described is referred to as the "Language Model" component of a natural language processing system. I'm sure Nuance uses one, so whatever errors it makes are probably from a result of data sparseness during training.

--
I like basketball!!1!
Re:Problems by revlayle · 2007-05-18 10:29 · Score: 1

also helps if you don't pronounce it "yer 'nalysis" (i know, i live in OKRAHOMA)
Re:Problems by Anonymous Coward · 2007-05-18 11:09 · Score: 0

Or real ambiguity which humans can only resolve from "deep" context and expert knowledge...

It would be easy for example to have a system which never allows "urinalysis" as an interpretation, but very occasionally this will lead to a drastic and irresolvable failure. It's easy to create a probabilistic model of language, but a bit subtle to weigh this against the clarity of any given sentence - consider that even assigning certainty to an interpretation is a difficult problem.
Re:Problems by dwarfsoft · 2007-05-18 11:39 · Score: 1

I work in a hospital, and they use dictation often for pathology results and all that kind of jazz, so half the time they probably want urinalysis and the other half the time they might want your analysis. I can't imagine the headaches they would have dictating other medical terminology to the system

--
Cheers, Chris
Re:Problems by RealGrouchy · 2007-05-18 14:27 · Score: 1

What is the chance that "urinalysis" will follow "your" which follows "admire"?

Actually, "urinalysis" would follow "admire", instead of "your [analysis]".

- RG>

--
Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
Re:Problems by laejoh · 2007-05-19 04:10 · Score: 1

You know, there are some handy phrase books on the market. If you're p.e. hungarian and you'd be listening to your own voice your nipples would explode with delight hearing your perfect pronounciation.
Re:Problems by nEoN+nOoDlE · 2007-05-19 05:39 · Score: 1

that stuff can easily be ironed out by an n-gram based system that "ranks" English sentences based on probability.

Isn't this what humans do all the time? If you heard your buddy say "I really admire urinalysis" you'd probably pause and say "Wait, what?" Humans stop other humans all the time for clarification on what they just said, but we expect computers to get it right all the time. We need a speech2text program that will say "Wait, can you repeat that?"

--
Don't trust a bull's horn, a doberman's tooth, a runaway horse or me.

This comment written by MS speech recognition by TodMinuit · 2007-05-18 09:13 · Score: 4, Funny

Dear aunt, let's set so double the killer delete select all.

--
I wonder if I use bold in my signature, people will notice my posts.

Re:This comment written by MS speech recognition by k1980pc · 2007-05-18 09:23 · Score: 1

This bug is reportedly fixed : http://blogs.msdn.com/larryosterman/archive/2006/0 7/31/684327.aspx

I play with speech recognition on my mac and it is pretty cool...but cannot say productive...possibly because I am not a native english speaker..

Love to impress my mates with the knock-knock jokes feature in mac speech recognition.. :)
Re:This comment written by MS speech recognition by maxume · 2007-05-18 10:39 · Score: 1

Love to impress my mates with the knock-knock jokes feature in mac speech recognition.. :)

I have it on good authority that they want you to let them go outside. Or something like that anyway.

--
Nerd rage is the funniest rage.
Re:This comment written by MS speech recognition by Dadoo · 2007-05-18 10:42 · Score: 1

My personal favorite is "Cod am pizza ship", that appeared in User Friendly, for those of you that read it.

--
Sit, Ubuntu, sit. Good dog.
Re:This comment written by MS speech recognition by R3d+M3rcury · 2007-05-18 11:46 · Score: 2, Interesting

Actually, I remember working with Apple's years ago. We had a project where, ideally, people could send voice commands to a Mac and get it pull entries out of a database and read it to you. A "What is my outsanding balance?" sort of thing.

It was really entertaining, but I fell into what I call "The Missing Remote" syndrome: If you've ever lost your remote, you will spend 10 minutes looking for it so you can turn off the TV and go to bed, rather than get up and walk over to the TV and turn it off. I think I must have spent 5 minutes saying "Close Window" in various different ways and speeds rather than just click on the damn close box.

Of course, what I really miss in Apple's speech recognition are the avatars...

It works! by Anonymous Coward · 2007-05-18 09:13 · Score: 0

I'm using it now so double delete the killer select all.

Sure by springbox · 2007-05-18 09:13 · Score: 2, Funny

In fact, I'm using it to write this Dear aunt, let's set so double the killer delete select all

Depends on what you use it for by orclevegam · 2007-05-18 09:13 · Score: 3, Insightful

Is Speech Recognition Finally 'Good Enough'?

For typing up an inter-office memo in Word, most likely. But I'm a programmer, and I can barely read out loud some perfectly fine code, I can't imagine trying to enter it all with voice recognition, no matter how good it gets.

--
Curiosity was framed, Ignorance killed the cat.

Re:Depends on what you use it for by GustoGaiden · 2007-05-18 09:36 · Score: 2, Insightful

programming with voice recognition just seems stupid to me. The idea behind voice recognition is to make it easier to write natural speech, such as email, or an essay, or anything else that follows normal speech patterns. Programming is writing so a computer can understand what you want it to do. It involves TONS of punctuation, oddly named keywords and variables (var, int, _InitBlockPosX). Hell, I can barely read my code aloud to someone else without confusing MYSELF, much less confusing the other human. Case in point, if you're trying to use your voice recognition software to write code, you using the wrong tool for the wrong job.
Re:Depends on what you use it for by parvenu74 · 2007-05-18 09:37 · Score: 1

For typing up an inter-office memo in Word, most likely. But I'm a programmer, and I can barely read out loud some perfectly fine code, I can't imagine trying to enter it all with voice recognition, no matter how good it gets. Probably because computer languages aren't designed for dictation. It would be interesting, however, if a language were designed for spoken programming rather than typing. What would that look like -- errr, sound like? Code-reviews might get a little wacky though (I'm hearing voices in the computer!).
Re:Depends on what you use it for by Richard+McBeef · 2007-05-18 09:45 · Score: 1

But I'm a programmer, and I can barely read out loud some perfectly fine code, I can't imagine trying to enter it all with voice recognition, no matter how good it gets.

Why would you even think of using it for that? That's completely retarded. Will it ever be faster to say 'if open paren x equals equals y close paren' than to type 'if (x==y)'? The answer is return apostrophe no comma it will not period apostrophe semi colon.
Re:Depends on what you use it for by Anonymous Coward · 2007-05-18 09:53 · Score: 0

Oh yeah? How about Lisp?
Re:Depends on what you use it for by Poromenos1 · 2007-05-18 09:57 · Score: 1

Why would you want to? I spend more time thinking about it than typing it anyway. It's not like speech, where you don't think about the words. I'm sure I'd hate being like "def getstr... no, getvaria... erm, gettype".

--
Send email from the afterlife! Write your e-will at Dead Man's Switch.
Re:Depends on what you use it for by QRDeNameland · 2007-05-18 10:06 · Score: 1

Imagine someone with a lisp coding LISP via speech recognition...
"cwothe pawenthethith, cwothe pawenthethith, cwothe pawenthethith, cwothe pawenthethith, cwothe pawenthethith, ...."
(My apologies for any insensitivity to those with speech impediments.)

--
Momentarily, the need for the construction of new light will no longer exist.
Re:Depends on what you use it for by Movi · 2007-05-18 10:08 · Score: 1

> Probably because computer languages aren't designed for dictation. It would be interesting, however, if a language were designed for spoken programming rather than typing. Like Applescript? http://www.apple.com/macosx/features/applescript/
Re:Depends on what you use it for by Anonymous Coward · 2007-05-18 10:20 · Score: 0

I'm an APL programmer, you insensitive clod ... I can't even *type* some perfectly fine code!
Re:Depends on what you use it for by Not_Wiggins · 2007-05-18 10:32 · Score: 1

For typing up an inter-office memo in Word, most likely.

I agree with the rest of your sentiment, but whole-heartedly disagree with your opening line.
The summary implies (incorrectly) a 95% accuracy rate for speech and 93% accuracy rate for typing are comparable when they're not.
If I mistype something:

I must leeve early to visit my aunt.

It is still understandable. But, if voice-recognition gets it wrong:

I muscle eve early to visit my ant (I'm from the Midwest).

It won't be as easily understood by the person reading it.

Machines just aren't very good at getting the context of what we're saying correct. That's because it requires a deeper understanding of intention and not just inflection/tone of voice. Spell-checkers suffer from this still today... and that's when someone has typed (no inflection/accent/anything) in the content (insert your favorite heterographic homophone example here!).

In terms of identifying and correcting mistakes after the fact, voice-recognition errors are more difficult than typed ones to correct. And with the accuracy floating around 95%, it still isn't accurate enough to supplant the keyboard. 8/

--
Diplomacy is the art of saying, "Nice doggie!" until you can find a rock.
Re:Depends on what you use it for by fmobus · 2007-05-18 10:52 · Score: 1

You are obviously forgetting COBOL-like languages.

But don't worry, this is nothing to be ashamed of
Re:Depends on what you use it for by Anonymous Coward · 2007-05-18 11:06 · Score: 0

Oh, well if seems stupid too you, let's just forget it then. No one'll every need more than 640kb of RAM either. No reason to pursue this any further.

Me? I'm intrigued by the idea. While punctuation might be a hurdle, I don't understand why you'd think it would be a major obstacle, esp. if the voice recognition software was "tuned" for the language.

"New public function foobar, arg String foo, arg Int bar, implements baz, begin."

"long int z"

"loop for int eye equals one, while eye less than ten, increment eye by one, begin."

"zee equals zee plus paren bar times baz close paren to the power of eye"

"end for loop"

"print line zee"

"end function"
Re:Depends on what you use it for by KermodeBear · 2007-05-18 12:07 · Score: 1

Funny you should mention that... Writing Perl using Vista's voice recognition.

--
Love sees no species.
Re:Depends on what you use it for by It'sYerMam · 2007-05-18 12:43 · Score: 1

If one were writing code with speech recognition software, one would assume the software would have to be well adapted for purpose. Namely, it would have to be hooked into the IDE or whatever, so it can get a better handle on what functions and variables the programmer might be saying. But using speech recognition could well make it feasible to use longer function names without all the abbreviation.

--
im in ur .sig, writin ur memes.
Re:Depends on what you use it for by dysfunct · 2007-05-18 13:43 · Score: 1

I actually like that one better. If you haven't seen it, it's just as awesome as frustrating.

--
:/- spoon(_).

No. by Caspian · 2007-05-18 09:13 · Score: 5, Funny

Speech recognition, handwriting recognition, species recognition... all of these suck, and will CONTINUE to suck, until strong AI is developed.

And by that time, there will be a lot more important problems to worry about than making a computer understand Bubba Sixpack who can't type-- such as keeping the robots from taking over the planet in a bloody war.

--
With spending like this, exactly what are "conservatives" conserving?

Re:No. by maztuhblastah · 2007-05-18 10:09 · Score: 1

Speech recognition, handwriting recognition, species recognition... all of these suck, and will CONTINUE to suck, until strong AI is developed.

I dunno... I mean, the Newton with six months of training had around 98% accuracy... Inkwell's based of the same algorithms, albeit tweaked slightly to accomadate from the difference input peripherals. I bet with a year of real research/development, Apple could take handwriting recognition off that list.

--
The real litigious bastards...

Of course it's good enough by ral315 · 2007-05-18 09:13 · Score: 5, Funny

I use it myself. It's wonder full. delete that. delete that. delete that. double the killer delete select all

Re:Of course it's good enough by Morky · 2007-05-18 09:25 · Score: 1, Redundant

In Soviet Russia, double the killer select all deletes you!

Voice recognition sucks. by grub · 2007-05-18 09:13 · Score: 1

Try it sometime.

right slash ass turd is mane dot see this will print hello oh whirl ass trick slash print f open parenthesis quote hell oh whirl backslash end close parent he says semi clothed close curly

--
Trolling is a art,

Not Useful for Coders by Hoi+Polloi · 2007-05-18 09:14 · Score: 1

"Set v underscore tab equals space parenthesis parenthesis x minus lev schema dot all recs concatenate..."

--
It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning

Re:Not Useful for Coders by Tackhead · 2007-05-18 09:26 · Score: 4, Funny

> "Set v underscore tab equals space parenthesis parenthesis x minus lev schema dot all recs concatenate..."
Yeah, but if you put a beat to it, you've got something.
{ } . ! / & ; ^ # - < > @ \ { } _ SYSTEM HALTED
"Left titty, right titty, dot bang slash.
Ampersand semicolon, caret pound dash.
Less than greater than, at back slash,
left titty, right titty, under score crash!"
* # ! ! ( ~ & | ) ' " . . DEL # ^G ! ! working... done.
"Star pound bang bang, open-paren.
Tilde and pipe, close-paren.
One quote, two quote, dot dot delete,
pound bell, bang bang, process complete!"
Google's USENET archive dates it back to 1990, but it predates the 1990 post ("Stuck Shift Key Poetry") to rec.humor.funny by several years.
You haven't lived until you've seen a dozen drunken geeks trying to sing "Waka Waka", or the entirety of "Hatless Atlas", while seeing only one character at a time. Well, maybe you have, but this is Slashdot.
Re:Not Useful for Coders by dinther · 2007-05-18 10:42 · Score: 1

Ha ha ha ha, best post I have read in a long time. Very funny. Cheers.
I can just hear this as a rap beat
Re:Not Useful for Coders by PW2 · 2007-05-18 15:26 · Score: 1

Thanks; I never knew what to call these things: { }

I would say yes by UnknowingFool · 2007-05-18 09:14 · Score: 4, Informative

Dear aunt, let's set so double the killer delete select all.

--
Well, there's spam egg sausage and spam, that's not got much spam in it.

Re:I would say yes by Anonymous Coward · 2007-05-19 04:14 · Score: 0

The problem with that MS demo was not poor speech recognition, it was a microphone with the gain set too high. When the audio-in starts clipping, you just have to expect random results.

dom

I'd say so.... by zappepcs · 2007-05-18 09:15 · Score: 1

With some of the stuff that I see on the Internet (websites and blogs etc.) I'd have to say that the urinalysis gaff isn't really all that bad.

The only place that speech recognition really annoys me is phone answering systems. They are not competent enough to let you concatenate menu item options and make an intelligent choice as to which phone queue to put you in. For example:

"I have trouble with my cable modem dropping packets" is a statement that 'SHOULD' get you put through to the second tier support line... but no, you have to go through 3 or more menu choices and still only get to talk to the scripted low wage 1st tier support.

--
Support NYCountryLawyer RIAA vs People

Re:I'd say so.... by RingDev · 2007-05-18 09:25 · Score: 2, Insightful

To be fair, that's a problem with the IVR coder, not the voice recognition engine.

-Rick

--
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs
Re:I'd say so.... by MartinB · 2007-05-19 00:00 · Score: 1

To be fair, that's a problem with the IVR coder,

No, it's a problem with the business analyst. Although if the IVR config is not as per requirements (and survives testing), then someone needs shooting.

(why yes, I *am* a Contact Centre specialist, with a specific interest in speech self-service)

--
The only thing you can accurately describe as "Scotch" is a sticky tape made by 3M. And it's
Re:I'd say so.... by RingDev · 2007-05-19 00:51 · Score: 1

Fair enough, can't blaim the coder if the words weren't in the documentation.

-Rick

--
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs

speech for programmers by VirexEye · 2007-05-18 09:15 · Score: 1

For those of us with serious RSI and who program/sys admin for a living, are there any serious attempts at voice recognition out there? Specifically, have there been any breakthroughs with speech -> symbol names or obscure shell commands?

Re:speech for programmers by Anonymous Coward · 2007-05-18 09:29 · Score: 0

http://www.inference.phy.cam.ac.uk/dasher
Re:speech for programmers by kerch · 2007-05-18 12:09 · Score: 1

VoiceCode looks pretty interesting. I haven't tried it yet, but it claims to knows enough about the language in which you are coding, and your existing code, to translate an utterance like "define method sort and arguments list" to def sort(self, list): (Python) and "compile symbols command set equals context sensitive command set without arguments" to command_set = CSCmdSet().

Demo videos here.
Re:speech for programmers by ibentmywookie · 2007-05-18 19:43 · Score: 1

Further to that, any advice anybody can give to us programmers suffering from Tendonitis/RSI would be most welcome. I've had it for around 7 months now, and I must say it is making my life pretty miserable. Doctors pretty much just say "take some anti-inflammatories and change careers". Like I want to hear that at 26 years of age... :-/

--
-- The doctor said I wouldn't get so many nose bleeds if I just kept my finger out of there!
Re:speech for programmers by Anonymous Coward · 2007-05-18 23:06 · Score: 0

It's been tried:

Breakpoint 07 Speech Coding Funcompo

Good enough for what? by traindirector · 2007-05-18 09:15 · Score: 4, Insightful

TFA mentions that many people stop using speech recognition software because of poor accuracy. I don't think that's the major reason. I think they start using it because it's a neat idea that seems to have a lot of promise, but quickly realize there are only a few situations where it's actually helpful. The end of the article mentions rough drafts; I'd also say it might be a decent choice

when you need to enter hand-written documents into a computer
for transcripts of a single speaker
informal free-thought when not surrounded by other people
when you have horrible typing skills

For the majority of office tasks, it just isn't a good fit.

So if the "good enough" is being useful in any way whatsoever, it sounds like we're almost there.

Re:Good enough for what? by L.+VeGas · 2007-05-18 09:20 · Score: 3, Insightful

These are some good points. I don't know what I would use speech recognition for, and I'm someone that writes a lot.

Seeing words laid out as text helps me think. I can compose things better, more coherently.

I'll write an email in an instant, but make me leave a voice mail, and I'll usually hang up first.

--
Best Windows Freeware
Re:Good enough for what? by RingDev · 2007-05-18 09:27 · Score: 2, Insightful

I would love it for a graphics editor. Being able to swap tools, zoom, bring up pallets, etc... with out having to go digging through menus or trying to remember hot keys. I think VR in desktop software has a place, but it is in augmentation, a fringe benefit, not the core functionality.

-Rick

--
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs
Re:Good enough for what? by QRDeNameland · 2007-05-18 09:45 · Score: 2, Interesting

Excellent points. One only need consider how much computer usage is done in cubicle farms, and then picture everyone chattering "Scratch that!" at their workstation, and the utility of speech recognition as a primary form of input becomes very limited regardless of its accuracy. I have a copy of Dragon, and its accuracy is really quite impressive, but past the novelty I have almost never used it. Other than the fact that it requires virtual silence (aside from your voice) to operate, unless I already know *exactly* what I want to say, it is easier for me to compose text by keyboard and construct my wording as I go along. The only time I could see it being of much use is for dictating a handwritten or badly printed document where OCR wouldn't work.

--
Momentarily, the need for the construction of new light will no longer exist.
Re:Good enough for what? by profplump · 2007-05-18 10:11 · Score: 1

Limited-vocabulary speech recognition has been working -- without training -- for years now. Suitable engines are built-in to OS X and MS Office, and there are several choices for Linux as well. Not all tools provide good programmatic access, so it may or may not be easy to integrate with your favorite tools, but the actual speech recognition part is there.
Re:Good enough for what? by amchugh · 2007-05-18 10:38 · Score: 2, Insightful

You missed for dumping a recording of a lecture or dictation into your computer.
Re:Good enough for what? by nine-times · 2007-05-18 10:50 · Score: 3, Insightful

informal free-thought when not surrounded by other people
I think you're implying something here that is one of the major reasons people don't use speech recognition software: if anyone is around, you feel like a total moron.
You might not realize this, but you probably speak differently than you write. Most of us do, because there are some things that look good in text that sound bad spoken, and vice versa. Also, a lot of composition goes on when writing, and so if you're playing with different word choices so you can see them written out, you just end up sputtering dumb little phrases. It's easier to edit on-the-fly when using a keyboard. And let's not forget that you might not want the people around you to know what you're writing.
Re:Good enough for what? by Nasarius · 2007-05-18 11:16 · Score: 1

One only need consider how much computer usage is done in cubicle farms, and then picture everyone chattering "Scratch that!" at their workstation

Other than the fact that it requires virtual silence (aside from your voice) to operate
So get a decent headset, or even a lavalier mic. A directional microphone near your mouth isn't going to pick up much besides your own voice. Trying to use a $5 "computer microphone" for pretty much anything is foolish.

--
LOAD "SIG",8,1
Re:Good enough for what? by elgeebar · 2007-05-18 12:47 · Score: 1

Something that should be mentioned in this context is "styles".

Speech styles vary considerably from written styles. The obvious area where these differing styles "merge" (maybe that should actually read "blur") are emails & instant messaging.

As a result, dictating emails and IM make sense. Dictating letters doesn't.
Re:Good enough for what? by gronofer · 2007-05-18 13:54 · Score: 1

I agree with this. It's quite rare that I'd want to enter text into a computer faster than I can type it. Usually the constraint is slow thinking speed, not my typing speed (which is not too bad in any case). It helps too if you can write what you want to say directly without unnecessary verbiage.
Perhaps there are situations where it would be useful, e.g., for somebody cranking out office memos all day, or perhaps a politician crafting legislation, i.e., situations where a large amount of text contains minimal original thought.
Re:Good enough for what? by tsstahl · 2007-05-19 02:23 · Score: 1

Seeing words laid out as text helps me think. I can compose things better, more coherently.

You have trained yourself this way. You can just as 'easily' train yourself to work from outlines, single words, short phrases, and even color. It is a fascinating area of study that I never thought would be useful until this moment on slashdot; good thing too, because I've just about tapped out my depth in this area.

Don't ask me to explain the color thing. I read about it, but do not comprehend it because my brain most decidedly does not work that way.

The point is, if you were born 100 years ago, would you still write a lot? Food for thought.
Re:Good enough for what? by ksheppard · 2007-05-21 01:22 · Score: 1

My physician (a podiatrist) uses a voice recognition system to record notes after a patient visit. He loves this, finding it much faster and more accurate than writing. And, doctors' handwriting being what it is, his staff certainly appreciate the verbal notes too. Of course his "notes" are not particularly lengthy. But I can see plenty of similar applications.

Depends on what for... by Actually,+I+do+RTFA · 2007-05-18 09:16 · Score: 1

I remember using M$'s speech recognition engine (the version that comes with Office 2k3) to prototype a training program. It was designed to teach radio protocol. And actually, it worked very well. It helped that we had a very limited vocabulary, and even more constricted sentence construction.

--
Your ad here. Ask me how!

oblig. by Anonymous Coward · 2007-05-18 09:17 · Score: 0

O'RLY?

Re:oblig. by WilliamSChips · 2007-05-18 11:53 · Score: 1

O'RLY? I hardly know 'er!

--
Please, for the good of Humanity, vote Obama.

Is it really faster, once you factor in checking? by Anonymous Coward · 2007-05-18 09:18 · Score: 0

I type pretty fast: somewhere around 60 WPM. I do tend to mistype, lowering my speed, but at the same time when I mistype I know I mistype: I can "feel" that my fingers are not moving as they should. With speech recognition, you'll have to be looking at the screen to find mistypes, and then you'll have to do something to retype them, but it'll probably take a while. And because of the lag, people will tend to talk slower so that it can "keep up" and they can prevent the words on the screen from getting too out of sync with their train of thought.

Speech Recognition: It's probably good enough for an IM conversation, but a copyeditor's nightmare.

"New Directions" by parvenu74 · 2007-05-18 09:18 · Score: 5, Funny

I used to work for a company that has the words "new directions" in their name. When I told people where I worked I would make a rather long pause between the "new" and "directions" so as not to sound like I was saying something else. I wonder how this software would render it...

Re:"New Directions" by sd_diamond · 2007-05-18 09:38 · Score: 3, Funny

I used to work for a company that has the words "new directions" in their name.

Please tell me the first two words in the name weren't "Coming From".
Re:"New Directions" by TrippTDF · 2007-05-18 09:50 · Score: 4, Funny

Reminds me of when the company "Pen Island" or "Mole Station Nursery" set up their domain names...
Re:"New Directions" by houghi · 2007-05-18 09:56 · Score: 4, Funny

You though you had problems with "new directions"
Can you imagine telling the software to go to this site?
haatch tee tee pee double point slash slash slash dot dot org.
http:///..org not found

--
Don't fight for your country, if your country does not fight for you.
Re:"New Directions" by Anonymous Coward · 2007-05-18 09:59 · Score: 0

HA! I finally get it.

Lame.
Re:"New Directions" by Anonymous Coward · 2007-05-18 10:00 · Score: 3, Funny

And let's not forget the Italian energy company Powergen Italia... their name makes for a wonderful .com address!
Re:"New Directions" by Champ · 2007-05-18 10:17 · Score: 2, Funny

Call me immature but I still get a mild chuckle out of, er, expert-sex-change-dot-com and part-sex-press-dot-com. Wait, what's the other part?
Re:"New Directions" by HappyEngineer · 2007-05-18 10:46 · Score: 1

Out of curiosity, who calls the colon a "double point"? Did you make that up or is it a foreign language thing?

--
Cow Cube
Re:"New Directions" by foniksonik · 2007-05-18 10:50 · Score: 1

"Cialis - Enabling New Directions for Active Seniors" : huh, makes perfect sense to me. Why would you put a pause in there?

--
A fool throws a stone into a well and a thousand sages can not remove it.
Re:"New Directions" by Gothic_Walrus · 2007-05-18 10:57 · Score: 1

You know, that would only work if people said "ees-land" instead of "eye-land."

--
Goo goo g'joob.
Re:"New Directions" by Njovich · 2007-05-18 11:13 · Score: 1

Well, in Dutch it's called like that ('dubbele punt'). That he tries to apply a Dutch-ism to a foreign language probably means he's Belgian.

I have no idea about other languages giving a similar name.
Re:"New Directions" by dwarfsoft · 2007-05-18 11:31 · Score: 1

I think he pulled it out of his colon

--
Cheers, Chris
Re:"New Directions" by Anonymous Coward · 2007-05-18 11:41 · Score: 0

penisland.com

How people say it doesn't figure into the equation.
Re:"New Directions" by jonbryce · 2007-05-18 11:54 · Score: 0, Redundant

Or Powergen Italia
Re:"New Directions" by Anonymous Coward · 2007-05-18 12:39 · Score: 1, Funny

> Reminds me of when the company "Pen Island" or "Mole Station Nursery" set up their domain names...

Does everyone remember the days when experts-exchange.com didn't seem to feel the need to use a hyphen?
Re:"New Directions" by Anonymous Coward · 2007-05-18 12:40 · Score: 0

Nah, that was just the "new direction" you had in your mouth.
Re:"New Directions" by OoberMick · 2007-05-18 12:59 · Score: 1

How people say it doesn't figure into the equation.

It sure as hell does when we're talking about speech recognition, or have you forgotten that part of the discussion?
Re:"New Directions" by dominious · 2007-05-18 13:11 · Score: 1

excuse me, did u just say "nude erections?"?
Re:"New Directions" by anilg · 2007-05-18 13:16 · Score: 1

Ha.. tell that to the "Psychotherapist" who had his domain name ordered!

--
http://dilemma.gulecha.org - My philospohical short film.
Re:"New Directions" by Ziwcam · 2007-05-18 13:34 · Score: 1

Yes, but when you're reading a domain name, you don't know which you're supposed to say...
www.penisland.com
Re:"New Directions" by Anonymous Coward · 2007-05-18 13:36 · Score: 0

Same in German. "Doppelpunkt" exactly translates to "double point"
Re:"New Directions" by MillionthMonkey · 2007-05-18 15:07 · Score: 1

So what's wrong with that? If you ask me, people should say "ees-land" or "ayes-land" if they're spelling it that way. They were probably trying to make it not sound like "Iceland" back in medieval times when automated speech recognition wasn't as good or existent as it is today.
Re:"New Directions" by Anonymous Coward · 2007-05-18 18:00 · Score: 0

Maybe he's french : "deux-points" --> two points
Re:"New Directions" by Fred+Ferrigno · 2007-05-18 19:00 · Score: 1

Hey, that's my band's name!
Re:"New Directions" by Instine · 2007-05-18 20:47 · Score: 2, Funny

People want machines to be better than people. They still have this 'infalibility' hang-up. That a machine is more determanistic, and thereby, is either right or wrong. I'm not stupid, but for a bit, when people said "/. was worth looking at" in blogsor whereever, I actually wondered how I'd find it. Then when I finally heard someone say "slash dot" I kept trying URLs with hyphens. Not for long, and clearly I've found it. But for weeks I was intregued by /. but couldn't figure out where to look (you can't google "/.").

So the question should be, are humans ready?

--
Because you can - or because you should?
Re:"New Directions" by identity0 · 2007-05-18 21:32 · Score: 1

Was it, by any chance, Women's Action for New Directions? I couldn't believe I found this when I googled...

Disclaimer: I am a linguistics major undergrad, but I have not taken any courses specifically about speech recognition.

The more I learn about human speech, the more I'm convinced that instead of analizing the whole sound input and actually deriving the individual sounds and building them into words (as a simple speech transcriber might), the human brain simply looks for major cue sounds and assembles sentences based on context and prior experience. It would fit with what we know about vision, i.e. that we don't process everything the eye sees but rather has the brain make assumptions about distance or depth using processing shortcuts.

When I took a course on Phonetics(*not* phonics), we had to learn to transcribe every sort of sound made by human speech, and it was really hard to distinguish sounds that were not in a language you knew. The brain had a hard time fitting them into previous knowledge, I guess.

So what I'm getting at is that context and setting plays a really important role understanding speech. It's not suprising that a software program might not get everything right if it doesn't know what field or specialty you are talking about. I'm careful to say "UNIX computers" to laypeople because "UNIX" sounds too much like "eunuchs", and people have enough odd ideas about geeks as it is :)
Re:"New Directions" by epine · 2007-05-18 23:57 · Score: 1

When I took a course on Phonetics(*not* phonics), we had to learn to transcribe every sort of sound made by human speech, and it was really hard to distinguish sounds that were not in a language you knew. The brain had a hard time fitting them into previous knowledge, I guess.

This was well understood in neuro-linguistics in books I was reading 15 years ago, and according to the theory at the time (which might still hold) it has hardly anything to do with context. When the brain believes it recognizes primary-language speech sounds, it routes the signal to a part of the brain dedicated to mapping those sounds onto the known, primary language phonetic elements. There is also a reverse effect: once your brain decides it hears a primary speech sound, you lose your ability to discriminate fine shades of pitch associated with those sounds. As much as I recall now, the experiments were conducted on the bounardy between linguistic processing and musical processing.

Furthermore, noticing that semantic context would serve a human to eliminate an error made by a speech recognition system does not imply that humans resort to semantic context when routinely performing that task. One factor you've completely failed to take into account is constraints on how much glucose the brain can consume to accomplish routine function. Even if it has the context available, the brain won't necessarily activate it if other more glucose-efficient circuits are capable of carrying the load. This hasn't been studied nearly enough. What we know so far is that the brain is very busy lighting up the circuits required for as long as they are needed before the glucose goes somewhere else.

You need to invest more of your own brain glucose in discriminating what's possible in brain function from what's practical to sustain.
Re:"New Directions" by bob.appleyard · 2007-05-19 02:09 · Score: 1

Writing crystallised, while speech did not. This explains most of the incongruities. Knight used to be pronounced ker-ni-gut, for instance.

--
How dare you be so modest!! You conceited bastard!!
Re:"New Directions" by ChameleonDave · 2007-05-19 02:21 · Score: 1

Knight used to be pronounced ker-ni-gut, for instance.
It certainly was not. It was pronounced /knixt/, and now it is pronounced /nait/. You actually tripled the number of syllables there.
Re:"New Directions" by Anonymous Coward · 2007-05-19 02:29 · Score: 0

in german it's Doppelpunkt, which means exactly double point
Re:"New Directions" by Anonymous Coward · 2007-05-19 05:33 · Score: 0

Hey, that's my old car radio's name! No, wait, that's Blaupunkt...
Re:"New Directions" by Anonymous Coward · 2007-05-20 06:36 · Score: 0

Not to ruin the joke, but a less suggestive workaround might have been to pronounce the latter word as "die-rections". It's an easily recognizable alternative without the ambiguity. We have a lot of those in English...

Speech recognition IS good enough by rinkjustice · 2007-05-18 09:18 · Score: 4, Informative

I'm using Dragon NaturallySpeaking. Right now, as I write this calm it, comet, post, and it sure as hacking beats typing.

Actually, I am using Dragon NaturallySpeaking right now, and it works very well. It actually works better if you speak quickly (as you normally would) and it's pretty good at inserting grammar along the way. I have bilateral tendinitis, and the software has been a godsend for me. I was even able to finish writing my book, a task that was becoming just too painful typing manually.

Oh, and you are probably wondering how long it takes to train the software? About a half an hour, and I find the accuracy at around 95%.

--
SEO Copywriter. Just Say ON

Re:Speech recognition IS good enough by Sciros · 2007-05-18 09:22 · Score: 2, Informative

Yeah, Nuance makes good stuff. Well, they've bought up everyone worth anything afaik, so I guess it's only to be expected.

--
I like basketball!!1!
Re:Speech recognition IS good enough by DragonWriter · 2007-05-18 09:29 · Score: 1

Actually, I am using Dragon NaturallySpeaking right now, and it works very well. It actually works better if you speak quickly (as you normally would) and it's pretty good at inserting grammar along the way.

What does "inserting grammar" mean?
Re:Speech recognition IS good enough by rinkjustice · 2007-05-18 09:48 · Score: 1

What does "inserting grammar" mean?

It means adding commas and periods as you speak to make the text read more natural.

--
SEO Copywriter. Just Say ON
Re:Speech recognition IS good enough by ddhuyvet · 2007-05-18 10:00 · Score: 1

Coincidently Monday the trail against Lernout & Hauspie begins. I don't know if they are known outside of Belgium, but in the late nineties they gave Flanders (Dutch speaking North of Belgium) the dream it could have a leading role in peach technology. L&H even formed the centre of a "Flanders Language Valley".

Unfortunately L&H made some wrong investments and became the centre of a major financial scandal after Robert Smithson of the Wall Street Journal discovered fictitious transactions in Korea and shady accounting techniques. As a result L&H went bankrupt in 2001. It's around this scandal that a court case starts this Monday. It's big news here in Belgium, as a lot of people invested money in L&H and are hoping to get some of it back.

I was wondering if L&H where actually on the right track, Jo Lernout today still believes in the technology. I was thinking he was wrong, but this news item might prove him right.

It was actually L&H that bought the then faltering in Dragon Systems in 2000. L&H was after their bankruptcy bought by ScanSoft (for very little money). ScanSoft bought Nuance Communications and changed it's name to Nuance. And now they seem to be getting successful with the NaturallySpeaking software, so it probably was a good acquisition by L&H back then. And ScanSoft (now Nuance) was in turn smart in buying them up.
Re:Speech recognition IS good enough by Anonymous Coward · 2007-05-18 11:01 · Score: 0

What does "inserting grammar" mean?
It means adding commas and periods as you speak to make the text read more natural.

In English we call that punctuation. I guess it could have been a speech recognition error.
Re:Speech recognition IS good enough by Anonymous Coward · 2007-05-18 13:15 · Score: 0

I don't know, but it doesn't sound relevant to slashdot.
Re:Speech recognition IS good enough by Anonymous Coward · 2007-05-18 14:31 · Score: 0

Hey, check out John Sarno's "Healing Back Pain". I had debilitating hand pain (bilateral as well) and it cured me. It might not be your thing but it is worth checking out. There is a free podcast of John Sarno interview at http://www.podfeed.net/podcast/Good+Audio+Makes+th e+Ear+an+Eye/8199

Go down to The Divided Mind by John E. Sarno, M.D. (Interview)

btw: 3 years on disability and I am back programming (started a company writing software for users to control their computers with their feet).

Regards. Keep up the good work.
Re:Speech recognition IS good enough by damonlynch · 2007-05-18 15:04 · Score: 1

There are times when speech recognition is particularly useful compared to typing. Relatively simple tasks, such as transcribing notes and dictating routine e-mails are tasks well suited to speech recognition. But there is one issue that I hardly ever see discussed with respect to speech recognition: the concentration it requires.

Putting aside the issues of dictating text that does not lend itself to natural language dictation, such as programming code, in my experience the main issue determining whether it will be productive or not is invariably how much concentration is required to form the sentences being written. If a lot of concentration is required to simply form the sentences -- such as when writing something difficult to conceptualize -- then speech recognition gets in the way of thinking about what to write. The reason is simple. Typing is practically automatic. Speech recognition is not. First, you need to think about clearly articulating what you are saying, in natural sentences. Second, you need to check every word to make sure it has been dictated correctly. 95% accuracy may sound good, but it's a real pain when you are concentrating on your ideas. Furthermore, the subtle errors that often occur with speech recognition -- such as "is" instead of "as", for instance -- can significantly degrade the quality of writing produced, and yet it is precisely these errors that are very difficult to pick up without excellent concentration on what the speech recognition program is outputting. Attention becomes fragmented, detracting from quality intellectual thought.
Re:Speech recognition IS good enough by Anonymous Coward · 2007-05-18 16:45 · Score: 0

Not to question you or anything, but why would Dragon NaturallySpeaking misspell tendonitis?
Re:Speech recognition IS good enough by Anonymous Coward · 2007-05-18 17:38 · Score: 0

Peach technology? They listen to fruit? Or was that you talking to your L&H software?
Re:Speech recognition IS good enough by Anonymous Coward · 2007-05-18 18:23 · Score: 0

Check your facts, Jack: http://www.answers.com/main/ntquery?gwp=13&s=tendi nitis

Its good enough for comercial applications by sentrido · 2007-05-18 09:20 · Score: 1

Its good enough for comercial warehouse applications e.g. the vocollect and voxwares of the world

IVR vs VoIP by RingDev · 2007-05-18 09:20 · Score: 1

I work on IVR systems for clinical research and medical screening (along with a huge variety of other things we make these systems do). And it's pretty good. We do a lot of work massaging the Grammars to make the system more accurate though, and we have a lot of extra logic built in for situations where we can predict values and assign weights to different words. But the one thing that rather annoys me is that I quite often have issues with Skype's quality just being a bit to low for the system to pull off. I use Skype to dial in so I don't have to take my hands off the keyboard/mouse for testing (or deal with the phone in general). I would guess about 1 in 5 questions I have to repeat or wait for a reprompt because of an audible glitch from the VoIP connection.

All in all though, I'm rather impressed with the functionality and accuracy we do have. I'm not sure it will take over in many places though because of the error rate on free-formed text and the volume levels. My old cube-farm was noisy enough with everyone typing, I even can't imagine it with everyone trying to talk to their computers and hoping the noise filters would pick out their voice correctly. I've got a nice closed of office to work in now, so no one has to hear me yell "Invalid selection my ^%#!" at my computer ;)

-Rick

--
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs

English is stupid! by drinkypoo · 2007-05-18 09:21 · Score: 0, Flamebait

will it still render 'I really admire your analysis' as "I really admire urinalysis'?

English is the only language I speak and I still think it's stupid. But if you pronounce 'your' correctly it doesn't sound like "yur", which is what the beginning of urinalysis sounds like. 99% of the time the problem is improper pronounciation.

And no, accents are no fucking excuse. I'm sorry you grew up around people who can't pronounce words properly... But you should really learn to pronounce the words correctly so that people outside of your inbred birthplace will understand you.

Once I had a Texan share an anecdote with me about an even sillier-sounding Texan who pronounced "oil wells" as "owl whales". I don't think speech recognition software will figure that out, either.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Re:English is stupid! by Anonymous Coward · 2007-05-18 09:38 · Score: 0

I defy you to define "proper" pronunciation without invoking a bunch of dead people who wrote down their unfounded ideas of perfection in books.
Re:English is stupid! by compro01 · 2007-05-18 09:50 · Score: 1

it is easier to reprogram computers than it is to reprogram humans.

For that matter, how do you define the "correct pronunciation" of any given word? The King's English? The President's English? The Prime-Minister's English? The MLA's take on it? Your opinion?

opinions are like assholes. everyone has one and a lot of people are assholes about their opinions. you try messing with people's ideas of language and they will tend to hate you.

--
upon the advice of my lawyer, i have no sig at this time

Pretty good by Richard+McBeef · 2007-05-18 09:21 · Score: 5, Funny

95 percent is pretty good, only one word in twenty. I wouldn't have a problem with a 5% error ate.

Re:Pretty good by Anonymous Coward · 2007-05-18 09:39 · Score: 0

mod parent up -- most elegant post I've seen in a long time
Re:Pretty good by Rei · 2007-05-18 09:52 · Score: 2, Insightful

5% could be the difference between "The report confirmed that Iraq has WMDs" and "The report confirmed that Iraq had WMDs." It could be the difference between "Tell Mrs. Smith to take 20mg of neurontin" and "Tell Mrs. Smidt to take 20mg of neurontin." It could be the difference between "The magnet should not be exposed to a field greater than fifteen teslas" and "The magnet should not be exposed to a field greater than fifty teslas." And on, and on.

Small wording changes can make a big difference -- generally much bigger than typos, which I can assure you happen far less often than 5%. Additionally, typos are generally recognizable as the intended word, and often aren't even noticed by the reader.

--
"'If one must live then one must die.' - oh, the truth must be funnier than this..." -- MammÃt
Re:Pretty good by Derek+Pomery · 2007-05-18 09:58 · Score: 1

That is precisely the problem with Dragon - the algorithm by its very nature will not create typos - it is matching speech against known words. So it is helpless with new vocab (although you can train it) and it makes for devilish subtle typos that take longer to pick out than it would have to run a spell check after a typist finished with their 93% accuracy.

--
-- perl -e'print pack"H*","6e656d6f406d38792e6f7267"' /. ate my old sig. Bastards.
Re:Pretty good by TheRaven64 · 2007-05-18 10:15 · Score: 2, Insightful

I wonder exactly what 95% means. Does it mean one character out of every 20 is wrong? One word out of every 20 has an error? One sentence. I average about one to two errors per page, and so all of these sound horrendous to me. Even typing with my eyes closed (which I do sometimes when my eyes are feeling tired, but generally don't because I always think I've managed to move my fingers one character across and started typing complete nonsense) I get higher accuracy than that.

--
I am TheRaven on Soylent News
Re:Pretty good by tolan-b · 2007-05-18 10:34 · Score: 1

Or it could be the difference between 'error rate' and 'error ate'...
Re:Pretty good by Anonymous Coward · 2007-05-18 10:41 · Score: 0

I'm glad you got modded as funny because for some reason these guys don't seem to get the friggan yoke.
Re:Pretty good by Tablizer · 2007-05-18 11:13 · Score: 1

95 percent is pretty good, only one word in twenty. I wouldn't have a problem with a 5% error ate.

Certainly mour than enuf for /.

--
Table-ized A.I.
Re:Pretty good by tepples · 2007-05-18 13:06 · Score: 1

5% could be the difference between "The report confirmed that Iraq has WMDs" and "The report confirmed that Iraq had WMDs." Then phrase your statement "The report confirmed that Iraq has/had WMDs by $MONTH" so that the human proofreader can still fix it.
It could be the difference between "Tell Mrs. Smith to take 20mg of neurontin" and "Tell Mrs. Smidt to take 20mg of neurontin." "Tell Mrs. Sierra mike india tango hotel to take 20 milligrams of neurontin."
"The magnet should not be exposed to a field greater than fifteen teslas" and "The magnet should not be exposed to a field greater than fifty teslas." "...a field greater than one five teslas" vs. "...a field greater than five zero teslas" better?

We use it. by Organic+Brain+Damage · 2007-05-18 09:21 · Score: 2, Interesting

For command control of a system where we need both hands free. It's pretty good, much better than stopping and typing, clicking or pressing buttons during a repetitive manual process.

We're using an older version of Microsoft's product and it seems the microphone quality is important.

Re:We use it. by cs02rm0 · 2007-05-18 09:34 · Score: 1

Likewise, for our main product we've integrated Dragon for command and control. It's faultless there, even without training. It's 'good' in general use, but that doesn't really cut it for anyone who can touch type.
Re:We use it. by diavolomaestro · 2007-05-18 11:02 · Score: 1

I like having two hands free as well, but Firefox always takes me to hugeteencities.com. It's kind of annoying like that.

Can your computer... by onkelonkel · 2007-05-18 09:22 · Score: 1

wreck a nice beach??

--
None of them can see the clouds; The polished wings don't care.

Yes and no.. by msimm · 2007-05-18 09:24 · Score: 1

For some reason even time this topic comes up the focus seems to shift word-processor type use.

What about simpler uses? How many basic tasks in the car require you to take your hands off the steering wheel? I'd like to see the basic functionality of the remote control mirrored in speech recognition. Things like stop/pause/increase/skip.

I'd imagine once this kind of simple recognition became common over-all speech recognition would (more) rapidly evolve.

--
Quack, quack.

Re:Yes and no.. by DragonWriter · 2007-05-18 09:40 · Score: 1

What about simpler uses? How many basic tasks in the car require you to take your hands off the steering wheel?

Zero.

A very few may require taking a hand off the steering wheel, though well designed newer cars tend to solve even that by putting controls on the wheel.

Though to solve the major "hands off the wheel" problem I've seen in other drivers, I'm not sure how voice control would work, anyhow: are you proposing a voice-controlled makeup application system?

No by Threni · 2007-05-18 09:25 · Score: 1

They use it on TV all the time for subtitles, and practically every sentence has a mistake. It's finally "usable" or "worth taking seriously", but "good enough" implies, to me, that no further improvements are required, and I don't agree with that.

Welcome to the new AT&T! by poptones · 2007-05-18 09:26 · Score: 3, Funny

Press or say one to speak with a representative in english...

One

When you hear the option you are calling about you may say it at any time. If you are calling about a billing problem, say billing. If you are calling about a technical issue, say technical. If you are calling about new service, say new customer. If you are...

Billing

I'm sorry, that is not an option. When you hear the option you are calling about you may say it at any time. If you are calling about a billing problem, say billing. If you are calling about a technical issue, say technical. If you are calling about new...

Billing!

I'm sorry, that is not an option. When you hear the option...

Billing billing billing!

I'm sorry, that is not an option. When you...

Fuck you! Give me a human! Human human human!

I'm sorry, that is not an option. When you hear the option...

Re:Welcome to the new AT&T! by Mattintosh · 2007-05-18 09:33 · Score: 1

Unless I call a number where I expect an automated system, the first thing I do is press and hold the 0 button for about 10 seconds.

I'm usually talking to a real person within a minute or so.
Re:Welcome to the new AT&T! by MartinB · 2007-05-19 00:15 · Score: 1

Press or say one to speak with a representative in english...

One

When you hear the option you are calling about you may say it at any time. If you are calling about a billing problem, say billing. If you are calling about a technical issue, say technical. If you are calling about new service, say new customer. If you are...

Billing

Bloody hell that's plain bad scripting. While the reco engine does sound duff (or rather: the setup & tuning for the specific business purpose is duff), there's *no* excuse for this kind of rubbish, and they are orthogonal problems.

What you've got there is a reco system poorly shoehorned onto the same logic as a simple DTMF IVR - "System reads out the options & possible response values; user responds with the chosen key"

If you want a *better* example, go call the iPod tech support line (it's TuVox behind the scenes - there's examples at that URL, and note that that's all synthetic speech, not recorded). Or sign up for sharedealing with T Rowe Price, which is IBM Websphere Voice Server (actually, that example's a couple of years old now, but it's still miles better).

(Disclaimer: I am a member of the IBM Contact Centre Competency, but am not speaking for my employer)

--
The only thing you can accurately describe as "Scotch" is a sticky tape made by 3M. And it's

It would be nice.. by Wicko · 2007-05-18 09:28 · Score: 1

..to see the software discern between two different voices when typing up a document.

The only problem I see here, is people becoming too dependant on the software. Terms like urinalysis might become something we will automatically associate with your analysis, people will get lazier and lazier, as if we aren't enough already.

FailWare by Anonymous Coward · 2007-05-18 09:31 · Score: 0

FailWare. Heh, I just thought of that term. Google only returned 77 hits, so I guess I almost coined it.

Re:FailWare by Anonymous Coward · 2007-05-18 10:00 · Score: 0

Better patent it

Maybe the question should be... by Mahjub+Sa'aden · 2007-05-18 09:31 · Score: 5, Insightful

Instead of asking if speech recognition is "good enough", maybe we should be asking whether or not it's actually useful for anything in the first place. I mean, is it good enough... to do what?

Can you imagine being in a cubicle farm full of people talking to their computers? Or trying to talk to your computer on the bus? You have to imagine that as computers become more ubiquitous, input methods will have to adjust alongside, and I simply can't see (or hear) speech recognition doing that very well.

--
What is is all that is. Isn't that obvious?

Re:Maybe the question should be... by EvanED · 2007-05-18 09:50 · Score: 1

I would like a good speech recognition program. I've been meaning to give Dragon a try at some point, but I'd need a mic too (unless it comes with one which it might...). I do enough writing and stuff that it could come in handy to reduce wrist strain.

A lot is coding, but I could still be speaking this /. post to the computer instead of typing it.
Re:Maybe the question should be... by Mahjub+Sa'aden · 2007-05-18 09:55 · Score: 1

My boss uses Dragon Natural Speaking in his office. It's quite a nice product once fully trained; out of the box it's pretty spotty. It's also quite a resource hog, but that's pretty much to be expected with that sort of software.

My point is this, however. While it may be fine for my boss, a touch typer and not much of a speller, in his office, alone, it's not much use in a public or semi-public space. I'm not much of a visionary, but it seems pretty obvious that sooner or later, computers will be everywhere and we'll have to be inputting stuff everywhere as well. I'm not sure how speech recognition will scale in those cases.

Not to mention that at this point, as far as I know (and feel free to correct me on this), speech recognition is not good enough out of the box to recognise all sorts of voices. Not everyone has clear natural diction.

--
What is is all that is. Isn't that obvious?
Re:Maybe the question should be... by 644bd346996 · 2007-05-18 10:24 · Score: 2, Insightful

Speech recognition is obviously not universally usable, but it is useful. I've found that for many mundane tasks, the OS X speech recognition is easier than a keyboard shortcut, and much easier than using the mouse. There are a lot of applications that could be much easier if they included speech recognition for commands. Consider an app that relies heavily on both keyboard and mouse input, such as Blender. A lot of the keyboard shortcuts would be faster and easier to remember as spoken commands, and they could be implemented so as to be quite reliable. Also, most 3d modelers can probably get the privacy to use a verbal interface.

I think the real issue is that speech recognition apps have focused almost exclusively on dictation, which is much harder computationally than picking commands out of a finite, known set. For the latter, speech recognition technology has long been "good enough," and the only challenge it to make effective use of spoken commands in addition to current input methods.
Re:Maybe the question should be... by babblefrog · 2007-05-18 10:48 · Score: 3, Insightful

Where I see it coming into its own is as an input method for really portable "wearable computing", where it would be extremely inconvenient to use a keyboard.
Re:Maybe the question should be... by AJWM · 2007-05-18 18:51 · Score: 2, Insightful

I mean, is it good enough... to do what?

Oh, how about evesdrop on a few thousand voice circuits and raise a flag when certain key words or phrases are mentioned?

--
-- Alastair
Re:Maybe the question should be... by koreaman · 2007-05-18 21:40 · Score: 1

What the hell is a "voice circuit"?

--
Le français vous intéresse?
Re:Maybe the question should be... by Anonymous Coward · 2007-05-19 05:20 · Score: 0

Gosh, I have no clue. You don't suppose it could be something like a circuit (you know, as in a telephone landline, or a virtual circuit as in a cellular call) that carries voice (speech, sound, audio), do you? It's not like Google returns more than 80,000 hits for the phrase or anything. Must be some random verbage the GP made up.

pronouncing words "properly" by Bearpaw · 2007-05-18 09:34 · Score: 1

Everybody has an accent. (Ask a linguist.) Basically, it sounds like you just want everybody to have the same accent that you do. Good luck.

Re:pronouncing words "properly" by drinkypoo · 2007-05-18 09:41 · Score: 1

You can have an accent and still pronounce words in such a way that you can properly distinguish between them. The pronunciations in the dictionary are there for a reason, and until people learn to use them, we will still have problems like this. And on the topic of everyone having my accent, with notable exceptions the people on the West coast of the US speak English closest to the intended pronunciation, and in fact are closer to it than denizens of England, who have actually gone so far as to change some of their common spellings many years ago to differentiate them from the way we spell them here across the pond, and to deny the desire of the individual who named Aluminum (note the absence of additional letters) as to how it should be spelled. So arguably, everyone's accent should be closer to mine :)

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:pronouncing words "properly" by DragonWriter · 2007-05-18 09:45 · Score: 1

You can have an accent and still pronounce words in such a way that you can properly distinguish between them.

Which words can be properly distinguished by sound alone (rather than context) varies by accent.
The pronunciations in the dictionary are there for a reason, and until people learn to use them, we will still have problems like this.

Except in languages where there is an official prescriptive authority, they exist to document actual usage, and often document several variations which can be ambiguous with other words or combinations of words.

People learning to use them will not change the fact that spoken language, even "proper" spoken language, by any definition, contains ambiguities that cannot be deterministically resolved with 100% accuracy.
Re:pronouncing words "properly" by wcbarksdale · 2007-05-18 10:20 · Score: 1

closest to the intended pronunciation I'm curious; who intended this pronunciation? I'd like to meet the inventors responsible...

Language Hat by Anonymous Coward · 2007-05-18 09:35 · Score: 0

I play the language hat card!

Well, if speech recognition gets common... by Kjella · 2007-05-18 09:38 · Score: 1

...so everyone will talk all the time, half the work population will go postal and the other half will get offices. Also one thing that I notice is that I rarely get everything right the first time, I go back to add a sentence or use copy-paste quite a bit. It's really much easier to do that with your fingers without losing the "verbal" line of thought. And all the applications where it makes much more sense to use the UI than trying to talk your way through commands, voice commands get a bit like the ocmmand line, you have to memorize a lot to use it at a decent pace. That limits its use to a very few select situations for me, not hardly enough to be worth it.

--
Live today, because you never know what tomorrow brings

Re:Well, if speech recognition gets common... by mark-t · 2007-05-18 10:53 · Score: 1

Speech recognition would probably have its greatest application for physically handicapped people using computers. It would also be useful for any circumstance where dictation would otherwise be used. Just because it wouldn't be applicable to your situation, does not mean it would not be a boon to others.
The ideal computer UI for many people, however, will be one that reads your brain activity and responds directly to thought.

--
File under 'M' for 'Manic ranting'
Re:Well, if speech recognition gets common... by Anonymous Coward · 2007-05-18 11:04 · Score: 0

As long as the Iraq and Afghanistan wars rage on, we will not exhaust the supply of double-hand amputees.

Speech Reco Software Consolidation by __aajwxe560 · 2007-05-18 09:39 · Score: 4, Informative

I am presently a financial customer of an enterprise speech recognition product that Nuance offers. For several years now, the speech recognition software industry has been under consolidation, with Nuance buying a few different competitors and technologies. Most recently, this dance has continued with Nuance being acquired by ScanSoft, a company known for specializing in type recognition.

Nuance support is marginal at best, and through all the consolidations, understanding even within their own company of how the product works is quite lacking. We have found our own developers often times educating the Nuance support folks in various aspects of how the product is working, and then inquiring as to whether this is intended behavior or not. Crickets can often be heard finishing these types of conversations. We normally would have moved to another product under these conditions, but simply put - Nuance acquired what little was left, and now has no competition in the market. Competition is what spurs innovation, and so with the continued consolidation, it is hard to see significant advances in the technology without free help from academia.

If you think the Microsoft monopoly is bad, imagine if they absorbed Apple and somehow took over Linux leaving you with a few "choices", but all under the Microsoft moniker. The technology is very neat and the enterprise level products do some basic things quite well, but there is still some glaring room for innovation that I don't expect anytime soon under present industry conditions.

Re:Speech Reco Software Consolidation by Michael+Ashley · 2007-05-18 11:17 · Score: 1

there is still some glaring room for innovation that I don't expect anytime soon under present industry conditions

I agree strongly with that. I have been amazed at the glacial progress in speech recognition in the decade I have been using ViaVoice and Dragon Naturally Speaking. Bugs in Dragon remain uncorrected for years, despite user complaints. As is common with proprietory software, there is no dialog between the users and the anonymous programmers.
The lack of any Linux alternative is also a major problem - Dragon is the only reason I still run Windows. The last time I checked (6 months ago), Dragon didn't work with Wine. With Microsoft owning, AFAIK, a substantial fraction of Nuance, and with Dragon being the only game in town, I see a very bleak future for Linux-based speech recognition.
If only the current generation of elite open-source programmers would realize that unless they get speech recognition working in the next decade, they will be in trouble when they themselves develop repetitive strain injury.
It is easy to think you are invincible when you are young and your fingers can hammer away at lightning speed on the keyboard for 18 hour programming sessions, but, reality hits after 5, 10, perhaps 20 years, when you suddenly find that typing is painful and even clicking a mouse button is difficult.
Re:Speech Reco Software Consolidation by Anonymous Coward · 2007-05-18 15:47 · Score: 0

There are still some people within Nuance that know really well how things work, but those with talent are often swamped with requests. If you're lucky enough, you'll get in touch with one of us with talent, and trust me things can move fast once you find that person.

In short, people like me went through dozens of mergers to this "mutant" we now call Nuance, and now that the dust (and blood) is settling, things are internally changing for the better. R&D is ramping up pretty fast too. I'm actually quite happy to see most of the internal bullshit going away so that we can please customers and make more money, and that upper management actually agrees with that...

So hang in there, things are getting better.

Certain industries already make heavy use of this by artemis67 · 2007-05-18 09:39 · Score: 1

Several years ago, I saw a court reporter using a speech recognition system with his laptop. The microphone actually looked like some sort of breathing apparatus, as it fit snugly over his mouth and nose with the wires in a tube running down to the laptop.

not even close by trwww · 2007-05-18 09:42 · Score: 1

Judging by this and this, I would say its not even close.

Looks like it makes for good jokes.

Mod parent up! by Doctor+Memory · 2007-05-18 09:43 · Score: 3, Insightful

Seriously, the only things speech recognition is good for are bulk text entry and simple navigation. I imagine trying to use voice commands to operate modern software would be similar to letting my four-year-old help make pancakes — yes, it gets done, but it's so much easier and faster to just do it yourself. Imagine trying to edit a document using just voice commands. Is your WP going to be smart enough you can tell it "find all occurrences of 'scum-sucking bottom feeders' and replace it with 'esteemed colleagues'". Or are you going to have to say "Find. Scum hyphen sucking bottom feeders. Tab. Esteemed colleagues. Replace all." Face it, GUIs have rendered speech recognition for command and navigation moot. Most operations you perform don't have a verbal description, or at least not one that is quicker to say than to do.

I also can't imagine it'd be that useful for actually writing things. I don't think I'm the only one who revises as they write. I think I actually write better when I write things out by hand, because it's slower so I tend to think my phrasing and sentence structure through more before I commit anything to paper. If I could suddenly type two or three times faster, I think it'd probably make my text even more incomprehensible than it usually is...

--
Just junk food for thought...

Re:Mod parent up! by arbitraryaardvark · 2007-05-18 10:57 · Score: 2, Interesting

* when you need to enter hand-written documents into a computer
* for transcripts of a single speaker
* informal free-thought when not surrounded by other people
* when you have horrible typing skills

You had me at "* when you have horrible typing skills".

Parent post mentions their 4 year old making pancakes.
At some point, most likely, you expect the kid is going to grow up and get better at making pancakes. There will be a learning curve. Maybe 4 is too young; I haven't met the kid. But part of the point of teaching a kid to make pancakes is to get the learning curve out of the way, so they can get better at it on their own time, preferably before they are 30.
My crude analogy is that a naturally speaking soft dragon is a bit like a 4 year old pancake maker. It can be worthwhile to get used to an imperfect tool now, so that you'll have the learning curve out of the way as the tool gets better over time.
Or it can be better to wait another year. Your mileage may vary.

Here's another potential application: Get the dragon for your kid. It may be useful as she or he learns to read and write.

I for one welcome our new naturally speaking dragon overlords.

I want the throat mike module, so that it types what I'm subvocalizing.

I'm hearing a business model here:
1 form a corp to offer voice to text software
2 wave hands
3. sell out to nuance
4......
Re:Mod parent up! by tsstahl · 2007-05-19 02:31 · Score: 1

f I could suddenly type two or three times faster, I think it'd probably make my text even more incomprehensible than it usually is...

No, you would have more time reread and think about what you wrote.

Writing is 'hard' because most people are terrible writers, but are actually pretty good re-writers. The problem, of course, is that we don't take the time to look over and rewrite our stuff. You understand yourself perfectly well when you think. If you take the time to look at your thoughts from the reader's point of view, glaring errors of assumption and clarity jump off the page.

Just like those how to get riches infomercials, none of this information is earth shattering, or particularly insightful. Some people never think along the lines to make the conclusions, others know and are too lazy, and of course one half of the bell curve is below average anyway and just doesn't 'get it'.

open source speech recognition by biscon · 2007-05-18 09:46 · Score: 1

are their any open source speech recognition word checking out? (as a coder I would love to have a library to play around with).

Yeah I could use google, but then you wouldn't have a chance of making the lists of links and get modded +5.

relying on karma whores since '07

Re:open source speech recognition by perky · 2007-05-18 10:54 · Score: 1

HTK

--
"The new wave is not value-added; it's garbage-subtracted" - Esther Dyson, Dec 1994
Re:open source speech recognition by zuzulo · 2007-05-18 10:57 · Score: 3, Informative

The Sphinx project is the current 'gold standard' in open source speech recognition. It can be found at

Sphinx Project at CMU

I have used a variety of open source libraries in addition to 'rolling my own' and for general purposes Sphinx is certainly the most mature option.

--
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
Re:open source speech recognition by kmaclean · 2007-05-18 12:37 · Score: 1

Julius Speech Recognition Engine

Although Julius is only distributes with Japanese Acoustic Models, the VoxForge project is working on creating English Acoustic Models for use with Julius, and other Free and Open source Speech Recognition Engines (disclaimer: I am the VoxForge admin).

--
Donate Your Speech

The lesson of speech recognition software by macraig · 2007-05-18 09:48 · Score: 1

The message that people *should* be learning from the less-than-perfect transcription of speech recognition software, such as misunderstanding "I really admire your analysis" as "I really admire urinalysis", is that it's finally time for people to learn to SPEAK as well as write proper English, as opposed to speaking in ebonics or text-speak or some other hard-to-transcribe dialect. "Your" pronounced as "ur" is pretty damned difficult to interpret, without resorting to contextual analysis... which of course is the ONLY reason we humans can still understand each other at all. Does the story of the Tower of Babel ring a bell?

did you know by way2trivial · 2007-05-18 09:49 · Score: 1

a lot of dictionaries have NO pronunciation guide.. they just aren't english dictionaries

that is because, in many languages, a certain order of letters are always pronounced the same way.

Russian is one example..

--
every day http://en.wikipedia.org/wiki/Special:Random

There's no comparison by Trailer+Trash · 2007-05-18 09:49 · Score: 1

This is really apples & oranges. The typist with 93% accuracy will produce a document with some typos, and I can tell you from years of reading /. that typos are easily "corrected" by the reader if the typist doesn't catch them. Even at that, spell checkers catch quite a few of them, too.

That's very different from "your analysis" turning into "urinalysis". Here, the spelling is correct but the words are completely wrong, and trying to figure out what is really meant will take a much longer reading of it.

To answer the question, it's not ready.

--
Do you have ESP?

Its hard to wreck a nice beach by Intron · 2007-05-18 09:50 · Score: 1

About 5 years ago some manufacturer announced chips for under $5 that would do speaker-independent, limited vocabulary recognition and I predicted that there would be products appearing all over the place that would get rid of the crappy buttons and use speech as the interface. The only place I see it is in cell phones, and I always turn it off, because I don't want my cell phone surreptitiously calling someone while I am talking ABOUT them. Anyway, why hasn't the toy and gadget market latched onto speech input? It seems like those back massagers ought to be able to understand "Harder, ooh, harder, harder".

--
Intron: the portion of DNA which expresses nothing useful.

too anstwer you question. by geekoid · 2007-05-18 09:50 · Score: 3, Funny

Yeth.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

It is on my BlackBerry by BigCheese · 2007-05-18 09:55 · Score: 1

I use the speech recognition on my BlackBerry Perl^H^Hrl^H^Harl all the time and it's "good enough".

--
The obscure we see eventually. The completely obvious, it seems, takes longer. - Edward R. Murrow

One example... by NIN1385 · 2007-05-18 09:56 · Score: 0

This is a very good example of succesful voice recognition:

Google 411

Very intelligent, but isn't everything Google does?

--

If carrots got you drunk, rabbits would be fucked up. - Comedian Mitch Hedberg R.I.P. 03/30/68-2/24/05

Until by geekoid · 2007-05-18 10:04 · Score: 1

sonme jackass tells non tech people to sue it to get tier 2 help.

Probably the same jackass that told people about the Internet.

--
The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

Holding mouse to mouth by fishthegeek · 2007-05-18 10:04 · Score: 1

and trying out my best Scottish accent... Computer..... Computer......

Has ANYONE gotten this to work on System 6 for the Mac yet?

--
load "$",8,1

who has actually used voice-recognition? by CaptainNerdCave · 2007-05-18 10:11 · Score: 0

i bought dragon naturally speaking way back when my 100mhz pentium was still in the running. once i went through the "training" (reading several paragraphs out loud for the software to figure out speech patterns), i found it to be very good at figuring out what i said. eventually, i stopped using it because of the difficulty created by the environment and forgetting to "begin/end dictation" properly, as well as several other problems that could be overcome with some practice. for the generic user, this is a great idea, especially if they need to "type" many pages of text in simpler language. the real problems start to show up when trying to use lots of complex words. suffice to say, this is not software that an english, philosophy, poli-sci, chemistry, etc major should hope to use without lots of frustration.

More goodies! by martyb · 2007-05-18 10:13 · Score: 1

Is Speech Recognition Finally 'Good Enough'?
Is spinachry ignition rivaly gooery stuff? What the hell are you talking about?

That's a great one! Here are a few of *MY* favorites:

Nature'll anguish; wreck ignition.
Our feet are stayin'.
A river and a ditch.
Mercy buckets.
Bone chewer.

The translations:

natural language recognition
Auf wiedersehen (German: "good-bye")
arrivederci (Italian: "good-bye")
merci beaucoup (French: "Tahnk-you very much")
bonjour (French: "hello")

These are all I can remember at the moment. I'd love to have more to add to my "funny file", please reply with your favorites!

Re:More goodies! by plover · 2007-05-18 10:45 · Score: 1

Here's the worst-case nightmare task of any speech engine: the Anguish Languish
When I was a kid, my grandmother had clipped Ladle Rat Rotten Hut from the local newspaper, and my dad and I thought it was hilarious. Every so often, one of us would quote something like "water bag icer gut". A friend recently told me about the book (I didn't know it was a book), and google turned up the above page.

--
John

beautiful Celtic name by Simonetta · 2007-05-18 10:14 · Score: 1, Funny

People sometimes get upset when they can't give their new daughter the name that sounds like " She-thay-hahd", but is spelled Shithead.

Re:beautiful Celtic name by Anonymous Coward · 2007-05-18 11:24 · Score: 0

The above anecdote is poorly paraphrased from the book Freakonimics.
Re:beautiful Celtic name by Simonetta · 2007-05-19 03:32 · Score: 1

Thanks for the reference. I read that book but didn't remember the quote. I had heard it from another source. All this stuff floats around.
Re:beautiful Celtic name by Anonymous Coward · 2007-05-19 04:02 · Score: 0

It's way older than Freakonomics. My mother used to tell the same story when I was a kid (I think in her version, it was a classmate's name).

Gaming by Evil+Cretin · 2007-05-18 10:15 · Score: 1

I think I'll stick to keyboard and mouse for gaming...

I can imagine someone at a LAN party shouting:

"w w w w w a w a w a mouse1 d d s s mouse1 mouse1 s s s ctrl w w y f u c k i n g space c a m p e r enter"

--
"A deadlock has been reached. One task must die. We must now choose between murder and suicide."

sierra lima alpha sierra hotel delta oscar tango by tepples · 2007-05-18 10:15 · Score: 2, Interesting

Any speech recognition software worth the $ should be able to detect and translate NATO letter names: "hotel tango tango papá colon slash slash sierra leema alpha sierra hotel delta oscar tango dot org".

Speech Impediments by Troy · 2007-05-18 10:16 · Score: 1

The holy grail for me has been software that deals with speech impediments. I stutter. I'm fluent enough that I function fairly well in real life (I'm a high school teacher), but speech recognition software has universally failed to meet my needs.

All of the words have too many s's.

great prevention for repetitive stress injuries by brettbum · 2007-05-18 10:17 · Score: 2, Interesting

I'm using Dragon NaturallySpeaking 9 right now. I've been using it for several months, and I have written a dozen articles on it. I think it works fantastic, but you definitely have to learn how to write all over again. Out of the box it trains extremely quickly, if you do not want to train it at all you can just start talking and it will eventually catch up with you. (Note it caught catch up and not ketchup) I started using it as a preventative means of avoiding repetitive stress injuries. I cannot use it to code, however I can definitely use it for my writing. Using Dragon NaturallySpeaking, I can easily push out five to 15,000 words a day. (notice it used the word five and then a number) Ultimately it provides you very accurate writing. It's almost impossible to have a spelling error, however word substitution errors are still very common. If you attempt to compare your typing accuracy versus your dictation accuracy, you will often see spelling errors in the typing and word substitution errors in the dictation. That means that when you go back and edit your own work you have to spend a good deal more time editing because you're not used to editing the type of dictation errors that you make because you have years of experience editing the normal types of spelling errors that you made. You also have to learn how to compose sentences by speaking as opposed to composing to your fingertips. This definitely exercises a different area of your brain and I'm sure you will find that you are not as good of a writer when you speak as you all are when you type. However with practice you can get up to speed dictating and you will then definitely benefit from the ability to type at 150 words a minute without breaking a sweat, stressing out your wrists, or even suffering from eyestrain. Dragon NaturallySpeaking definitely helps people to avoid eyestrain because you don't have to stay focused on the computer monitor while you're typing you can look around the room, or outside or anywhere. Touch diapers (s/b touch typers!) can do this also however good ergonomics dictates that you sit in positions that align your body correctly to avoid repetitive stress injuries and this includes pointing your face for words (forwards!) towards the computer screen. With Dragon NaturallySpeaking I can face in any direction I like in the program will keep up. Downside it does substitute words and on occasion it skips words entirely. I run at least a gigabyte of RAM in my computer and I was would suggest double that amount. Dragon NaturallySpeaking is a bit of a resource hog, however it's worth it and it's not as bad as Firefox. I should have purchased it years ago and definitely do not regret the purchase nor my new attempts to learn how to write all over again. I had to learn to write with pencil and paper, and then with pen and paper and then with a manual typewriter and then with an electric typewriter and then with my trs 80 and then a laptop and my treo and yada yada yada I can sure learn to do it with my voice.

Re:great prevention for repetitive stress injuries by aynoknman · 2007-05-18 11:17 · Score: 1

I see it doesn't do so well at breaking things into paragraphs

--
We need a "+1 -- nice sig" moderation.
Re:great prevention for repetitive stress injuries by brettbum · 2007-05-18 11:38 · Score: 1

Nah, that's just my general obtuseness in dealing with Slashdot's interface. I resubmitted the article three times. It kept getting stuck. The third time, my clip board lost the html tagged version and I got lazy.

My apologies, but it would be nice if Slashdot came out of the stone age and update their interface a bit. Why force people to write up those tags if its not necessary?

In the meantime, I did think of something that makes working with DNS more difficult. . . .

Its tough to eat Cheetos and use DNS at the same time. Plus, when my dogs bark at home and I am composing an email, the dog bark triggers the send command and the email gets sent prematurely.

Re:Ted "Chug-a-lug" Kennedy by BlackSnake112 · 2007-05-18 10:17 · Score: 1, Offtopic

why am I reminded of this:

The grim reaper is standing next to the grave markers of JFK, RFK, and John Jr. A voice from above yells "I said TED! TED Kennedy!"

there goes any karma I ever had any chance of having

Wreck a nice beach by Chris+Burke · 2007-05-18 10:17 · Score: 1

The classic example used in my AI class to describe the problem of getting a computer AI to... recognize speech.

Though to me the problem with dictating text (the obvious use for speech recognition) is the need for some kind of escape for punctuation or program control. I mean, you can't just say "I went to the store period select all cut" because even assuming it recognizes all the words perfectly it wouldn't know if the "period" is supposed to be a word or punctuation, and either way it "assumes" you'd need a way to get the opposite behavior. Saying "escape" out loud constantly would be weird, espcially if you need the word escape "escape escape" or maybe you also need to invoke the escape key so you have "escape" and "escape escape" and "escape escape escape".

--

The enemies of Democracy are

Re:Wreck a nice beach by Montag2k · 2007-05-18 10:21 · Score: 3, Funny

Sounds like someone wants to use Vi with their speech recognition engine!
Re:Wreck a nice beach by sydneyfong · 2007-05-18 13:34 · Score: 1

In fact this might work really well. Instead of switching between insert/command mode, just open your mouth.
Now we won't have to press all the time, or remap our caps lock keys....

--
Don't quote me on this.
Re:Wreck a nice beach by Fred+Ferrigno · 2007-05-18 19:12 · Score: 1

So in normal mode you're entering commands by pantomime or what? How do I do CTRL-V hj R /?
Re:Wreck a nice beach by Anonymous Coward · 2007-05-19 04:59 · Score: 0

You download Emacs. Then you jump out the window.

Speech recognition is okay by rolfwind · 2007-05-18 10:19 · Score: 1

I don't mind the errors, what I do mind is taking my time out to correct them.

While tying, if I make a typo or something - I either ignore the few wrong letters, correct them really fast (takes a second or two), or the spell checker does it for me. All in all, I am still concentrating on what I was doing.

I have tried Dragon Naturally Speaking ver 5, 7, and the latest one, 9sp1. It really has gotten better throughout the generations but when I dictate a document and something comes out bad - it's an entire word or phrase and I HAVE TO CORRECT that type of mistake. I can't ignore it - people can overlook spelling mistakes - they won't overlook silly phrases/words in between.

Then my concentration is knocked off the task as I am sitting there training the program. They could streamline this by seeing how you eventually corrected it and what you eventually type in and compare it to the program's first guess. Right now, they make selected the phrase, make you say pronounce the wrong guess and then the correct one. It's too time consuming.

Speech recognition is good though, to give your wrists a rest. But I find that typing shorter reports that are to the point work just as well.

Re:Is it really faster, once you factor in checkin by dubbreak · 2007-05-18 10:19 · Score: 1

With speech recognition, you'll have to be looking at the screen to find mistypes, and then you'll have to do something to retype them, but it'll probably take a while.

Well if you correct items as you go, yeah you will lose all your speed increase. For a speed increase you would have to dictate your entire document and go back and fix it. This will cut down speed if you are using it for emails, but for technical documents it can be a godsend (you have to go back and proof/edit them anyhow, so just fix issues at that time). It really depends on your use. For bashing out emails or code it would be much to slow.

I'd love to use speech recognition software at work, but I don't think my coworkers really want to hear me dictate software requirements to my computer. Maybe I'll switch to an old IBM keyboard first, then they'll be happy when I shift to speaking.

--
"If you are going through hell, keep going." - Winston Churchill

Glottal stops by tepples · 2007-05-18 10:30 · Score: 1

As for stuff like "i really admire your analysis" being interpreted as "i really admire urinalysis," that stuff can easily be ironed out by an n-gram based system Please tell me you're not talking about engrams in Dianetics.

that "ranks" English sentences based on probability. That, or just have speakers adopt the German habit of pronouncing a glottal stop before words that start with a vowel. (A glottal stop is the sound in the middle of "uh-oh".) The test phrase would sound like this, and such a habit would help to disambiguate "your analysis" from "urinalysis" even for a medical transcriptionist, whose language model may have been overtrained with "urinalysis".

Re:Glottal stops by Anonymous Coward · 2007-05-18 11:10 · Score: 1, Informative

Please tell me you're not talking about engrams in Dianetics. I don't think they are ... it sounds like they're talking about n-grams. http://en.wikipedia.org/wiki/N-gram.

"Cape Horn" by tholomyes · 2007-05-18 10:31 · Score: 1

It's a coffee company, but it sounds like he's peddling homoerotic publications. *shudder*

--
When did the future switch from being a promise to a threat? -C. Palahniuk

Yes... by msimm · 2007-05-18 10:33 · Score: 1

That's exactly what I mean. ...

By hands I meant either hand. Conversational English.

--
Quack, quack.

Re:Yes... by randyest · 2007-05-18 12:27 · Score: 1

What about "conversational English" makes "hands" equal "either hand?"

--
everything in moderation

Try explaining to a person what to do. by Wierdy1024 · 2007-05-18 10:33 · Score: 1

Using speech recognition to perform any task is like explaining to a new employee how to do something - it's much quicker to do it yourself, but you kind of hope the employee/computer will learn and do it without prompting next time - except the computer doesn't learn like that...

For example, consider doing a google search by speech:
start (wait while it processes and does what you ask)
firefox (wait while it processes and does what you ask)
click address bar (wait while it processes and does what you ask)
spell w w w dot g o o g l e dot c o m (wait while it processes and does what you ask)
enter (wait while it processes and does what you ask)
(wait while it loads page)
type hello (wait while it processes and does what you ask)
click google search (wait while it processes and does what you ask)

now compare that to the 4 mouse clicks and 21 keystrokes (inside 5 secs) that'd take to do it yourself.

Ad-free single-page article link by Anonymous Coward · 2007-05-18 10:35 · Score: 0

Here

Charlie Stross has a good essay on this by monopole · 2007-05-18 10:37 · Score: 1

He basically runs through most existing input devices including speech recognition and finds them wanting.
He saves particular venom for speech recognition, particularly due to the difficult of finding homonym based errors in the transcribed text:
http://www.antipope.org/charlie/blog-old/2005/08/2 7/#input-devices-1

clearly it's time... by Anonymous Coward · 2007-05-18 10:43 · Score: 0

2 moov 2 a fonetic langwedge

maybe not.

Actual useful applicatikon of speech recognition: by Anonymous Coward · 2007-05-18 10:51 · Score: 0

In the US: 1-800-GOOG-411

Try it out. It's actually okay!

What about command recognition? by irp · 2007-05-18 10:53 · Score: 1

I don't need speech recognition. But I really want command recognition - "open browser" - "close window" - "move window to left monitor" (I'm using a 3 monitor setup at work). Most of my mouse moves are to position the windows at the right monitors.

I found a program - "game commander" - that almost, but not completely can do this. It has very good recognition rate (it only has to recognize 10-20 simple commands). However, the commands can only be a sequence of keypresses, and that is not enough e.g. when moving the position of a window. Another annoyance is you have to "do" the keypress in the program, so I'm unable to enter "Alt+TAB" as a command...

Yes, it's awesome by Psychor · 2007-05-18 11:02 · Score: 0, Troll

Speech recognition is great, the Slashdot editors have been using it to enter stories for a long time, and it's clearly absolutely flawless.

Typing speeds by calctech · 2007-05-18 11:02 · Score: 1

Those who type at hunt-and-peck speeds will experience results that are even more dramatic. So is this faster than the biblical method of typing: seek and ye shall find?

--

Opportunity for Open Source by Bunyip+Redgum · 2007-05-18 11:11 · Score: 1

If there is only one effective supplier, this is an ideal opportunity for FLOSS to innovate.

The lack of good speech recognition applications on Linux and the BSDs is a major barrier to corporate and government organisations switching to FLOSS. Frequently the main users of speech recognition use it for occupational health and safety reasons. Failing to provide the software the OH&S specialist recommended may be enough to scuttle a migration project.

Re:Opportunity for Open Source by fyoder · 2007-05-19 06:52 · Score: 1

Free software speech recognition:

GPL is viral -> GPL is virtue

Microsoft is good -> Microsoft is goo

Richard Stallman is a big, fat wind bag -> Richard Stallman is a visionary who understands that nothing good can come of anything which is not built upon a foundational philosophy whose central tenet is a profound respect for freedom.

--
Loose lips lose spit.

Google 411 by got2liv4him · 2007-05-18 11:12 · Score: 1

I just used google's 411 service ( http://labs.google.com/goog411/ ) and it's speech recognition was great. I usually have a hard time with them because of my New Orleans dialect, but it understood every word, opposed to those that can't get the number you're saying ("I'm sorry, I didn't understand you, could you please repeat the WHOLE thing again!").

--
King of kings and Lord of lords

IBM didn't sell ViaVoice! by bmetz · 2007-05-18 11:13 · Score: 1

IBM's deal with Nuance was that they have the exclusive distribution rights for IBM's ViaVoice product. IBM still owns all the intellectual property and actively is developing its Speech Recognition technology, both as research projects like MASTOR as well as in product form as part of WebSphere Voice Server, which is an MRCP compliant Speech Recognizer that can be plugged into basically any standards-compliant thing that wants to utilize speech.

--
What did you eat today? http://www.atetoday.com/

Pocy & Caste by omgamibig · 2007-05-18 11:14 · Score: 1

Still owned him.

Medical transcription by grogo · 2007-05-18 11:17 · Score: 2, Interesting

Where I work, we use PowerScribe, a Dragon-based medical transcription service. The following was dictated using it:

"I am using PowerScribe, which is a radiology speech dictation system. It is fairly accurate in the doming [domain] of medical transcription, and particularly in the doming of radiology, but it not very useful for free pexed [text] speech.

For example, there [here] is a sample of the typical chest report: Hazy groundglass opacities noted with both lungs, particularly the right middle lobe as well as the left lower lobe, with no evidence of effusion, pneumothorax, or consolidation. [this is pretty much verbatim what I said].

[But here's a free text example:] However, if a Type II right a regular letter to a friend, [if I try to type a regular...] for example setting the following, [for example, saying the following...] Yesterday was a very nice state [day]. The clots [clouds are] gone, and only a little brain [rain] remains. Today it is supposed to be even warmer outside, I think elbow [I'll go] injected [and check] with the right knob. [the weather right now]"

The biggest problem with this system, particularly for medical transcription purposes, is that it only gets about 95-97% right. That means, it's wrong at least 3% of the time. Worse yet, whenever it's not sure, it just inserts random garbage! Whatever the closest match is, which is often wrong, and sometimes fundamentally changes the meaning of what I intended.

Human transcriptionists, on the other hand, will insert a blank if they're not sure, to alert the dictating physician. This fscking system has no clue when it's wrong, which makes it very dangerous in my opinion!

Re:Medical transcription by fishbowl · 2007-05-18 11:52 · Score: 1

>Human transcriptionists, on the other hand, will insert a blank if they're not sure, to alert the dictating physician. This fscking system has no
>clue when it's wrong, which makes it very dangerous in my opinion!

Indeed. I'm also not convinced that "speech to text" is the right thing to do anyway. I think we'd be better off re-thinking the whole interface, and using "speech-to-audio" idioms.

Working with visually impaired people for the first time, I was surprised to see how efficient the screen readers actually were. I was picturing something that read at a rate on the order of what I read. The surprising thing was to realize the users were reading *many* times faster than a visual reader would do. The most proficient people using readers can absorb information that sounds to the untrained ear, like a tape on fast forward. It really is amazing.

Anybody who is picturing the screen readers as being like "Mail.... From.... user... at... ess... dee... ess... see... e...dee.... you... Subject..." might be in for a shock. I know I was.

--
-fb Everything not expressly forbidden is now mandatory.

I've waited years for a reason to post this... by cruiser5 · 2007-05-18 11:18 · Score: 1

Eye yam tie ping too ewe whiff ay news peach wreck ignition soft wear. Thistle bee duh firs sting eye half writ tin witted. A cording two these oft ware pack edge, disallows won two bee mower E fish sent win rioting and isle know lawn gear kneed a key bored. Sum pee pull say its snot vary ewes full, butt eye knead too due sum spearmint ting sew icon seafloor my shelf. Oops knot shelf. Delete lass toward. Know delete whirred self! Know dam mitt! Delete sin tense. Crop! Eye shooed B A bull 2 E raise win eye May comma stake U pizza sheet! Wearer thee inns truck shuns! I yam gun too err race these E male, putt these peace a ship inn a Bach sand sinned too duh come penny foray re funned. Naught sinned, send! ::RECEIVED SEND COMMAND:: MESSAGE SENT

Its much better these days by arse+maker · 2007-05-18 11:20 · Score: 1

Speech recognition is a lot better than it once was. It will require a lot of improvement to make it better... if you look at a human; even we have a lot of trouble understanding another accent. That pretty much means a paltry computer won't get it. However, a computer should be able to be more thorough, so same accent speech should be possible. However... it takes a person 5-10-20 years to master speech, so a computer needs alot of training, the method of training a computer is probably key. Not only listening to speech, but more advanced concepts like recognizing who is speaking so as to apply the patterns to that person. Also learning who is talking to the computer, telling my computer to turn on the TV might be easily confused with asking if my chocolate wiener is turning my paid gf on for some time till that is worked out. Personally I think prefixing everything with "COMPUTER, turn on that" is not good enough, it should be possible to work out context within the next 10 years with so many cores available.

% accuracy isn't necessarily meaningful by CTho9305 · 2007-05-18 11:22 · Score: 1

At 95% accuracy, people aren't jumping on the bandwagon. Wood's typing speed is about 60 wpm with 93% accuracy, so he found that using speech recognition was about twice as fast as typing.

That isn't really meaningful. When I make a typographical error, it's usually a single letter oission/dupplication/trasnposition, which 1) is easy to address with a spell-checker (or even auto-correct for teh more common typos) and 2) is unlikely to interfere with understanding of the result. When speech recognition software makes an error, it tends to replace one or more words with one or more different words. If you correct those errors as you go, you're going to be starting and stopping, and if you go back to fix them later, you can't just catch them with a spell checker.

A meaningful comparison would require looking at teh typeso f errors made whiel typing, and the types of let's set so double the killer delete select all.

--
My server

Maybe the question should be...Am I the only one? by Anonymous Coward · 2007-05-18 11:23 · Score: 0

"Instead of asking if speech recognition is "good enough", maybe we should be asking whether or not it's actually useful for anything in the first place. I mean, is it good enough... to do what?"

Uh, huh. So slashdot, what's so insightful about saying you don't even consider disabled people during your discussions?

Re:sierra lima alpha sierra hotel delta oscar tang by Dun+Malg · 2007-05-18 11:34 · Score: 1

NATO letter names: ...papá ..." Perhaps it was just a typo, but the above is in error. The NATO phonetic alphabet always puts the stress on the first ejective consonant. In the radio alphabet, "papa" is definitely not pronounced "pah-PAH". See, it's all English, and designed to be shouted into a radio by a panicked infantryman wishing to call in an artillery or air strike.

--
If a job's not worth doing, it's not worth doing right.

'Uranus' vs 'your anus' by Anonymous Coward · 2007-05-18 11:34 · Score: 0

Can Speech recognition do this one?

"The astronomer looks at Uranus. The doctor looks at your anus."

Re:'Uranus' vs 'your anus' by AMSRay · 2007-05-18 12:25 · Score: 1

True story: At a medical transcription convention a doctor was trying a demo of the IBM speech recognition product. Physician: The patient has acute angina. Speech recognition: The patient has a cute vagina.
Re:'Uranus' vs 'your anus' by glenstar · 2007-05-18 14:18 · Score: 1

This is also a true story... I was sitting here innocently eating Dick's and trust me if I HAD a vagina my vanilla shake would have spurted out of it after reading the above comment. Instead the white, frothy liquid came cascading out of my nose.

Yeah but... by taradfong · 2007-05-18 11:40 · Score: 1

Is talking really less fatiguing than typing? And if everyone had to speak every word they entered into their computer the modern cubical workplace would be chaos.

Clearly voice recognition is extremely desirable in mobile situations such as with mobile phone commands and calling into phone centers.

--
Does it hurt to hear them lying? Was this the only world you had?

Try it with an accent... by spywhere · 2007-05-18 11:52 · Score: 2, Informative

We were testing an edition of Dragon Naturally Speaking back in 2000, when an Asian-American woman on our team took the microphone. She had a heavy accent, and the software interpreted her words as... nothing.

She stood there, trying to get it to write something, and finally ended up repeating, "It not woking! Why it is not woking?"

We were afraid to laugh, fearing a trip to HR... we all stood there, biting the insides of our cheeks, until she gave up and left the room; then, we collapsed on the floor, literally ROTFL.

Accuracy? by johncadengo · 2007-05-18 12:06 · Score: 1

At 95% accuracy, people aren't jumping on the bandwagon. Wood's typing speed is about 60 wpm with 93% accuracy, so he found that using speech recognition was about twice as fast as typing.

The difference in stated accuracies of typing versus speech recognition is the fact that when you're typing, you can easily and readily catch your mistakes. If I type the wrong letter, more often than not, even before I finish pushing down on the wrong key I will realize it and reach for the backspace. However, in speech recognition, you must pay very close attention to how the software renders your speech on the screen and subsequently find a way to fix it. This can be done in two ways, either: by stopping speaking and manually editing it with your keyboard, or by remembering that it was mistaken and going back at the end to correct it (because no one wants to be stopped or slowed down in their train of thought, which is half the reason I choose typing over penmanship). Either way, you have the problem of a two-pass proofread in comparison (all our CS majors know) to the on-the-fly proofread.

In the end, however, if a document in really important you will proofread it multiple times. So it's all up to preference. Personally, I prefer typing, regardless of accuracy, simply because I can catch my own mistakes on the run and correct it immediately. I am an attention to detail kind of person in the first place.

--
My page.

Exactly: Different types of errors by obtuse · 2007-05-18 12:13 · Score: 1

Typos look wron, but speech recognition errors are more supple.

Oh, and 1 in 20 words being wrong is astonishingly bad if you've ever actually tried to use SR.

This is an AI problem of exactly the sort that humans excel at, and we still screw it up regularly. It's hilarious that people seem to have so little respect for the difficulty of the problem.

--
Assembly is the reverse of disassembly.

Re:Exactly: Different types of errors by geminidomino · 2007-05-18 16:42 · Score: 1

Typos look wron, but speech recognition errors are more supple.

This is an AI problem of exactly the sort that humans excel at, and we still screw it up regularly. It's hilarious that people seem to have so little respect for the difficulty of the problem. Not ALL humans, apparently... ;)

No... by MLS100 · 2007-05-18 12:20 · Score: 1, Informative

>>Is speech recognition 'good enough'?
No.
>>I'm sorry I did not understand your selection. Is speech recognition 'good enough'?
NO.
>>I'm sorry I did not understand your selection. Is speech recognition 'good enough'?
NO!
>>I'm sorry I did not understand your selection. Is speech recognition 'good enough'?
HOW ABOUT DIE!
>>Answer "yes" entered, is this correct?
No.
>>I'm sorry I did not understand your selection. Is this correct?
AJKFLSJFKSLFJSDKFDJSKSFDJK
>>Thank you for participating in our survey, goodbye.

Re:No... by KillerBob · 2007-05-18 16:23 · Score: 1

Oh, I dunno... my cell phone's built-in speech recognition for its commands, voice dial, voice memo, and recognizing names in the phone book works pretty well. Especially since I haven't actually trained it to my voice...

And as somebody who's called Bell Canada and dealt with Amy, the voice-recognizing computer they have answering the phone, I can tell you that it's even got a touch of AI built in, that works perfectly. Why... when Amy asks me to say the name of the department I'd like to speak to and I reply "I think you're a piece of shit Amy and want to talk to a goddamned human for fuck's sake." it transfers me right through to a human.

--
If you believe everything you read, you'd better not read. - Japanese proverb

Zeno's Translator by Carcass666 · 2007-05-18 12:28 · Score: 3, Informative

Speech recognition has been at a standstill for years now, it's been "almost there now" for well over five years. As mentioned in other posts, there has been a lot of consolidation and that has really hurt growth. Lernout & Hauspie and Dragon were constantly going back and forth a few years ago trying to get a leg up on each other. When L&H got into all of their accounting problems and shut down, that left Dragon and IBM. IBM's product went to Scansoft and went to Nuance where it languishes until somebody pulls the plug (for example, if you call for support on ViaVoice and mention you have XP SP2, they will tell you it is not a supported platform).

Most of the improvement in the Dragon and ViaVoice over the last couple of years has been in the reduction of training required to get to the high-ninety's level of accuracy (assuming noise-cancelling mic in a quiet room and you do not have a cold/sore-throat). The advancements in training have not corresponded to much in the way of translation accuracy. A "trained" Dragon 7 recognizes speech pretty much as well as Dragon 9 (I haven't played with Dragon 10 yet).

Most of the real speech recognition advancement these days is focused on discrete word sets for voice mail trees and other interactive systems. When you are on the phone giving your credit card number, two/to/too is all the same thing. While speech recognition in its current incarnation is good for people who can't type (disabilities, carpal-tunnel, etc.) it is not a replacement for typing, and isn't any closer today than it was five years ago.

Re:Zeno's Translator by tbuskey · 2007-05-21 01:23 · Score: 1

Lernout & Hauspie and Dragon were constantly going back and forth a few years ago trying to get a leg up on each other. When L&H got into all of their accounting problems and shut down, that left Dragon and IBM. IBM's product went to Scansoft and went to Nuance where it languishes until somebody pulls the plug
L&H aquired Dragon and Dictaphone a few months before the accounting fraud came up. Dictaphone spun out again. Dragon and the L&H speech stuff was acquired by ScanSoft. The founders of Dragon were left with worthless stock & tried to get Dragon back which didn't happen. ViaVoice was acquired some time later.

Patents by Anonymous Coward · 2007-05-18 12:30 · Score: 1, Interesting

This is actually a really good case on patents, and how one company can purchase a couple patents and block out commercial market competition. Dragon/Nuance/.... has such a wonderful monopoly, and has noticably scaled back their release cycle since they achieved that Monopoloy (dragon 7 - dragon 8). Somehow I lack the faith to presume they've been working hard and are about to release anything worldchanging.

Speech Recognition is more than dictation by Pedrito · 2007-05-18 12:46 · Score: 2, Interesting

Speech recognition generally comes in two flavors: Command and Dictation. Most voice recognition engines can handle either, but the implementations are very different. Command mode is handled by providing a list of "command" words that are valid at any given point and operates much like a state machine. Dictation is a completely different beast and does a variety of things under the hood to increase accuracy.

"Good enough" is very vague as applied to voice recognition. For command stuff, "good enough" has been here for about 7+ years. Even MS's free engine does a great job at that.

I used Via Voice years ago and it worked pretty well. But here's the thing: Have you ever tried to dictate something? It's definitely a skill. I'm sure some people have a natural ability for it, but I certainly didn't. I tried dictating stuff and it's tough. You hit a pause mid-sentence trying to figure out how you want to phrase something and suddenly there's a period and you're beginning a new sentence. Try dictating several sentences of original material and keeping it going without pauses and "um"s and so forth and you'll see, it's not quite as easy as it seems. I suspect one of the reasons voice recognition hasn't been a hit, is that people don't expect that. They try it for a few days think, "Hell,it's easier just to type," and give up. That's why I don't use it for writing. I can type faster and more accurately than I can dictate. I'm sure if it's something I wanted to work on, I could develop the skill, but my point is, I think that's probably why a lot of people give up on it.

I honestly think that voice recognition in command mode could be really useful at speeding things up, if software were designed to take advantage of it. But it's not easy to add it as an afterthought and it adds significant work, even if it's done with forethought. It's a chicken and the egg thing. If a lot of software supported it, I think people would see a gain in productivity using whatever software they use daily. I don't mean just using voice recognition, but in combination with a mouse and keyboard. For example: "Execute Browser. google dot com. flying burrito brothers. google search". Saying that would be a pretty fast way of opening your web browser, typing "google.com" and then typing "flying burrito brothers" and then clicking the "Google Search" button. Replace "Google Search" with hitting the enter key and even faster.

But as I said, it's a chicken and the egg thing. Software doesn't support it because there's no demand and there's no demand because people haven't really experienced software that supports it.

Another issue (and I'm sure this has been mentioned by others), is background noise. I like to listen to music or watch TV while I work. Those don't mix well with voice recognition, at least not at the volumes I listen to them. Until voice recognition can get around that and recognize my voice amidst background noise and do it accurately AND software out there generally supports it, it's not going to go mainstream.

Re:Speech Recognition is more than dictation by winwar · 2007-05-19 00:23 · Score: 1

"Another issue (and I'm sure this has been mentioned by others), is background noise."

I currently use voice recognition in a warehouse-and I would love to have only a 5% error rate. Considering the limited commands used, the template set up time, etc, it sucks a great deal. As you noted, once you introduce background noise, it really sucks. But it also fails in quiet environments. It regulary fails to pick up commands, confuses them (two and zero, eight and vehicle horns), and picks up other users.

Did I mention it sucks?

But it is slightly faster, so we are stuck with it. Just waiting for someone to "accidently" run over the voice unit with a lift....)
Re:Speech Recognition is more than dictation by PPH · 2007-05-19 05:38 · Score: 1

But it is slightly faster, so we are stuck with it. Just waiting for someone to "accidently" run over the voice unit with a lift....)

"I kept yelling 'Get out of the way!!!' but it just sat there."

--
Have gnu, will travel.

Maybe... by theshowmecanuck · 2007-05-18 12:48 · Score: 1

It depends on what you mean.

--
-- I ignore anonymous replies to my comments and postings.

Re:Maybe... by RespekMyAthorati · 2007-05-19 09:44 · Score: 1

Ah yes! The one and only good thing I remember about UBC!

Elevators... by HobophobE · 2007-05-18 12:56 · Score: 2, Funny

I'm still waiting for speech recognition to come to our elevators so I don't have to touch the dirty buttons.

Also so I can pretend I'm on the Enterprise.

--

-HobophobE
Nothing laughs forever.

When the software's history involves jail terms... by Futurepower(R) · 2007-05-18 13:02 · Score: 1

Interesting sociological phenomenon: When the first 30 or 40 comments in a Slashdot story are jokes or attempts at jokes, that is an indication that Slashdot readers don't believe something in the story. It's surprising to me that people don't just write:

FRAUD ALERT -- FRAUD ALERT -- FRAUD ALERT

This is what is apparently happening, in my opinion. First, speech recognition has gotten an extremely bad reputation for being worthless garbage, because it is worthless garbage.

A 0.5 percent recognition failure rate is enough to make speech recognition software worse than worthless. The reason is that speech recognition software never makes a spelling mistake. Instead, the mistakes are often extremely difficult to recognize, and sometimes change the meaning in subtle ways. That's partly because, when the software is confused, it tries to select something that is grammatically plausible.

The result is that it has become difficult to sell speech recognition software. A high enough percentage of people in the U.S. culture know that it isn't actually useful. The orginal owners of Dragon NaturallySpeaking sold the product to a company that sold it to the company that became Nuance, maybe because they felt the product was damaging the credibility of their trademarks.

Here is a quote from the story linked in the Slashdot story:

"In 1993 two executives from Kurzweill Applied Intelligence (which pioneered SR for the medical market) went to prison for faking sales. That firm was sold in 1997 to a Belgium SR firm, Lernout and Hauspie (L&H), which was reporting phenomenal sales growth at the time. Dragon Systems, which originated DNS that year, was reporting only anemic growth, and L&H had no trouble acquiring Dragon Systems in early 2000 in a stock deal. Within a year a series of accounting frauds came to light and L&H collapsed into bankruptcy. Its SR technology was sold in late 2001 to ScanSoft Inc., which kept the DNS line going. (It was then at Version 6.0.) ScanSoft later acquired Nuance and adopted its name.

"Thereafter, "It was with the launch of Version 8.0 (in November 2004) that the market became reinvigorated and took off," said Chris Strammiello, director of product management at Nuance. "We crossed an invisible line with Version 8.0, where the software actually delivered on its promises and offered real utility for the users. Sales have been growing at a rate of 30% yearly since then, except that we expect it to do better than 30% this year."

Read that again: "... the software actually delivered on its promises and offered real utility..." I called Nuance and was told that version 8 did not have a new recognition engine, but only had improvements in the user interface. A friend who owns and tested version 8 told me he could see no difference in accuracy between that and version 7.

So, in my opinion, Nuance has done the common deceitful things that are called "Marketing":

1) Bring out new versions. Previously, when there has been a "new version" of Dragon NaturallySpeaking, I call Nuance technical support and ask if there is a new recognition engine. I didn't call for version 9, but for the last two versions they have said no. So, nothing is changed; the software is still useless, in spite of the fact that they always advertise that the software is now more accurate.

How is it possible that the software is more accurate, if the recognition engine did not change? Maybe it isn't true. Or maybe the company improved the guesses the software makes when the software really has no clue what the user said. Those guesses have become so sophisticated that you can become confused about what you actually said, and you have to spend time re-creating your ideas. If you are saying simple things about a simple subject, this is not as much of problem as when you are writing about contract negotiations, for example.

In the words of a Slashdot reader: "The opinions expressed h

charlie india tango alpha tango india oscar nov? by tepples · 2007-05-18 13:10 · Score: 1

The NATO phonetic alphabet always puts the stress on the first ejective consonant. Citation needed so that I can update wiki. I don't hear Q pronounced "KEB ick" either.

Until we get hard ai along with it no. by otomo_1001 · 2007-05-18 13:16 · Score: 3, Interesting

I mean really, until I can say to my computer things like:

Find all mp3's that were created by Trent Reznor and pipe them to /dev/audio on the neighbors computer. What use will it be?

I can't program in it can I?

if(i_can_write_code_I_mean_speak_code_to_the_compu ter() == true) then
i_might_use_it_a_bit();
else
system("find /music -type f -name \"*trent*reznor*\" | xargs -t cat - | ssh hackeduser@neighborcomputer \"cat - > /dev/audio\"");
endif

But that is just me.

Re:Until we get hard ai along with it no. by Anonymous Coward · 2007-05-18 16:41 · Score: 0

perl coding works fine with it http://youtube.com/watch?v=KyLqUf4cdwc

Amazing: Admission of fraud from a Nuance manager. by Futurepower(R) · 2007-05-18 13:27 · Score: 1

Read that statement again from Chris Strammiello, director of product management at Nuance:

"We crossed an invisible line with Version 8.0, where the software actually delivered on its promises and offered real utility for the users."

Nuance owned the software when it was sold as version 7, and before. So, Mr. Strammiello is apparently saying that his company was knowingly involved in fraud, but, don't worry, now the company is honest.

Wow. Sometimes people lie so much that they have no idea how what they say sounds to others.

Basically, the article linked by Slashdot says that the first and second owners of the software were convicted of crimes, and seems to say that the present owner is also guilty of fraud.

At least, that's how it sounds to me.

Marketing is meant to be methods whereby a company makes healthy connections. However, most marketing people seem to think that marketing means lying. And, when they sink the company, they just get a job somewhere else.

+5 insightful? Are you mods kidding? by doug141 · 2007-05-18 14:00 · Score: 1

Even the most successful technologies have some problems associated with them. Nothing parent mentioned is even remotely close to show stopper.

Type Faster? by cbeley · 2007-05-18 14:41 · Score: 2, Interesting

Do we really need this? All this is good for is for the people who can't type 100wpm with reasonable accuracy. I don't think I would be able to speak much faster (at a normal speed) any faster than I could type. Plus, I only think so fast. So...Everyone should learn to type at 100wpm and the problem is solved. Also, who wants to hear a bunch of chatering at the library with people "typing" on the computers verses very loud obnoxiouse 100wpm typing sounds that make the people typing at 40wpm drop their jaws.

Re:Type Faster? by julesh · 2007-05-18 22:41 · Score: 1

All this is good for is for the people who can't type 100wpm with reasonable accuracy. I don't think I would be able to speak much faster (at a normal speed) any faster than I could type. Plus, I only think so fast. So...Everyone should learn to type at 100wpm and the problem is solved.

As I understand it, typical speaking rate is about 150WPM. With 95% accuracy, you'll have to make about 8 corrections per minute, bringing it down to about 130WPM. You'll also probably find you can think faster when you're speaking your thoughts rather than typing them, as it's a more natural action to us. Unless you spend a *lot* of time thinking & typing. Most novelists I've heard talking about speech recognition find it more convenient, for instance, and they type a lot more than most people do.

For what? by Dun+Malg · 2007-05-18 14:52 · Score: 1

Is Speech Recognition Finally 'Good Enough'? Good enough for what? Good enough for the same jackasses who constantly assault us with half of their cell phone conversation to now share their latest memo to the sales department as well?

--
If a job's not worth doing, it's not worth doing right.

Get a human by symbolset · 2007-05-18 15:34 · Score: 1

If you can't call the number from the "Get Human" database at get human, AT&T usually gets you a human if you press 0 repeatedly, ignoring prompts.

Sometimes silence works. Try not saying or pressing anything.

Has anybody tried using a dialup botnet to DOS an 800-helpline yet? That would be an interesting comment on the lousy phone help we get from offshore.

If we wrote how we talked voice recognition might be helpful. We don't. Written communication is different and should be.

--
Help stamp out iliturcy.

Can't compare % accuracies by swordgeek · 2007-05-18 15:43 · Score: 1

Other's here have commented on the nature of mistakes in a person's typing vs. errors from speech recognition. I'd just like to point out that 95% is a (current) technical limit and nearly constant, regardless of the speed of the program, whereas personal typing accuracy can be improved by practice and slowing down.

--

"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban

You're joking but... by thepotoo · 2007-05-18 16:26 · Score: 3, Informative

You hit the nail on the head with that one. My sister uses Dragon Speak Naturally exclusively (she's dyslexic and can't type or read worth crap, so she has to use Dragon Speak Naturally and Kurzweil (screen reader).

Dragon requires MONTHS of training (literally), and even then it will make mistakes exactly like the one you noted. The plus side is that Dragon works pretty decently under WINE, but apart from their Linux "support", it's a complete mess.

Screen readers aren't much better; they have the accuracy, but are hard to understand.
For a little geeky fun, I had Kurzweil read a few English papers to Dragon. Even after some training, Dragon still couldn't get above 80% accuracy on a computer generated, 100% reliable, voice. Now that's just sad.

--
Obligatory Soundbite Catchphrase

Speaking and writing are not by KidGeezer · 2007-05-18 16:55 · Score: 1

necessarily the same thing nor do they represent the same cognitive process. I can see where good speech to text software would be a useful tool but hardly a replacement or even mainstay. What's with this techno-yearning for a return to pre-literate oral culture? Hell, just dispense with the to-text part and post/publish/purvey the sound file. I find enough difference between handwriting words and typing them as far as my cognitive processes go. Yes, sometimes I speak something very eloquently, clearly and yet complexly as the situation demands and it would be nice to have that captured instantly. But the famously witty Algonquin Roundtable conversations transcribed to print as banal and stilted. The several posters who have pointed out the difference between typos and transcription errors (former are much easier to rectify on the fly)are right on the mark. The Henry Higgins posters are really engaged in wishful thinking.

"Japanese" English - Wasei Eigo by cadu · 2007-05-18 17:41 · Score: 1

(i'm a brazillian living in japan)

Well, that remembers me of a day i was with my friend, and he was all smiles showing all the bells and whistles of his new car, a sound system with dvd/tv/radio/mp3 with hard drive included, everything.... and then he said : "you that speak english, try the voice recognition stuff, amazing"

I popped in a DVD and after some moments said "next chapter" (paying close attention to the pronunciation), the thing answered me "Boryumu Appu" (Volume UP), really ...

Said it again with a higher tone and it warned me it was going into "chivii moodu" (tv mode)....

At last i went back to dvd mode and said "nekisuto chaputaa" ... can you tell me what it did?

Turned Off.

ps: one could argue that poster's pronunciation is really **cked up, your mileage may vary (but i think it was okay :)

I doubt you know better than Nuance by Anonymous Coward · 2007-05-18 17:58 · Score: 0

I work in Nuance Support. I would really like to know which financial customer you are with and which product you are using. Perhaps, we, Nuance Support, might want to keep in mind what you think of us and what you think of yourself with regards to knowledge of speech recognition, when we deal with support incidents from your company.

We in Support, from time to time, come across folks like you who think they just know better and have a too shallow mind to be aware of how things around them actually work. Here is a good opportunity for me to explain to people like you how it works since I wouldn't want to do that in a normal professional engagement for obvious reasons.

It is entirely conceivable that you know *a particular aspect* of a product better than a given person working in Nuance Support. And that shouldn't come as a surprise. You are probably a dude working on a speech reco project spending your entire working day, five days a week, on a specific area of a speech application and focusing on a particular problem. Whereas a tech support engineer needs to support literally dozen of products (not taking into account the different versions). We can't possibly know every single aspect of every product like the back of our hand. On every single day, we had to juggle a handful of different incidents and satisfy as many individual customers. Often, we can't even afford to spend more than half a day on the troubleshooting of an issue. I would really like to see you in our shoes. We don't have the luxury of having entire days and weeks for testing the ins and outs of a product. And this should not be the purpose of our function either. Our job is to facilitate the resolution of a problem; we try to understand the problem and troubleshoot it; we are not a walking encyclopedia of speech reco or of related technologies (MRCP, VoiceXML, SSML, NLSML and all such standards; though we need to have a working knowledge of those), nor are we the human form of the product documentation. That's why we have engineers, developers and scientists helping us behind the scenes if digging really deep was needed.

Speaking of which, I would be curious to see how your developer's knowledge of speech reco stacks up against Nuance software engineers and R&D scientists. They are the ones whose knowledge of the specifics and the cutting-edge is representative of Nuance's expertise in speech reco. They are the ones you should look at when judging how much innovation Nuance could deliver.

How to get out of AT&T's crappy menu by Fred+Ferrigno · 2007-05-18 19:32 · Score: 1

At any time, at any point in the menu, say "Operator" or hit 0. You will hear:

"Operator"
"I'm sorry, I didn't understand that."
"Operator"
"It sounds like you want to talk to an operator, is that correct?"
"Operator"
"Connecting..."

That I know that exact script should be some indication how much time I've spent on that line over the past few years...
Even if you do go through the menu and enter in your info, it'll eventually send you to an operator, who will ask you to repeat it all anyway.

Try this by Anonymous Coward · 2007-05-18 20:01 · Score: 0

start command
format c colon slash autotest

speech recognition finally good enough? by Anonymous Coward · 2007-05-18 20:39 · Score: 0

know

good enough... for subbing? by Anonymous Coward · 2007-05-18 22:26 · Score: 0

It would be so great if you could run a movie file through speech recognition and get subtitles. Troops of sub scene monkeys are still spending hours transcribing and synchronizing. It's so medieval. And English subtitles are usually the last ones to be released, if ever.

Sometimes I get the impression that "good enough" speech recognition exists, it's just not publicly available. Of course the military are using it (Echelon), but also companies: Some captions of Google keynote videos have very strange errors that don't look human (misheard words that are totally out of context). In other words, it looks like a computer did the captions and no human corrected them, and the overall error rate is impressingly small.

I wouldn't trust any computer to translate them, though-- we all know what comes out of that. A real babelfish is still decades away.

Re:Hmmm....a typo on almost every line by Anonymous Coward · 2007-05-18 22:40 · Score: 0

Not so would a secretary missing one word out of every twenty get fired.

If the recipient was Kilvichan and and a letter was sent to Kivlichan because it was felt by her boss that this was the correct spelling she would keep her job. But the company would lose a customer.

In fact some Scottish secretary was sacked for correctly spelling Urquhart as Urquhart!

comparing apples and oranges by argStyopa · 2007-05-19 00:40 · Score: 1

"speech recognition software can keep up with someone speaking at 160 wpm."
Well...is that really true?

As I type, I edit. I finish a sentence, and I edit. WITH EDITING TIME, I'm around 30-40 wpm (I probably type 2x-2.25x as fast).

Now, speechrec is 'keeping up with' someone at 160 wpm. Then, that person has to either
a) stop, comma, go back, del del del (or whatever the 'word commands' are for editing, or
b) manually re-read and edit the entire paper.

Personally, when I've used recent speechrec software, I find that the ACTUAL, real-world rate is well below 10 wpm to create a 'final draft'.

Don't get me wrong, I think speechrec is *great* for getting words & thoughts down on paper quickly, without interrupting a stream of consciousness musing. But as a tool for generating "final" quality text? Um...not yet.

--
-Styopa

Jose Tequila who can't speak by mosel-saar-ruwer · 2007-05-19 02:22 · Score: 1

making a computer understand Bubba Sixpack who can't type

Dude, your clichés are like so Eisenhower Administration.

This is Bush 43; get with the times.

Yeh wanna salsa weedat, señor?

i think.. by Anonymous Coward · 2007-05-19 03:33 · Score: 0

english is a damn crap

I have only one thing to say on this topic .... by PPH · 2007-05-19 05:27 · Score: 1

... Format C colon enter.

--
Have gnu, will travel.

Privacy would be my holdback by bjdevil66 · 2007-05-19 05:49 · Score: 1

This tech would be handy and I'd probably use it for many things. However, it won't ever easily replace the keyboard for all of my tasks. If I'm sitting in a cube or some other public place, the anonymity a keyboard provides in invaluable. It's not like typing is totally secure in a strict sense, but if I'm chatting with my wife about what I need to bring home after work, I don't need the world hearing me ask about what kind of toilet paper, feminine pads, gossip rag, or porno to pick up.

Re:Hmmm..../Ahoom-NO WAY, no recognition comprehen by usciiiiii · 2007-05-19 06:05 · Score: 1

Lets look at usciiiiii Universal Natural Speech & Logic solution, responding and operational in any personal voiceprint, any language and by natural logic syntexting of syntax - so that machines can sing and think - and not only fail to read or listen to us. To get rid of the keyboard and mouse, and have an Echo-Logic Machine, find out from the inventor's website @ http://www.islandnet.com/~surfins/TestSpace/usciii iii_.html ASCII can not deliver Automatic Natural Language Comprehension in any natural language, so all must be upgraded to my proposed Operating FONTS for EchoLogical Machines - if you are waiting for the personal wizened secretary you can automate to run your business and its nomadic manufacturing plants, robots or systems.

--
inventor@prepatent.org

There are TWO products on the market. by notthepainter · 2007-05-19 06:27 · Score: 1

Don't forget about iListen, the Macintosh only product from MacSpeech. http://www.macspeech.com/.

I was one of the founders so I have a bias opinion. iListen uses the Philip's speech, not the Dragon or IBM engine.

Re:sierra lima alpha sierra hotel delta oscar tang by typicallyterrific · 2007-05-19 08:22 · Score: 1

What happens when you genuinely want to say, "I was in a Hotel in Lima with a beautiful view of the Delta dancing Tango with a guy named Oscar"?

Re:Hmmm..../Ahoom-NO WAY, no recognition comprehen by RespekMyAthorati · 2007-05-19 09:42 · Score: 1

So just what system did you use to dictate this post? Whatever it was, it sucks!

Re:sierra lima alpha sierra hotel delta oscar tang by tepples · 2007-05-19 09:52 · Score: 1

What happens when you genuinely want to say, "I was in a Hotel in Lima with a beautiful view of the Delta dancing Tango with a guy named Oscar"?

The trigram matching code in existing dictation products can already distinguish "see" from "sea" from "C". So you'll probably need two or three consecutive NATO letters to trigger NATO mode.

hotel in Lima -> "hotel in Lima" but hotel lima -> "hl"
delta dancing tango -> "delta dancing tango" but delta tango -> "dt"

I use it on my PDA by Podcaster · 2007-05-19 10:57 · Score: 1, Insightful

I'm using the Nuance voice recorder on my PDA to record dictation, and I've been training Dragon Naturally Speaking 9 to recognise my voice and convert it into text. When I get home I upload the voice files into the desktop computer and it crunches away for an hour running the DNS language recognition engine to turn my speech into text.

I have used older versions of DNS in the past and the current version is a massive improvement. Basically, if you decide that it is worth spending dozens of hours training the software in order to get reasonably accurate transcriptions then I recommend the product. Make sure to always use the same microphone/headset when recording on your PDA and you'll get great results.

If your need for accuracy is high, or you have alternatives to recording dictation, then DNS is still probably not for you. Also note that it is still a very frustrating experience to train DNS for non-american accents. There are at least a few reasonably common words that seem to be simply untrainable using the Australian language model for example.

-P

--
Be my friend.

Re:Hmmm..../Ahoom-NO WAY, no recognition comprehen by usciiiiii · 2007-05-19 13:16 · Score: 1

sorry your wisdom is so poor, you must be a sucker but we do not suck, shame on your stupidity in public also!

--
inventor@prepatent.org

Is that an educated opinion? by Anonymous Coward · 2007-05-19 13:52 · Score: 0

Do some research and you'll probably learn that software-based speech dictation is used by medical staff and law professionals for years now. If it's not ready, why do you think hundreds of thousands of people bought Dragon Naturally Speaking? You think all of them got the software just for its "geekyness" or "cool" factor?

Slashdotters should know better by dysonlu · 2007-05-19 14:17 · Score: 1

I have not seen any "insightful" comment mentioning speech recognition or dictation for the mobile world. That's where the Big Bang for the technology will come from. Unless you're a mobile device user too hard-wired to thumbtyping, speech is the most efficient way for entering information in most situations.

Learning Droidspeak by Anonymous Coward · 2007-05-20 06:49 · Score: 0

The secret key word is "REPRESENTATIVE".

At least, I have usually found that that is what works. Just keep saying it once, loud and slow and emphatic, in answer to every question the droid throws at you.

With all the AI's I have talked to, they eventually get the picture.

Define good enough by bandmassa · 2007-05-21 10:08 · Score: 1

About 6 years ago, I interviewed a young woman for a video I was making who hard a central nervous system disease that caused her enourmous pain with fine-motor movements. Basically, typing for her was like having her hands attacked by bees. To complete her degree, she had to use speech recognition software. She did her degree in the late 90s! Speech recognition was obviously "good enough" for her 10 years ago. Of course, if you're talking about using SR as a form of voice compression algorithm, turning voice into text, transmitting it and reconstructing it as "voice" at the other end, no, it's not "good enough" yet. It's all relative. I'm sure, faced with the young lady in question's dilemma, you'd probably choose SR technology, even at gen 1 level, rather than be unable to use /.

--
"I hope you like Guinness, Sir. I find it a refreshing substitute for, er... food." Col. Jack O'Neil, SG-1

Slashdot Mirror

Is Speech Recognition Finally 'Good Enough'?

313 comments