Is Speech Recognition Finally 'Good Enough'?
jcatcw writes "Speech recognition software is fast, but it still may not be accurate enough. Clerical jobs usually ask for 40 wpm, but speech recognition software can keep up with someone speaking at 160 wpm. In Lamont Wood's demo it did very well at too/two/to and which/witch, but will it still render 'I really admire your analysis' as "I really admire urinalysis'? At 95% accuracy, people aren't jumping on the bandwagon. Wood's typing speed is about 60 wpm with 93% accuracy, so he found that using speech recognition was about twice as fast as typing. Those who type at hunt-and-peck speeds will experience results that are even more dramatic. There's really only one product on the US market: Dragon NaturallySpeaking from Nuance Communications. The free versions from Microsoft aren't up to the task and IBM sold ViaVoice to Nuance, where it's treated as an entry-level product."
Is spinachry ignition rivaly gooery stuff? What the hell are you talking about?
As a foreigner it is really hard to get the pronounciation right enough.
Also command execution by others in the room is a problem.
How about listening to music, or TV, and having the computer interpreting it.
If you mod this up, your slashdot background will turn into a beautiful sunset!
Dear aunt, let's set so double the killer delete select all.
I wonder if I use bold in my signature, people will notice my posts.
I'm using it now so double delete the killer select all.
In fact, I'm using it to write this Dear aunt, let's set so double the killer delete select all
For typing up an inter-office memo in Word, most likely. But I'm a programmer, and I can barely read out loud some perfectly fine code, I can't imagine trying to enter it all with voice recognition, no matter how good it gets.
Curiosity was framed, Ignorance killed the cat.
Speech recognition, handwriting recognition, species recognition... all of these suck, and will CONTINUE to suck, until strong AI is developed.
And by that time, there will be a lot more important problems to worry about than making a computer understand Bubba Sixpack who can't type-- such as keeping the robots from taking over the planet in a bloody war.
With spending like this, exactly what are "conservatives" conserving?
I use it myself. It's wonder full. delete that. delete that. delete that. double the killer delete select all
Try it sometime.
Trolling is a art,
"Set v underscore tab equals space parenthesis parenthesis x minus lev schema dot all recs concatenate..."
It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
Dear aunt, let's set so double the killer delete select all.
Well, there's spam egg sausage and spam, that's not got much spam in it.
With some of the stuff that I see on the Internet (websites and blogs etc.) I'd have to say that the urinalysis gaff isn't really all that bad.
The only place that speech recognition really annoys me is phone answering systems. They are not competent enough to let you concatenate menu item options and make an intelligent choice as to which phone queue to put you in. For example:
"I have trouble with my cable modem dropping packets" is a statement that 'SHOULD' get you put through to the second tier support line... but no, you have to go through 3 or more menu choices and still only get to talk to the scripted low wage 1st tier support.
Support NYCountryLawyer RIAA vs People
For those of us with serious RSI and who program/sys admin for a living, are there any serious attempts at voice recognition out there? Specifically, have there been any breakthroughs with speech -> symbol names or obscure shell commands?
TFA mentions that many people stop using speech recognition software because of poor accuracy. I don't think that's the major reason. I think they start using it because it's a neat idea that seems to have a lot of promise, but quickly realize there are only a few situations where it's actually helpful. The end of the article mentions rough drafts; I'd also say it might be a decent choice
For the majority of office tasks, it just isn't a good fit.
So if the "good enough" is being useful in any way whatsoever, it sounds like we're almost there.
I remember using M$'s speech recognition engine (the version that comes with Office 2k3) to prototype a training program. It was designed to teach radio protocol. And actually, it worked very well. It helped that we had a very limited vocabulary, and even more constricted sentence construction.
Your ad here. Ask me how!
O'RLY?
I type pretty fast: somewhere around 60 WPM. I do tend to mistype, lowering my speed, but at the same time when I mistype I know I mistype: I can "feel" that my fingers are not moving as they should. With speech recognition, you'll have to be looking at the screen to find mistypes, and then you'll have to do something to retype them, but it'll probably take a while. And because of the lag, people will tend to talk slower so that it can "keep up" and they can prevent the words on the screen from getting too out of sync with their train of thought.
Speech Recognition: It's probably good enough for an IM conversation, but a copyeditor's nightmare.
I used to work for a company that has the words "new directions" in their name. When I told people where I worked I would make a rather long pause between the "new" and "directions" so as not to sound like I was saying something else. I wonder how this software would render it...
I'm using Dragon NaturallySpeaking. Right now, as I write this calm it, comet, post, and it sure as hacking beats typing.
Actually, I am using Dragon NaturallySpeaking right now, and it works very well. It actually works better if you speak quickly (as you normally would) and it's pretty good at inserting grammar along the way. I have bilateral tendinitis, and the software has been a godsend for me. I was even able to finish writing my book, a task that was becoming just too painful typing manually.
Oh, and you are probably wondering how long it takes to train the software? About a half an hour, and I find the accuracy at around 95%.
SEO Copywriter. Just Say ON
Its good enough for comercial warehouse applications e.g. the vocollect and voxwares of the world
I work on IVR systems for clinical research and medical screening (along with a huge variety of other things we make these systems do). And it's pretty good. We do a lot of work massaging the Grammars to make the system more accurate though, and we have a lot of extra logic built in for situations where we can predict values and assign weights to different words. But the one thing that rather annoys me is that I quite often have issues with Skype's quality just being a bit to low for the system to pull off. I use Skype to dial in so I don't have to take my hands off the keyboard/mouse for testing (or deal with the phone in general). I would guess about 1 in 5 questions I have to repeat or wait for a reprompt because of an audible glitch from the VoIP connection.
;)
All in all though, I'm rather impressed with the functionality and accuracy we do have. I'm not sure it will take over in many places though because of the error rate on free-formed text and the volume levels. My old cube-farm was noisy enough with everyone typing, I even can't imagine it with everyone trying to talk to their computers and hoping the noise filters would pick out their voice correctly. I've got a nice closed of office to work in now, so no one has to hear me yell "Invalid selection my ^%#!" at my computer
-Rick
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs
English is the only language I speak and I still think it's stupid. But if you pronounce 'your' correctly it doesn't sound like "yur", which is what the beginning of urinalysis sounds like. 99% of the time the problem is improper pronounciation.
And no, accents are no fucking excuse. I'm sorry you grew up around people who can't pronounce words properly... But you should really learn to pronounce the words correctly so that people outside of your inbred birthplace will understand you.
Once I had a Texan share an anecdote with me about an even sillier-sounding Texan who pronounced "oil wells" as "owl whales". I don't think speech recognition software will figure that out, either.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
95 percent is pretty good, only one word in twenty. I wouldn't have a problem with a 5% error ate.
For command control of a system where we need both hands free. It's pretty good, much better than stopping and typing, clicking or pressing buttons during a repetitive manual process.
We're using an older version of Microsoft's product and it seems the microphone quality is important.
wreck a nice beach??
None of them can see the clouds; The polished wings don't care.
For some reason even time this topic comes up the focus seems to shift word-processor type use.
What about simpler uses? How many basic tasks in the car require you to take your hands off the steering wheel? I'd like to see the basic functionality of the remote control mirrored in speech recognition. Things like stop/pause/increase/skip.
I'd imagine once this kind of simple recognition became common over-all speech recognition would (more) rapidly evolve.
Quack, quack.
They use it on TV all the time for subtitles, and practically every sentence has a mistake. It's finally "usable" or "worth taking seriously", but "good enough" implies, to me, that no further improvements are required, and I don't agree with that.
Press or say one to speak with a representative in english...
One
When you hear the option you are calling about you may say it at any time. If you are calling about a billing problem, say billing. If you are calling about a technical issue, say technical. If you are calling about new service, say new customer. If you are...
Billing
I'm sorry, that is not an option. When you hear the option you are calling about you may say it at any time. If you are calling about a billing problem, say billing. If you are calling about a technical issue, say technical. If you are calling about new...
Billing!
I'm sorry, that is not an option. When you hear the option...
Billing billing billing!
I'm sorry, that is not an option. When you...
Fuck you! Give me a human! Human human human!
I'm sorry, that is not an option. When you hear the option...
..to see the software discern between two different voices when typing up a document.
The only problem I see here, is people becoming too dependant on the software. Terms like urinalysis might become something we will automatically associate with your analysis, people will get lazier and lazier, as if we aren't enough already.
FailWare. Heh, I just thought of that term. Google only returned 77 hits, so I guess I almost coined it.
Instead of asking if speech recognition is "good enough", maybe we should be asking whether or not it's actually useful for anything in the first place. I mean, is it good enough... to do what?
Can you imagine being in a cubicle farm full of people talking to their computers? Or trying to talk to your computer on the bus? You have to imagine that as computers become more ubiquitous, input methods will have to adjust alongside, and I simply can't see (or hear) speech recognition doing that very well.
What is is all that is. Isn't that obvious?
Everybody has an accent. (Ask a linguist.) Basically, it sounds like you just want everybody to have the same accent that you do. Good luck.
I play the language hat card!
...so everyone will talk all the time, half the work population will go postal and the other half will get offices. Also one thing that I notice is that I rarely get everything right the first time, I go back to add a sentence or use copy-paste quite a bit. It's really much easier to do that with your fingers without losing the "verbal" line of thought. And all the applications where it makes much more sense to use the UI than trying to talk your way through commands, voice commands get a bit like the ocmmand line, you have to memorize a lot to use it at a decent pace. That limits its use to a very few select situations for me, not hardly enough to be worth it.
Live today, because you never know what tomorrow brings
I am presently a financial customer of an enterprise speech recognition product that Nuance offers. For several years now, the speech recognition software industry has been under consolidation, with Nuance buying a few different competitors and technologies. Most recently, this dance has continued with Nuance being acquired by ScanSoft, a company known for specializing in type recognition.
Nuance support is marginal at best, and through all the consolidations, understanding even within their own company of how the product works is quite lacking. We have found our own developers often times educating the Nuance support folks in various aspects of how the product is working, and then inquiring as to whether this is intended behavior or not. Crickets can often be heard finishing these types of conversations. We normally would have moved to another product under these conditions, but simply put - Nuance acquired what little was left, and now has no competition in the market. Competition is what spurs innovation, and so with the continued consolidation, it is hard to see significant advances in the technology without free help from academia.
If you think the Microsoft monopoly is bad, imagine if they absorbed Apple and somehow took over Linux leaving you with a few "choices", but all under the Microsoft moniker. The technology is very neat and the enterprise level products do some basic things quite well, but there is still some glaring room for innovation that I don't expect anytime soon under present industry conditions.
Several years ago, I saw a court reporter using a speech recognition system with his laptop. The microphone actually looked like some sort of breathing apparatus, as it fit snugly over his mouth and nose with the wires in a tube running down to the laptop.
Judging by this and this, I would say its not even close.
Looks like it makes for good jokes.
Seriously, the only things speech recognition is good for are bulk text entry and simple navigation. I imagine trying to use voice commands to operate modern software would be similar to letting my four-year-old help make pancakes — yes, it gets done, but it's so much easier and faster to just do it yourself. Imagine trying to edit a document using just voice commands. Is your WP going to be smart enough you can tell it "find all occurrences of 'scum-sucking bottom feeders' and replace it with 'esteemed colleagues'". Or are you going to have to say "Find. Scum hyphen sucking bottom feeders. Tab. Esteemed colleagues. Replace all." Face it, GUIs have rendered speech recognition for command and navigation moot. Most operations you perform don't have a verbal description, or at least not one that is quicker to say than to do.
I also can't imagine it'd be that useful for actually writing things. I don't think I'm the only one who revises as they write. I think I actually write better when I write things out by hand, because it's slower so I tend to think my phrasing and sentence structure through more before I commit anything to paper. If I could suddenly type two or three times faster, I think it'd probably make my text even more incomprehensible than it usually is...
Just junk food for thought...
are their any open source speech recognition word checking out? (as a coder I would love to have a library to play around with).
Yeah I could use google, but then you wouldn't have a chance of making the lists of links and get modded +5.
relying on karma whores since '07
The message that people *should* be learning from the less-than-perfect transcription of speech recognition software, such as misunderstanding "I really admire your analysis" as "I really admire urinalysis", is that it's finally time for people to learn to SPEAK as well as write proper English, as opposed to speaking in ebonics or text-speak or some other hard-to-transcribe dialect. "Your" pronounced as "ur" is pretty damned difficult to interpret, without resorting to contextual analysis... which of course is the ONLY reason we humans can still understand each other at all. Does the story of the Tower of Babel ring a bell?
a lot of dictionaries have NO pronunciation guide.. they just aren't english dictionaries
that is because, in many languages, a certain order of letters are always pronounced the same way.
Russian is one example..
every day http://en.wikipedia.org/wiki/Special:Random
This is really apples & oranges. The typist with 93% accuracy will produce a document with some typos, and I can tell you from years of reading /. that typos are easily "corrected" by the reader if the typist doesn't catch them. Even at that, spell checkers catch quite a few of them, too.
That's very different from "your analysis" turning into "urinalysis". Here, the spelling is correct but the words are completely wrong, and trying to figure out what is really meant will take a much longer reading of it.
To answer the question, it's not ready.
Do you have ESP?
About 5 years ago some manufacturer announced chips for under $5 that would do speaker-independent, limited vocabulary recognition and I predicted that there would be products appearing all over the place that would get rid of the crappy buttons and use speech as the interface. The only place I see it is in cell phones, and I always turn it off, because I don't want my cell phone surreptitiously calling someone while I am talking ABOUT them. Anyway, why hasn't the toy and gadget market latched onto speech input? It seems like those back massagers ought to be able to understand "Harder, ooh, harder, harder".
Intron: the portion of DNA which expresses nothing useful.
Yeth.
The Kruger Dunning explains most post on
I use the speech recognition on my BlackBerry Perl^H^Hrl^H^Harl all the time and it's "good enough".
The obscure we see eventually. The completely obvious, it seems, takes longer. - Edward R. Murrow
Google 411
Very intelligent, but isn't everything Google does?
If carrots got you drunk, rabbits would be fucked up. - Comedian Mitch Hedberg R.I.P. 03/30/68-2/24/05
sonme jackass tells non tech people to sue it to get tier 2 help.
Probably the same jackass that told people about the Internet.
The Kruger Dunning explains most post on
and trying out my best Scottish accent... Computer..... Computer......
Has ANYONE gotten this to work on System 6 for the Mac yet?
load "$",8,1
i bought dragon naturally speaking way back when my 100mhz pentium was still in the running. once i went through the "training" (reading several paragraphs out loud for the software to figure out speech patterns), i found it to be very good at figuring out what i said. eventually, i stopped using it because of the difficulty created by the environment and forgetting to "begin/end dictation" properly, as well as several other problems that could be overcome with some practice. for the generic user, this is a great idea, especially if they need to "type" many pages of text in simpler language. the real problems start to show up when trying to use lots of complex words. suffice to say, this is not software that an english, philosophy, poli-sci, chemistry, etc major should hope to use without lots of frustration.
That's a great one! Here are a few of *MY* favorites:
The translations:
These are all I can remember at the moment. I'd love to have more to add to my "funny file", please reply with your favorites!
People sometimes get upset when they can't give their new daughter the name that sounds like " She-thay-hahd", but is spelled Shithead.
I can imagine someone at a LAN party shouting:
"A deadlock has been reached. One task must die. We must now choose between murder and suicide."
Any speech recognition software worth the $ should be able to detect and translate NATO letter names: "hotel tango tango papá colon slash slash sierra leema alpha sierra hotel delta oscar tango dot org".
The holy grail for me has been software that deals with speech impediments. I stutter. I'm fluent enough that I function fairly well in real life (I'm a high school teacher), but speech recognition software has universally failed to meet my needs.
All of the words have too many s's.
I'm using Dragon NaturallySpeaking 9 right now. I've been using it for several months, and I have written a dozen articles on it. I think it works fantastic, but you definitely have to learn how to write all over again. Out of the box it trains extremely quickly, if you do not want to train it at all you can just start talking and it will eventually catch up with you. (Note it caught catch up and not ketchup) I started using it as a preventative means of avoiding repetitive stress injuries. I cannot use it to code, however I can definitely use it for my writing. Using Dragon NaturallySpeaking, I can easily push out five to 15,000 words a day. (notice it used the word five and then a number) Ultimately it provides you very accurate writing. It's almost impossible to have a spelling error, however word substitution errors are still very common. If you attempt to compare your typing accuracy versus your dictation accuracy, you will often see spelling errors in the typing and word substitution errors in the dictation. That means that when you go back and edit your own work you have to spend a good deal more time editing because you're not used to editing the type of dictation errors that you make because you have years of experience editing the normal types of spelling errors that you made. You also have to learn how to compose sentences by speaking as opposed to composing to your fingertips. This definitely exercises a different area of your brain and I'm sure you will find that you are not as good of a writer when you speak as you all are when you type. However with practice you can get up to speed dictating and you will then definitely benefit from the ability to type at 150 words a minute without breaking a sweat, stressing out your wrists, or even suffering from eyestrain. Dragon NaturallySpeaking definitely helps people to avoid eyestrain because you don't have to stay focused on the computer monitor while you're typing you can look around the room, or outside or anywhere. Touch diapers (s/b touch typers!) can do this also however good ergonomics dictates that you sit in positions that align your body correctly to avoid repetitive stress injuries and this includes pointing your face for words (forwards!) towards the computer screen. With Dragon NaturallySpeaking I can face in any direction I like in the program will keep up. Downside it does substitute words and on occasion it skips words entirely. I run at least a gigabyte of RAM in my computer and I was would suggest double that amount. Dragon NaturallySpeaking is a bit of a resource hog, however it's worth it and it's not as bad as Firefox. I should have purchased it years ago and definitely do not regret the purchase nor my new attempts to learn how to write all over again. I had to learn to write with pencil and paper, and then with pen and paper and then with a manual typewriter and then with an electric typewriter and then with my trs 80 and then a laptop and my treo and yada yada yada I can sure learn to do it with my voice.
why am I reminded of this:
The grim reaper is standing next to the grave markers of JFK, RFK, and John Jr. A voice from above yells "I said TED! TED Kennedy!"
there goes any karma I ever had any chance of having
The classic example used in my AI class to describe the problem of getting a computer AI to... recognize speech.
Though to me the problem with dictating text (the obvious use for speech recognition) is the need for some kind of escape for punctuation or program control. I mean, you can't just say "I went to the store period select all cut" because even assuming it recognizes all the words perfectly it wouldn't know if the "period" is supposed to be a word or punctuation, and either way it "assumes" you'd need a way to get the opposite behavior. Saying "escape" out loud constantly would be weird, espcially if you need the word escape "escape escape" or maybe you also need to invoke the escape key so you have "escape" and "escape escape" and "escape escape escape".
The enemies of Democracy are
I don't mind the errors, what I do mind is taking my time out to correct them.
While tying, if I make a typo or something - I either ignore the few wrong letters, correct them really fast (takes a second or two), or the spell checker does it for me. All in all, I am still concentrating on what I was doing.
I have tried Dragon Naturally Speaking ver 5, 7, and the latest one, 9sp1. It really has gotten better throughout the generations but when I dictate a document and something comes out bad - it's an entire word or phrase and I HAVE TO CORRECT that type of mistake. I can't ignore it - people can overlook spelling mistakes - they won't overlook silly phrases/words in between.
Then my concentration is knocked off the task as I am sitting there training the program. They could streamline this by seeing how you eventually corrected it and what you eventually type in and compare it to the program's first guess. Right now, they make selected the phrase, make you say pronounce the wrong guess and then the correct one. It's too time consuming.
Speech recognition is good though, to give your wrists a rest. But I find that typing shorter reports that are to the point work just as well.
With speech recognition, you'll have to be looking at the screen to find mistypes, and then you'll have to do something to retype them, but it'll probably take a while.
Well if you correct items as you go, yeah you will lose all your speed increase. For a speed increase you would have to dictate your entire document and go back and fix it. This will cut down speed if you are using it for emails, but for technical documents it can be a godsend (you have to go back and proof/edit them anyhow, so just fix issues at that time). It really depends on your use. For bashing out emails or code it would be much to slow.
I'd love to use speech recognition software at work, but I don't think my coworkers really want to hear me dictate software requirements to my computer. Maybe I'll switch to an old IBM keyboard first, then they'll be happy when I shift to speaking.
"If you are going through hell, keep going." - Winston Churchill
It's a coffee company, but it sounds like he's peddling homoerotic publications. *shudder*
When did the future switch from being a promise to a threat? -C. Palahniuk
That's exactly what I mean. ...
By hands I meant either hand. Conversational English.
Quack, quack.
Using speech recognition to perform any task is like explaining to a new employee how to do something - it's much quicker to do it yourself, but you kind of hope the employee/computer will learn and do it without prompting next time - except the computer doesn't learn like that...
For example, consider doing a google search by speech:
start (wait while it processes and does what you ask)
firefox (wait while it processes and does what you ask)
click address bar (wait while it processes and does what you ask)
spell w w w dot g o o g l e dot c o m (wait while it processes and does what you ask)
enter (wait while it processes and does what you ask)
(wait while it loads page)
type hello (wait while it processes and does what you ask)
click google search (wait while it processes and does what you ask)
now compare that to the 4 mouse clicks and 21 keystrokes (inside 5 secs) that'd take to do it yourself.
Here
He basically runs through most existing input devices including speech recognition and finds them wanting.2 7/#input-devices-1
He saves particular venom for speech recognition, particularly due to the difficult of finding homonym based errors in the transcribed text:
http://www.antipope.org/charlie/blog-old/2005/08/
2 moov 2 a fonetic langwedge
maybe not.
In the US: 1-800-GOOG-411
Try it out. It's actually okay!
I don't need speech recognition. But I really want command recognition - "open browser" - "close window" - "move window to left monitor" (I'm using a 3 monitor setup at work). Most of my mouse moves are to position the windows at the right monitors.
I found a program - "game commander" - that almost, but not completely can do this. It has very good recognition rate (it only has to recognize 10-20 simple commands). However, the commands can only be a sequence of keypresses, and that is not enough e.g. when moving the position of a window. Another annoyance is you have to "do" the keypress in the program, so I'm unable to enter "Alt+TAB" as a command...
Speech recognition is great, the Slashdot editors have been using it to enter stories for a long time, and it's clearly absolutely flawless.
If there is only one effective supplier, this is an ideal opportunity for FLOSS to innovate.
The lack of good speech recognition applications on Linux and the BSDs is a major barrier to corporate and government organisations switching to FLOSS. Frequently the main users of speech recognition use it for occupational health and safety reasons. Failing to provide the software the OH&S specialist recommended may be enough to scuttle a migration project.
I just used google's 411 service ( http://labs.google.com/goog411/ ) and it's speech recognition was great. I usually have a hard time with them because of my New Orleans dialect, but it understood every word, opposed to those that can't get the number you're saying ("I'm sorry, I didn't understand you, could you please repeat the WHOLE thing again!").
King of kings and Lord of lords
IBM's deal with Nuance was that they have the exclusive distribution rights for IBM's ViaVoice product. IBM still owns all the intellectual property and actively is developing its Speech Recognition technology, both as research projects like MASTOR as well as in product form as part of WebSphere Voice Server, which is an MRCP compliant Speech Recognizer that can be plugged into basically any standards-compliant thing that wants to utilize speech.
What did you eat today? http://www.atetoday.com/
Still owned him.
"I am using PowerScribe, which is a radiology speech dictation system. It is fairly accurate in the doming [domain] of medical transcription, and particularly in the doming of radiology, but it not very useful for free pexed [text] speech.
For example, there [here] is a sample of the typical chest report: Hazy groundglass opacities noted with both lungs, particularly the right middle lobe as well as the left lower lobe, with no evidence of effusion, pneumothorax, or consolidation. [this is pretty much verbatim what I said].
[But here's a free text example:] However, if a Type II right a regular letter to a friend, [if I try to type a regular...] for example setting the following, [for example, saying the following...] Yesterday was a very nice state [day]. The clots [clouds are] gone, and only a little brain [rain] remains. Today it is supposed to be even warmer outside, I think elbow [I'll go] injected [and check] with the right knob. [the weather right now]"
The biggest problem with this system, particularly for medical transcription purposes, is that it only gets about 95-97% right. That means, it's wrong at least 3% of the time. Worse yet, whenever it's not sure, it just inserts random garbage! Whatever the closest match is, which is often wrong, and sometimes fundamentally changes the meaning of what I intended.
Human transcriptionists, on the other hand, will insert a blank if they're not sure, to alert the dictating physician. This fscking system has no clue when it's wrong, which makes it very dangerous in my opinion!
Eye yam tie ping too ewe whiff ay news peach wreck ignition soft wear. Thistle bee duh firs sting eye half writ tin witted. A cording two these oft ware pack edge, disallows won two bee mower E fish sent win rioting and isle know lawn gear kneed a key bored. Sum pee pull say its snot vary ewes full, butt eye knead too due sum spearmint ting sew icon seafloor my shelf. Oops knot shelf. Delete lass toward. Know delete whirred self! Know dam mitt! Delete sin tense. Crop! Eye shooed B A bull 2 E raise win eye May comma stake U pizza sheet! Wearer thee inns truck shuns! I yam gun too err race these E male, putt these peace a ship inn a Bach sand sinned too duh come penny foray re funned. Naught sinned, send! ::RECEIVED SEND COMMAND:: MESSAGE SENT
Speech recognition is a lot better than it once was. It will require a lot of improvement to make it better... if you look at a human; even we have a lot of trouble understanding another accent. That pretty much means a paltry computer won't get it. However, a computer should be able to be more thorough, so same accent speech should be possible. However... it takes a person 5-10-20 years to master speech, so a computer needs alot of training, the method of training a computer is probably key. Not only listening to speech, but more advanced concepts like recognizing who is speaking so as to apply the patterns to that person. Also learning who is talking to the computer, telling my computer to turn on the TV might be easily confused with asking if my chocolate wiener is turning my paid gf on for some time till that is worked out. Personally I think prefixing everything with "COMPUTER, turn on that" is not good enough, it should be possible to work out context within the next 10 years with so many cores available.
At 95% accuracy, people aren't jumping on the bandwagon. Wood's typing speed is about 60 wpm with 93% accuracy, so he found that using speech recognition was about twice as fast as typing.
That isn't really meaningful. When I make a typographical error, it's usually a single letter oission/dupplication/trasnposition, which 1) is easy to address with a spell-checker (or even auto-correct for teh more common typos) and 2) is unlikely to interfere with understanding of the result. When speech recognition software makes an error, it tends to replace one or more words with one or more different words. If you correct those errors as you go, you're going to be starting and stopping, and if you go back to fix them later, you can't just catch them with a spell checker.
A meaningful comparison would require looking at teh typeso f errors made whiel typing, and the types of let's set so double the killer delete select all.
My server
"Instead of asking if speech recognition is "good enough", maybe we should be asking whether or not it's actually useful for anything in the first place. I mean, is it good enough... to do what?"
Uh, huh. So slashdot, what's so insightful about saying you don't even consider disabled people during your discussions?
If a job's not worth doing, it's not worth doing right.
Can Speech recognition do this one?
"The astronomer looks at Uranus. The doctor looks at your anus."
Is talking really less fatiguing than typing? And if everyone had to speak every word they entered into their computer the modern cubical workplace would be chaos.
Clearly voice recognition is extremely desirable in mobile situations such as with mobile phone commands and calling into phone centers.
Does it hurt to hear them lying? Was this the only world you had?
We were testing an edition of Dragon Naturally Speaking back in 2000, when an Asian-American woman on our team took the microphone. She had a heavy accent, and the software interpreted her words as... nothing.
She stood there, trying to get it to write something, and finally ended up repeating, "It not woking! Why it is not woking?"
We were afraid to laugh, fearing a trip to HR... we all stood there, biting the insides of our cheeks, until she gave up and left the room; then, we collapsed on the floor, literally ROTFL.
At 95% accuracy, people aren't jumping on the bandwagon. Wood's typing speed is about 60 wpm with 93% accuracy, so he found that using speech recognition was about twice as fast as typing.
The difference in stated accuracies of typing versus speech recognition is the fact that when you're typing, you can easily and readily catch your mistakes. If I type the wrong letter, more often than not, even before I finish pushing down on the wrong key I will realize it and reach for the backspace. However, in speech recognition, you must pay very close attention to how the software renders your speech on the screen and subsequently find a way to fix it. This can be done in two ways, either: by stopping speaking and manually editing it with your keyboard, or by remembering that it was mistaken and going back at the end to correct it (because no one wants to be stopped or slowed down in their train of thought, which is half the reason I choose typing over penmanship). Either way, you have the problem of a two-pass proofread in comparison (all our CS majors know) to the on-the-fly proofread.
In the end, however, if a document in really important you will proofread it multiple times. So it's all up to preference. Personally, I prefer typing, regardless of accuracy, simply because I can catch my own mistakes on the run and correct it immediately. I am an attention to detail kind of person in the first place.
My page.
Typos look wron, but speech recognition errors are more supple.
Oh, and 1 in 20 words being wrong is astonishingly bad if you've ever actually tried to use SR.
This is an AI problem of exactly the sort that humans excel at, and we still screw it up regularly. It's hilarious that people seem to have so little respect for the difficulty of the problem.
Assembly is the reverse of disassembly.
>>Is speech recognition 'good enough'?
No.
>>I'm sorry I did not understand your selection. Is speech recognition 'good enough'?
NO.
>>I'm sorry I did not understand your selection. Is speech recognition 'good enough'?
NO!
>>I'm sorry I did not understand your selection. Is speech recognition 'good enough'?
HOW ABOUT DIE!
>>Answer "yes" entered, is this correct?
No.
>>I'm sorry I did not understand your selection. Is this correct?
AJKFLSJFKSLFJSDKFDJSKSFDJK
>>Thank you for participating in our survey, goodbye.
Speech recognition has been at a standstill for years now, it's been "almost there now" for well over five years. As mentioned in other posts, there has been a lot of consolidation and that has really hurt growth. Lernout & Hauspie and Dragon were constantly going back and forth a few years ago trying to get a leg up on each other. When L&H got into all of their accounting problems and shut down, that left Dragon and IBM. IBM's product went to Scansoft and went to Nuance where it languishes until somebody pulls the plug (for example, if you call for support on ViaVoice and mention you have XP SP2, they will tell you it is not a supported platform).
Most of the improvement in the Dragon and ViaVoice over the last couple of years has been in the reduction of training required to get to the high-ninety's level of accuracy (assuming noise-cancelling mic in a quiet room and you do not have a cold/sore-throat). The advancements in training have not corresponded to much in the way of translation accuracy. A "trained" Dragon 7 recognizes speech pretty much as well as Dragon 9 (I haven't played with Dragon 10 yet).
Most of the real speech recognition advancement these days is focused on discrete word sets for voice mail trees and other interactive systems. When you are on the phone giving your credit card number, two/to/too is all the same thing. While speech recognition in its current incarnation is good for people who can't type (disabilities, carpal-tunnel, etc.) it is not a replacement for typing, and isn't any closer today than it was five years ago.
This is actually a really good case on patents, and how one company can purchase a couple patents and block out commercial market competition. Dragon/Nuance/.... has such a wonderful monopoly, and has noticably scaled back their release cycle since they achieved that Monopoloy (dragon 7 - dragon 8). Somehow I lack the faith to presume they've been working hard and are about to release anything worldchanging.
Speech recognition generally comes in two flavors: Command and Dictation. Most voice recognition engines can handle either, but the implementations are very different. Command mode is handled by providing a list of "command" words that are valid at any given point and operates much like a state machine. Dictation is a completely different beast and does a variety of things under the hood to increase accuracy.
"Good enough" is very vague as applied to voice recognition. For command stuff, "good enough" has been here for about 7+ years. Even MS's free engine does a great job at that.
I used Via Voice years ago and it worked pretty well. But here's the thing: Have you ever tried to dictate something? It's definitely a skill. I'm sure some people have a natural ability for it, but I certainly didn't. I tried dictating stuff and it's tough. You hit a pause mid-sentence trying to figure out how you want to phrase something and suddenly there's a period and you're beginning a new sentence. Try dictating several sentences of original material and keeping it going without pauses and "um"s and so forth and you'll see, it's not quite as easy as it seems. I suspect one of the reasons voice recognition hasn't been a hit, is that people don't expect that. They try it for a few days think, "Hell,it's easier just to type," and give up. That's why I don't use it for writing. I can type faster and more accurately than I can dictate. I'm sure if it's something I wanted to work on, I could develop the skill, but my point is, I think that's probably why a lot of people give up on it.
I honestly think that voice recognition in command mode could be really useful at speeding things up, if software were designed to take advantage of it. But it's not easy to add it as an afterthought and it adds significant work, even if it's done with forethought. It's a chicken and the egg thing. If a lot of software supported it, I think people would see a gain in productivity using whatever software they use daily. I don't mean just using voice recognition, but in combination with a mouse and keyboard. For example: "Execute Browser. google dot com. flying burrito brothers. google search". Saying that would be a pretty fast way of opening your web browser, typing "google.com" and then typing "flying burrito brothers" and then clicking the "Google Search" button. Replace "Google Search" with hitting the enter key and even faster.
But as I said, it's a chicken and the egg thing. Software doesn't support it because there's no demand and there's no demand because people haven't really experienced software that supports it.
Another issue (and I'm sure this has been mentioned by others), is background noise. I like to listen to music or watch TV while I work. Those don't mix well with voice recognition, at least not at the volumes I listen to them. Until voice recognition can get around that and recognize my voice amidst background noise and do it accurately AND software out there generally supports it, it's not going to go mainstream.
It depends on what you mean.
-- I ignore anonymous replies to my comments and postings.
I'm still waiting for speech recognition to come to our elevators so I don't have to touch the dirty buttons.
Also so I can pretend I'm on the Enterprise.
-HobophobE
Nothing laughs forever.
Interesting sociological phenomenon: When the first 30 or 40 comments in a Slashdot story are jokes or attempts at jokes, that is an indication that Slashdot readers don't believe something in the story. It's surprising to me that people don't just write:
FRAUD ALERT -- FRAUD ALERT -- FRAUD ALERT
This is what is apparently happening, in my opinion. First, speech recognition has gotten an extremely bad reputation for being worthless garbage, because it is worthless garbage.
A 0.5 percent recognition failure rate is enough to make speech recognition software worse than worthless. The reason is that speech recognition software never makes a spelling mistake. Instead, the mistakes are often extremely difficult to recognize, and sometimes change the meaning in subtle ways. That's partly because, when the software is confused, it tries to select something that is grammatically plausible.
The result is that it has become difficult to sell speech recognition software. A high enough percentage of people in the U.S. culture know that it isn't actually useful. The orginal owners of Dragon NaturallySpeaking sold the product to a company that sold it to the company that became Nuance, maybe because they felt the product was damaging the credibility of their trademarks.
Here is a quote from the story linked in the Slashdot story:
"In 1993 two executives from Kurzweill Applied Intelligence (which pioneered SR for the medical market) went to prison for faking sales. That firm was sold in 1997 to a Belgium SR firm, Lernout and Hauspie (L&H), which was reporting phenomenal sales growth at the time. Dragon Systems, which originated DNS that year, was reporting only anemic growth, and L&H had no trouble acquiring Dragon Systems in early 2000 in a stock deal. Within a year a series of accounting frauds came to light and L&H collapsed into bankruptcy. Its SR technology was sold in late 2001 to ScanSoft Inc., which kept the DNS line going. (It was then at Version 6.0.) ScanSoft later acquired Nuance and adopted its name.
"Thereafter, "It was with the launch of Version 8.0 (in November 2004) that the market became reinvigorated and took off," said Chris Strammiello, director of product management at Nuance. "We crossed an invisible line with Version 8.0, where the software actually delivered on its promises and offered real utility for the users. Sales have been growing at a rate of 30% yearly since then, except that we expect it to do better than 30% this year."
Read that again: "... the software actually delivered on its promises and offered real utility..." I called Nuance and was told that version 8 did not have a new recognition engine, but only had improvements in the user interface. A friend who owns and tested version 8 told me he could see no difference in accuracy between that and version 7.
So, in my opinion, Nuance has done the common deceitful things that are called "Marketing":
1) Bring out new versions. Previously, when there has been a "new version" of Dragon NaturallySpeaking, I call Nuance technical support and ask if there is a new recognition engine. I didn't call for version 9, but for the last two versions they have said no. So, nothing is changed; the software is still useless, in spite of the fact that they always advertise that the software is now more accurate.
How is it possible that the software is more accurate, if the recognition engine did not change? Maybe it isn't true. Or maybe the company improved the guesses the software makes when the software really has no clue what the user said. Those guesses have become so sophisticated that you can become confused about what you actually said, and you have to spend time re-creating your ideas. If you are saying simple things about a simple subject, this is not as much of problem as when you are writing about contract negotiations, for example.
In the words of a Slashdot reader: "The opinions expressed h
I mean really, until I can say to my computer things like:
/dev/audio on the neighbors computer. What use will it be?
u ter() == true) then /music -type f -name \"*trent*reznor*\" | xargs -t cat - | ssh hackeduser@neighborcomputer \"cat - > /dev/audio\"");
Find all mp3's that were created by Trent Reznor and pipe them to
I can't program in it can I?
if(i_can_write_code_I_mean_speak_code_to_the_comp
i_might_use_it_a_bit();
else
system("find
endif
But that is just me.
Read that statement again from Chris Strammiello, director of product management at Nuance:
"We crossed an invisible line with Version 8.0, where the software actually delivered on its promises and offered real utility for the users."
Nuance owned the software when it was sold as version 7, and before. So, Mr. Strammiello is apparently saying that his company was knowingly involved in fraud, but, don't worry, now the company is honest.
Wow. Sometimes people lie so much that they have no idea how what they say sounds to others.
Basically, the article linked by Slashdot says that the first and second owners of the software were convicted of crimes, and seems to say that the present owner is also guilty of fraud.
At least, that's how it sounds to me.
Marketing is meant to be methods whereby a company makes healthy connections. However, most marketing people seem to think that marketing means lying. And, when they sink the company, they just get a job somewhere else.
Even the most successful technologies have some problems associated with them. Nothing parent mentioned is even remotely close to show stopper.
Do we really need this? All this is good for is for the people who can't type 100wpm with reasonable accuracy. I don't think I would be able to speak much faster (at a normal speed) any faster than I could type. Plus, I only think so fast. So...Everyone should learn to type at 100wpm and the problem is solved. Also, who wants to hear a bunch of chatering at the library with people "typing" on the computers verses very loud obnoxiouse 100wpm typing sounds that make the people typing at 40wpm drop their jaws.
If a job's not worth doing, it's not worth doing right.
If you can't call the number from the "Get Human" database at get human, AT&T usually gets you a human if you press 0 repeatedly, ignoring prompts.
Sometimes silence works. Try not saying or pressing anything.
Has anybody tried using a dialup botnet to DOS an 800-helpline yet? That would be an interesting comment on the lousy phone help we get from offshore.
If we wrote how we talked voice recognition might be helpful. We don't. Written communication is different and should be.
Help stamp out iliturcy.
Other's here have commented on the nature of mistakes in a person's typing vs. errors from speech recognition. I'd just like to point out that 95% is a (current) technical limit and nearly constant, regardless of the speed of the program, whereas personal typing accuracy can be improved by practice and slowing down.
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
Dragon requires MONTHS of training (literally), and even then it will make mistakes exactly like the one you noted. The plus side is that Dragon works pretty decently under WINE, but apart from their Linux "support", it's a complete mess.
Screen readers aren't much better; they have the accuracy, but are hard to understand.
For a little geeky fun, I had Kurzweil read a few English papers to Dragon. Even after some training, Dragon still couldn't get above 80% accuracy on a computer generated, 100% reliable, voice. Now that's just sad.
Obligatory Soundbite Catchphrase
necessarily the same thing nor do they represent the same cognitive process. I can see where good speech to text software would be a useful tool but hardly a replacement or even mainstay. What's with this techno-yearning for a return to pre-literate oral culture? Hell, just dispense with the to-text part and post/publish/purvey the sound file. I find enough difference between handwriting words and typing them as far as my cognitive processes go. Yes, sometimes I speak something very eloquently, clearly and yet complexly as the situation demands and it would be nice to have that captured instantly. But the famously witty Algonquin Roundtable conversations transcribed to print as banal and stilted. The several posters who have pointed out the difference between typos and transcription errors (former are much easier to rectify on the fly)are right on the mark. The Henry Higgins posters are really engaged in wishful thinking.
(i'm a brazillian living in japan)
...
... can you tell me what it did?
:)
Well, that remembers me of a day i was with my friend, and he was all smiles showing all the bells and whistles of his new car, a sound system with dvd/tv/radio/mp3 with hard drive included, everything.... and then he said : "you that speak english, try the voice recognition stuff, amazing"
I popped in a DVD and after some moments said "next chapter" (paying close attention to the pronunciation), the thing answered me "Boryumu Appu" (Volume UP), really
Said it again with a higher tone and it warned me it was going into "chivii moodu" (tv mode)....
At last i went back to dvd mode and said "nekisuto chaputaa"
Turned Off.
ps: one could argue that poster's pronunciation is really **cked up, your mileage may vary (but i think it was okay
I work in Nuance Support. I would really like to know which financial customer you are with and which product you are using. Perhaps, we, Nuance Support, might want to keep in mind what you think of us and what you think of yourself with regards to knowledge of speech recognition, when we deal with support incidents from your company.
We in Support, from time to time, come across folks like you who think they just know better and have a too shallow mind to be aware of how things around them actually work. Here is a good opportunity for me to explain to people like you how it works since I wouldn't want to do that in a normal professional engagement for obvious reasons.
It is entirely conceivable that you know *a particular aspect* of a product better than a given person working in Nuance Support. And that shouldn't come as a surprise. You are probably a dude working on a speech reco project spending your entire working day, five days a week, on a specific area of a speech application and focusing on a particular problem. Whereas a tech support engineer needs to support literally dozen of products (not taking into account the different versions). We can't possibly know every single aspect of every product like the back of our hand. On every single day, we had to juggle a handful of different incidents and satisfy as many individual customers. Often, we can't even afford to spend more than half a day on the troubleshooting of an issue. I would really like to see you in our shoes. We don't have the luxury of having entire days and weeks for testing the ins and outs of a product. And this should not be the purpose of our function either. Our job is to facilitate the resolution of a problem; we try to understand the problem and troubleshoot it; we are not a walking encyclopedia of speech reco or of related technologies (MRCP, VoiceXML, SSML, NLSML and all such standards; though we need to have a working knowledge of those), nor are we the human form of the product documentation. That's why we have engineers, developers and scientists helping us behind the scenes if digging really deep was needed.
Speaking of which, I would be curious to see how your developer's knowledge of speech reco stacks up against Nuance software engineers and R&D scientists. They are the ones whose knowledge of the specifics and the cutting-edge is representative of Nuance's expertise in speech reco. They are the ones you should look at when judging how much innovation Nuance could deliver.
At any time, at any point in the menu, say "Operator" or hit 0. You will hear:
"Operator"
"I'm sorry, I didn't understand that."
"Operator"
"It sounds like you want to talk to an operator, is that correct?"
"Operator"
"Connecting..."
That I know that exact script should be some indication how much time I've spent on that line over the past few years...
Even if you do go through the menu and enter in your info, it'll eventually send you to an operator, who will ask you to repeat it all anyway.
start command
format c colon slash autotest
know
It would be so great if you could run a movie file through speech recognition and get subtitles. Troops of sub scene monkeys are still spending hours transcribing and synchronizing. It's so medieval. And English subtitles are usually the last ones to be released, if ever.
Sometimes I get the impression that "good enough" speech recognition exists, it's just not publicly available. Of course the military are using it (Echelon), but also companies: Some captions of Google keynote videos have very strange errors that don't look human (misheard words that are totally out of context). In other words, it looks like a computer did the captions and no human corrected them, and the overall error rate is impressingly small.
I wouldn't trust any computer to translate them, though-- we all know what comes out of that. A real babelfish is still decades away.
Not so would a secretary missing one word out of every twenty get fired.
If the recipient was Kilvichan and and a letter was sent to Kivlichan because it was felt by her boss that this was the correct spelling she would keep her job. But the company would lose a customer.
In fact some Scottish secretary was sacked for correctly spelling Urquhart as Urquhart!
"speech recognition software can keep up with someone speaking at 160 wpm."
Well...is that really true?
As I type, I edit. I finish a sentence, and I edit. WITH EDITING TIME, I'm around 30-40 wpm (I probably type 2x-2.25x as fast).
Now, speechrec is 'keeping up with' someone at 160 wpm. Then, that person has to either
a) stop, comma, go back, del del del (or whatever the 'word commands' are for editing, or
b) manually re-read and edit the entire paper.
Personally, when I've used recent speechrec software, I find that the ACTUAL, real-world rate is well below 10 wpm to create a 'final draft'.
Don't get me wrong, I think speechrec is *great* for getting words & thoughts down on paper quickly, without interrupting a stream of consciousness musing. But as a tool for generating "final" quality text? Um...not yet.
-Styopa
making a computer understand Bubba Sixpack who can't type
Dude, your clichés are like so Eisenhower Administration.
This is Bush 43; get with the times.
Yeh wanna salsa weedat, señor?
english is a damn crap
... Format C colon enter.
Have gnu, will travel.
This tech would be handy and I'd probably use it for many things. However, it won't ever easily replace the keyboard for all of my tasks. If I'm sitting in a cube or some other public place, the anonymity a keyboard provides in invaluable. It's not like typing is totally secure in a strict sense, but if I'm chatting with my wife about what I need to bring home after work, I don't need the world hearing me ask about what kind of toilet paper, feminine pads, gossip rag, or porno to pick up.
Lets look at usciiiiii Universal Natural Speech & Logic solution, responding and operational in any personal voiceprint, any language and by natural logic syntexting of syntax - so that machines can sing and think - and not only fail to read or listen to us. To get rid of the keyboard and mouse, and have an Echo-Logic Machine, find out from the inventor's website @ http://www.islandnet.com/~surfins/TestSpace/usciii iii_.html
ASCII can not deliver Automatic Natural Language Comprehension in any natural language, so all must be upgraded to my proposed Operating FONTS for EchoLogical Machines - if you are waiting for the personal wizened secretary you can automate to run your business and its nomadic manufacturing plants, robots or systems.
inventor@prepatent.org
I was one of the founders so I have a bias opinion. iListen uses the Philip's speech, not the Dragon or IBM engine.
What happens when you genuinely want to say, "I was in a Hotel in Lima with a beautiful view of the Delta dancing Tango with a guy named Oscar"?
So just what system did you use to dictate this post? Whatever it was, it sucks!
The trigram matching code in existing dictation products can already distinguish "see" from "sea" from "C". So you'll probably need two or three consecutive NATO letters to trigger NATO mode.
I'm using the Nuance voice recorder on my PDA to record dictation, and I've been training Dragon Naturally Speaking 9 to recognise my voice and convert it into text. When I get home I upload the voice files into the desktop computer and it crunches away for an hour running the DNS language recognition engine to turn my speech into text.
I have used older versions of DNS in the past and the current version is a massive improvement. Basically, if you decide that it is worth spending dozens of hours training the software in order to get reasonably accurate transcriptions then I recommend the product. Make sure to always use the same microphone/headset when recording on your PDA and you'll get great results.
If your need for accuracy is high, or you have alternatives to recording dictation, then DNS is still probably not for you. Also note that it is still a very frustrating experience to train DNS for non-american accents. There are at least a few reasonably common words that seem to be simply untrainable using the Australian language model for example.
-P
Be my friend.
sorry your wisdom is so poor, you must be a sucker but we do not suck, shame on your stupidity in public also!
inventor@prepatent.org
Do some research and you'll probably learn that software-based speech dictation is used by medical staff and law professionals for years now. If it's not ready, why do you think hundreds of thousands of people bought Dragon Naturally Speaking? You think all of them got the software just for its "geekyness" or "cool" factor?
I have not seen any "insightful" comment mentioning speech recognition or dictation for the mobile world. That's where the Big Bang for the technology will come from. Unless you're a mobile device user too hard-wired to thumbtyping, speech is the most efficient way for entering information in most situations.
The secret key word is "REPRESENTATIVE".
At least, I have usually found that that is what works. Just keep saying it once, loud and slow and emphatic, in answer to every question the droid throws at you.
With all the AI's I have talked to, they eventually get the picture.
About 6 years ago, I interviewed a young woman for a video I was making who hard a central nervous system disease that caused her enourmous pain with fine-motor movements. Basically, typing for her was like having her hands attacked by bees. To complete her degree, she had to use speech recognition software. She did her degree in the late 90s! Speech recognition was obviously "good enough" for her 10 years ago. Of course, if you're talking about using SR as a form of voice compression algorithm, turning voice into text, transmitting it and reconstructing it as "voice" at the other end, no, it's not "good enough" yet. It's all relative. I'm sure, faced with the young lady in question's dilemma, you'd probably choose SR technology, even at gen 1 level, rather than be unable to use /.
"I hope you like Guinness, Sir. I find it a refreshing substitute for, er... food." Col. Jack O'Neil, SG-1