Is Speech Recognition Finally 'Good Enough'?
jcatcw writes "Speech recognition software is fast, but it still may not be accurate enough. Clerical jobs usually ask for 40 wpm, but speech recognition software can keep up with someone speaking at 160 wpm. In Lamont Wood's demo it did very well at too/two/to and which/witch, but will it still render 'I really admire your analysis' as "I really admire urinalysis'? At 95% accuracy, people aren't jumping on the bandwagon. Wood's typing speed is about 60 wpm with 93% accuracy, so he found that using speech recognition was about twice as fast as typing. Those who type at hunt-and-peck speeds will experience results that are even more dramatic. There's really only one product on the US market: Dragon NaturallySpeaking from Nuance Communications. The free versions from Microsoft aren't up to the task and IBM sold ViaVoice to Nuance, where it's treated as an entry-level product."
As a foreigner it is really hard to get the pronounciation right enough.
Also command execution by others in the room is a problem.
How about listening to music, or TV, and having the computer interpreting it.
If you mod this up, your slashdot background will turn into a beautiful sunset!
For typing up an inter-office memo in Word, most likely. But I'm a programmer, and I can barely read out loud some perfectly fine code, I can't imagine trying to enter it all with voice recognition, no matter how good it gets.
Curiosity was framed, Ignorance killed the cat.
TFA mentions that many people stop using speech recognition software because of poor accuracy. I don't think that's the major reason. I think they start using it because it's a neat idea that seems to have a lot of promise, but quickly realize there are only a few situations where it's actually helpful. The end of the article mentions rough drafts; I'd also say it might be a decent choice
For the majority of office tasks, it just isn't a good fit.
So if the "good enough" is being useful in any way whatsoever, it sounds like we're almost there.
To be fair, that's a problem with the IVR coder, not the voice recognition engine.
-Rick
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs
Instead of asking if speech recognition is "good enough", maybe we should be asking whether or not it's actually useful for anything in the first place. I mean, is it good enough... to do what?
Can you imagine being in a cubicle farm full of people talking to their computers? Or trying to talk to your computer on the bus? You have to imagine that as computers become more ubiquitous, input methods will have to adjust alongside, and I simply can't see (or hear) speech recognition doing that very well.
What is is all that is. Isn't that obvious?
I'll be honest with you, Vista is way better at coming up with hilarious new Madlibs than you are.
What is is all that is. Isn't that obvious?
Seriously, the only things speech recognition is good for are bulk text entry and simple navigation. I imagine trying to use voice commands to operate modern software would be similar to letting my four-year-old help make pancakes — yes, it gets done, but it's so much easier and faster to just do it yourself. Imagine trying to edit a document using just voice commands. Is your WP going to be smart enough you can tell it "find all occurrences of 'scum-sucking bottom feeders' and replace it with 'esteemed colleagues'". Or are you going to have to say "Find. Scum hyphen sucking bottom feeders. Tab. Esteemed colleagues. Replace all." Face it, GUIs have rendered speech recognition for command and navigation moot. Most operations you perform don't have a verbal description, or at least not one that is quicker to say than to do.
I also can't imagine it'd be that useful for actually writing things. I don't think I'm the only one who revises as they write. I think I actually write better when I write things out by hand, because it's slower so I tend to think my phrasing and sentence structure through more before I commit anything to paper. If I could suddenly type two or three times faster, I think it'd probably make my text even more incomprehensible than it usually is...
Just junk food for thought...
Dragon is no more... and hasn't been for a long time.
NaturallySpeaking has been sold a few times to various companies.
(I keep track because I worked on V1.0)
5% could be the difference between "The report confirmed that Iraq has WMDs" and "The report confirmed that Iraq had WMDs." It could be the difference between "Tell Mrs. Smith to take 20mg of neurontin" and "Tell Mrs. Smidt to take 20mg of neurontin." It could be the difference between "The magnet should not be exposed to a field greater than fifteen teslas" and "The magnet should not be exposed to a field greater than fifty teslas." And on, and on.
Small wording changes can make a big difference -- generally much bigger than typos, which I can assure you happen far less often than 5%. Additionally, typos are generally recognizable as the intended word, and often aren't even noticed by the reader.
"'If one must live then one must die.' - oh, the truth must be funnier than this..." -- MammÃt
I wonder exactly what 95% means. Does it mean one character out of every 20 is wrong? One word out of every 20 has an error? One sentence. I average about one to two errors per page, and so all of these sound horrendous to me. Even typing with my eyes closed (which I do sometimes when my eyes are feeling tired, but generally don't because I always think I've managed to move my fingers one character across and started typing complete nonsense) I get higher accuracy than that.
I am TheRaven on Soylent News
95% sounds good if you're not comparing it to a person. But 5% error rate is horrendous for business use. A secretary who missed one word out of every 20 would be fired after a few hours. A couple decades ago, when I temped for office work, I could transcribe about 80 wpm with close to 100% accuracy, and I was nowhere near the fastest.
If you got a letter from a business containing a typo on almost every line, would you do business with them?
If the masses can keep you down, you're not the Ubermensch.
I'm using the Nuance voice recorder on my PDA to record dictation, and I've been training Dragon Naturally Speaking 9 to recognise my voice and convert it into text. When I get home I upload the voice files into the desktop computer and it crunches away for an hour running the DNS language recognition engine to turn my speech into text.
I have used older versions of DNS in the past and the current version is a massive improvement. Basically, if you decide that it is worth spending dozens of hours training the software in order to get reasonably accurate transcriptions then I recommend the product. Make sure to always use the same microphone/headset when recording on your PDA and you'll get great results.
If your need for accuracy is high, or you have alternatives to recording dictation, then DNS is still probably not for you. Also note that it is still a very frustrating experience to train DNS for non-american accents. There are at least a few reasonably common words that seem to be simply untrainable using the Australian language model for example.
-P
Be my friend.