Is Speech Recognition Finally 'Good Enough'?

← Back to Stories (view on slashdot.org)

Is Speech Recognition Finally 'Good Enough'?

Posted by ryuzaki0 on Friday May 18, 2007 @09:07AM from the why-do-salty-snacks-keep-coming-up-freedom-fries dept.

jcatcw writes "Speech recognition software is fast, but it still may not be accurate enough. Clerical jobs usually ask for 40 wpm, but speech recognition software can keep up with someone speaking at 160 wpm. In Lamont Wood's demo it did very well at too/two/to and which/witch, but will it still render 'I really admire your analysis' as "I really admire urinalysis'? At 95% accuracy, people aren't jumping on the bandwagon. Wood's typing speed is about 60 wpm with 93% accuracy, so he found that using speech recognition was about twice as fast as typing. Those who type at hunt-and-peck speeds will experience results that are even more dramatic. There's really only one product on the US market: Dragon NaturallySpeaking from Nuance Communications. The free versions from Microsoft aren't up to the task and IBM sold ViaVoice to Nuance, where it's treated as an entry-level product."

12 of 313 comments (clear)

Min score:

Reason:

Sort:

We use it. by Organic+Brain+Damage · 2007-05-18 09:21 · Score: 2, Interesting

For command control of a system where we need both hands free. It's pretty good, much better than stopping and typing, clicking or pressing buttons during a repetitive manual process.

We're using an older version of Microsoft's product and it seems the microphone quality is important.
Re:Good enough for what? by QRDeNameland · 2007-05-18 09:45 · Score: 2, Interesting

Excellent points. One only need consider how much computer usage is done in cubicle farms, and then picture everyone chattering "Scratch that!" at their workstation, and the utility of speech recognition as a primary form of input becomes very limited regardless of its accuracy. I have a copy of Dragon, and its accuracy is really quite impressive, but past the novelty I have almost never used it. Other than the fact that it requires virtual silence (aside from your voice) to operate, unless I already know *exactly* what I want to say, it is easier for me to compose text by keyboard and construct my wording as I go along. The only time I could see it being of much use is for dictating a handwritten or badly printed document where OCR wouldn't work.

--
Momentarily, the need for the construction of new light will no longer exist.
sierra lima alpha sierra hotel delta oscar tango by tepples · 2007-05-18 10:15 · Score: 2, Interesting

Any speech recognition software worth the $ should be able to detect and translate NATO letter names: "hotel tango tango papá colon slash slash sierra leema alpha sierra hotel delta oscar tango dot org".
great prevention for repetitive stress injuries by brettbum · 2007-05-18 10:17 · Score: 2, Interesting

I'm using Dragon NaturallySpeaking 9 right now. I've been using it for several months, and I have written a dozen articles on it. I think it works fantastic, but you definitely have to learn how to write all over again. Out of the box it trains extremely quickly, if you do not want to train it at all you can just start talking and it will eventually catch up with you. (Note it caught catch up and not ketchup) I started using it as a preventative means of avoiding repetitive stress injuries. I cannot use it to code, however I can definitely use it for my writing. Using Dragon NaturallySpeaking, I can easily push out five to 15,000 words a day. (notice it used the word five and then a number) Ultimately it provides you very accurate writing. It's almost impossible to have a spelling error, however word substitution errors are still very common. If you attempt to compare your typing accuracy versus your dictation accuracy, you will often see spelling errors in the typing and word substitution errors in the dictation. That means that when you go back and edit your own work you have to spend a good deal more time editing because you're not used to editing the type of dictation errors that you make because you have years of experience editing the normal types of spelling errors that you made. You also have to learn how to compose sentences by speaking as opposed to composing to your fingertips. This definitely exercises a different area of your brain and I'm sure you will find that you are not as good of a writer when you speak as you all are when you type. However with practice you can get up to speed dictating and you will then definitely benefit from the ability to type at 150 words a minute without breaking a sweat, stressing out your wrists, or even suffering from eyestrain. Dragon NaturallySpeaking definitely helps people to avoid eyestrain because you don't have to stay focused on the computer monitor while you're typing you can look around the room, or outside or anywhere. Touch diapers (s/b touch typers!) can do this also however good ergonomics dictates that you sit in positions that align your body correctly to avoid repetitive stress injuries and this includes pointing your face for words (forwards!) towards the computer screen. With Dragon NaturallySpeaking I can face in any direction I like in the program will keep up. Downside it does substitute words and on occasion it skips words entirely. I run at least a gigabyte of RAM in my computer and I was would suggest double that amount. Dragon NaturallySpeaking is a bit of a resource hog, however it's worth it and it's not as bad as Firefox. I should have purchased it years ago and definitely do not regret the purchase nor my new attempts to learn how to write all over again. I had to learn to write with pencil and paper, and then with pen and paper and then with a manual typewriter and then with an electric typewriter and then with my trs 80 and then a laptop and my treo and yada yada yada I can sure learn to do it with my voice.
Re:Mod parent up! by arbitraryaardvark · 2007-05-18 10:57 · Score: 2, Interesting

* when you need to enter hand-written documents into a computer
* for transcripts of a single speaker
* informal free-thought when not surrounded by other people
* when you have horrible typing skills

You had me at "* when you have horrible typing skills".

Parent post mentions their 4 year old making pancakes.
At some point, most likely, you expect the kid is going to grow up and get better at making pancakes. There will be a learning curve. Maybe 4 is too young; I haven't met the kid. But part of the point of teaching a kid to make pancakes is to get the learning curve out of the way, so they can get better at it on their own time, preferably before they are 30.
My crude analogy is that a naturally speaking soft dragon is a bit like a 4 year old pancake maker. It can be worthwhile to get used to an imperfect tool now, so that you'll have the learning curve out of the way as the tool gets better over time.
Or it can be better to wait another year. Your mileage may vary.

Here's another potential application: Get the dragon for your kid. It may be useful as she or he learns to read and write.

I for one welcome our new naturally speaking dragon overlords.

I want the throat mike module, so that it types what I'm subvocalizing.

I'm hearing a business model here:
1 form a corp to offer voice to text software
2 wave hands
3. sell out to nuance
4......
Medical transcription by grogo · 2007-05-18 11:17 · Score: 2, Interesting

Where I work, we use PowerScribe, a Dragon-based medical transcription service. The following was dictated using it:
"I am using PowerScribe, which is a radiology speech dictation system. It is fairly accurate in the doming [domain] of medical transcription, and particularly in the doming of radiology, but it not very useful for free pexed [text] speech.
For example, there [here] is a sample of the typical chest report: Hazy groundglass opacities noted with both lungs, particularly the right middle lobe as well as the left lower lobe, with no evidence of effusion, pneumothorax, or consolidation. [this is pretty much verbatim what I said].
[But here's a free text example:] However, if a Type II right a regular letter to a friend, [if I try to type a regular...] for example setting the following, [for example, saying the following...] Yesterday was a very nice state [day]. The clots [clouds are] gone, and only a little brain [rain] remains. Today it is supposed to be even warmer outside, I think elbow [I'll go] injected [and check] with the right knob. [the weather right now]"
The biggest problem with this system, particularly for medical transcription purposes, is that it only gets about 95-97% right. That means, it's wrong at least 3% of the time. Worse yet, whenever it's not sure, it just inserts random garbage! Whatever the closest match is, which is often wrong, and sometimes fundamentally changes the meaning of what I intended.
Human transcriptionists, on the other hand, will insert a blank if they're not sure, to alert the dictating physician. This fscking system has no clue when it's wrong, which makes it very dangerous in my opinion!
Re:This comment written by MS speech recognition by R3d+M3rcury · 2007-05-18 11:46 · Score: 2, Interesting

Actually, I remember working with Apple's years ago. We had a project where, ideally, people could send voice commands to a Mac and get it pull entries out of a database and read it to you. A "What is my outsanding balance?" sort of thing.

It was really entertaining, but I fell into what I call "The Missing Remote" syndrome: If you've ever lost your remote, you will spend 10 minutes looking for it so you can turn off the TV and go to bed, rather than get up and walk over to the TV and turn it off. I think I must have spent 5 minutes saying "Close Window" in various different ways and speeds rather than just click on the damn close box.

Of course, what I really miss in Apple's speech recognition are the avatars...
Patents by Anonymous Coward · 2007-05-18 12:30 · Score: 1, Interesting

This is actually a really good case on patents, and how one company can purchase a couple patents and block out commercial market competition. Dragon/Nuance/.... has such a wonderful monopoly, and has noticably scaled back their release cycle since they achieved that Monopoloy (dragon 7 - dragon 8). Somehow I lack the faith to presume they've been working hard and are about to release anything worldchanging.
Speech Recognition is more than dictation by Pedrito · 2007-05-18 12:46 · Score: 2, Interesting

Speech recognition generally comes in two flavors: Command and Dictation. Most voice recognition engines can handle either, but the implementations are very different. Command mode is handled by providing a list of "command" words that are valid at any given point and operates much like a state machine. Dictation is a completely different beast and does a variety of things under the hood to increase accuracy.

"Good enough" is very vague as applied to voice recognition. For command stuff, "good enough" has been here for about 7+ years. Even MS's free engine does a great job at that.

I used Via Voice years ago and it worked pretty well. But here's the thing: Have you ever tried to dictate something? It's definitely a skill. I'm sure some people have a natural ability for it, but I certainly didn't. I tried dictating stuff and it's tough. You hit a pause mid-sentence trying to figure out how you want to phrase something and suddenly there's a period and you're beginning a new sentence. Try dictating several sentences of original material and keeping it going without pauses and "um"s and so forth and you'll see, it's not quite as easy as it seems. I suspect one of the reasons voice recognition hasn't been a hit, is that people don't expect that. They try it for a few days think, "Hell,it's easier just to type," and give up. That's why I don't use it for writing. I can type faster and more accurately than I can dictate. I'm sure if it's something I wanted to work on, I could develop the skill, but my point is, I think that's probably why a lot of people give up on it.

I honestly think that voice recognition in command mode could be really useful at speeding things up, if software were designed to take advantage of it. But it's not easy to add it as an afterthought and it adds significant work, even if it's done with forethought. It's a chicken and the egg thing. If a lot of software supported it, I think people would see a gain in productivity using whatever software they use daily. I don't mean just using voice recognition, but in combination with a mouse and keyboard. For example: "Execute Browser. google dot com. flying burrito brothers. google search". Saying that would be a pretty fast way of opening your web browser, typing "google.com" and then typing "flying burrito brothers" and then clicking the "Google Search" button. Replace "Google Search" with hitting the enter key and even faster.

But as I said, it's a chicken and the egg thing. Software doesn't support it because there's no demand and there's no demand because people haven't really experienced software that supports it.

Another issue (and I'm sure this has been mentioned by others), is background noise. I like to listen to music or watch TV while I work. Those don't mix well with voice recognition, at least not at the volumes I listen to them. Until voice recognition can get around that and recognize my voice amidst background noise and do it accurately AND software out there generally supports it, it's not going to go mainstream.
Until we get hard ai along with it no. by otomo_1001 · 2007-05-18 13:16 · Score: 3, Interesting

I mean really, until I can say to my computer things like:

Find all mp3's that were created by Trent Reznor and pipe them to /dev/audio on the neighbors computer. What use will it be?

I can't program in it can I?

if(i_can_write_code_I_mean_speak_code_to_the_compu ter() == true) then
i_might_use_it_a_bit();
else
system("find /music -type f -name \"*trent*reznor*\" | xargs -t cat - | ssh hackeduser@neighborcomputer \"cat - > /dev/audio\"");
endif

But that is just me.
Re:Hmmm.... by Helios1182 · 2007-05-18 13:36 · Score: 3, Interesting

There is a lot of work on word prediction and language modeling in natural language programming and computational linguistics research. 95% accuracy is considered very good though. There are ways to help, but some of the most effective ways require a constriction of the language recognized. n-gram based language models provide a good statistical framework, but are very data hungry. You need lots and lots of relevant (this is the hard part) text. The model needs to be based on the language the user uses in order to be effective.
Type Faster? by cbeley · 2007-05-18 14:41 · Score: 2, Interesting

Do we really need this? All this is good for is for the people who can't type 100wpm with reasonable accuracy. I don't think I would be able to speak much faster (at a normal speed) any faster than I could type. Plus, I only think so fast. So...Everyone should learn to type at 100wpm and the problem is solved. Also, who wants to hear a bunch of chatering at the library with people "typing" on the computers verses very loud obnoxiouse 100wpm typing sounds that make the people typing at 40wpm drop their jaws.