Voice Recognition for a Techie?

← Back to Stories (view on slashdot.org)

Voice Recognition for a Techie?

Posted by ryuzaki0 on Tuesday April 18, 2006 @11:50AM from the are-we-there-yet dept.

kaybee asks: "I am a long-time developer, sysadmin, and general computer junkie (for fun and for work) who needs to seriously curb the usage of his hands. I'm curious as to the current voice recognition options, preferably usable on Linux and Windows. I prefer the command-line to a GUI, I prefer Vim to anything else, and I still read my email with Pine. I'd like to hear options for sending email via voice, which I hope is easy, and I'd love to hear of any solutions that allow effective coding via voice, which seems much more difficult."

16 of 102 comments (clear)

Min score:

Reason:

Sort:

Write it yourself by Kawahee · 2006-04-18 11:59 · Score: 2, Informative

Write it yourself. Grab the Microsoft Speech SDK and WINE or some suitable interoperatibility layer and you should be good for Windows and Linux. The Microsoft Speech SDK doesn't require oodles of code to make it work, so you should be able to get a working sample under Windows in about half an hour. It comes with some rudimentry samples as well, and since it's not released under any particularly binding license you can just build your code around it.

'Course you could go the other way with some Open Source speech recognition and cygwin or similiar.

--
I'll subscribe to Slashdot when I see a month without a dupe, a typo, or an article the "editors" didn't read.
1. Re:Write it yourself by amliebsch · 2006-04-18 13:55 · Score: 2, Informative
  
  Parent is on the right track, for sure. Microsoft may be evil, but their speech API is truly easy to use. Also, if you are willing to use Windows, Windows XP Tablet Edition comes out of the box with relatively full-featured voice command and dictation capabilities built into the OS. With a little training, it can probably do most of what you require. I have in the past actually used it to dictate an entire paper. In fact, I used it to write this post. Once you get used to getting all the proper punctuation commands, it is possible to dictate at a fairly good rate of speed.
  
  --
  If you don't know where you are going, you will wind up somewhere else.
2. Re:Write it yourself by Anonymous Coward · 2006-04-18 20:49 · Score: 1, Informative
  
  Yes, there were two clauses separated by a comma which is perfectly good punctuation by normal standards of literacy.
wouldn't bother yet by joe+155 · 2006-04-18 12:02 · Score: 2, Informative

I've actually used some vioce recognition over the years and it's got a lot better than it used to be. last time I used to use a voice recognition software on my computer even though I did loads of the training it just didn't seem to get it; eventually I had to give up... it wasn't cheap either. Whilst I think it will have potential to do a lot in the future I'm just not sure that it's really at the stage where it can be considered a full time replacement; especially for technical jobs

--
*''I can't believe it's not a hyperlink.''
Save your hands -- while you can by Lars512 · 2006-04-18 12:36 · Score: 5, Informative

Seriously, if you're suffering hand or arm pain, you should think about the way you're doing things now. Speech recognition is unlikely to replace your current coding practices, although it might help with writing reports.

Instead, try using the keyboard break feature in gnome. To start with, have it kick you off your computer every 30 mins for a 3 min break, and don't allow yourself to postpone breaks. Get some equivalent software for windows too. Use your 3 min breaks to walk around and stretch. Within a week, you won't be a lot less productive, but your arms will feel a lot better. Then you can maybe up it to 40 mins. In the short term, a course of anti-inflams might help (ask your doctor).

Also, don't come home in the evening and play games on your computer, or do more work. Your arms probably can't take it. Equivalently, inform your employer of your condition and subsequent inability to work reckless overtime hours.

These two things should get you started for long-term sustainable maintenance of your arms.
Linux Adaptability by skwirlmaster · 2006-04-18 12:45 · Score: 3, Informative

It's been a while since I've had to look into speech recognition for linux, but this link should help you get started: Linux Accessibility Resource Site

Read down to the section about speech recognition. I hope that helps.

--
My inner self is ineffable, so don't eff with me.
Shoot! by Bios_Hakr · 2006-04-18 12:47 · Score: 4, Informative

For gaming on WinXP, I use an app called Shoot!. While playing Falcon, I use it for fairly simple (press T, wait 5 seconds, press 1) macros. I was dicking around and decided to set up a profile for some simple stuff in Cygwin. If I say "list", the program returns "ls". "List all" will return "ls -a". "List all long" will return "ls -la".

You can, with some tweaking, even get it to understand complicated stuff. If I say "manual g r u b", I can get "man grub". "Vi save quit" could be mapped to ":wq" without too much trouble.

Anything you can type, it can do.

I don't think it works under Linux. I don't know of anything like it under linux. It does, however, work quite well inside PuTTY.

--
I'd rather you do it wrong, than for me to have to do it at all.
Re:Try using a GUI for email, etc... by dbIII · 2006-04-18 13:04 · Score: 2, Informative

Voice recognition is still hit-or-miss
It seems to work with Nintendogs on low end hardware (Nintendo DS with 4MB memory). I suspect the secret is having a limited number of things to match - for example a voice menu with limited options in each context that sound very different.
Re:Speaking WPM != Chars Per Minute by TheWanderingHermit · 2006-04-18 13:06 · Score: 2, Informative

We only have 26 single sounds we can make

No. Not true, even in English. For example, "c" does not make a sound distinctly different from all other characters. Some letters, such as "x" make sounds that can easily be made from a combination of other letters. Including pairings and such, linguists say that the English language includes something closer to 45 single sounds.

I used to teach Special Ed and saw software that could recognize entire words and use them in writing in a word processor. I have not used voice rec software in the 10 years since I left the field, but I don't see how, if it could be done by some of the pioneering programs 10 years ago, why many programs now would recognize only individual letters and not words.

I have also heard about some writers that were using voice rec to do a lot of their writing as long ago as when we were using the software in sp.ed. (again, that was about 10 years ago).
Re:Speaking WPM != Chars Per Minute by Eideewt · 2006-04-18 13:07 · Score: 2, Informative

I think you may be confused. First of all, there are way more than 26 sounds in the English language. It's more like 49 individual consonants, vowels, and dipthongs, and many monosyllabic words can be constructed from those.

As far as I can tell, you're saying that words would need to be spelled out character by character so you'd have to talk really fast to be productive. Custom dictionaries would go a long way towards fixing that. The main issue would be whether a particular speech recognition solution integrates well with the shell and/or dev environment being used. It would be fairly simple, with any software, to get it to recognize shell commands like rm, cp, |, and grep when they were spoken as words. When coding, it would also be pretty simple to recognize common keywords and operators and output the proper text. I don't think there would be much trouble with speed until stuff like variable names began to come up. Even then, the big problem isn't that storing variable names for later spoken use would be hugely difficult to implement; it's just that (afaik) it hasn't been yet.

Assuming that most words could be recognized when spoken, you wouldn't need to speak at a higher WPM than you type at. Conversations happen at around 200 wpm (just over 3 words per second), according to Wikipedia, so speed wouldn't be much of an issue.

I think the biggest problem with speech as an input for techies is that the software itself has not yet been written. While there may be recognition software that can comprehend speech at normal speed and append its dictionary as it runs, there's none that I know of that has been set up to function in a technical environment. It may be as simple as putting the pieces together, but it would probably require a lot of hacking on your own. The second biggest problem would be wearing out your voice, although that's something you can work with.
perlbox using sphinx by Danny+Rathjens · 2006-04-18 13:23 · Score: 4, Informative

The perlbox voice control app is kind of a stalled project, but it is a nifty front end for the open sphinx voice recognition engine.
http://perlbox.sourceforge.net/
http://cmusphinx.sourceforge.net/
Command and control is a lot easier to do with voice recognition since the dictionary the engine has to choose from is so much smaller. Having voice recognition engines understand arbitrary words well is still a bit difficult.
VoiceCode by Stranger+Than+Fictio · 2006-04-18 16:04 · Score: 2, Informative

Don't get too discouraged by the large number of commenters who haven't used speech recognition or who don't understand why someone might need to lay off the keyboard for a while. I wrote 100k lines of C++ code hands-free for my astronomy thesis over the course of two years, using with speech recognition software that is now about 10 years out-of-date. There have been significant improvements in both the speech recognition technology and tools for coding by voice since then. For coding, take a look at the VoiceCode project at http://voicecode.iit.nrc.ca/VoiceCode/public/ywiki .cgi For other tools/approaches to coding by voice, see also the VoiceCoder group at yahoo groups: http://groups.yahoo.com/group/VoiceCoder/ I don't know of any open-source or non-commercial dictation software which matches the accuracy and ease-of-use of the Dragon NaturallySpeaking (fair warning - I work for Nuance, which makes Natspeak, though I was a user long before I became an employee). Natspeak is only available for MS Windows, but you can always put a Windows box on your desk and connect to a unix host via an X server (exceed, xwin32). That generally works well for command-line stuff, not so great for GUIs (but you say you prefer command-lines anyway).
1. Re:VoiceCode by lpq · 2006-04-18 18:01 · Score: 3, Informative
  
  I tied both Dragon Naturally Speaking (costing ~ $500 or $600 at the time), and gave up the training problems and low recognition rates. I tried IBM's ViaVoice Professional, USB-Pro -- with digital signal processing in an included microphone and a digital connection to my computer. With a 1 paragraph training session, it was already over 95% and improving over Draggin'. It was easier to train, and you could train it on the text you were typing -- i.e. it was able to learn from corrections and merge them back into your voice profile.
  
  Unfortunately, IBM released it in 2001-2002, then forgot about it. They've since gone onto their non-training voice recognition solutions for sale to businesses. They seem to have advanced, but not in any retail product.
  
  Dragon has come out with updates, but from people who have used and trained on *both*, ViaVoice has higher accuracy (~1% difference). The ViaVoice product price has fallen, and Dragon has, of course, gone up....
  
  Whatever product you get, get a fast 2+CPU machine with lots of RAM - 2GB or more. The ViaVoice algorithm adapts to your talking speed -- it will perform more looks and comparisons and have greater accuracy as the processor speed goes up. ViaVoice stops comparing when it runs out of time (your speaking has gotten too far ahead). But it listens to the words, in context, to determine spelling. The more memory it has, the more vocabulary it can pull into memory. Note -- I am saying get a dual-cpu (or dual core) machine, the faster the better.
  
  Viavoice was also released on Linux, but without as much application support.
  
  For coding support in voice products -- there just hasn't been enough demand.
  
  But for "wrist support" -- try a multi-faceted approach. Maybe voice recognition, maybe a tablet for input? Ergo keyboards, trackballs? It's not a comfy field. There isn't a great financial incentive to develop voice input for coding when you can hire foreigners for peanuts, and keep having eager generations of new hackers to come and be sacrificial lambs on the keyboards of progress...;-)
Re:Speaking WPM != Chars Per Minute by identity0 · 2006-04-18 16:06 · Score: 2, Informative

Hooray, I just got out of linguistics class and happen to have my book on me. According to my "Contemporary Linguistics: An Introduction 5th Edition" by O'Grady, et al, there are 49 phonemes in American English. Keep in mind that variants and dialects of English can vary quite a bit, and the book itself says some speakers may be missing a few of the phonemes.
speech recognition for Linux by belmolis · 2006-04-18 20:30 · Score: 2, Informative

For something that runs on Linux directly, you might have a look at the Accessible Speech Recognition Technology software. It's a research project, not a polished system, but you might be able to hack it to do what you need.
Coming from speech recognition by obarel · 2006-04-18 23:11 · Score: 2, Informative

It's possible to recognize speech pretty well (and no, the ridiculous examples of "I'll Ike's peach recognition all hot" don't really happen for any reasonable engine that uses language models, and most of them do these days).

The main problem is that no one actually speaks or writes as eloquently as people present speech recognition.

Try this experiment: map backspace, delete and arrow keys to @ and try to write a letter or some code. You'll quickly give up. When you see demos of speech recognition, you never hear someone saying "Yesterday I went to the cinema. umm Monday actually. Ha, look the computer is still writing. Oh boy... delete delete delete delete ... delete ... delete ... replace Yesterday with Monday" (while it's possible to recognize "replace X with Y", you still have to be pretty focused not to say anything else).

The missing bit is the intelligent dialogue that is redundant when you type. When you type, you have arrow keys, control keys and backspace. When you talk, these things are part of the communication, and writing an intelligent dialogue system is not trivial. If you want another experiment for the limits of speech recognition, just try whatever you want a computer to do with a real person. Try to dictate code to someone, and you'll soon find that it's not that simple. A person can even ask at the right time "empty brackets?" after you say a function name followed by a semi-colon, yet it's still very difficult to dictate code (or even a letter without any corrections).

There is another problem: Imagine that you type away, and suddenly you see that you've forgotten a semi-colon. But as you're writing a game, you have the constants UP, DOWN, LEFT and RIGHT. Hmmmmm.... Now you have to change your code (or the code you've downloaded) to suit the interface. Not good. Another option would be "missing semi-colon at the end of the line beginning with strcpy", but you need a very intelligent dialogue system for that.

Note: I've assumed that the recognition is perfect (and the problem is with our brains), but of course it isn't.