Voice Recognition for a Techie?
kaybee asks: "I am a long-time developer, sysadmin, and general computer junkie (for fun and for work) who needs to seriously curb the usage of his hands. I'm curious as to the current voice recognition options, preferably usable on Linux and Windows. I prefer the command-line to a GUI, I prefer Vim to anything else, and I still read my email with Pine. I'd like to hear options for sending email via voice, which I hope is easy, and I'd love to hear of any solutions that allow effective coding via voice, which seems much more difficult."
Oh, I'm sorry. I thought you said voice recognition for a Trekkie....
Tibbon
tibbon.com
No, computer, I said, "awk single quote left curly print dollar one right curly single quote file dot txt pipe sort pipe uniq dash see greater than a dot out"
shudder
[...]who needs to seriously curb the usage of his hands.
Lest they... *ahem* wander.
The main issue I see with coding by voice is that each character needs to be said by a word. We only have 26 single sounds we can make (at least us english speakers) and so pretty much everything besides the basic sounds have to be the result of multiple letters strung together. Here's some math:
Lets say you type at about 40wpm, or about 160characters per minute (this is a low estimate of 4 chars per word), or about 2.5 characters per second.
To be as productive speaking, you'd probabily have to speak about the same number of words per second as you type characters, or 2.5 words. That's really fast.
Sorry bub, doesn't look speech is a very good alternative. Hell, Brain Implants on the other hand...
Voice recognition is good for letter-writing but bad for overall computer usage, especially in UNIX shell (incl vi and especially Emacs). Picking programs that don't require jumping all over the keyboard for basic tasks can reduce the strain. Same goes for programming syntax: Python is a lot more RSI-friendly than Perl, for example. (IMHO) Write scripts that automate routine tasks, even if it's just one line with lots of regex.
Seriously, if you're suffering hand or arm pain, you should think about the way you're doing things now. Speech recognition is unlikely to replace your current coding practices, although it might help with writing reports.
Instead, try using the keyboard break feature in gnome. To start with, have it kick you off your computer every 30 mins for a 3 min break, and don't allow yourself to postpone breaks. Get some equivalent software for windows too. Use your 3 min breaks to walk around and stretch. Within a week, you won't be a lot less productive, but your arms will feel a lot better. Then you can maybe up it to 40 mins. In the short term, a course of anti-inflams might help (ask your doctor).
Also, don't come home in the evening and play games on your computer, or do more work. Your arms probably can't take it. Equivalently, inform your employer of your condition and subsequent inability to work reckless overtime hours.
These two things should get you started for long-term sustainable maintenance of your arms.
It's been a while since I've had to look into speech recognition for linux, but this link should help you get started: Linux Accessibility Resource Site
Read down to the section about speech recognition. I hope that helps.
My inner self is ineffable, so don't eff with me.
For gaming on WinXP, I use an app called Shoot!. While playing Falcon, I use it for fairly simple (press T, wait 5 seconds, press 1) macros. I was dicking around and decided to set up a profile for some simple stuff in Cygwin. If I say "list", the program returns "ls". "List all" will return "ls -a". "List all long" will return "ls -la".
You can, with some tweaking, even get it to understand complicated stuff. If I say "manual g r u b", I can get "man grub". "Vi save quit" could be mapped to ":wq" without too much trouble.
Anything you can type, it can do.
I don't think it works under Linux. I don't know of anything like it under linux. It does, however, work quite well inside PuTTY.
I'd rather you do it wrong, than for me to have to do it at all.
"I'd like to hear options for sending email via voice, which I hope is easy, and I'd love to hear of any solutions that allow effective coding via voice, which seems much more difficult."
I've wondered about this myself. I tend to use my computer with the headphones on. Often, I'm listening to music or.. well just plain silence, just the standard dings of Windows. I do pay attention, though, to the sounds coming from the computer. (i.e. the traditional hoo-hoo of recieving an email.) I've always wondered about what more could be done with sound to make the user more aware of the goings on with their computer, especially when a number of apps are actively working. I think I was inspired by an episode of Futurama I caught. One of the character's personalities was in the Pilot's body. The Pilot, whose personality was in yet another body was trying to describe how to interact with the ship. I remember him saying "Can you hear that faint little tone? That's the status of..".. or something or other.
In any event, it's fun to imagine. I wouldn't mind if a soft low-volume voice were to say "You have recieved an email from: John Smith." I had a job a few years ago where that would have been a nice little feature since messages would come in that required urgent attention. My solution to the problem at the time was to use a custom filter that would specficially notify me of important messages by bringing a little window up to the surface. That was fairly annoying, though, when the computer was busy and it was slow as molasses to get the window to go away.
"I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
I ran into this problem while working (coding) & trying to do grad school (in Comp Sci). The first point I'd make is, take a rest break (no computer use) for a while if you can. ASR isn't really there yet, & it won't help you with other things you might want to be pain-free for... seriously. That said, there is a group called "Open Source Speech Recognition Initiative" whose mailing list I'm on, but they don't have any product yet. Might get a better answer posting there, though. Or not. There's also a group on Yahoo (I think) called VoiceCoder. That's your best bet right now, although it's all about Dragon Naturally Speaking & various hacks & kludges to be able to do coding, use Dragon for Linux, etc. Dragon has been reported to run under WINE, but of course YMMV depending on your hardware, versions, etc., etc. Finally, whatever approach you try, expect it to take a good long while before you begin to approach your hand-using productivity. The technology isn't there yet, and even though I know how to improve it, I have no Ph.D. so no one would give me the $$ to do the research that could back up my claim.
nificant.
http://perlbox.sourceforge.net/
http://cmusphinx.sourceforge.net/
Command and control is a lot easier to do with voice recognition since the dictionary the engine has to choose from is so much smaller. Having voice recognition engines understand arbitrary words well is still a bit difficult.
First, find a solution that makes it easy to enter text into a GUI (gnome accessibility, WINE w/dragon natural speaking, whatever).
Find a subset of words that are short, easy to remember, easy to say, and above all -- accurately translated by the chosen voice recognition software.
Then create a small perl script that can take this coded input and convert it into a nicely formatted chunk of code.
You can have different translators for different target languages... for example
In shell programming, you might have the following:
hash -> #
bang -> !
pipe -> |
test -> [
end test -> ]
mark -> '
quote -> "
end mark/quote (keeps them balanced for shell scripts)
for identifiers... don't name them. For example, lets' say you wanted to do this:
#!/bin/bash
function hello_lcase {
HELLO = $1
if [ -z $HELLO ] ; then
echo "Hello world"
else
echo -n "Hello from "
echo $HELLO | sed -e 's/.*/\L\0/'
fi
}
you would say:
hash bang slash bin slash bash
new function 1
set local 1 ref in 1
if test empty ref local 1 end test
then
echo string 1
else
echo option n string 2
echo ref local 1 pipe program s e d option e space
mark s slash dot star slash back upper l back 0 slash end mark
end if
end function 1
you'd run the perl script and it'd ask you:
what do you want to call function 1: foo
what do you want to call local variable 1 in function 1: HELLO
what do you want to use for string resource 1: Hello World
what do you want to use for string resource 2: Hello from
and it'd output the script (maybe after running through indent)
You could substitute "1" for any easily recalled mnemonic or symbol the text->speech translator is unlikely to mistranslate (in this case "foo" and "hello" would probably be fine as is)
Then you'd get a chance to globally "refactor" your symbols and give them nice-looking names, only having to type them once.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
I tied both Dragon Naturally Speaking (costing ~ $500 or $600 at the time), and gave up the training problems and low recognition rates. I tried IBM's ViaVoice Professional, USB-Pro -- with digital signal processing in an included microphone and a digital connection to my computer. With a 1 paragraph training session, it was already over 95% and improving over Draggin'. It was easier to train, and you could train it on the text you were typing -- i.e. it was able to learn from corrections and merge them back into your voice profile.
Unfortunately, IBM released it in 2001-2002, then forgot about it. They've since gone onto their non-training voice recognition solutions for sale to businesses. They seem to have advanced, but not in any retail product.
Dragon has come out with updates, but from people who have used and trained on *both*, ViaVoice has higher accuracy (~1% difference). The ViaVoice product price has fallen, and Dragon has, of course, gone up....
Whatever product you get, get a fast 2+CPU machine with lots of RAM - 2GB or more. The ViaVoice algorithm adapts to your talking speed -- it will perform more looks and comparisons and have greater accuracy as the processor speed goes up. ViaVoice stops comparing when it runs out of time (your speaking has gotten too far ahead). But it listens to the words, in context, to determine spelling. The more memory it has, the more vocabulary it can pull into memory. Note -- I am saying get a dual-cpu (or dual core) machine, the faster the better.
Viavoice was also released on Linux, but without as much application support.
For coding support in voice products -- there just hasn't been enough demand.
But for "wrist support" -- try a multi-faceted approach. Maybe voice recognition, maybe a tablet for input? Ergo keyboards, trackballs? It's not a comfy field. There isn't a great financial incentive to develop voice input for coding when you can hire foreigners for peanuts, and keep having eager generations of new hackers to come and be sacrificial lambs on the keyboards of progress...;-)
xvoice is a gtk1 X application which uses IBM's ViaVoice engine to provide voice control and dictation support to arbitrary X applications. xvoice.sf.net is the url. The mailing list mainly covers issues of getting the ViaVoice libs working on modern distributions. The last release of VV was around the glibc2.0/2.1 era and most new ld.so's will struggle to execute the libraries and java dependancies. It's also fairly hard to buy a copy of VV 2nd hand anywhere and IBM appear to ignore any request to release it.
However once you get past all of these issues (actually even running the old gtk1 xvoice becomes hard on modern dists), it works a charm. As it's X clean, you can X to any X server, be it one run under OSX or Windows, or a Sun SPARC box. You just need the mic connected to the x86 Linux box the client runs on.
This meets your requirement for editing in vim etc. The accuracy, I found was fantastic.