Good Cross-Platform Speech-Recognition Programs?
CryoStasis writes "I am a graduate student getting my degree in biomedical sciences. Because my work often requires me to maintain a local sterile environment (under a biological hood) I find that I am unable to physically touch my computer, which sits beside me, in order to open my notes, protocols, etc. while I'm working. As a result, I have begun to search for a voice-recognition program that will allow me to tell the computer what files/programs to launch. I know that the general field of voice recognition has come a long way, but I find that the built-in speech recognition systems in both OS X and Vista are clunky and difficult to use. Are there any good, cross-platform speech-recognition programs available that might fit the bill?"
I sure hope so, but I could not find any that were even worth considering. That includes the supposed "best," Dragon Naturally Speaking. It has a HORRIBLE system, only works with a small number of programs, and is cluttered even for those.
So you're a grad student in the sciences and write "build in" instead of "built-in".
We have pocketshinx working on windows, mac and linux in FreeSWITCH. http://www.freeswitch.org/ /b
Dragon Naturally Speaking is as close as it gets. And it's only really good for basically writing down your voice, it's not really that good for controlling your computer. I believe it works in both Vista and OS X.
There used to be ViaVoice that also worked in Linux IIRC - but it basically stopped working on it circa 2001/2002.
Perhaps another input device is called for, because voice recognition right now will only frustrate you more than anything for what you want to use it for.
BTW, I believe OS X has voice recognition built in you may want to check out for controlling your computer - but it's been ages since I used it. It's actually geared toward controlling your computer, and not to replace typing.
I work in a biological lab and have a similar problem. I find that paper is much simpler for most things. I have a notebook containing only printouts of protocols with little tabs denoting where each one is. I remove whatever protocol I'm using and carry it over to wherever I'm working. Anything else I need from my notes, I write on paper and carry. Yes, it's a bit wasteful, but I've found that in the preparation of gathering all the relevant pieces of paper, it really forces you to adequately prepare for an experiment instead of trying to figure it out on the fly.
You could always use Vista's speech recognition.
Here's a Video.
Yes, software exists. But most likely unless the program only performs simple operations with dialog boxes and can function with only limited keyboard input, you will probably find it inadequate or klunky, even if the speech recognition is perfect (it never is). Instead of asking whether speech synthesis software is right for you, the better question would be is your software a good fit for speech synthesis?
#fuckbeta #iamslashdot #dicemustdie
Kaiser MDs use Dragon.
I'm thinking you're only using one computer for most of your work anyway.
How important is cross platform - or is that just what the cool kids say these days?
There are also many natural speech recognition systems on the market:
http://www.allfordmustangs.com/photopost/showphoto.php/photo/42084
The speech recognition is quite accurate and they are very user friendly.
Wireless keyboard much?
You need to sterilize the computer (or keyboard & mouse) so you can bring them inside your hood.
Wireless keyboard & mouse is probably easiest. Autoclave, Ethylene Oxide gas, or gamma radiation will work.
This isn't directly an answer to your question, but why not put a keyboard/mouse in the hood and use that? A wireless keyboard, perhaps, or it shouldn't be too difficult to put an interface through one of the existing ports. They even make some smaller keyboards that take up less space.
Cute summer student.
or..
Let's Wreck a Nice Beach.
Yeah, there's no such thing as "good" speech recognition yet.
MABASPLOOM!
I use a nitrogen box (O2 and H2O less than 0.1 ppm) in my lab to test transistors. I test several hundred transistors at a time, and need to connect probes to electrodes on each one manually, so my hands are always in the glove box. In order to start my analysis program and enter a filename, I wired a USB port to an electrical feedthrough and put a USB hub inside. Originally the hub was just for a keyboard and mouse, but it has since proved useful for other devices (cameras, etc) as well.
Can you do something similar here, assuming a keyboard and mouse can be sufficiently sterilized?
you can control your PC without touching it.
You've probably heard of Johnny Lee before:
http://www.cs.cmu.edu/~johnny/projects/wii/
On the down side, Lee provides a windows-only framework, and you'd have to write an application on it, or re-write a cross-platform solution.
There are also other "gesture" solutions.
The current state of voice control is, unfortunately, rather clunky. On the plus side, there are slightly nonstandard peripherals that might do the job instead.
For some years now, there have been pointing devices for the disabled that essentially involve an IR webcam and a reflector or LED stuck to whatever part of the body the user can still move. http://www.naturalpoint.com/ make some such, I suspect that they also have competitors. On the cheap side, there has been a fair bit of buzz lately about using video processing software with ordinary webcams. A bit of googleing should turn up stuff for Win, Mac, and Linux.
On the keyboard side, silicone rubber flexible keyboards have proliferated alarmingly of late. The keyfeel is bloody awful; but they are cheap, fully sealed against moisture, and can survive cleaning with various moderately horrible solvents.
With a simple USB hub, you should be able to leave the keyboard and webcam in the hood, never having to touch the webcam, and dousing the keyboard in whatever horrible substances are necessary to keep it sterile, and just plug in the one USB cable to your laptop before you begin work. Not wildly elegant; but it should provide you with a standard keyboard and pointing device that fulfill your requirements.
There is no substitute for teamwork. I don't work in a biologically clean environment, but I do sometimes work in a vacuum clean environment which requires that I avoid touching anything that isn't cleaned to go into a UHV chamber. Having a teammate to work in the "dirty" environment in the rest of the lab makes things much, much easier.
The progress of research is never perfectly predicable, and you're always going to find some surprise which needs immediate attention. Having another person there means you don't have to prepare in advance every possible command you may need a computer to run, plus a person can do things like answer the phone and sign for deliveries. It's also good practice for later in your scientific career when you'll have to train and trust your own students/interns/employees.
Kind of a clunky idea, but here goes.
Get a numeric keypad, and pop off every other button cap. Map the remaining keys to whatever actions you want to control on the computer. Tape the keypad to the window on your hood, perhaps with blue masking tape (removes cleanly). Hit the buttons with your nose.
On Windows, I would get all the files opened, and have a key for Alt-Tab, and then keys for left, right, up and down.
Good Luck!
The best (and cheapest) speech recognition program is "undergrad". It will open anything you want on your computer, and even read it back to you. Sometimes it just stops working, though, so you might have to keep getting newer versions as they become available.
You can get a mouse that you can operate with your feet. Would that work?
For some years now, there have been pointing devices for the disabled that essentially involve an IR webcam and a reflector or LED stuck to whatever part of the body the user can still move.
Sounds like one of Johnny Lee's projects, you could probably accomplish this with a Wii-remote and his free software.
Why not to just buy wireless keyboard (second) and keep it always in sterile environment?
The Microsoft office suite has built in speech recognition software. You may have noticed the language bar with in your taskbar.But maybe that is what you were referring to when you mentioned Vista.
they are awkward but pretty cool. It's a virtual keyboard projected onto a flat surface which could be sterile. There's zero tactile feedback but you can use it for simple stuff.
Example
http://www.virtual-laser-keyboard.com/
http://voxforge.org/
http://lifehacker.com/software/speech-recognition/hack-attack-make-your-macs-speech-recognition-work-for-you-215764.php http://bbs.macscripter.net/viewtopic.php?id=24662
Shame you're sitting unseen. There are foot controls for the simple stuff he's asking for. Now if he wants to do something more complex then the voice option is the viable one.
Shai Schticks:"You don't make peace with friends, you make peace with enemies"
There's a beta version of gesture recognition software here:
http://www.movesinstitute.org/~kolsch/HandVu/HandVu.html
You might get a few bills for lab equipment breakages if you wave too hard, but at least the software is free.
Dragon Systems is by far the best speech to text resource. I use 9.0, but 10.0 is out. And by all accounts it is better. Like all good tools that have power and flexibility Dragon takes some time to master. But it is intelligent and repays hard work by improving. Suggest you get Dragon Preferred or, at a minimum, Pro. With these you can also make audio notes on a stand-alone recorder which may be fed in to the program later for transcription. If the audio is good (use a headset) the results are very good. Of course it needs an editing treatment, but what draft does not? So, you could make notes in addition to controlling the computer.
I suggest you practice at some time when your hands are not busy playing with the Andromeda Strain. And if you get skilled with Dragon you can swap modes; that is, speech to text or control mode.
The hard truth is this: Speech to text is something you have to learn how to do. Even if the program is perfect there is a learning curve for verbally inserting punctuation. And for writing with your voice. Nine has a feature to do punctuation automatically, but it works as poorly as most stenographers. In another life I used to dictate to a secretary who took shorthand. Even with her I interposed punctuation. And I can tell you...It really took me some time to learn her curves. Drum Roll Please
"No fear. No envy. No meanness." Liam Clancy
Inside the hood and sterilized with UV.
"I find that I am unable to physically touch my computer"
This makes computer sad. :(
Rather than trying to make speech recognition totally portable you might consider building it into a portable machine like an eeepc, then use that machine as a terminal for any system you want to interact with.
Try thinking in terms of a voice activated keyboard instead of a voice activated computer without a keyboard.
http://michaelsmith.id.au
I work in healthcare, and know a man paralyzed from the neck down who uses dragonspeak to do everything on his computer.
He has a laptop, and needs someone to turn his computer off and on. But, seems to do pretty well from there, at least for searching the internet. He also buys and trades stocks with it
He had to hire an expert to customize his laptop. So, while it's currently possible to do, it's probably not something that you can do easily.
Is it cross platform? Know idea. He uses windows xp.
Not business to be taken seriously.
How about a voice recorder? Transcription might be a pain, but a digital voice recorder seems a lot cheaper and more reliable -- if it works for you.
What is the real problem you are trying to solve?
Why is it you think you need access to your computer? Surely there are ways to record your results without recourse to a computer in a sterile environment. I mean seriously what is wrong with a notepad and a pencil? In the days of Newton, Galileo, Einstein, Lavoisier, Lord Kelvin, Darwin, Planck, Curie etc that was the best technology available and yet, amazingly, they were still capable of good science
All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
How about some sort of vmware (or kqemu etc) hack using dragon, then either write to a named pipe (if that's possible), or
make the file network mounted, and auto-save every 10 seconds? Actually, you could set Dragon up to input into a browser text-box, and do some AJAXy stuff to capture stdout....
Just a coincidence, but the tech writer at WaPo has an article up today about speech recognition software, FWIW. I used to use one that I have forgotten the name of unfortunatly on Mac classic..good for not much, but would open applications, that was fun enough "Computer! Open Netscape!" And that was about it. I imagine they have to be just a scosh better now. It's a goldmine though, if anyone really nails it, we have an aging population, the ones that have disposable income, who are getting arthritis in their fingers. Personally, I would like such a system for using the computer while doing some jobs, such as working on equipment and you get greasy hands, or say, you are fooling around on your bench and want to yak at the computer to display stuff because you have a hot iron in one hand and tweezers in the other. Very useful I think if it is ever perfected better.
Software such as web browsers can perform actions via mouse gestures. But what if you don't have a mouse? Use a webcam!
Google "gesture webcam" and you'll get links to demos on youtube and software. I'm not sure how mature this idea is but it sounds cool!
Have no keyboard? learn sign-language! :) For deaf people that can sign faster than they can type researchers are developing webcam recognition.
Those that don't grok sign-language could potentially use character-based gesture input modeled on Palm's Graffiti.
You don't want voice recognition. You want basic planning and lab book management skills.
You should be asking "Why didn't I get all of my protocols, reagents, samples, and equipment set up before I started my experiments for the day?"
I did quite a bit of biochemical benchwork to get my PhD, involving flu. Touching almost anything was either a bad idea for your health, or a worse idea for your experiment.
Instead, you laid out a plan for what experiments you were going to do for the day. You wrote it up in your notebook before you started. If you were doing a standard experiment, you probably had an easy excel template where you typed in the number of replicate experiments you wanted to run, and it did all of your calculations for you. Print it out, tape it in your notebook, grab all your samples and reagents from the freezer, and then (and *only* then) did you put on your gloves and go into the sterile hood.
My old lab book is *full* of these little protocols, usually with a typed note at the bottom about which samples I wanted to run, and a few hand written notes from after I took my gloves off.
For long, complex protocols, lay out a protocol book with step by step instructions. For really sensitive experiments, don't be afraid to change gloves after you flip the page. Gloves are cheap, compared to the reagents needed to run even a single PCR reaction.
A good craftsman has laid out all of his tools, plans and materials before he starts work. Good chefs have all their ingredients measured and utensils easily accessible before they start cooking. Either one *could* use a computer to track their project. But they don't, because it just makes everything more complicated.
Use a computer for planning, data storage, analysis, etc. Once you put the gloves on, good notebook skills put the computer to shame every time.
-V-
Who can decide a priori? Nobody.
-Sartre
"I'm sorry Dave, I'm afraid I can't do that. "
Why not simply purchase a touch-screen panel, and bring that into the hood with you? If you get a resistive-touch version, any object can be used -- including your sterilized aluminum stylist -- or a chop-stick. And because it's resistive, you could even use it through another substance, even through your hood itself. It's a simple USB cable to the computer, so you could keep it far away from the computer, and use it as you would a tablet.
Your speech-to-text system isn't the only problem. ...) but a lot of the words you're using won't even be in a regular dictionary.
The system must be able to recognize the specialized vocabulary you're using.
I don't know what your working on (bacteria, virii, gene sequences, proteins, clinical trials, medication,
I have the same problem (PDF in a bioeng lab). When necessary I use a PDA in a ziploc bag, handwriting recognition works fine through the plastic for me (2 different ipaqs). You can ethanol the bag if necessary.
I'm a grad student in computer science, specializing in AI. Although it is not my forte, I have studied speech recognition a fair amount, and I am friends with professors and grad students who are on the cutting edge of ASR.
Unfortunately, the real answer is that, at least by my standards, there is no good speech recognition anywhere.
One of the most challenging things about human speech is what we call "lack of invariance". The same word can be said by the same person two times in a row, within exactly the same context, and the signals will differ to an amazing degree.
At this point, if you have a hand-segmented accoustic signal, where the phone boundaries (such that there are any) are already marked, we have recognition rates exceeding 90%. But if the signal is not already marked, where the ASR machine has to segment automatically, the rate goes down dramatically. Then you have to recognize words, where the realization of any given word in any give context is not necessarily consistent with how you would typically describe the word phonemically. We see it all the time where what's in the accoustic signal is actually quite different from what the listener hears. It's really quite frustrating.
In my opinion, the accuracy of even cutting edge speech recognition software is pretty miserable.
You: "Wow. This virus interferes with T-Cells, even reanimating dead tissue. That's really wild!"
Computer: "Command accepted. Releasing virus into the wild."
Use your feet. Set up a big touch pad on the ground with some pedals.
The best speech recognition application I've come across for creating my own speech commands to open programs, files, and even websites without touching my PC is Tazti Speech Recognition by Voice Tech Group. It's a free download and works 100% of the time with custom commands I create. It does require a some training for the XP version, but less for the Vista. I've used them both.
I found out about tazti through a Popular Science Online article. It's also mentioned in a Geek.com blog and also a blog post on the Intel Software website that talks about creating custom commands.
It works on XP and Vista and a friend of mine installed it on a Mac but had to use Parallels and Windows on top of Parallels and then installed tazti.
Other Features: I can control the iTunes player, log into and Navigate Facebook and Myspace, and perform Voice Searches of Google, yahoo, MSN, Amazon, eBay, Wikipedia... all by talking to my PC. There are about 15 search engines or websites with search built in. It has other features too, but you can check it out yourself. There's a demo video on YouTube.
Best of all... this is a free download. I don't know how they can afford to do it????
Sphinx 3.5
Seriously. Sphinx is a great LV ASR system. Command and control is almost trivial nowadays.
Sterilise a keyboard, perhaps with one of the silicone covers that you get for using computer keyboards in sterile environments. Seal it into your sterile box. If you're really fussy, use a wireless keyboard so you don't even need a gland to take the cable through.
It's really sexy!
They learn to script instead.
Deleted
Pick a suitable wireless mouse pen, sterilize it.
Try SpeechVibe... It works well with dictation, is the best solution for command-and-control (you can navigate through the menus and dialog items with it), and can even paint stuff through tougher operations like drag and drop, etc... It is for Vista. As far as getting it to work on OS X, you will have to investigate to have a virtualization system on OS X.
Ok i have seen those on japanese markets, and it might be something for you to. It isnt speach ercognition, but what you might be able to work cleanroom clean with it : http://www.celluon.com/products/laserkey.htm?sm=2_1 How: Those are keyboards without mechanical key without even a board. They project a beam of light which displays keys on a table on which you can type. So in essence you can easily clean a table, it doesnt interfere its only light i gues you might even project from outside into your cleanroom
I know you're out there. I can feel you now. I know that you're afraid. You're afraid of us. You're afraid of change.
Period, end of report. In the PC world there essentially is no other general purpose voice interface tech that is even worth bothering with.
That being said, there are much better ones for very specific vertical markets, but not for general use.
Note that this means you ARE restricted to Windows. The stuff built into OSX and Vista are not even worth messing around with. They might in theory meet some very casual or narrow specific need of particular users but they are literally an order of magnitude slower and less reliable than Naturally Speaking.
If you MUST use a Mac or Linux etc then basically the answer is, you're SOL, there's nothing. Yeah, there are a few OSS bits out there, but frankly they aren't even at the level of being really functional software, let alone meeting speed or accuracy required from this type of software. It would be AWESOME if there was something open, but the fact is this area is just so technically demanding it appears to be beyond the reach of non-commercial effort.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
Simon listens is an open-source speech recognition program....
From your original post, it sounds like you need command control. This is significantly easier than general speech recognition and is well within reach for current computers and software. If you have a relatively small and consistent set of applications and actions, it's easy to program the voice recognition to map distinct sounds to specific hot key combinations. And with a decent microphone, you can get consistent results. I've used this technique, the only problem we had was ambient noise triggering the commands.
Why are you limiting this to voice processing? Consider a camera based system which records your hand gestures to control your computer. You should see the work of Juan Wachs for an introduction on this technology: http://www.movesinstitute.org/~jpwachs/index.html
We have something that might help.
Utter Command is a speech command system that works across all applications. It works with the NaturallySpeaking speech engine, and speeds command-and-control considerably. There's a lot more information here, including videos: www.redstartsystems.com
The best way around this would be to use terminals that have been installed within the clean lab. You can then remote into your desktop machine outside of the clean room.
I have been working in a clean room for years and this seems to be the best way to get data conveniently in and out of the lab.
You will need to train anything that does a decent job, especially in a non-standard environment. I'm not familiar with the hood environment, but if it introduces noise or otherwise changes the signal that's getting to the mike (compared with being outside the hood), then your software will do better if you train it while in the hood. Otherwise it's very likely to be a frustrating experience for you.
instantrimshot.com
You need to tell us more about what you use the computer for.
Speech recognition tends to be GUI-only and not cross platform. This is because the anticipated market is the disabled, who are usually users of only one machine. For most of the disabled, more than one machine is just a burden. You probably won't have more than one in the lab, either.
Controlling the computer (click that button, switch windows) is a different challenge than text input because the speech-to-text vocabulary is much larger.
In tech stuff like research, you are probably using a lot of words that are not even in the program's dictionary. And that matters a great deal for speech-to-text. Good speech2text products know that 'yes sir' makes a lot more sense than 'yes fur', so they keep track of what words go next to what else. Markov chains and all of that. Commercial software won't distinguish 'de-ionized reagent' from 'the lionized Regent'. That is, until after you train it for several days. You can train it.
Learning to use speech software is easier than learning to type. But you already know how to type, so learning speech software seems harder. You've had a dozen years of typing, already.
That said, the best speech-to-text software is from Dragon Systems, despite the unfamiliar name. All their competitors smartly gave up in face of the competition.
I18N == Intergalacticization