Speech Recognition, Voice Verification -- Free
ten thirty writes: "TECHNOCRAT.NET recently featured a great
article regarding the dawning (well, it's only a few of years old anyway) of speech recognition software within the open source community. In particular, the
Sphinx
project of Carnegie Mellon University is discussed, as well as some other systems such as Festival and a public domain project at the University of Missouri. The notion here is that
eventually the GUI, which has come so far over the past two decades, will eventually be supplanted, at least for some applications, by the VUI. The question is, will the open-source community allow the integration of this technology into our society be spearheaded by closed-source vendors?"
Having a speech recognition toolbox is only one part of the problem. As many people in the domain (I used to work in speech recognition) will tell you is that sometimes, the key to a good speech recognition engine is not in the code, but in the speech data used to train it. Speech databases are very expensive and speech recognition companies usually have a lot of "proprietary" databases.
One project which addresses the problem is the Open Mind Initiative, and more specifically the Open Mind Speech Recognition project, for which I am the coordinator. Our goal is to collect data from people on the internet and make that data available to people working on speech recognition with a GPL-like license. I think this is the key to having OSS speech recognition engines perform as well as the proprietary ones. The project is not very advanced yet, but any help would be really welcomed.
Opus: the Swiss army knife of audio codec
"are em space dash eff capital arr space slash enter."
No worries; your computer will dutifully add to the command line:
bash$ Our imps pace the chef cap a dull ours pace lashing turn.
which may give the grammar checker fits but which won't erase your hard drive.
Well, I spend some of my time helping out a quadraplegic friend of mine...having an open-source framework for building a reliable open-source voice app would be ideal for him. Having seen some of the posts on current projects, none of them right now fit the needs of someone who is quad impaired. Being online is about his ONLY source of interpersonal socialization right now, and probably will be for quite some time.
There are three problems with voice apps right now.
First is the lack of off-the-shelf recognition. Dragon gets better than 90%, IBM ViaVoice MIGHT get about 60%, others score well below that. For someone with no hands and a non-technical nurse for day-to-day assistance, Dragon ends up being the choice for now. Mind you, an ideal system should be able to be installed with one or two clicks, and then be on Voice Recognition through the rest of the process, or it won't work for most of the physically impaired. As things stand, Dragon is all he can consider using, being that the other packages he has demo'd have all required AT LEAST 45 min of voice recognition training to be done at a given time prior to getting functionality. Given that the amount of time that most quads get with someone who knows a delete key from a return key is limited, most of these apps are pretty useless. Dragon is the only one that will let you do this at your leisure.
Second is impact on resources. Most disabled people dont have them. My friend's box is built out of donated parts. The software, Dragon, costs more than $400 and was donated as well. Now, Dragon gets that 90% and stability from running on at least 256M of RAM, on a 500 Mhz processor. Did I mention that these closed source software houses completely revamp their software every so often, requiring you to buy a completely new version just about whenever you upgrade your hardware? Additionally, my friend is one of the very lucky few to know anyone in the computer biz. There are three of us that spare time for him whenever we can, but most people are stuck buying their time. Think of what this means when it comes to upgrading every so often. Remember, you can't even hit a return key, much less open up your box. For that matter, neither can your nurse, really.
Third is actual usability. Most of these voice systems are designed for and by sighted people who can use their hands. 'Nuff said.
Ideally, it would take the efforts of several physically impaired people working with some coders to come up with a working Voice Recognition package that was open-sourced and designed with the impaired user in mind. It is nice that some of the framework apps useful for that type of project are now open-sourced.
'Hail Eris, baby, hail Eris...pfffffffttt.' *cough* 'Yeah.'