Speech Recognition, Voice Verification -- Free
ten thirty writes: "TECHNOCRAT.NET recently featured a great
article regarding the dawning (well, it's only a few of years old anyway) of speech recognition software within the open source community. In particular, the
Sphinx
project of Carnegie Mellon University is discussed, as well as some other systems such as Festival and a public domain project at the University of Missouri. The notion here is that
eventually the GUI, which has come so far over the past two decades, will eventually be supplanted, at least for some applications, by the VUI. The question is, will the open-source community allow the integration of this technology into our society be spearheaded by closed-source vendors?"
It'd also be nice in a wearable computer system, though I'm sure someone already has a patent on using voice to control a wearable computer.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
User: Post this story
Computer: Unable to toast lorry
User: No, Post, P
Computer: Command 'host tea'. Tea is scheduled for 16:00
User: Post the damn story
Computer: Command 'roast ham'. Oven is preheating. Would you like to serve the Ham with tea?
User: Cancel, I do not want ham, I do not want spam, I do not like it in a car, I do not like it at the bar. Just post the story.
...
JET Program: see Japan, meet intere
Having a speech recognition toolbox is only one part of the problem. As many people in the domain (I used to work in speech recognition) will tell you is that sometimes, the key to a good speech recognition engine is not in the code, but in the speech data used to train it. Speech databases are very expensive and speech recognition companies usually have a lot of "proprietary" databases.
One project which addresses the problem is the Open Mind Initiative, and more specifically the Open Mind Speech Recognition project, for which I am the coordinator. Our goal is to collect data from people on the internet and make that data available to people working on speech recognition with a GPL-like license. I think this is the key to having OSS speech recognition engines perform as well as the proprietary ones. The project is not very advanced yet, but any help would be really welcomed.
Opus: the Swiss army knife of audio codec
Mike Monkowski - One of the engineers for via-voice recently asked why via-voice had so few developers using it.
I replied with the following-
I would suspect, that the primary reason [there are so few developers of via-voice] is the desire of (free software) programmers to not make their code dependent on non-free (as in speech) software. For better or worse, many Linux programmers will reject, out of hand, any library or software that is not based upon one of the standard free licenses (GPL, LGPL, BSD, NPL, Artistic, etc.).
Given that IBM is unlikely to change it's licensing terms in the near future, and that (free) programmers are unlikely to change their moral stance on using 'non-free' software. Development with viavoice will likely
be limited to commercial programmers, or those situations where STT/VTS are a necessity such as applications for the blind.
Tom M.
TomM@pentstar.com
In a latter post he asked our opinion on the IBM Public License. My reply was thus...
"I did a search on the web for discussions on the IBM Public License (IPL).
According to Bruce Perens, (and the general consensus...)- the IPL is OSD
(Open Source Definition) compliant, but not GPL compatible. Being OSD
compliant will certainly encourage more developers, however, how many is the
big question. Of the free software developers out there, my guess would be
that 80% (likely more?) will only develop (in their free time) with software
that is GPL compatible (i.e. GPL, LGPL, BSD, and a few others). However,
for 'work' stuff, the IPL is less problematic, and thus would lead to more
commercial development (not as much as the GPL, BSD, LGPL - but mostly for
'religious' reasons).
Personally, I would recommend going with the GPL, which would result in full
and quick integration with all of the Linux distributions, and allow source
from many useful GPL and LGPL projects to be integrated/merge with it. I'm
guessing that the developer good will from such an action would be
Phenomenal. The suggestion of another poster that viavoice should be viewed
as infrastructure is very valid. However, I'm a realist. There is almost
zero chance of IBM doing that unless they come out with their own Linux
distribution, and tout complete voice integration as the big selling point,
or, the dollar value of developer good will is high enough to justify
whatever future lost revenue would be. (I'd bet that it certainly would be-
having a 'truly free' voice software solution would be rather impressive.
The fact that viavoice isn't considered a drowning/dying product (I.e.
Netscape) or (in the case of Apple) one that was previously free - would be
all the more impressive.
So, given the above, I would say that changing to the IPL might well give vv
a strong pull for more developers, certainly enough to justify the change.
Of course, as suggested above, an even stronger case can be made for the
GPL.
Tom M.
TomM@pentstar.com
"
If you would care to contribute to the conversation, you can join by sending email to
join-viavoice@laser.sparklist.com
Thanks,
LetterRip
Tom M.
Just as GUIs weren't practical in 1980. Or pick an earlier year if you would dispute that. The point is that this idea is more than current technology can handle.
GUIs allow users to do more with less knowledge and less work if properly designed. For instance, it is easier to drag select several folders then drop them into the trash, than it is to explicitly name those directories in a CLI.
But the GUI didn't replace the CLI, it augmented it, and relegated it to a secondary function, or one for power users only. The Next Big Thing, will do the same.
I am one click away from reading new mail after it comes in, and I don't think it would be a great improvement to have to say outloud, "Read new mail." But for less experienced users, being able to say, "New message to Bob Jones, copy marketing team, blind copy Jon Bones. Dear Bob, I love you like the brother...." That's valuable, and would be quicker than CLI or GUI if it worked.
The challenges are myriad. How do you insure privacy? How do you achieve accuracy? (Though accuracy never stopped the CLI or GUI).
"are em space dash eff capital arr space slash enter."
No worries; your computer will dutifully add to the command line:
bash$ Our imps pace the chef cap a dull ours pace lashing turn.
which may give the grammar checker fits but which won't erase your hard drive.
Well, I spend some of my time helping out a quadraplegic friend of mine...having an open-source framework for building a reliable open-source voice app would be ideal for him. Having seen some of the posts on current projects, none of them right now fit the needs of someone who is quad impaired. Being online is about his ONLY source of interpersonal socialization right now, and probably will be for quite some time.
There are three problems with voice apps right now.
First is the lack of off-the-shelf recognition. Dragon gets better than 90%, IBM ViaVoice MIGHT get about 60%, others score well below that. For someone with no hands and a non-technical nurse for day-to-day assistance, Dragon ends up being the choice for now. Mind you, an ideal system should be able to be installed with one or two clicks, and then be on Voice Recognition through the rest of the process, or it won't work for most of the physically impaired. As things stand, Dragon is all he can consider using, being that the other packages he has demo'd have all required AT LEAST 45 min of voice recognition training to be done at a given time prior to getting functionality. Given that the amount of time that most quads get with someone who knows a delete key from a return key is limited, most of these apps are pretty useless. Dragon is the only one that will let you do this at your leisure.
Second is impact on resources. Most disabled people dont have them. My friend's box is built out of donated parts. The software, Dragon, costs more than $400 and was donated as well. Now, Dragon gets that 90% and stability from running on at least 256M of RAM, on a 500 Mhz processor. Did I mention that these closed source software houses completely revamp their software every so often, requiring you to buy a completely new version just about whenever you upgrade your hardware? Additionally, my friend is one of the very lucky few to know anyone in the computer biz. There are three of us that spare time for him whenever we can, but most people are stuck buying their time. Think of what this means when it comes to upgrading every so often. Remember, you can't even hit a return key, much less open up your box. For that matter, neither can your nurse, really.
Third is actual usability. Most of these voice systems are designed for and by sighted people who can use their hands. 'Nuff said.
Ideally, it would take the efforts of several physically impaired people working with some coders to come up with a working Voice Recognition package that was open-sourced and designed with the impaired user in mind. It is nice that some of the framework apps useful for that type of project are now open-sourced.
'Hail Eris, baby, hail Eris...pfffffffttt.' *cough* 'Yeah.'