Speech Recognition in Silicon
Ben Sullivan writes "NSF-funded researchers are working to develop a silicon-based approach to speech recognition. "The goal is to create a radically new and efficient silicon chip architecture that only does speech recognition, but does this 100 to 1,000 times more efficiently than a conventional computer." Good use of $1 million?"
If this really is true what they're saying, and knowing how much money is invested in speech recognition research on a yearl y basis, yeah, i would definately say that this is one million dollars of great investment...
- Leon Mergen
http://www.solatis.com
Good use of $1 million?
Let me think for a moment... Hell yeah! If we had low power speech processors, the possibilities would be endless. For one, we'd finally have a Star Trek(TM) interface for our homes!
"Computer, lights!"
"Computer, make coffee!"
"Computer, Earl Grey, hot!"
As silly as it may sound, such an interface would be far more efficient than mashing buttons.
In addition, blind people could be significantly helped by this. Many of them already use speech recognition and synthesis to assist in computer usage. Imagine if their computers could suddenly understand them a thousand times better? They could talk to their computers a bit more naturally, thus saving their vocal chords from undue stress.
Other applications (off the top of my head) are:
- Voice notes on embedded devices (store only text!)
- Helpful Kiosks that can give you directions
- A new use for natural language database queries (i.e. Ask the computer what last quarter's net sales were.)
- Voice controlled robots ("You missed a corner, vacuum cleaner")
- Data search by voice ("Find me a channel that plays Star Trek")
Any other cool ideas out there?
Javascript + Nintendo DSi = DSiCade
Imagine how much money could be saved if you could *perfect* speach recognition.
...
Heck, the hospital I used to work at by itself spent over a million dollars a year on medical transcriptionists
Depends. It's not as good as using it to prevent the deaths of thousands - possibly tens of thousands - of people by ensuring they have clean drinking water and shelter from the elements. But hey - you can't put a price on being able to speak to a computer rather than type when you're ordering a pizza.
During 1994 upto 1998 I did marketign and technical support for IBM's Voicetype Dictation products..
Initially, doing anythign beyond understanding a few words would take special hardware, but after a bit of 'training' highly acurate and fast speech to text was quite a possibility with a specially developed dsp.
Then, the pentium class cpus came about, and a p90 could just do the whole thing without the dsp.
So, now someone is developing a new dedicated piece of silicon for this.. lets see how long it takes for general purpose computers to catch up.
The issue is not that this is not usefull, but that it either has to keep developing, or offer a somewhat longer lasting price/performance ratio or much better features for a logn time to come.
Using specialised DSPs makes more sense to me than burning up generic CPU cycles. There have been many examples over the years of how a specialized DSP is more efficient and effective for a narrow task than a regular CPU. Look at portable MP3 players. They use tiny specialized DSPs to decode the files in a manner that is much more efficient than using a regular CPU.
We'll still need to do traditional development to interpret the data from the DSPs. We'll need to parse the output so that we can use natural commands to control devices.
"Coffee maker, brew 10 cups, strong."
"Bathroom lights, on."
Without some manner of AI to interpret them, these phrases will be useless.
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
From the blog: ''Homeland security applications are the big reason we were chosen for this award,'' says Rutenbar. ''Imagine if an emergency responder could query a critical online database with voice alone, without returning to a vehicle, in a noisy and dangerous environment. The possibilities are endless.''
Like some slight tweaking in order to deploy massive voiceprint-recognition silicon arrays for amazingly efficient automatic realtime conversation transcription and identity determination, attached to Echelon.
So cool... so potentially evil... head begins to hurt... tinfoil hat burning....
Although $1million significantly can speed things up, this is a pretty ambitious undertaking.
My Master's research was on implementing machine learning in hardware, specifically support vector machines.
Now, they have much more money than I did, and probably this will be a collaboration involving many graduate students, but converting complex algorithms from software to hardware is no easy task.
It is just easier to do things in software, that's why it has evolved. The modular layers of abstraction allow a Computer Scientist working in machine learning or speech recognition to not have to worry about how the underlying hardware works.
Working in hardware, a lot these issues come face to face. Particularly since you want an architecture on a chip, whereas in a conventional desktop/server system there are resources such as lots of RAM, harddrive space, etc are available and their interconnections have been built and refined over decades.
Throw in concerns about small form factor, low power consumption, quite fast a lot of unexpected hurles pop up.
My master's research goal was to produce a data mining/machine learning machine, or at the very least a data mining/machine learning co-processor. In retrospect, that was a very ambitious goal that would require many years of work, probably in collaboration with other graduate students.
What I ended up doing was just Support Vector Machines in digital hardware. Now granted, there is another aspect to my research that I'm not mentioning here, mainly that I didn't use normal floating point mathematical architectures, but a different innovative logarithmic based mathematical architecture. That in itself was a significant undertaking.
In any case, this sounds like a great project, I just wonder how much they can do in their (in an academic sense) very small time frame of 2-3 years. Even though a lot of preliminary work has probably already been done just to apply for the grant.
In any case, it is great to see something like this, something to keep in mind in case I ever go back for a Ph.D.
Once this technology has matured and some more headway can be made in Natural Language Processing, (uncertainty for teh win) we'll be on the cusp of some really excellent improvements in human-computer interfaces. It's becoming more common to see 'intelligent' systems being built to mirror the architecture of the human nervous system. This will be a necessary step to forming a generally proficient AI system. The day a computer can readily recognize you're being sarcastic, it's time to be paranoid.
"Don't waste your time or time will waste you" -MUSE
This sounds like a great idea. Sometimes a Hammer works better than a screwdriver at a certain task. Not all Jobs can be preformed as well by a single tool or method.
After all, the human brain has different areas for processing different types of stimuli.
In fact, some parts of our brain are so radically different they are almost considered brains of their own.
like the cerebellem; it's often referred to as "the small brain". This controls motor coordination - and in humans allows us to do amazing things like flips, kung-fu, and cup-stacking.
And forgive me for forgetting the exact names, but the brain has layers as well. the outmost layer being the cortex (where most of the higher-level mamillian processing takes place - correct me if I'm wrong, the frontal lobe is pretty much purely cortical tissue). as you delve deeper you get into the hippocampus and medulla whatever (sorry IANAN I am not a Neurologist) which is where emotion rules - and if I again remember correctly is sometimes referred to as the "reptilian" brain.
Even the eyes themselves can almost be considered little 'brains' of thier own - considering the amount of pre-processing they do (maybe a co-processor would be more accurate).
make
With the advent hardware speech recognition, hardware speech translation is just the next evolution. Imagine being able to go to any country in the world and have just an iPod size device and a bluetooth hearing aid as a translator.
-Randy
With voice software, you can already speak in real-time, conference style. I think Skype supports 5 people.
With speech-to-text, you could log all conversation to IRC.
Then you could have search engines that search *all conversation within the last 5 minutes, world-wide.*
Well, at least all conversation that was okay with being public.
So you could say, "Show me all conversations that are going right now about Python, and immediately find the people talking about Python, wherever they were.
One step towards the HiveMind.
At least they admit it in the article:
... may revolutionize the way humans communicate and have a significant impact on America's homeland security.
Why doesn't this kind of thing bother more people?
Why are they talking about querying online databases for 911 calls as the national security app? It's obvious the national security app is to translate every single phone call to text and store them (indexed) in a classified database. I've attempted to believe the US wouldn't do this because it's illegal, but I can't manage to suspend disbelief. The only way to avoid this is if phone calls are encrypted and the US doesn't have the keys.