Slashdot Mirror


Speech Recognition in Silicon

Ben Sullivan writes "NSF-funded researchers are working to develop a silicon-based approach to speech recognition. "The goal is to create a radically new and efficient silicon chip architecture that only does speech recognition, but does this 100 to 1,000 times more efficiently than a conventional computer." Good use of $1 million?"

7 of 328 comments (clear)

  1. Text of article by Anonymous Coward · · Score: 4, Informative

    Carnegie Mellon University's Rob A. Rutenbar is leading a national research team to develop a new, efficient silicon chip that may revolutionize the way humans communicate and have a significant impact on America's homeland security. Rutenbar, a professor of electrical and computer engineering at Carnegie Mellon, working jointly with researchers at the University of California at Berkeley received a $1 million grant from the National Science Foundation to move automatic speech recognition from software into hardware. ''I can ask my cell phone to 'Call Mom,''' says Rutenbar, ''but I can't dictate a detailed email complaint to my travel agent or navigate a complicated Internet database by voice alone.''

    From Carnegie Mellon University:

    Carnegie Mellon engineering researchers to create speech recognition in silicon

    Team to develop new silicon chip

    Carnegie Mellon University's Rob A. Rutenbar is leading a national research team to develop a new, efficient silicon chip that may revolutionize the way humans communicate and have a significant impact on America's homeland security.

    Rutenbar, a professor of electrical and computer engineering at Carnegie Mellon, working jointly with researchers at the University of California at Berkeley received a $1 million grant from the National Science Foundation to move automatic speech recognition from software into hardware.

    ''I can ask my cell phone to 'Call Mom,''' says Rutenbar, ''but I can't dictate a detailed email complaint to my travel agent or navigate a complicated Internet database by voice alone.''

    The problem is power--or rather, the lack of it. It takes a very powerful desktop computer to recognize arbitrary speech. ''But we can't put a PentiumTM in my cell phone, or in a soldier's helmet, or under a rock in a desert,'' explains Rutenbar, ''the batteries wouldn't last 10 minutes.''

    Thus, the goal is to create a radically new and efficient silicon chip architecture that only does speech recognition, but does this 100 to 1,000 times more efficiently than a conventional computer.

    The research team is uniquely poised to deliver on this ambitious project. Carnegie Mellon researchers pioneered much of today's successful speech recognition technology. This includes the influential 'Sphinx' project, the basis for many of today's commercial speech recognizers.

    ''We're still not even close to having a voice interface that will let you throw away your keyboard and mouse, but this current research could help us see speech as the primary modality on cell phones and PDAs,'' said Richard Stern, a professor in electrical and computer engineering and the team's senior speech recognition expert. ''To really throw away the keyboard, we have to go to silicon.'' But enhanced conversations between people and consumer products is not the main goal. ''Homeland security applications are the big reason we were chosen for this award,'' says Rutenbar. ''Imagine if an emergency responder could query a critical online database with voice alone, without returning to a vehicle, in a noisy and dangerous environment. The possibilities are endless.''

    Researchers plan to unveil speech-recognition chip architecture in two to three years.

  2. Save a few kilobytes... by tcopeland · · Score: 2, Informative

    ...and view the printable version.

  3. Re:1... million... DOLLARS!!! by frank_adrian314159 · · Score: 3, Informative
    There are two steps to an operation like these, speech to text, and understanding the text you get out. Speech recognition gives you the first part, but you still have to be able to pull apart the sentence and figure out what it means.

    In fact, converting the speech to text and then trying to analyze the text without sound-level annotations might give bad results, as tonal or emotional content would be lost. You need both simultaneously to really understand what's being said.

    --
    That is all.
  4. Re:Carnivore on telephones by ChefInnocent · · Score: 2, Informative

    Hello? Have you heard of Echelon?

  5. Re:1... million... DOLLARS!!! No by soltarusprime · · Score: 2, Informative

    You are forgetting the coded phonetic context of a word and distillations for "known dialects". Besides dialects, English is bereft with words that sound the same yet mean different things or even sound differently (slightly) depending on the surrounding contectual words and whether it is a statement, question or exclamation (different intonations). Feel free to multiply that K figure by up to 1000 times.

  6. Re:1... million... DOLLARS!!! by Sir+dies+alot · · Score: 2, Informative

    Actually they are one in the same, it is possible to determine what something means using today's voice recog. (I've got a setup that controls my entertainment center and lights in my apartment through voice recog) However it is wildly inefficient and difficult to setup. The reason is the english language is just about the most illogical system on the planet, and computers only understand logic. Due to the limited scope of my setup, I only had to record about 20-40 words/phrases and reference them differently in a database. Then you speak, it gets each word and follows a tree like structure jumping from each word to the next until it gets to the end. Any word not understood is simply filtered out as useless. When it reachs the "leaf" in the tree it has a command which it sends out the preconfigured port. Not a beautiful system but it works fairly well. If they make the ability to recognize text much more efficient, that means all the processing power that was being used to simultaneously decode and translate speech can be used to understand the speech. This is an immediate boost in power and then it just takes some good algorithms to be made in order for these inventions to become a plausible reality. Also, the reference about using a high-end PC to do this is true if thats not all it is doing. If you use a mid-range PC solely for voice processing, it should work just as well. (mine is running using spare processing time on my Athlon 64 3400+ with 2GB RAM, but I would assume that you could use a slower system if you werent doing anything else on it.)

    --
    The stupidity of your average American is just about the same as the average European, we simply show it off better.
  7. Power efficient speech recognition by Anonymous Coward · · Score: 1, Informative

    While we are on the topic of speech recognition hardware, here is a shameless plug for the Perception Processor that people might find interesting. The Perception Processor OR The Perception Processor