Speech Recognition in Silicon

Funny... by leonmergen · 2004-09-14 02:56 · Score: 5, Interesting

Funny, I work on a speech recognition research project, and well, i have to say, think about all the possibilities... automa ted speech2text recording of meetings, on-the-fly subtitling of live tv shows, but it can get better : think about searching multimedia files in a google-kind of way based on audio, that automatically directs you to that part of the file where you want to be...

If this really is true what they're saying, and knowing how much money is invested in speech recognition research on a yearl y basis, yeah, i would definately say that this is one million dollars of great investment...

... but then again, maybe they're just throwing around with numbers to make sure they get their money. :)

--
- Leon Mergen
http://www.solatis.com

Re:Funny... by tubbtubb · 2004-09-14 03:19 · Score: 2, Interesting

My understanding of speech recognition is minimal, but from what I understand the meat of this chip would probably just be a floating point SIMD engine to do FFTs, and some comparison and control logic.

I'm wondering if you could just do this with your average ATI or Nvidia 3D chip and an FPGA wrapper?
Re:Funny... by syukton · 2004-09-14 03:20 · Score: 3, Interesting

From what you describe, it isn't so much a speech recognition thing as it is a sound recognition thing; essentially, a way for a computer to logically distinguish between many millions of different sounds.

How far away are we from having a machine that could identify all of the instruments in a piece of music by "listening" to the music? I say "listening" because there need not physically be a playback-and-listen, the playback could be mathematically modeled by the computer.

--
Reinvent the wheel only at either a lower cost, greater effectiveness, or your own personal enrichment and satisfaction.
Re:Funny... by Anonymous Coward · 2004-09-14 03:27 · Score: 1, Interesting

In the UK there is something similar, called Shazam. Which works surpisingly well.
Re:Funny... by richy+freeway · 2004-09-14 03:27 · Score: 3, Interesting

We have something like that in the UK called Shazam.

Just dial a number on your mobile phone, hold it up to the speaker while the tune you want ID'd is playing and it'll SMS you back shortly with the track name and artist. You can then log onto the Shazam website, enter in your mobile number and you get a list of all the tracks you've searched for along with links to an Amazon search so you can purchase the track.

Pretty good for ID'ing tracks when you're in a club and can't get to the DJ to hassle him. :P
Re:Funny... by Anonymous Coward · 2004-09-14 06:14 · Score: 1, Interesting

It is speech recognition because they are trying to recognize a smaller subset of spoken syllables. Actually, I think it is half-syllables. There are apparently several hundred of these (complicated a little by dialects/Bush obviously).

1... million... DOLLARS!!! by AKAImBatman · 2004-09-14 02:56 · Score: 5, Interesting

Good use of $1 million?

Let me think for a moment... Hell yeah! If we had low power speech processors, the possibilities would be endless. For one, we'd finally have a Star Trek(TM) interface for our homes!

"Computer, lights!"
"Computer, make coffee!"
"Computer, Earl Grey, hot!"

As silly as it may sound, such an interface would be far more efficient than mashing buttons.

In addition, blind people could be significantly helped by this. Many of them already use speech recognition and synthesis to assist in computer usage. Imagine if their computers could suddenly understand them a thousand times better? They could talk to their computers a bit more naturally, thus saving their vocal chords from undue stress.

Other applications (off the top of my head) are:

- Voice notes on embedded devices (store only text!)
- Helpful Kiosks that can give you directions
- A new use for natural language database queries (i.e. Ask the computer what last quarter's net sales were.)
- Voice controlled robots ("You missed a corner, vacuum cleaner")
- Data search by voice ("Find me a channel that plays Star Trek")

Any other cool ideas out there?

--
Javascript + Nintendo DSi = DSiCade

Re:1... million... DOLLARS!!! by theparanoidcynic · 2004-09-14 03:05 · Score: 5, Interesting

Any other cool ideas out there?

Universal language translators. Imagine headphones that let you understand any known language.

--
Only in a Slashdot fantasy can a Slackware install turn into several hours of sex . . . . .
Re:1... million... DOLLARS!!! by AKAImBatman · 2004-09-14 03:14 · Score: 2, Interesting

It's not that hard. Have you ever seen those automatic coffee machines? i.e. Put a few quarters in, then punch a bunch of "options" buttons. A cup drops down, and fills with coffee, cream, sugar, and any other options offered by the machine.

The same could be done with tea. Just keep a reservoir of hot water, a stack of tea bags, cubes of sugar, and refrigerated lemons. When you order tea, the machine would inject the bag into the hot water stream, then drop the sugar and lemon into the tea.

Voila, Earl Grey, hot! ;-)

--
Javascript + Nintendo DSi = DSiCade
Re:1... million... DOLLARS!!! by AKAImBatman · 2004-09-14 03:21 · Score: 2, Interesting

Ah hah! Found one!

--
Javascript + Nintendo DSi = DSiCade

A measily $1 million? by Aggrazel · 2004-09-14 03:03 · Score: 2, Interesting

Imagine how much money could be saved if you could *perfect* speach recognition.

Heck, the hospital I used to work at by itself spent over a million dollars a year on medical transcriptionists ...

Good use of $1 million? by Threni · 2004-09-14 03:07 · Score: 3, Interesting

Depends. It's not as good as using it to prevent the deaths of thousands - possibly tens of thousands - of people by ensuring they have clean drinking water and shelter from the elements. But hey - you can't put a price on being able to speak to a computer rather than type when you're ordering a pizza.

History.. by SillyNickName4me · 2004-09-14 03:07 · Score: 4, Interesting

During 1994 upto 1998 I did marketign and technical support for IBM's Voicetype Dictation products..

Initially, doing anythign beyond understanding a few words would take special hardware, but after a bit of 'training' highly acurate and fast speech to text was quite a possibility with a specially developed dsp.

Then, the pentium class cpus came about, and a p90 could just do the whole thing without the dsp.

So, now someone is developing a new dedicated piece of silicon for this.. lets see how long it takes for general purpose computers to catch up.

The issue is not that this is not usefull, but that it either has to keep developing, or offer a somewhat longer lasting price/performance ratio or much better features for a logn time to come.

Re:History.. by geordie_loz · 2004-09-14 03:30 · Score: 1, Interesting

I considered this too.. the article does address this however.

Small low-power units are useful for say a soldier's helmet, or in a PDA.

I'd also say, that the same thing happened with 3D cards, and they keep making them faster/more features, but you could play half-life with software 3D on a 2.x Ghz PC looking pretty much the same as it did on a Voodoo card back in the day.

The question is rather, would there be much future speed advances in hardware, or once it's built, would later software recognition do as well - a little like DVD hardware cards. I have an encore card, but software decoding beats it now, and my DVD decoding doesn't need to be any faster.

I think the thing they're looking for is building some cheap (as) chips for embedded systems, like mobile phones and PDA's.

Better approach by Lord+Kano · 2004-09-14 03:08 · Score: 3, Interesting

Using specialised DSPs makes more sense to me than burning up generic CPU cycles. There have been many examples over the years of how a specialized DSP is more efficient and effective for a narrow task than a regular CPU. Look at portable MP3 players. They use tiny specialized DSPs to decode the files in a manner that is much more efficient than using a regular CPU.

We'll still need to do traditional development to interpret the data from the DSPs. We'll need to parse the output so that we can use natural commands to control devices.

"Coffee maker, brew 10 cups, strong."
"Bathroom lights, on."

Without some manner of AI to interpret them, these phrases will be useless.

LK

--
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano

Yay! Boo! Uh... Oh bugger.... by MooseByte · 2004-09-14 03:08 · Score: 4, Interesting

From the blog: ''Homeland security applications are the big reason we were chosen for this award,'' says Rutenbar. ''Imagine if an emergency responder could query a critical online database with voice alone, without returning to a vehicle, in a noisy and dangerous environment. The possibilities are endless.''

Like some slight tweaking in order to deploy massive voiceprint-recognition silicon arrays for amazingly efficient automatic realtime conversation transcription and identity determination, attached to Echelon.

So cool... so potentially evil... head begins to hurt... tinfoil hat burning....

Pretty Ambitious, Harder than it sounds by Anonymous Coward · 2004-09-14 03:12 · Score: 5, Interesting

Although $1million significantly can speed things up, this is a pretty ambitious undertaking.

My Master's research was on implementing machine learning in hardware, specifically support vector machines.

Now, they have much more money than I did, and probably this will be a collaboration involving many graduate students, but converting complex algorithms from software to hardware is no easy task.

It is just easier to do things in software, that's why it has evolved. The modular layers of abstraction allow a Computer Scientist working in machine learning or speech recognition to not have to worry about how the underlying hardware works.

Working in hardware, a lot these issues come face to face. Particularly since you want an architecture on a chip, whereas in a conventional desktop/server system there are resources such as lots of RAM, harddrive space, etc are available and their interconnections have been built and refined over decades.

Throw in concerns about small form factor, low power consumption, quite fast a lot of unexpected hurles pop up.

My master's research goal was to produce a data mining/machine learning machine, or at the very least a data mining/machine learning co-processor. In retrospect, that was a very ambitious goal that would require many years of work, probably in collaboration with other graduate students.

What I ended up doing was just Support Vector Machines in digital hardware. Now granted, there is another aspect to my research that I'm not mentioning here, mainly that I didn't use normal floating point mathematical architectures, but a different innovative logarithmic based mathematical architecture. That in itself was a significant undertaking.

In any case, this sounds like a great project, I just wonder how much they can do in their (in an academic sense) very small time frame of 2-3 years. Even though a lot of preliminary work has probably already been done just to apply for the grant.

In any case, it is great to see something like this, something to keep in mind in case I ever go back for a Ph.D.

You bet it's worth it by Tairnyn · 2004-09-14 03:19 · Score: 3, Interesting

Once this technology has matured and some more headway can be made in Natural Language Processing, (uncertainty for teh win) we'll be on the cusp of some really excellent improvements in human-computer interfaces. It's becoming more common to see 'intelligent' systems being built to mirror the architecture of the human nervous system. This will be a necessary step to forming a generally proficient AI system. The day a computer can readily recognize you're being sarcastic, it's time to be paranoid.

--
"Don't waste your time or time will waste you" -MUSE

brains are and probably should be modular by deathcloset · 2004-09-14 03:21 · Score: 2, Interesting

This sounds like a great idea. Sometimes a Hammer works better than a screwdriver at a certain task. Not all Jobs can be preformed as well by a single tool or method.

After all, the human brain has different areas for processing different types of stimuli.

In fact, some parts of our brain are so radically different they are almost considered brains of their own.

like the cerebellem; it's often referred to as "the small brain". This controls motor coordination - and in humans allows us to do amazing things like flips, kung-fu, and cup-stacking.

And forgive me for forgetting the exact names, but the brain has layers as well. the outmost layer being the cortex (where most of the higher-level mamillian processing takes place - correct me if I'm wrong, the frontal lobe is pretty much purely cortical tissue). as you delve deeper you get into the hippocampus and medulla whatever (sorry IANAN I am not a Neurologist) which is where emotion rules - and if I again remember correctly is sometimes referred to as the "reptilian" brain.

Even the eyes themselves can almost be considered little 'brains' of thier own - considering the amount of pre-processing they do (maybe a co-processor would be more accurate).

make

The UN would probably use this heavily by ARRRLovin · 2004-09-14 03:24 · Score: 2, Interesting

With the advent hardware speech recognition, hardware speech translation is just the next evolution. Imagine being able to go to any country in the world and have just an iPod size device and a bluetooth hearing aid as a translator.

--
-Randy

Live Chat & Search by LionKimbro · 2004-09-14 04:08 · Score: 2, Interesting

With voice software, you can already speak in real-time, conference style. I think Skype supports 5 people.

With speech-to-text, you could log all conversation to IRC.

Then you could have search engines that search *all conversation within the last 5 minutes, world-wide.*

Well, at least all conversation that was okay with being public.

So you could say, "Show me all conversations that are going right now about Python, and immediately find the people talking about Python, wherever they were.

One step towards the HiveMind.

Re:Carnivore on telephones by Anonymous Coward · 2004-09-14 04:17 · Score: 1, Interesting

At least they admit it in the article:

... may revolutionize the way humans communicate and have a significant impact on America's homeland security.

Why doesn't this kind of thing bother more people?

national security? by bob_jenkins · 2004-09-14 06:10 · Score: 2, Interesting

Why are they talking about querying online databases for 911 calls as the national security app? It's obvious the national security app is to translate every single phone call to text and store them (indexed) in a classified database. I've attempted to believe the US wouldn't do this because it's illegal, but I can't manage to suspend disbelief. The only way to avoid this is if phone calls are encrypted and the US doesn't have the keys.

Slashdot Mirror

Speech Recognition in Silicon

23 of 328 comments (clear)