Neural Net Outperfoms Human in Speech Recognition

HAL-9000 by centron · 1999-10-01 01:24 · Score: 1

We still have a full year in which to make a fully functional HAL-9000. Based on his neurosis, I imagine he must be a Neuron computer. So let's get cracking! All we need are a few million neurons, some clear plastic holographic hard drives, and infrared fish cameras. Then train, train, train! It would be a rare first for technology to achieve the level of sci-fi by the time sci-fi predicts it will exist.

--

XeoMage

Re:HAL-9000 by Yarn · 1999-10-01 01:41 · Score: 1

Nope, HAL has already been built in the '2001' reality, as he spent 5 (?) years learning how to run a space ship, reading how to kill people on the 'net and singing 'Mary has a little lamb'.

So unless he's been built secretly in this reality its too late.

--
-Yarn - Rio Karma: Excellent

Re:Is everybody a pessimist?? by Anonymous Coward · 1999-10-01 01:24 · Score: 0

Of course everyone's a pessimist, this is /. it's a required trait--either that or a cynic

;)

Cheers

Re:Article misses biggest (and scariest) use ... by ian+stevens · 1999-10-01 01:25 · Score: 1

"They could never do this with a tape recorder or a digital sampler because it would never recognize the person's voice! It would just record everything!"

Tape surveillance and a human filter can only go so far. Parabolic antennae are bulky and easily recognized. Humans using digital filters require training and equipment booking, which could be expensive depending on the amount of noise in the recording and the expertise of the user.

These speech recognition devices could put surveillance within reach of anyone, regardless of their expertise, without hiring potentially expensive human filters. You or I, without any surveillance experience or knowledge of digital filtering technology, could trail certain people and tape their conversations even in crowded areas.

--
ian

How does this new technique affect learning time? by astrashe · 1999-10-01 06:14 · Score: 1

How does this new technique affect the amount of time it takes for a network to learn?

Re:The scariest thing... by Millennium · 1999-10-01 06:16 · Score: 2

Actually, it looks as though this thing doesn't understand language better than humans (if at all). All it can do is pick out the sounds and form them into words. It still does not know what the words mean.

In essence, what was created is little more than a super-hearing-aid. Certainly a good thing for the hard-of-hearing (and this one, it would seem, could significantly boost the hearing of anyone, even those with "normal" hearing to start).

Re:Which human? by Anonymous Coward · 1999-10-01 06:17 · Score: 0

Fred Loki Qwertyfaster

Re:How to extend this: by TheDullBlade · 1999-10-01 06:27 · Score: 1

Actually, I suspect this is rather closely related to how the human brain works in some cases, but perhaps more efficient due to some shortcuts we can take. How could consciousness work without some form of broadcast? Somehow, thoughts are being moved around so different parts all over your brain are hearing the same things.

There is some very definite predefined structure to the brain between the gross and cellular anatomy. It isn't just a raw neural net that is trained by physical pleasure and pain. Somehow, consciousness, intent, recognition of success and failure are built in at the highest level and recognition of the same sound in different pitches and visual recognition independent of image location on the retina are built in at a low level.

I suspect if you could read and analyse all of the connections, it wouldn't look like one big mess of seemingly random connections, but a lot of small neural nets arranged and interconnected in hierarchies, sorting pipes, buses, and the like (which aren't especially neural mechanisms, they're probably better done in the familiar ways we've developed that suit silicon).

I agree about using more neural nets on the problem. I never meant to imply that it would be anything more than a single component in a larger system. At some point in the process, you need to recognize sounds, whatever you do with them later.

What I dislike is the way some people treat neural nets as a magic bullet, as if we only need to make a big enough neural net and it will solve any problem. I think only small neural nets really work well; beyond a few dozen neurons, an external structure is needed to get anything to work.

(IMHO, the most important thing for *-recognition programs to start doing is admitting that they didn't understand and asking for clarification; "best guess" is not a good strategy)

--
/.

Re:The scariest thing... by Anonymous Coward · 1999-10-01 06:43 · Score: 0

Unfortunately, no one can be told what the Matrix is, you have to see if for yourself.

(now available on DVD at a store near you!)

The imperative nsa connection by Hobbex · 1999-10-01 07:11 · Score: 2

And how many still believe that Echelon is not capable of recognizing words in conversations automatically?

-
/. is like a steer's horns, a point here, a point there and a lot of bull in between.

Re:The imperative nsa connection by KFury · 1999-10-01 10:48 · Score: 0

The point is that Eschelon wouldn't have to have 99% accuracy to be devastatingly effective. Even if it only got 10% of the words, it could still generate a pattern of conversational content, and tag that line for human analysis.

More useful than tracking who says what is simply tracking who calls who and just a fet bits of extra information, such as whether the tone was serious or humorous and the duration of the call.

This in and of itself could be formed into a neural net that would give devastatingly accurate insights into how information is flowing, and easily light up potential security risks.

The government does this now, as is noted in a 8/26/99 story in the San Jose Mercury News.

It's not the specific capabilities of science that'll change the world, it's the pervasiveness of it.

--

Kevin Fox

Moderators: score this UP, not down by Anonymous Coward · 1999-10-01 07:18 · Score: 0

This post hits THE point.

Pulling out 1 word out of 4 possibles is NOT hard, when you know those are the only possibilities. So what USC have achieved is NOT impressive.

Dragon was pulling the word out from 60,000 possibles. That is FAR harder, and MUCH more impressive.

Moderators, get a clue.

Bitchin' Beowulfs by Anonymous Coward · 1999-10-01 07:21 · Score: 0

A bitchin' Beowulf cluster of Alphas (with Dvorak keyboards) would romp ass over any friggin' "neural net"...

Who's faster?
* John C. Dvorak
* Qwerty Berst

This Kills Intel -- and AMD by Monir · 1999-10-01 07:26 · Score: 1

This breakthrough is a complete disaster for Intel -- they have been struggling to find a valid reason for people to upgrade to Pentium IIIs. Witness the absurd marketing attempt to convince people that they need Pentium IIIs and Intel's WebOutfitter service to really enjoy the Internet -- when all it really takes is a computer with a decent graphics card and reasonable Internet connection. Video decompression and game playing are tasks which a Pentium III can legitimately improve -- but only a limited subset of the population cares about those tasks. The real "killer app" was speech recognition -- which until today appeared to require tremendous amounts of processing power. Speech recognition is something virtually everyone could use -- and these two guys have just disproven the assumption that vast processing power is required. Monir

Re:This Kills Intel -- and AMD by Zurk · 1999-10-01 08:20 · Score: 2

nope. they have *NOT* done any such thing. They used a 11 node neural net for necognising 4 words...which is all well and good until you reach reality (10,000 words) and continous (rather than the discrete stuff they were doing) language processing. In that environment your pentium iii, K7 or even alpha isnt upto the task. Note that the 20 MILLION dollar electronics in the eurofighter are so far the only platform for recognising language independent speech with nearly 100% success rates in real time (what? you didnt know the eurofighter has speech recognition? now you do).

Re:Lies, Damned Lies, and Neural Nets by Mr.+Mikey · 1999-10-01 07:41 · Score: 1

That's an awful lot of bitterness for what amounts to a signal processing technique. Besides, last I heard there were quite a few successful applications for neural networks - credit card application processing and financial analysis, to name two.

--
wants to be the first monkey to touch the monolith

Re:Call me crazy... by mizerai · 1999-10-01 01:25 · Score: 2

This may sound like just a bunch of preachy BS, but it's very disturbing...

No, it's just preachy BS.

...and the creator of the first one that says "I'm Sorry Dave" will think he is god.

It's obvious you are just tripping. The creator of the first one that says "I'm Sorry Dave" will think he is Dave, or that he has been mistaken for someone who IS Dave.

...people were glorifying themselfs, when they were really just mimmicking what they already saw with the human brain.

As far as "glorifying themselfs" for mimicking what they already saw with the human brain, WTF are you talking about? Of course they're proud of themselves! They did something with a computer that nobody had ever been able to do before. That right there is fucking cool! I'm proud of them too, and I go to UCLA! (the researchers in question were from USC, in case anybody missed that)

--

--Mizerai

Re:Is everybody a pessimist?? by Angst+Badger · 1999-10-01 01:27 · Score: 2

There is more than just a little Big Brother possibility to this. If this technology actually works as advertised, it eliminates the last technical barrier preventing governments from monitoring all voice communications all the time. Heretofore, this was not practically possible because of the manpower required to listen to millions of voice calls; this technology will make it possible to search for key phrases in real time as well as to archive millions of calls efficiently. The fact that it is apparently both cheap and simple only makes things worse.

Almost equally disturbing is the apparent ability of the Berger-Liaw system to distinguish individual voices from background noise, which raises the specter of governments being able to use almost unimaginably faint sounds to avoid more intrusive methods of bugging, and the monitoring of conversations in crowds. Combine that with existing off-the-shelf technology for face recognition...

Let's just say that I will be very surprised if the first customers for this technology aren't in Beijing and even more surprised if they aren't quickly followed by the dolts in Washington.

And hey, if I can reconstruct what you say inside your home from the weak sound waves that drift out into the street, that might not even require a warrant...

--
Proud member of the Weirdo-American community.

Re:I get the impression by Cassandra · 1999-10-01 01:30 · Score: 2

The net has to know what it is listening for inside of the noise before it can actually pick it out.

NO, IT DOES NOT!!

It all comes down to statistics. Speech is a non-white signal. Noise is white. If you have two microphones/ears, you simply search for the linear combination of the two signals that is the most un-correlated temporally, and voala! You have found the speech signal. This is known as blind separation.

All this and a counting horse by An+El+Haqq · 1999-10-01 01:30 · Score: 5

It's difficult to evaluate this system given the sparse amount of information available. I, for one, am incredibly skeptical at this point.

a) There is no statement of the train/test procedure for the neural net. It's fairly easy to get good performance if you're training your system on the same dataset that you test. Without this information, you cannot make a reasonable judgement.

b) If you listen to the audio samples in the video at
http://www.usc.edu/ext-relations/news_service/re al/real_video.html

You can notice a significant difference in the times of the samples (e.g. "stop" is shorter than "yes"). A fairly unsophisticated NN can pick up on the length of a sound sample and generalize from there. I didn't hear any statement saying that in the official training and testing all sound samples were of the same length.

It's really a mess. If someone has a journal article or other piece of reliable information on this research, a pointer would be appreciated. Until then, I'll be feeding Clever Hans.

Re:All this and a counting horse by KFury · 1999-10-01 10:54 · Score: 3

All very good points.

What I noticed (and makes me wish they actually had a technical paper linked to the article to appease my methodological curiosity) is that the 'random background noise' was exactly the same for each word in a given round of testing.

If they were training by those samples, the entire story is bogus because the pure, unmasked original word could be extrapolated by taking one sample, inverting the wave, and adding a second sample.

to put it another way, the net wouldn't be learning how to interpret the word "no" or "fire" in a crowd. It would be learning how to understand that particular soundbyte of cocktail party babble and be able to distinguish in what way the original cocktail party sound was modified.

This is completely useless because you'll never have a need (or the opportunity) to have two (or four) differnt words masked over the exact same soundwave. The background noise will always be different from sample to sample in a real world test.

--

Kevin Fox
Re:All this and a counting horse by geoGIF · 1999-10-01 04:13 · Score: 2

This reminds me of a story I once read about how careful you must be when training a NN. Some researchers tried to use NNs to recognize tanks in aerial photographs. They had a nice set of sample data set with pictures of the same terrain; the tanks were there in one set of photographs and not there in the other. They trained the NN by feeding it digitized versions of the photographs, and it returned a binary result: tanks or no tanks. When it got it right, it was reinforced, when it got it wrong it wasn't. After extensive training, the NN was performing beyond expectations. It was producing the correct answer nearly 100% of the time. So, they moved beyond their test photographs, and tried some real photographs from the field. The NN failed miserably; it got no better results than tossing a coin. The researchers eventually determined that their test data had been photographed on two different days. On one day it was cloudy, on the other it was clear. The NN had been trained to distinguish between photographs taken on cloudy days vs. photographs taken on clear days.

Is it really recognizing speech? by My+Third+Account · 1999-10-01 01:31 · Score: 1

From watching the realmedia thing, it seems like the sounds sample it gave for recognizing the four words in random order with noise played a word, then silence, then word, etc.

Couldn't the system simply be detecting the length of the signal and interpreting it that way? With so few sample words it seems hard to tell how the thing is really working -- something which is not really explained.

Maybe I'm just way off tho.

Re:Is it really recognizing speech? by Anonymous Coward · 1999-10-02 02:25 · Score: 0

I agree. Let's see the same test with the words very, berry, and cherry.

Re:Long way to go, but cool for AI by Kythe · 1999-10-01 01:32 · Score: 1

On the other hand, this could be a great leap for neural networks in general. Realizing that the timing of synapse signals is a critical factor in neuron firing is going to shake up some things in AI

This is my feeling, too. This might be just the tip of the iceberg.

Can anyone see how something like this could be made using software?

Kythe
(Remove "x"'s from

--

Kythe

This is cool but... by chandoni · 1999-10-01 01:32 · Score: 2

I notice it's tested on "just a few spoken words". Even though researchers claim better performance than current systems on this small test set, I'd like to see how well it scales. Training a net with 11 neurons and 30 connections could be done using a lot of algorithms; some do not scale well at all to larger networks.

I'm more concerned that USC is trying to patent the "system and the architectural concepts on which it is based". As a computational biologist who uses neural nets in my work, I rely on the AI community to develop the underlying algorithms. If they get a patent on the algorithm and not just their hardware, that would severely limit the use of this breakthrough in other scientific areas.

JMC

Re:Long way to go, but cool for AI by Admiral+Burrito · 1999-10-01 01:37 · Score: 1

This 11 neuron system is capable of differentiating four words, each of which was trained extensively. That's a very tight niche. Until we have a system where each word doesn't have to be trained explicitly, we won't have gotten too far. (Imagine training your computer with the estimated 1+ million English words...)

Doesn't the English languages use only a few dozen sounds ("phonems" or something)?

Once you can recognize those sounds I'm pretty sure it's easy convert a list of those sounds to a written sentence. I'd bet it could be done in under 200 lines of Perl. :)

But I'm no speech recognition expert.

This has been done before... by Anonymous Coward · 1999-10-01 01:41 · Score: 0

but not this well.

From what I remember from my NN class, there have been spike train networks for at least a decade. They're mostly relegated to the backs of the NN books, though.

I'm pretty skeptical about this as being a major breakthrough until I see an algorithm or code. It looks promising though & demonstrates that these networks deserve more research.

What's on top of houses? Dog: Ruff (C'mon...) by Rares+Marian · 1999-10-01 08:18 · Score: 1

A 4004 could do that, just takes a week :)

All it did was match her voice recording of her saying his name as key to a database against her saying his name. Sheesh, hype hype hype hype hype hype.

--
The message on the other side of this sig is false.

Patent? by TheKodiak · 1999-10-01 00:17 · Score: 3

It said they'll apply for a patent - I wonder how much the patent will cover. I really hope they don't manage to get a patent covering the use of temporal information in neural networks as a whole - ordinarily, I'd assume they wouldn't, but given some recent patents, I tend to worry.

--
-=Best Viewed Using [INLINE]=-

Re:Patent? by chazR · 1999-10-01 02:08 · Score: 1

I am not a lawyer but...

My understanding of patent law is that a patent isn't just for a device, but for a use of that device. Please correct me if I'm wrong.

Given that this idea (temporal information in neural networks) has so many really cool possible applications, they'd have a very difficult job patenting all the uses.

I can see this being useful in just about any real-time control system, such as autopilots, assembly lines, controlling the temperature of your shower. Anything involving streaming data really.

Actually, the more I think about it, the more *really* crazy ideas for this I come up with. I've got a problem with automatic garbage collection in a system at the moment where this might help... Oh, dear - they probably can't patent it for that now. Whoops.

Incidentally, are there any regular /. readers who are lawyers? Should we get some?
Re:Patent? by Brecker · 1999-10-01 10:41 · Score: 1

If USC wins a patent for the "underlying architectures of this new technology," what are the chances that thought will require a license? Last I checked, my brain uses neural networks with dynamic timing. I pledge to be the first to defy my cease and decist order.

My children will not have their brains surgically removed at differentiation to avoid infringing USC's patent.
Re:Patent? by trelyle · 1999-10-01 12:58 · Score: 1

We better pay real close attention to what happens with this patent idea. As computing becomes more heuristic, we have more and more of this type of technology to deal with. Technology is becoming more powerful by leaps and bounds; is it true that absolute power corrupts absolutely?

--
"A society that will trade a little liberty for a little order will lose both, and deserve neither. " Ben Franklin
Re:Patent? by HiThere · 1999-10-01 03:28 · Score: 1

How about desk calendars, meeting makers, etc. You can do anything with a neural net that you can do with a different kind of computer. It just isn't usually worthwhile. And it doesn't really sound like this is any kind of exception.

That won't invalidate the patent, of course.

--

I think we've pushed this "anyone can grow up to be president" thing too far.
Re:Patent? by Cassandra · 1999-10-01 00:45 · Score: 1

I really hope they don't manage to get a patent covering the use of temporal information in neural networks as a whole - ordinarily, I'd assume they wouldn't, but given some recent patents, I tend to worry.

I'd be surprised, since the idea is not totally new. There was even talk about giving a course for PhD students at my university (Linköping) this spring on the topic (Temporal coding in ANN:s). Sadly, since we were only two people applying for the course, it was never given :-(
Re:Patent? by Anonymous Coward · 1999-10-01 23:47 · Score: 0

Neural networks using pulses are not new. MIT Press has a nice book on the subject:
http://mitpress.mit.edu/book-home.tcl?isbn=0262133 504
They may have something new with their training procedure.
BTW, when are people going to start complaining about publicly funded research being patented and used for commercial gain?

Re:Micheal is going to get you!#$^ by Anonymous Coward · 1999-10-01 08:19 · Score: 0

I believe that the point was that this could potentially make wiretapping and so forth cheaper by several orders of magnitude. And simple economics suggests that when something gets that much cheaper, people may use it more. So the real fear is that if it's that easy/cheap, then the government will be able to use it in far more situations.

Missing the point by CR0 · 1999-10-01 08:22 · Score: 1

I think many people are missing the point.
Ok, ya, this thing can do voice regognition. But the advantage of that is not so that I can dictate this speach instead of typing it (although I would) the advantage is "changing the different user interface".

Right now, computers are very specific in their instrution taking. Click a little to the left of a button, and the computer has no idea of what you are doing. Type copy instead of cp into most unix's and it won't even copy the dumb file.

If this neural net can distinguish my voice from others in a room, I can talk to it. "Computer, check all TV channels for the baseball scores and display on this screen please."

Not only does this make the computer easier to use, and more usable for more poeple, it makes it more useful. ***If frees us from sitting at a workstation.*** Notice in Star Trek they can say, "Computer, what is the population of Earth?" and it will respond.

Granted, Artificial Intelligence must be improved to allow the computer to understand this instruction, but the voice communication is ESSENTIAL.

I look forward to the days when I can chat with my computer.

Re:Missing the point by Anonymous Coward · 1999-10-01 13:04 · Score: 0

Not only does this make the computer easier to use, and more usable for more poeple, it makes it more useful. ***If frees us from sitting at a workstation.*** Notice in Star Trek they can say, "Computer, what is the population of Earth?" and it will respond. Not quite, there will always be benefits to being able to sit in front of a workstation, typing is a lot faster then speaking, controling things is still easier by hitting a key on a keyboard. Using your Star Trek example, they still sit in front of a display, and do (weird) kinds of typing. Not that speech recognition would not be used, just that it will never fully replace sitting in front of a terminal. The only thing I can think of that could replace them all, brain implants, you just think and poof, its done... :)

Morons still my favorite by Raul+Acevedo · 1999-10-01 00:18 · Score: 2

I wonder what a neural net made of bogons, morons and vogons would be like?
----------

--
In a real emergency, we would have all fled in terror, and you would not have been notified.

Re:Morons still my favorite by Sehnsucht · 1999-10-01 01:43 · Score: 1

Whoever moderated this as Offtopic should be slapped! It's funny! Give it the correct description, you ah... something or other!

sprechen mit dem Computer by RoLlEr_CoAsTeR · 1999-10-01 08:38 · Score: 1

I can see this capability being put to great use, because of all the places where there can be noise, and having a computer with speech recognition abilities like this would be very helpful as they could automate more things by speaking to the computer.. each person having their voice recognized by the computer (and, just for insurance, having at least dual processors, if not a few more), and each person yapping their commands to the computer and getting instant results...

Imagine calling your computer up with a phone.. any phone, a pay phone, etc.. and getting it booted up and ready for you.. how convenient!

--

Insert mind here.

Oh great, just what the world needs. by Pont · 1999-10-01 00:18 · Score: 3

I know speech recognition seems cool and it will be very good for the disabled, but it's not a purely good thing.

Now, instead of requiring at least 2 people to invade your privacy and listen to everything you say, one supercomputer and a bunch of listening devices let The Man (tm) listen to thousands of people at once and scan the transcripts for keywords and sentances.

Re:Remember in _Snow Crash_... by Anonymous Coward · 1999-10-01 08:58 · Score: 0

Actually, I used to (still do, I guess) know someone at Bell Atlantic (formerly Nynex) who was involved in developing their voice recognition stuff. However it was all server-side, and the hardware-vendor they used was pretty unreliable, IIRC. General Magic has been working on this type of stuff too, c.f. Portico, myTalk, etc. Again, limited word choice, but ability to distinguish words in noisy environments. Has anybody thought of using this to improve acoustic couplers? The main problem is background noise, right? And only two "words"... --Josh 0schrier_j@spcvxa.spc.edu/schrier@qtp.ufl.edu

I get the impression by konstant · 1999-10-01 00:19 · Score: 5

I get the impression that this net did not perform better "even" under noisy conditions, but "only" under noisy conditions.

Here's the original link
http://ww w.usc.edu/ext-relations/news_service/releases/stor ies/36013.html

If I'm right about that, then this development (while still insanely cool - don't get me wrong) might not be so surprising. As I recall from college brain-and-mind psych courses, humans use a variety of factors when singling out a lone voice or conversation in a noisy environment. These include spacial orientation, visual cues, etc. My prof called the "cocktail party effect". Rob them of these cues, and it isn't suprising that they are hobbled.

Also, computers have the mixed blessing of ignoring information patterns unless they are instructed to do otherwise. A person, listening to white noise, would subconsciously attempt to find meaning in every bleep and scratch. A computer, listening only for certain cues, can disregard the majority of the signal.

I would be interested in learning what rate of word recognition this system achieves. Current technology manages about 90%, which means one in every ten words is heard incorrectly. If they could improve that to 99.9% or even just 99%, we might actually get some speech-processors in Office desktop products.

-konstant

--
-konstant
Yes! We are all individuals! I'm not!

Re:I get the impression by methuseleh · 1999-10-01 04:16 · Score: 3

I just like the way they raised the term "hubbub" to the level of technical terminology. They even quantified it!

I can see it now:

"Joe, I'm reading 14% hubbub coming over this line--can you try to reduce it to 5%?"

Or even make it an actual unit of measure:

"Man, the rating on that party must have been 23.6 Khb." (Kilohubbubs)

Of course, that's assuming it'd be a metric measure. If it gets adopted here in the U.S. of A first, the above example might be 8 11/16 hb.

We need more technological terms like this :)

--

--
--
Think Green... Burn only 100% recycled dinosaurs in you car.
Re:I get the impression by wjwlsn · 1999-10-01 17:14 · Score: 1

Please pardon my ignorance here. I just listened to the sample audio track for this thing. I'm interested by the performance of the neural network on the set of four words "yes, no, fire, stop" with varying levels of conversation noise added. Listening to the samples, it occurred to me that even though there was no way in hell I could actually understand which words were being spoken, I noticed that you can almost tell which word is spoken simply by the length of each sample. So... is the neural network actually recognizing the words spoken, or just keying off the sample lengths?

--
Getting tired of Slashdot... moving to Usenet comp.misc for a while.
Re:I get the impression by Sesse · 1999-10-01 04:28 · Score: 1

Yes, the human should really beat the computer at filling in the meaning. (Example: I listen to some German speaker, and even with my bad German, I still get the general idea...)

I've heard somewhere (not confirmed from anybody else, though), that humans only actually hear _30%_ of what's being said, and then guesses the rest. Could anybody confirm (or thrash) this?

/* Steinar */

--
(This comment is of course GPLed.)
Re:I get the impression by Capt+Dan · 1999-10-01 00:48 · Score: 2

So here's what th article has to say about the nets' performance:

Even the best existing systems fail completely hen as little as 10 percent of hubbub masks a speaker's voice. At slightly higher noise levels, the likelihood that a human listener can identify spoken test words is mere chance. By contrast, Berger and Liaw's system functions at 60 percent recognition with a hubbub level 560 times the strength of the target stimulus.

With just a minor adjustment, the system can identify different speakers of the same word with superhuman acuity.

I see where konstant is going with his bit about computer listening for cues, hence the "minor adjustment" mentioned above. I cannot agree one way or the other without seeing uh hearing the actual tests.

But I can theorize wihtout any proof whatsoever =)

But since neral nets are trained, wouldn't make sense to train the net to listen at low noise levels, and then steadilly increase the level of white noise as performance includes? Baby steps. The net has to know what it is listening for inside of the noise before it can actually pick it out.

Anyone know about neral net trainging, or has more info on this project? Maybe saw it in a lab?

--
Sig:
Barbeque is a noun. Not a verb.
Re:I get the impression by Pedrito · 1999-10-01 00:55 · Score: 1

Actually, with current technology (IBM's ViaVoice, for example), you can actually achieve rates of higher than 95% accuracy. While still a bit of a problem, it's much better than just two or three years ago. Also, that's speaking at around 60 words per minute, which is faster than I believe I normally speak. I've done quite a bit of work with ViaVoice and have been very impressed with it's abilities. From reading the article, I didn't get the impression that it only works well in a noisy environment, though they didn't get strong numbers either, so it remains to be seen what it can really do. Being able to pick out a voice in a crowd is a huge problem in speech recognition right now. The slightest noise, much of produced by cheap microphones, or simply background noise (maybe music in the background) severely hampers the current technology from being truly useful day to day. Microsoft (I know, everyone hates them here, but credit where it's due and all) has been doing a lot of work in voice recognition on their research website. http://research.microsoft.com/srg/ Personally, I believe it's a real up-and-coming technology whose time has come. I've written quite a few voice recognition-enabled apps and it can be a tremendous time saver, IMHO.
Re:I get the impression by Cy+Guy · 1999-10-01 01:09 · Score: 2

> I get the impression that this net did not perform better "even" under noisy conditions, but "only" under noisy conditions.

If you look at the chart provided in the video you'll see the 'Dynamic Synapse' ALWAYS beat the human subject pool. In the zero background noise test, the net was accurate 100%, while the humans were right only 90% of the time. However, to be fair they should create the same number of 'Dynamic Synapse' listeners as humans in the pool and then compare the average results of the 'Dynamic Synapse' pool to the average results of the human pool.

--
Work for Change & GET PAID!
Re:I get the impression by koan · 1999-10-02 09:14 · Score: 1

The ability to pick out one voice in a crowd using a machine, what could that be used for =)
Big Brother just got better ears.

--
"If any question why we died, Tell them because our fathers lied."
Re:I get the impression by Anonymous Coward · 1999-10-01 01:19 · Score: 0

I also wonder whether they were testing recognition of random words.

I'd guess that they were, because to add in the human's ability to fill in the gaps based on context wouldn't really be a fair test of pure speech recognition.

Either way, the differences between a test of random words and a test of real speech would be interesting.

Louis Louis by Anonymous Coward · 1999-10-01 09:08 · Score: 0

I've not yet seen anybody comment on the ultimate use of this technology. Just imagine, we could finally know what the lyrics in Louis Louis are. AC

Bummer! by dustpuppy · 1999-10-01 00:19 · Score: 1

I guess I won't be able to safely mutter insults about my manager under my breath any more ...

Smacks of PR depts. self-justifying... by Anonymous Coward · 1999-10-01 09:12 · Score: 0

> These guys should publish something when they get half that good on any axis.

Agreed. This whole thing smacks of yet another PR department, in yet another large research organization releasing something they dimly comprehend but know how to hype to the max, simply because that's what they get paid to do. (It makes justifying enlarging the staff of the PR department so much easier because just look how busy they are...)

Want to bet that the researchers didn't write (or even edit much of) the press release? Want to bet that their scientific publications on the topic do not make anything resembling such grandiose claims? Want to bet that they are severely embarassed by at least one aspect of this press release?

Or is it just that they have spun off a new company (majority owned by USC undoubtedly) that will be IPOing in the near future?

(Me? Cynical?)

The Really Scary Thing about all this... by Anonymous Coward · 1999-10-01 09:21 · Score: 0

...is the number of conspiracy theorists that are going to come out of the woodwork shouting about how it will allow the government to spy on us more effectively.

Only 11 neurons? by 1010011010 · 1999-10-01 00:30 · Score: 1

Didn't it take over 300 to reproduce the behavior of a ringworm recently?

I wonder how long before we see silicon for a net like this... or Sony incorporates it in the next Aibo...

--
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.

Re:Only 11 neurons? by Anonymous Coward · 1999-10-01 12:38 · Score: 0

Yes, this passage about just 11 neurons connected by a mere 30 links makes me wonder what this net actually does. "Speech Recognition" could of course also mean the ability to recognize that an audio signal contains speech :-)

Heh funny thought. Actually that would be 'Voice recognition' (recognizes if something is a voice and even further recognizes whos voice it is). 'Speech recognition' of course is for recognizing speech, not just vocal noise.
Re:Only 11 neurons? by Anonymous Coward · 1999-10-01 12:44 · Score: 0

From my experience I have to agree, they are more then likely not doing everything with artificial neurons. Usually artificial neurons are used for learning, so more then likely they had to train those neurons to handle the more dynamic part of speech recoginition. The neurons in question may even be complex then normal neurons, to include timing as they say, so more then likely their complexity reduces the amount of neurons needed.
Re:Only 11 neurons? by Sesse · 1999-10-01 04:36 · Score: 1

I don't think _everything_ is done in neural nets. Usually, they are only used for small, specific tasks. You wouldn't use a neural net to call printf() or gtk_init(), would you? Such things are generally much easier to solve in `conventional' programming ;-)

I still agree, using only 11 neurons is impressive... Wonder how complex those timing issues are. (Normal neural nets can quite easily be `hard-coded' in almost any programming language, small nets means less computing power needed. This makes a small network REALLY interesting. Speech recognition in under 50 lines of C code? Or perhaps in Haskell? ;-) )

/* Steinar */

--
(This comment is of course GPLed.)
Re:Only 11 neurons? by Lemmy+Caution · 1999-10-01 03:34 · Score: 2

Inasmuch as ringworm is actually a fungus without a nervous system, I'm a little perplexed by that claim. Tapeworm, maybe? Planaria?

Ringworm (tinea) is a fungus that covers the skin, causing discomfort, itching, and leaving an unsightly rash. Microsoft has managed to reproduce this behavior in software without using neural net technology at all.
Re:Only 11 neurons? by Cassandra · 1999-10-01 01:08 · Score: 2

Yes, this passage about just 11 neurons connected by a mere 30 links makes me wonder what this net actually does. "Speech Recognition" could of course also mean the ability to recognize that an audio signal contains speech :-)

Of course the task of net could also be to separate the noise signal from the speech, aka blind separation, a problem that has been solved before (for instance by independent component analysis)

If this is merely ICA with a time coded neural net, it is IMHO still pretty cool, and much more impressive than all those commercial systems that rely on dumb correlation and processing power.

Anyway, instead of just having me guessing, could someone please point to their paper :-)
Re:Only 11 neurons? by Anonymous Coward · 1999-10-01 01:10 · Score: 0

For a look at various (and a bit old) neural-net "processors" take a look at http://www.emsl.pnl.gov:2080/proj/neuron//neural/s ystems/commercial.html

There was at one time a more comprehensive list of neural network hardware at CERN, but I can't find the link anymore...

As a general rule, you don't want hardware for your neural network, unless of course, you're planning on putting it in some nifty embedded device that doesn't have room/cost for a general purpose processor. A software based approach is quite a bit more modular than hardware (go figure :)

Re:Remember in _Snow Crash_... by rueba · 1999-10-01 10:12 · Score: 1

My mom has one like that (at least thats what she told me.) And she lives in Tanzania, hardly a technological mecca.

Time to open you wallet dude!
Check out this link:
http://www.hp.com/jornada/products/430se/overview. html

--
The only reason all cover-ups appear to fail is that you never hear about the ones that succeed.

The scariest thing... by Anonymous Coward · 1999-10-01 00:41 · Score: 0

The scariest thing about all this is that it kind of gives us an entry into artifial intelligence that we have never seen before, not this big, not with such an impact. This actually proves that a machine can do things better than man, but now it proves that even the language we speak, it can understand better than even us. This opens doors to all those government conspiracies about bugging devices and such.

Re:The scariest thing... by sfp2322 · 1999-10-01 01:52 · Score: 1

Exactly.. it's like me listening to someone speak a different language. I know they are saying something but I can't associate those words with a meaning.
Re:The scariest thing... by drMental · 1999-10-01 01:53 · Score: 1

This actually proves that a machine can do things better than man

That was proven a long time ago when the first computer was invented (I'll include calculators). You try to multiply 1935219 * 32946214 and compare the time-response with a computer and you'll agree with me....
Re:The scariest thing... by Sehnsucht · 1999-10-01 01:56 · Score: 1

Two words: The Matrix
Re:The scariest thing... by epine · 1999-10-01 11:07 · Score: 1

What is it that makes everyone think that the government agencies don't already have this technology? If you don't think these agencies are capable of keeping important technologies all to themselves you should dig up the history of undersea channel microphones. And what is this business about machines suddenly overtaking humans on a "perceptual" basis? Technology has long held the advantage over humans in most fundamental categories. The historical problems with noise filtering were entirely due to the fact that we didn't know how to solve the problem. Finally these cloistered scientists have got off their asses and figured out what they were doing wrong all along. The only "breakthrough" here is that we learn that the "insurmountable" human advantage accrues to a neuronal system which can be replicated (and even improved) with a model consisting of eleven neurons. Wow. What a huge edge the human being has over the machine. But don't worry. Our "innate" advantage in searching complex problem spaces is probably safe for another ten years.
Re:The scariest thing... by epine · 1999-10-01 11:07 · Score: 1

What is it that makes everyone think that the government agencies don't already have this technology?
If you don't think these agencies are capable of keeping important technologies all to themselves you should dig up the history of undersea channel microphones.
And what is this business about machines suddenly overtaking humans on a "perceptual" basis? Technology has long held the advantage over humans in most fundamental categories.
The historical problems with noise filtering were entirely due to the fact that we didn't know how to solve the problem. Finally these cloistered scientists have got off their asses and figured out what they were doing wrong all along.
The only "breakthrough" here is that we learn that the "insurmountable" human advantage accrues to a neuronal system which can be replicated (and even improved) with a model consisting of eleven neurons.
Wow. What a huge edge the human being has over the machine.
But don't worry. Our "innate" advantage in searching complex problem spaces is probably safe for another ten years.
Re:The scariest thing... by Sesse · 1999-10-01 04:42 · Score: 1

The Matrix is a movie (a good movie, but not the `god' movie many wants it to be). The main concept is: The world is a computer-generated illusion...

I don't see what that (The Matrix) had to do with this (speech recognition), though...

/* Steinar */

--
(This comment is of course GPLed.)
Re:The scariest thing... by Stig · 1999-10-01 00:56 · Score: 1

There's a long way from recognising words to recognising sentences and be able to extract meaning from them (for some meaning of 'meaning'). Mind you, this thing might be able to do that as well, in which case I think government conspiracies will be the least of our worries.

Strong AI just moved closer by about 15 yrs. The future will be cool.

S.
Re:The scariest thing... by Anonymous Coward · 1999-10-01 03:32 · Score: 0

What is The Matrix?
Re:The scariest thing... by Tenareth · 1999-10-01 05:09 · Score: 1

Or sitting in an upper-managment meaning.

-- Keith Moore

--
This sig is the express property of someone.
Re:The scariest thing... by SRMoore · 1999-10-01 01:04 · Score: 1

I think that you are confusing recognizing with understanding. So this type of net can hear words spoken.. it can figure out that you said something, and what that something was. This set up comes nowhere close to understanding what the meaning behind what you said is.

Of course it will probably be a matter of time, you just put the results of this net behind one that can figure out what each word means in the particular context that it was said in.. and then you can have a machine explain what you meant. But, even in this example the machine would have no fundamental understanding of what it meant. It could look up words, and re phrase, but it wouldn't know. That is a long long way off.

Although I would like to have seen some more technical specs of how they did it.. I myself am working on stuff that these types of developments could be useful in.
Re:The scariest thing... by mizerai · 1999-10-01 01:13 · Score: 1

It doesn't say anything at all about language understanding. It says the network "recognizes words" in noisy conditions better than humans. That is a far cry from understanding language!
The most interesting thing in this is that it used so few neurons yet it can recognize speech patterns and categorize them! This technique has applications ranging far outside of just speech recognition!

--
--Mizerai

An entirely frivolous application? Maybe not... by alumshubby · 1999-10-01 01:43 · Score: 1

My wife's constantly complaining that I don't listen whereas I think the problem's increasingly that I don't hear well; I'm over 35 and have been putting off going to audiological screening for awhile now. This article makes me wonder: Will we eventually see hearing aids that specialize in recognizing and resynthesizing speech? (In case you care, what triggered my pondering was the mention that this works well even in noisy environments, and in any kind of background noise at all, I'm having real trouble understanding speech lately.)

--
"How many light bulbs does it take to change a person?" --BMcC-->

Re:An entirely frivolous application? Maybe not... by Anonymous Coward · 1999-10-01 13:09 · Score: 0

I actually have the same problem. However, there is nothing *physically* wrong with my hearing. I have a signalling problem. If there is background noise I have hell trying to understand (I have to intensely concentrate and also look at someone's lips to understand). This is the kind of technology that will help those of us whose hearing is fine, but whose software/network is buggy.

Use your cell phone at a loud rock concert by TrevorB · 1999-10-01 01:50 · Score: 1

I can see it now.. If you're in a loud noisy place (like a rock concert), you can make a cell phone call to someone and the new speech recognition software would be able to translate the call.

I can just image such a call now...

"Yes! No! Fire! Stop! No! Fire! Yes!"

Of course, you'll need to have only 11 neurons to understand the conversation.

Re:Use your cell phone at a loud rock concert by alumshubby · 1999-10-01 02:31 · Score: 1

Of course, you'll need to have only 11 neurons to understand the conversation.

Why, this'll stand me in good stead, then! I've only got about eleven left after all the booze and drugs I did at all the rock concerts that caused the degenerative hearing loss.

If my employer sniffs this packet I'll probably be taking a piss test Monday morning...

--
"How many light bulbs does it take to change a person?" --BMcC-->

Hmmm.. Interesting picture by Otto · 1999-10-01 01:50 · Score: 1

htt p://www.usc.edu/ext-relations/news_service/release s/art/berger_liaw360x246.jpg

Looking closer at this pic and zooming in a bit.. I'm noticing something..

11 Neurons and 30 connections, hmm? Well, in the center (the big black circle) there's 11 little circles (or twelve if you'd call the third from the top on the left a circle.. looks like a mistake to me). Count all the lines going between these, and include the lines coming in from the left (the red ones) and the black one going to the big black circle and you have 30 lines.....

Anyone more knowledgable that I care to figure this one out? :-)

---

--
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.

AAAAAAAAAARRRRRRRGGGGGHHHHHHH! by xyzzy · 1999-10-01 01:56 · Score: 0

This thing is a joke. There are virtually NO ramifications for speech recognition, because it is virtually IMPOSSIBLE to build a neural net that recognizes 40,000-60,000 words! This stupid demo recognizes two (or in one case, four) words. Yes, No. Not much of an application there! Or at least, the applications are very limited.

File under: fluff. I work for a speech recognition group. We did.

Re:AAAAAAAAAARRRRRRRGGGGGHHHHHHH! by Squeamish+Ossifrage · 1999-10-01 02:54 · Score: 1

Oh come on and open your eyes for a moment. Yes, it's very far from a complete or usefull system, but it represents a possible solution to one of the weaknesses of current speach recognition. It's not the whole puzzle, but it's a usefull piece.
Re:AAAAAAAAAARRRRRRRGGGGGHHHHHHH! by Anonymous Coward · 1999-10-01 14:44 · Score: 0

Ground control, be advised, I may have a problem with my tires.
No, NO !! I did NOT say Fire !!!
Ground control, uh, sorry about that. Hope you didn't, uh, need that tower...
Ground control ? Ground control ?
Re:AAAAAAAAAARRRRRRRGGGGGHHHHHHH! by xyzzy · 1999-10-01 03:20 · Score: 1

It's not just "far from a complete or useful system", it's a trivial system. Consider that I could write a 1-line piece of PERL code that achieved 50% accuracy disambiguating the words YES and NO. Also consider that if the human recognizers (the "base" case) weren't TOLD that the only two words they were listening to were "yes" and "no" one would expect their accuracy to degrade.

This kind of science is littered with cool ideas that worked for the simple problems, but just didn't work for anything bigger. We already HAVE good systems that have high accuracy (> 85%) for speaker independent recognition in the 60k vocabulary range. These guys should publish something when they get half that good on any axis.
Re:AAAAAAAAAARRRRRRRGGGGGHHHHHHH! by speek · 1999-10-01 03:22 · Score: 1

You just displayed circular reasoning at it's best. Why is it impossible?

'Cause xyzzy said so? Care to offer up a real explanation?

--
First, make it work, then make it right, then make it fast, then, make it bloated!
Re:AAAAAAAAAARRRRRRRGGGGGHHHHHHH! by speek · 1999-10-01 03:27 · Score: 1

It seems to me they've demonstrated the ability to do something current "good" systems can't do - recognize words within a white noise environment. That's what they're excited about.

--
First, make it work, then make it right, then make it fast, then, make it bloated!
Re:AAAAAAAAAARRRRRRRGGGGGHHHHHHH! by Anonymous Coward · 1999-10-01 03:35 · Score: 0

There are virtually NO ramifications for speech recognition, because it is virtually IMPOSSIBLE to build a neural net that recognizes 40,000-60,000 words!
Hmmmm, now what did your parents do 9 months before you were born? Did they create a neural network which can recognize that many words???
Re:AAAAAAAAAARRRRRRRGGGGGHHHHHHH! by Cassandra · 1999-10-01 04:52 · Score: 1

To train such a network would take years :-)

Seriously, the way to go in neural networks is most likely NOT to have 40,000-60,000 separate classes on the response side. Such a system cannot generalize at all, and space is wasted. ie. many links will be the same for similar words. There should be an intermediary representation (not a hidden layer, but a sparse/semi local representation of the output concepts).

Re:Can we please drop the conspiracy theories? by Anonymous Coward · 1999-10-01 12:02 · Score: 0

I don't know, but if you read the article: "And the system can pluck words from the background clutter of other voices -- the hubbub heard in bus stations, theater lobbies and cocktail parties, for example." Now I've never been in the navy but I don't actually think that they routinely search for the sound of submarines in bus stations and cocktail parties. :-)

Eleven Neurons? by Woodblock · 1999-10-01 12:16 · Score: 1

Consarn it!

This AI machine certainly has more neurons than I have delegated to speech recognition as I am in the process of patenting a process in which you can switch 95% of your brain capacity over to web surfing, useless pop culture quotes, and slashdot posting.
I figure if everyone was less intelligent, it would be way easier to create artificial intelligence.

Neural Networks -- a farce or fact? by SamBeckett · 1999-10-01 12:25 · Score: 3

I've recently done quite a bit of research on Neural Networks, including coding and simulating them by hand... There are some (qutie drastic) flaws with neural networks...

I started my research doing a classic 5 pixel by 5 pixel OCR (optical character recognition) on the domain of digits on a single layer perceptron type network (similar to what these guys were using minuns the delayed firing rate)

Not suprisingly, the training algorithm converged to an answer quite quickly and I proceded to run tests with noisy data, to test the genrealazation of the network.

100 per cent correct at zero noise
50 per cent correct at twnety-five per cent nosie
10 per cent correct at fifty per cent noise
NEARLY zero percent correct above fifty.

This isn't shocking in itself until you realize that once you go above fifty percent distortion rates you are actually INVERTING the digit!

I retrained the network with inverted digits as well as the normal digits and re-ran the tests on the same set of data (note: The net WILL NOT converge on normal & inverted 5x5 digits with only ten cells).. The correctness rate was only twnety-per cent throughout the whole domain of noise levels.

I then retrained again using TWENTY cells (9 more than this articles) and it converge quite nicely and gave me a quadratic function with an R-Squared value of .9995 or so.

People view Neural networks sometimes as a fix-all solution.. The article on /. earlier about "eveloutionary computing" is the same premise as neural networks : try stuff randomly (or using calculus) until we get a decent solution.

I'm sorry kiddoes, but that just doesn't cut it. A neural network can't ever outperform a Turing machine so there can't be any chance in hell it will ever outperform us in non-specilized tasks.

Of course, I'd probably be more optimistic if these guys would have released there algorithms, papers, source-code, etc so we could actually figure out HOW the HELL they can get an 11 cell network to recognize speech...

The moral of the story? understanding speech is a hell of lot harder than recognizing ten digits!

Re:Long way to go, but cool for AI by Cuthalion · 1999-10-01 02:08 · Score: 2

Yeah the number of phonemes used in most languages is in the 'few dozen' range. And you generally don't have to listen very long to hear them all at least once.

But even when you've got the phonemes, you've still got a fair ammount of work cut out for you. A number of phonological processes take place. For instance 'in plain sight' in may be pronounced 'im'. These kind of transformations (and more complicated ones) are happening all over the place, in every spoken language.

Linguists generally describe this kind of thing by writing context-sensitive rules to enumerate the transformations. Similar syntactic translations are context-sensitive.

Computer programming languages' syntax (er, not counting types, and identifier agreement (which are special cased)) are not even typically generic context-free languages, but instead are almost always part of the LL(1) or LR(1) subsets, meaning that they have the special property that you can determine what's going on just by looking ahead one character. Otherwise you end up with N^3 parsing time, and that's for context-free languages. Parsing of context-sensitive languages is way more problematic (think halting problem).

Unless you can parse the syntax, you can't really resolve ambiguities (to/two/too, there/they're, or even things which merge because of phonology (bitter/bidder/bit her)). Note that humans don't do so great with these issues always either, so a partial solution will be still qutie amazing.

But the fact still stands that turing samples into phonemes is only the first step in a very complicated process towards even something as simple as taking dictation. In fact, I'd say that syntax->semantics may be a smaller step than phonemes->syntax.

--
Trees can't go dancing
So do them a big favor
Pretend dancing stinks!

They're solving a much easier problem (?) by Anonymous Coward · 1999-10-01 02:12 · Score: 1

It looks like the Neural group at USC don't believe in putting papers on-line, but there are a couple of slides at http://www.usc.edu/dept/engineer ing/CNE/tech/spch. No explanation as to what they mean, or what the algorithm is doing, though.

I think the really important thing here is that the neural system almost certainly knew there were only four possibilities, and never had to respond 'none of the above'. So this is a comparatively simple two-bit classification problem, which is a far easier thing than what Dragon Dictate (or people) are trying to do, ie recognise a arbitrary string of phonemes, giving a combinatorial explosion of possible words. So the performance of this system probably is actually not that impressive.

But there is a huge interest building in biological neural networks' sensitivity to the temporal sequence of input spikes (rather than just the average rates of inputs spiking, which is what software neural networks try to model).

There was a talk I went to in London in June by Terry Sejnowski, who's head of the computational neurobiology lab at the Salk institute in California. Apparently, rather than neurons learning that signal A correlates with signal B (Hebbian learning), it's apparently surprisingly easy to wire two neurons up so that they are correlating signal A occurring just before signal B -- becoming more sensitised to this, the more times they see it, so they effectively they learn to predict signal B as soon as they see signal A.

This obviously appears to be very important for tracking objects at a low level, and as here in identifying temporal patterns (Sejnowski's suggestion was bats' echolocation); but it may be even more important at a higher level, for recognising causality (if this thing happens, then that good thing/bad thing) may happen, and perhaps for learned behaviour (if I do this, under these circumstances, then that happens).

Re:Long way to go, but cool for AI by _Logic_ · 1999-10-01 02:16 · Score: 2

Pulsed Neural Networks. It's really not such a new technology. There's a good book on different topologies and algorithms titled,
"Pulsed Neural Networks". I know Amazon has a copy (that's where I got mine a few months back).

Re: noise levels by Hard_Code · 1999-10-01 02:20 · Score: 1

And to produce the best over all listener, you would train it against random noise levels within a range (with a weight towards what it will be used for most). And if you want better performance, just throw some more neurons at it. Fidelity goes up.

--

It's 10 PM. Do you know if you're un-American?

Cool! by Anonymous Coward · 1999-10-01 02:23 · Score: 1

Could this be true? Has science finally found a way to decipher Nirvana lyrics?

Can we please drop the conspiracy theories? by Fastolfe · 1999-10-01 02:29 · Score: 1

It seems like every single Slashdot article nowadays has several posts that inevitably link some new piece of technology with ways the evil government can spy on us.

Go e-mail 'michael' about it. I'm sure he'll be happy to write up another Your Rights Online editorial thing where all you folks can go discuss the latest evils between yourselves, but let's keep the conspiracy theories out of "normal" articles, OK?

Re:Can we please drop the conspiracy theories? by Anonymous Coward · 1999-10-01 12:48 · Score: 0

Conspiracy is one of the most overused words in the english language. It's lost it's meaning. Kind of like the abuse of "unlimited" by ISP's. Or "free" by marketroids. Last time I saw the word consipiracy used in a way that meant anything was during the Ides of March, when I was killed by my best friend...
Seriously, good writing style means avoiding all the buzzwords that people are so sick of hearing. Can anyone say Information Superhighway *barf*.

And besides, it's not theory, it's a FACT. The govt was caught red handed trying to spy on the whole fucking world. *cough*...ESCHELON
I think that merits discussion. Lots of discussion. If you can't take it, don't read it.
Re:Can we please drop the conspiracy theories? by Anonymous Coward · 1999-10-01 03:40 · Score: 0

You can complain about conspiracy all you want, but I know people who work in the field of natural language processing who get "approached" by suits who "work for the Federal government at an undisclosed location" with a strong interest in their work.
Re:Can we please drop the conspiracy theories? by Fastolfe · 1999-10-01 05:09 · Score: 1

Do you honestly think any interest they might have has anything to do with invading your privacy and secretly spying on everyone in the country?

So they'd like to automate the wiretap transcription process. What in the world could possibly be wrong with that?

And please stop exaggarating. I have no doubt in my mind that people from various government entities are interested, but you can drop the whole "at an undisclosed location" bit of silly secrecy. He probably left a business card with a perfectly legitimate address and telephone number.

I, too, have friends that work in various private sectors that are "approached" by government agencies. These have all been very straightforward, very clear in meaning, and as normal as any other business meeting could be. The only dark, sinister, secret undertones in existence are the ones you conspiracy theorists insert while telling your stories.

Lies, Damned Lies, and Neural Nets by Anonymous Coward · 1999-10-01 02:32 · Score: 0

Will these claims be any more justified than the bogus claims previous workers have made for their NN's? I doubt it. NN's are a load of hogwash, built on dubious reasoning and hopes of big grants from fools who read press releases instead of checking 'experimental' data.

TWW

Potential for AI... by Anonymous Coward · 1999-10-01 02:34 · Score: 1

First, anyone who says it's infeasible to create a neural net with 60000+ neurons must realize, computing power is always increasing at an astounding rate... Plus, we can always make special purpose hardware...

Anyway, these results are _quite_ significant in that they really show an advantage to using this new type of NN, and also make it clear to people that if we integrate these sorts of sensors into ourselves, or an AI such as CYC (check it out...), the resulting system will be able to process sensory information much more effectively than humans...

Of course, we've always known that the vision of hawks is like a couple hundred times more acute than that of humans, but some people never made the connection -- If hawks have better vision, and they have NNs to process that data, and we can learn how to make good, well trained NNs, then our AIs can have better vision than us, based on a biological model...

And on a similar note, I think it's amazingly cool that they've been able to show that a neural net trained by humans for a special purpose can -way- outperform biologically evolved neural nets... :)

Re:Things are not so simple. by Cuthalion · 1999-10-01 12:55 · Score: 1

Oops, you're right, though you're not using the proper terms.

/si/ vs. /su/ - I haven't looked at any spectrograms for a while, but I'll take your word for it that the /s/ sounds differ in quality. You could say that they are different phones. However, they are definately not different phonemes, in a strictly linguistic sense. Even if you know the difference, I'll explain it to everyone who doesn't have a Linguistics (or related) degree.

Any sound made by a human can be called a phone. Many of these crop up in language. These sounds can be classified into groups. These categories of sounds are semantically the same - switching from one s to the other does not alter the meaning of a word. These are called phonemes.

Some phones within a phoneme can be chosen by the speaker, these are said to be in free variation. Others are determined by context (and sound funny otherwise) - these are called allophones. (Your /si/ & /su/ example illustrate this nicely).

Spanish does not distinguish between b and v, similarly to the way Japanese lumps r and l (two separate phones) together into what in Japanese is the same phoneme.

I omitted to mention this added layer of complexity - the sonic properties of a given phoneme (which is really what you want to extract, in order to build morphemes) can vary a lot, to a degree dependant upon the language, dialect, and accent.

Nice to see some other language geeks here - keep me on my toes.

--
Trees can't go dancing
So do them a big favor
Pretend dancing stinks!

The Really Really Scary Thing about all this... by Anonymous Coward · 1999-10-01 12:57 · Score: 0

...is the naive who think its impossible for this technology to be used for such reasons.

Re:How does this new technique affect learning tim by Anonymous Coward · 1999-10-01 13:08 · Score: 0

I don't think it would really affect it so much, if it does chances are its a bit slower in learning considering it uses timing. It seems more like its just has another dimension to it.

How the brain does Hidden Markov Models by Anonymous Coward · 1999-10-01 14:34 · Score: 0

it's apparently surprisingly easy to wire two neurons up so that they are correlating signal A occurring just before signal B -- becoming more sensitised to this, the more times they see it, so they effectively they learn to predict signal B as soon as they see signal A.

So, effectively, each node would be building up an internal estimate of a Markov transition probability, p(A->B given A); with the node's output the probability that the transition A->B had occurred; but with the added feature that different transitions are associated with different time intervals.

Clever !

*I* Think It's Exciting and Promising by Onymous+Coward · 1999-10-01 14:39 · Score: 1

Whether speech recognition has advanced greatly with this particular claim is yet to be seen. Powerful speech recognition, however, has many great potential benefits.

reduce carpal tunnel incidents
make nimble typing fingers of mutes... moot?
put lots of people out of work

Science marches on!

video games
porn slide show control (look, Ma, clean keyboard!)
tapping communications (digital and otherwise)

When it really gets rolling, encrypted voice communication will be more of a necessity than a paranoid indulgence. Conspiracy theory? Try this test: Would you use this tech to spy on people?

When you say, "Dude, the conversation at the next table triggered my autogrep of the word 'computer'," you could be talking hardware instead of wetware autogrepping.

understanding Eddie Vedder
augmenting voice signals

Imagine donning headphones and hearing only a computer-enhanced (probably a little time-delayed) version of the surrounding sounds where selected voices are augmented. The same tech could probably be applied to identifying and reducing known noises. Chatting at a dance club wouldn't have to be a shouting match. (But, then there's less excuse to get close to their necks...)

clarifying commands on the battlefield

"Yes! No! Stop! FIRE?" I wonder who's sponsoring this, or to whom these researchers are whoring themselves... "Yes! No! Retreat! Use the nerve gas!" War marches on!

universal translator

It's been said already, but man, I have to echo this. Practical speech recognition + language analysis & translation + voice synthesis will rock. Just imagine being able to hit on an lovely Italian by telling her that you like her hairstyle and that's she's a pretty lady: "A lot I appreciate your style of hats. You are one Mrs. much graceful one. Beep." A whole new era of international misunderstanding.

The idea of Ctrl-key-free chorded typing still excites me. I'll pop you in the speech-recognized mouth with my data gloves.

moderation by Indomitus · 1999-10-01 02:38 · Score: 0

The moderation of this post to Offtopic shows that the Offtopic choice should be removed from the moderation list. Offtopic is a totally subjective choice and I can say that during the many many times I've been a moderator I've never used it. This post is not Offtopic, it should be marked Funny as it was clearly meant to be.

Re:moderation by jafac · 1999-10-01 02:50 · Score: 0

Just because we don't see all that many offtopic posts anymore (like this one), doesn't mean that we should do away with what has proven to be an effective solution.

Yes, moderation can be abused, yes, we have a system to moderate the moderators. No it's not foolproof - but it does work dang well for most cases.

"The number of suckers born each minute doubles every 18 months."

--

These are my friends, See how they glisten. See this one shine, how he smiles in the light.
Re:moderation by Sesse · 1999-10-01 04:10 · Score: 1

If any choice was to be moderated, IMHO, it would have to be `Flamebait'. It's really too vague, and far too often moderators don't even see its function. At least "Flame" would be much better. I'm not sure if I like the `Funny' choice either (CT didn't like it in the beginning, either -- don't know what made him change his mind), but it's not that bad, and it works well.

Now, this debate _is_ offtopic... But perhaps there's no better place to debate it. The problem with mass moderation, is that too many people are suddenly find themselves with power they're not experienced with.

/* Steinar */

--
(This comment is of course GPLed.)
Re:moderation by Indomitus · 1999-10-01 03:22 · Score: 1

I'm not talking about doing away with moderation, far from it. What I was talking about was removing the Offtopic selection from the list.

I guess that's really a solution though, people without a sense of humor would just moderate another way. So maybe what I'm saying is all moderators should be required to have a sense of humor. (just kidding, if I needed to say it)

Re:Long way to go, but cool for AI by SpinyNorman · 1999-10-01 02:38 · Score: 1

FYI, the phoneme-to-word (or, more generically, feature-vector-to-word) translation is conventionally done with Hidden Markov Models (in a nutshell, creating probability driven state transition models). I'd expect the commercial dictation products probably have a somewhat ad-hoc cleanup stage to post-process the HMM output.

Micheal is going to get you!#$^ by Shanoyu · 1999-10-01 02:48 · Score: 2

Yet another comment from the conspiracy to make it look like there is only one conspiracy

Honestly, do we have anything to fear from the technology as it is now? No, of course not. However, you have to expect plenty of fear on the part of people from /., just look at the stories on geek profiling (The Katz stories). The Government IS out to get us, they admit it after all, and what is this caused by? Paranoia on the part of people in power. It's a dramatic irony, of sorts. But the Light in the darkness and the shadow from the sun is manditory for everything in life.

This mass paranoia against governments isn't bred because someone reads Farenheight 451 and says "shock!", (although it probaly does happen in SMALL quantaties) It's because we see it in our government today. We see corruption, and special intrests, and all sorts of scary, scary things, in government TODAY. The fact that this could be used to track all of the recordings a person ever made is scary.

Is it a long way off? Sure. Can you blame them for being overprotective of their rights? No, of course not.

Nothing personal but I don't see how you can mock or make fun of anyone for holding these fears.

-[ World domination - rains.net ]-

Re:Micheal is going to get you!#$^ by sjames · 1999-10-01 21:49 · Score: 2

Voice recognition technology does not suddenly mean government agencies can now affect wiretaps of all the people in the United States on a whim.

When the FBI rammed through a law that telcom providers would have to provision tapping 10% of all communication (knowing very well that it's not possable for all of the judges in the U.S. to even rubber stamp that many court orders), some people said 'don't worry, they don't have the man power to listen to that many conversations'.

Here is the 'man' power to produce the transcripts. So tell me again, Why shouldn't I worry? Keep in mind, machine parsing of English text for meaning is available now as well.
Re:Micheal is going to get you!#$^ by Fastolfe · 1999-10-01 05:15 · Score: 1

The fact that this could be used to track all of the recordings a person ever made is scary.

This is my point exactly. The conspiracy theorists always make the same conclusive leap you just made.

The only logical "surveillance" benefit that could arise from this technology would be the automation of wiretap transcriptions, which, as far as I know, are either done by hand today or with relatively crude voice recognition.

Voice recognition technology does not suddenly mean government agencies can now affect wiretaps of all the people in the United States on a whim. Your privacy will be untouched. The only way this would change the government is by saving them money on people doing transcriptions.

If anything, you should be worried about the handful of jobs that might be lost over it. Unless you have a distant relative that sits at a desk and does this, you will be totally unaffected by this technology's use in government agencies.
Re:Micheal is going to get you!#$^ by Fastolfe · 1999-10-07 00:28 · Score: 2

Right-o. Cheaper is better.

If you're worried that "cheaper" also means "easier to convince a judge of the need", then perhaps you need to oust your current judges.

This should in now way affect the requirements to obtain a court order/wiretap order from a judge.

Re:One step closer to Star Trek every day... by DanaL · 1999-10-01 02:48 · Score: 1

Who needs a Palm Pilot when you can walk down the hall hands free as the briefs you on your next meeting, or allows you to read and compose your mail on the way to work. My hands shake.

I wonder if that would be terribly successful. Apparently, the first car phones marketed were speaker phones, which sounded like a good idea because both hands would be free for driving. The idea flopped because people looked kind of odd talking to themselves while driving.

I bet there would be a similar effect (at least for a long time). People walking down the sidewalk talking to themselves usually get some pretty strange looks :)

Dana

Which human? by Anonymous Coward · 1999-10-01 02:53 · Score: 0

I bet that the human John C. Dvorak is faster than a neural net.

Re:White noise? by Cassandra · 1999-10-01 04:08 · Score: 1

White noise is certainly random - but background noise in real world situations is hardly going be that random. Rather, it's going to be a chaotic blend of non-random signals - each of which may (or may not) be a valid speech signal in it's own right.

Actually it does not matter whether the background signal is completely white, or not. As long as the speech signal is the most correlated one, you can find it. The coctail party problem (to isolate one speech signal in a crowd of speakers) is of course more difficult. The technique can be extended to separate more sources, if one adds more microphones/ears (see independent component analysis), one extra microphone per source you want to isolate, but that would be to cheat, wouldn't it ;)

... by Signal+11 · 1999-10-01 04:09 · Score: 2

You know, if you look behind the 'inventors' of this technology in the picture... you can see what looks startlingly like the random scribblings my two year old sister makes.

Of course, I could be mistaken, and that drawing is really a graphical representation of the most sophisticated neural net ever made. *g*

--

like Cmdr Data by PHroD · 1999-10-01 03:03 · Score: 0

remember that epside where they were stuck in the time loop and by like the 4th time around or so, Data was listening to some noise from Space, and he said something to the effect of "I can dicsern 1024 distinct voices" and it turned out to be the crew of the entterprise yada yada yada...but anyway, no HUMAN can dicsern that many voices that accurtaly (not even CLOSE!). Cool that computers can now do that :)

"There is no spoon" - Neo, The Matrix

About time they got it right. by AJWM · 1999-10-01 04:10 · Score: 1

This (the bit about timing of signals) is the sort of thing my father-in-law (Dr. Jack Steele) has been complaining about folks missing for years. (And he ought to know -- he invented the term "bionics" back about forty years ago.) (Gee, if he's the "father of bionics", does that make me the brother-in-law of bionics?)

I'm a little surprised at how few neurons and links it took, though - and how general purpose (as in different languages) it is. Different human languages contain somewhat different sets of phonemes - what may be two distinct phonemes in one language are considered the same in another. (E.g., Chinese has a sound between the "p" and "b" of English, considered differetn from either. Hence the difficulty anglicizing the name of the city Peking/Beijing.)

--
-- Alastair

Not Going to Change the World by mfterman · 1999-10-01 03:13 · Score: 3

Voice recognition wouldn't be of great use to me, at least at the desktop. I hate leaving prolonged voicemail messages because I can't go back and edit a previous sentence. I have to go and compose a speech if I want to sound intelligent and coherent.

Voice recognition only becomes useful to me if natural language parsing and enough cognition power are available for me to command my computer in plain english to a fair degree of abstraction.

In mobile computing, it might be a lot more useful, especially for a device, say the size of the Palm Pilot, where various factors make voice far more convenient and less difficult than other forms of input.

There are a lot of human use factors that complicate voice recognition (making the computer recognize when you want it to parse your speech and when you don't want it listening). Human interface issues often make these things less wonderful than they appear.

Not that I'm saying this isn't a wonderful development and there aren't people out there who could really use this (in specialized environments or people who have mechanical difficulties), but I don't think voice recognition is going to change the world the way some people think it will.

Re:Not Going to Change the World by Atomic+Frog · 1999-10-01 06:12 · Score: 1

You're thinking too narrow...too much COMPUTER!

There are tons of applications where hands-free operation is desirable.
- HP already has a voice-activated scope. It's damn handy!
- How about hands free control of your automobile gadgets? Radio, heat, air, cell phone etc.. You can hardly call cars a "specialized" environment.
- Operating room, when the surgeon needs to get a readout from instruments without looking up or fiddling with knobs...or hoping the nurse isn't asleep...

Wouldn't take me too long to come up with a list of other "non-computing" areas in which voice recognition would be fabulous.

How to extend this: by TheDullBlade · 1999-10-01 03:18 · Score: 2

Use more neural nets.

Some people are saying that you can't make a really big neural net efficiently (at least without specialized hardware), but I don't see why you couldn't have hundreds of seperate neural nets each reporting on whether one word was said.

A very tiny, very simple computer could handle the task of managing a few neural nets. You could make it out of a few thousand surface features on a chip, so you could pack thousands of these processors on a chip. For that matter, they probably don't need to be terribly fast, so you could make them like memory chips. Imagine a megabyte chip, but instead of 1024K dumb memory, with 1024 minimal neural processors, each with 512 bytes of RAM.

Broadcasting the incoming data is pretty simple, and I don't think the networking issues of one or two of these processors reporting every few seconds would be too severe.

Training wouldn't be all that hard, either. You need a few man-years of samples, but the training could be done in parallel. It would cost a few million dollars (unless there was a dedicated online effort, which is entirely possible), but not billions. Imagine going down to the mall and asking people if they would read a few hundred words for $20; no problem, just repeat it all over the place so it deals well with accents.

There has never been a task better suited to massive parallel processing.

Oh yeah, I suppose I have to say: hey, we can do it with a Beowulf cluster, |)00|)Z!

--
/.

Re:How to extend this: by kaphka · 1999-10-01 05:27 · Score: 2

"I don't see why you couldn't have hundreds of seperate neural nets each reporting on whether one word was said."

You could do this, but you would be diverging considerably from the way the human brain actually works. And considering that the human brain is currently the best speech-processing device we know of (notwithstanding this experiment, which sounds awfully limited to me), that's probably a bad idea.

Think about it: humans have, what, about a 10,000 word vocabulary? (Yes, there are a lot of different ways of measuring vocabulary, but that's a reasonable figure.) I'm willing to accept that somebody could combine 10,000 eleven-node neural nets to approximate the same vocabulary. But the average human would have no trouble recognizing a word like "picklemobile", or "Vulcanophone", or "Rodmania", even though he has never heard these words before. (Hopefully.) Or any of the millions of possible proper names, although I'm not sure that that's a fair example. (And no, the examples that I gave cannot be dismissed as simple compounding or affixation, as far as I remember from linquistics. As a matter of fact, if anybody can explain to me what the hell is going on in "picklemobile", please let me know...)

I don't mean to knock neural nets. I think they're on the right track, but they need to be moving towards more complexity and structure, not less. Maybe have one net for phonology, one for syntax processing, one for vocabulary, etc., and then link them using conventional computation. In other words, more like the way we do it.

--
MSK

No usefulness? Sha. Right. by Anonymous Coward · 1999-10-01 04:12 · Score: 2

To all of you naysayers out there who think this system has no real-world use because it can only understand a handful of words...Do you so easily forget the lesson of the computer? You only need two states to transmit information. If we merely learn to speak in binary (On On On Off Off On) the problem is solved and we have achived practically perfect speech recognition. Narrow minded fools!!!

Re:No usefulness? Sha. Right. by Anonymous Coward · 1999-10-15 05:46 · Score: 0

...If we merely learn to speak in binary (On On On Off Off On) the problem is solved...
No - that's "Fire! Fire! Fire! Stop! Stop! Fire!"
(When's my slashdot password coming through, anyway?)

Looks like I have a bad posting day today... by Sesse · 1999-10-01 04:13 · Score: 1

That's the second comment today that slipped through the error check with an error. The first sentence should read:

"If any choice was to be removed..." You can't moderate a choice :-)

/* Steinar */

--
(This comment is of course GPLed.)

Re:/.'ed ? by richnut · 1999-10-01 04:14 · Score: 1

Well I read the article as someone posted it, it certainly is intriguing. Dont get me wrong about this being cool, it just seems (to me at least) that using the brain to out perform the brain is an odd assertion.

Of course computer control via voice would generally happen in a controlled environment and would probably not have to involve a huge vocabulary as long as the computer could be trained on basic phonics and cross reference against a good dictionary.

-Rich

The military and tech by Sesse · 1999-10-01 04:21 · Score: 1

Come on, why would always the military be `5 to 20 years ahead of' civilian technology? Remember, there _are_ great people outside the military as well. And there are more of them. Sure, the military can `take' tech from civilian life and not the other way round, but still? (BTW, there are _many_ militaries in the world...)

/* Steinar */

--
(This comment is of course GPLed.)

Re:The military and tech by Dwonis · 1999-10-02 01:37 · Score: 1

That's about as likely as O.J. Simpson's innocence. Isn't it amazing that every military of every country does this?

Think of Japan. They are capable of innovation (not to be confused with Microsoft's inovation), and it would be more advantageous for them to compete with the US in market, than in military.

The US is powerful, but not that powerful.
--------
"I already have all the latest software."
Re:The military and tech by Coda · 1999-10-01 05:43 · Score: 2

In the United States, at least, patents can be snatched up by the military and made Top Secret.

This allows the military to wait until some bright young entrepreneur to come up with a great solution, then they swoop down and tell the poor sap he can't talk about his patent for 10-15 years, and next thing you know the military comes out with some really cool speech recognition device.

So while there are brilliant people outside of The Man's Territory, their ideas can be and are stolen, and no one can talk about it.

I can think of better ways for the world to work...

--
-- I can't think of anything witty to put here. Sorry.

Their, There, They're, Thier .... by Anonymous Coward · 1999-10-01 16:05 · Score: 0

If you can get a computer to use them correctly more than 80% of the time? You can't even get slashdot posters to do that. What a bunch of loosers^H^H^H^H^H^H^Hlosers
--Anonymous Cowlings

Picklemobile by Anonymous Coward · 1999-10-01 16:13 · Score: 0

That's the sunroof for the Oscar Mayer Wienermobile! :P

Recognizes better than who? by Deosyne · 1999-10-01 16:33 · Score: 1

Please! If they really wanted to test the capabilities of this system in comparison to a human, they should use my wife!

Her: Honey, where are you at, its so noisy? And who is that with you?
Me: Um, nowhere and nobody, its just a business meeting...
Her: Oh? Does she work with you?
Me: Um, who?
Her: The 26 year old brunette wearing the green dress who just said your name two tables away. I'm not deaf, you know...

Deosyne

ViaVoice & Xvoice will do it for you... by TDoris · 1999-10-01 19:19 · Score: 1

There seem to be a bunch of people saying how great it would be to use voice commands for their Linux HCI, so I thought I'd let them know that you can do it already, just download ViaVoice for Linux (beta) free from IBM, then get Xvoice by Dan Creemer. Xvoice allows you to send your speech (which is converted to text by viavoice) to any X application as a stream of synthesised Xkeypresses. If you're interested, I'm trying to develop some grammars for X apps like the terminal, netscape and Xemacs which would permit speaker independent voice recognition for command sequences, and I could use some suggestions from the people who'll be using it in the end so that I'm not developing in a vaccuum. Tom Doris. Remove 'nospam.' to email: tdoris@nospam.compapp.dcu.ie

Is your NN project online? by Morgaine · 1999-10-01 21:00 · Score: 2

Are your research materials online?

I like following the progress of projects around the world --- I was in academia myself a decade ago, in a department where colleagues who were working with NNs would discuss their processing requirements and architectures with me. The work you describe sounds interesting.

--
"The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra

Re:Is your NN project online? by SamBeckett · 1999-10-02 00:57 · Score: 1

Unfortunately no-- since I developed them at work and signed (foolishly) my soul over to the company, they own all of it and I would have to go through a tedious process to extract it legally...
Sorry,

Kudos by Anonymous Coward · 1999-10-01 21:40 · Score: 0

Probably the single most important item on /. this year, if not ever. This has earthshakingly profound implications. Still not genuine AI, but the path is clear, to those with eyes and ears. No patent, though. There is ample prior theory, if not art, but it's lain acknowledged and unrecognized for a long time. Totally apart from the fact that this is fundamentally a discovery, not an invention. Any patent issued would be subject to irrelevance by a more general statement of the issues at hand. Patent is a legalized form of theft, anyway, and I would only say that there are those who are not averse to larceny in turn, in a just cause. Sorry to be cryptic, but any plainer would be asking for trouble. That's not to say that kudos aren't in order.

Re:Kudos by Anonymous Coward · 1999-10-01 21:41 · Score: 0

make that "unacknowledged"

Hopfield's group at Princeton by Anonymous Coward · 1999-10-01 21:43 · Score: 0

Hi Matt,

You might also want to check on John Hopfield's group in Molecular Biology at Princeton, who have also been demonstrating word recognition in noisy environments, based on neurons receptive to the timing differences between action potential spikes.

I can't see any papers online, but I think there were papers at NIPS the last couple of years.

As far as I remember, he had basically got the neurons to do a time-frequency decompostion, which automatically rescaled to allow for different speed and pitch baselines, and could use the outputs to train an adaptive classifier.

Hopfield was pretty pleased with the results, but one guy from the speech community was very unimpressed. His line was that distinguishing a small number of well separated possibilities was not hard, so almost any technique would do well. (That is basically what you're saying above). But that doesn't tell you anything useful /at all/ about how well it would identify a word from the full range of possible speech, because from a limited number of very distinctive words we get no idea how similar those likelihood ratios might be for other words. In fact, once you're looking at the full range of speech, different words can be very similar, and it has been /very/ difficult to push up the likelihood ratios even to present levels of discrimination.

So performance on such a small set of possibilities really tells us nothing about the real efficiency or effectiveness of the USC techniques.

Re:Hopfield's group at Princeton by Anonymous Coward · 1999-10-01 23:10 · Score: 0

A paper which describes Hopfield's model:
John Hopfield, Carlos Brody & Sam Roweis.
Computing with Action Potentials.
Neural Information Processing Systems 10 (NIPS'97) pp.166-172
It's online at http://www.gatsby.ucl.ac.uk/ ~roweis/papers/hopnips.pdf
The big idea is to get a neuron to generate a spike train whose intervals slowly get longer and longer, so that the phase of the spike train relative to a system clock represents the log of the time since the feature was detected.
Having the log(time elapsed since feature) means you can then thumbprint the word by the ratios of the time-intervals between different combinations of features appearing and disappearing (in this case, the presence of power in particular bands). The system is thus invariant to whether the word is said quickly or slowly.
It would be intersting to know if the USC system is similar.

White noise? by porkchop_d_clown · 1999-10-01 03:18 · Score: 1

White noise is certainly random - but background noise in real world situations is hardly going be that random. Rather, it's going to be a chaotic blend of non-random signals - each of which may (or may not) be a valid speech signal in it's own right.

--

--
Clear, Dark Skies

Re:Long way to go, but cool for AI by chuck · 1999-10-01 03:21 · Score: 1

Doesn't the English languages use only a few dozen sounds ("phonems" or something)?

You're right. I'm embarassed I didn't think of that. There are definitely a finite number of phonemes, even if you include several mumblings of combinations of phonemes. As well, the rules of phoneme analysis has been quite complete by many varying text-to-speech and speech-to-text translators, so maybe it won't be too long before this research can make a real difference. (Especially if it's truly speaker independent like they claim!) I'm definitely looking forward to what will be produced by such technology.

And I bet it could be done in 2 lines of perl!

--
My Freakin Blog

Don't wory, be happy... by Greyfox · 1999-10-01 03:26 · Score: 1

That's OK. Once the machines can out think us, they'll probably exterminate the entirety of humanity anyway, so you really don't have to worry about Big Brother at all.

--

I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

Re:Is everybody a pessimist?? by Anonymous Coward · 1999-10-01 04:21 · Score: 0

What I don't get is what are all you people saying that you're afraid the guv'mint is gonna hear? I doubt they're really interested in your shopping lists.

American paranoia makes me wonder - America is an elected democracy last I heard, so why are its citizens so afraid of the people they've elected?

Oh wait...
Bill Clinton - Openly lies.
George Bush - Former CIA head
Ronald Reagan - Alzheimer's sufferer
... never mind. I understand now. :-)

Re:One step closer to Star Trek every day... by miquels · 1999-10-01 04:24 · Score: 1

Handsfree carphones flopped? Well, they are compulsatory in the Netherlands now. It's illegal to drive and use a non-handsfree phone in your car.

--
Living is a horizontal fall

Remember in _Snow Crash_... by Anonymous Coward · 1999-10-01 04:28 · Score: 0

Remember in _Snow Crash_ when YT flipped her phone open, said her boyfriend's name into it, and it looked up his number from its memory and dialed it? This technology looks to make that kind of thing possible. Personally, I think that'd be great! You could type someone's number into your cell phone once, then speak the name you wanted to address them as to "train" the phone. From then on you need only speak their name into your phone and it'll automatically dial their number for you. Uber-convenient and dead simple. It sure beats the hell out of trying to type in a person's name via a telephone keypad. (Assuming your current cell phone even allows you to do that - mine doesn't.) I don't care about larger applications, or the AI potential, or any of that big picture stuff. I just want to be able to say "Mom" into my cell phone and have it dial her number. This could be an absolutely killer application, and make tons of money for the first cell phone company to license and perfect this technology. With only 11 neurons, you know it's got to be cheap to fab into silicon and put on a chip. -Ben

Re:Remember in _Snow Crash_... by kamileon · 1999-10-01 05:27 · Score: 1

If you think that would be cool, go check out Hewlett Packard's new Jornada 430se at http://www.hp.com/jornada/pr oducts/430se/overview.html. The technology's already out on the market.
Disclaimer:I contract to HP, and have no qualms about pimping their really cool stuff. :)
Geek-grrl in training
"I haven't lost my mind, it's backed up on tape somewhere."

--
To truly understand recursion, you must first truly understand recursion.
Re:Remember in _Snow Crash_... by Awel · 1999-10-01 05:30 · Score: 1

I just want to be able to say "Mom" into my cell phone and have it dial her number.

This is already available, and has been for a couple of years. Several phone companies already make such things. Apparently they work quite well..

Severely interesting! by Anonymous Coward · 1999-10-01 03:30 · Score: 0

Tinker-toy topologies (lines & circles) don't usually give enough info to figure out what they're doing. The real meat is in the weight update (training) algorithm, and the detection algorithm, but in this case we get a few clues.

Though it is interesting to note that it looks amazingly non-dynamic -- the 5th output neuron looks to be the only feedback neuron into the hidden layer. (Voice recognition nets usually have lots of feedback connections to refine the net's guess based on constantly-incoming data AND previous iterations.)

Which would leave the four other output neurons to be the "four words" that it can learn... Which means the fifth neuron (the feedback) is probably an "I don't know" output.

The input signal appears to be sent through 5 bandpass filters & then on to the input layer.

Another interesting feature is that it's not a fully connected trans-layer net... which can save time on large NNs, but it can be like severing connections in your brain--you don't go doing it willy-nilly.

My guess is that they custom-created this net to get the results they wanted and that it won't scale for crap.

Re:Severely interesting! by AJWM · 1999-10-01 04:41 · Score: 1

This could be cool, but there are some obvious unanswered questions, like how well does it distinguish those four test words if they're part of continuous speech rather than discrete samples?

If the thing can parse that stuff out of continuous speech, then the key to recognizing a fairly large vocabulary is not scaling it to 50,000 output neurons, each one recognizing a single word, but only to about 40, each one recognizing a single phoneme. Then some backend logic tries to recognize words out of the phoneme stream.

In other words, use something like this as a front end noise filter that inputs to something along the lines of more conventional speech recognition systems.

--
-- Alastair

Radar and sonar performance enhancement by alumshubby · 1999-10-01 03:30 · Score: 1

The article alludes to the US Navy's hope that the technology can be applied to detecting and classifying ships' and submarines' sonar signatures more quickly and reliably. In spite of the bitchin' signal processing the Navy already does, it's still as much of a black art as a science. (Like the old joke about how to get to Carnegie Hall, you have to practice, practice, practice!) I wonder what a massive infusion of neural-net processing will add to the AI end of it. Same thing goes for ELINT/ESM and radar-intercept work -- I wonder how much better we'll get and how quickly.

--
"How many light bulbs does it take to change a person?" --BMcC-->

NNs and the training set by davids35 · 1999-10-01 04:37 · Score: 1

Does anyone know if these results scale up to a large vocabulary? The better-than-human recognition results are really stunning, but a traditional problem with auto speech rec is sparsity in the training set, isn't it? Does anyone know if it will still work with 20,000 words in the vocabulary, and how much training is required to get there?

Things are not so simple. by Masker · 1999-10-01 04:40 · Score: 2

I am not a Ph.D in this field, but I do have my Master's degree in Speech Science. While I have taken a break from Speech Science for about 2 years to learn C++ enough to start working in computer speech recognition/perception/production I'm still fairly up on Speech research. That caveat out of the way, let me tell you my thoughts.

While you say there are only a few dozen phonemes in most languages what you are missing is the fact that each phoneme is context sensitive. So if I say "See" and "Sue", the 's' sound in each morpheme is spectrally quite different. They are both the /s/ phoneme, but the one in /si/ ("See") has a spectrum much higher than (well, in speech terms, I think about ~1KHz) /su/ ("Sue"). Phonemes are not discrete things, they are gradients or classes. So you are simplifying things far too much when you suggest that morphemes are just combinations of a few dozen phonemes.

Really, if you think about it, humans do not learn to understand words by rote memorization of the acoustic properties of each word. That would be far, far too inefficient. Think about the fact that you could still understand someone's voice, even if they inhaled helium. That skews the spectral/acoustic properties of the person's voice into a very high frequency range compared to their normal voice. Also, if you tried to listen to non-native speakers who are missing phonemes or substituting phonemes, how could you possibly understand them? What you do is you figure out the missing or corrupted phonemes from the context of the morpheme. Some research supports the addition of other, extraneous acoustic information (such as the spectral shift of /s/ in /si/ vs. /su/) as one thing that can cue a listener into what phoneme follows it. In that particular set of studies, people were able to identify the morphemes (/si/, /su/, etc.) by only hearing the initial /s/. That is, the vowel was cut-off from the morpheme, yet people were able to (with something like 90% accuracy) complete the morpheme.

There is an awful lot that speech research has not yet uncovered. One of the problems that I see in the field of computer speech recognition/perception/production is the lack of solid speech research and implementing the trickier research into these projects. Training neurons to recognize individual morphemes doesn't work. It's like brute force calculation of chess; the system is too complex to tackle with such a simple model. It's just too damned inefficient.

Besides, homophones will always be a problem with speech research, until language makes an appearance. How many times do you want to have to correct "their", "there" and "they're" in a document?

--

---------The early bird gets the worm, but the second mouse gets the cheese.

Re:Things are not so simple. by Masker · 1999-10-02 19:15 · Score: 1

Errrr... Phones.... Of course!

I guess that I ought to break out some journals again and start reading up on this stuff. I said I've been out of it for two years cramming my head with *nix, C++, and Perl, but that's not really an excuse. Oh well.

Thanks for keeping me honest!

--
---------The early bird gets the worm, but the second mouse gets the cheese.

Re:One step closer to Star Trek every day... by dclydew · 1999-10-01 04:41 · Score: 1

No problem with that... just that many more people will leave you alone when you walk down the street :)

--
Get a life, not a lifestyle. - Hikem Bey

/.'ed ? by richnut · 1999-10-01 00:43 · Score: 1

I cant seem to get there.

Sounds like some pretty dubious claims that some neurons can out do a human brain. Anyone here who can post a summary?

-Rich

Re:/.'ed ? by Bearpaw · 1999-10-01 03:37 · Score: 1

"Sounds like some pretty dubious claims that some neurons can out do a human brain."
Relax. Some neurons can outdo a human brain for a specific, limited function. Note especially, that according to the article, it was "benchmark testing using just a few spoken words". Presumably it'd take a larger neural net to deal with tens of thousands of words. Though possibly the concept could be extended to that level.

Um, this has already happend. Was: Oh great... by orac2 · 1999-10-01 00:46 · Score: 3

The US, through NATO, already monitors telecoms traffic, where speech recognition machines are programmed to listen for buzzwords like "plutonium" or "assasinate". Suspect conversations are then recorded for later perusal. This is not conspiracy theory, the program is called Echelon, and here'a recent CNN report. And that's not even considering military technology is usually about five to twenty years ahead of everyone else, depending on the tech. (This is also why I sometimes preface trans-atlantic calls to friends with a string of probable buzzwords, just to waste some snoop's time.)

--
"Just once, I'd like to meet an alien menace that wasn't immune to bullets." -- The Brigadier, Dr. Who

THIS IS THE ARTICLE by 1010011010 · 1999-10-01 00:46 · Score: 1

Contact: Eric Mankin (213-740-9344)
Email: mankin@usc.edu

Release number: 0999025

Release date: 9/30/99

A demonstration of the Berger-Liaw Neural Network Speaker-Independent
Speech Recognition System can be found on line at

http://www.usc.edu/ext-relations/news_service/real /real_video.html

Jim-Shih Liaw (left) and Theodore W. Berger (right)
Photo by Eric Mankin

Machine Demonstrates Superhuman Speech Recognition
Abilities

University of Southern California biomedical engineers have created the world's
first machine system that can recognize spoken words better than humans can. A
fundamental rethinking of a long-underperforming computer architecture led to
their achievement.

The system might soon facilitate voice control of computers and other machines,
help the deaf, aid air traffic controllers and others who must understand speech in
noisy environments, and instantly produce clean transcripts of conversations,
identifying each of the speakers. The U.S. Navy, which listens for the sounds of
submarines in the hubbub of the open seas, is another possible user.

Potentially, the system's novel underlying principles could have applications in
such medical areas as patient monitoring and the reading of electrocardiograms.

In benchmark testing using just a few spoken words, USC's Berger-Liaw
Neural Network Speaker Independent Speech Recognition System not only
bested all existing computer speech recognition systems but outperformed the
keenest human ears.

Neural nets are computing devices that mimic the way brains process
information. Speaker-independent systems can recognize a word no matter who
or what pronounces it.

No previous speaker-independent computer system has ever outperformed
humans in recognizing spoken language, even in very small test bases, says
system co-designer Theodore W. Berger, Ph.D., a professor of biomedical
engineering in the USC School of Engineering.

The system can distinguished words in vast amounts of random "white" noise -
noise with amplitude 1,000 times the strength of the target auditory signal. Human
listeners can deal with only a fraction as much.

And the system can pluck words from the background clutter of other voices -
the hubbub heard in bus stations, theater lobbies and cocktail parties, for example.

Even the best existing systems fail completely when as little as 10 percent of
hubbub masks a speaker's voice. At slightly higher noise levels, the likelihood that
a human listener can identify spoken test words is mere chance. By contrast,
Berger and Liaw's system functions at 60 percent recognition with a hubbub level
560 times the strength of the target stimulus.

With just a minor adjustment, the system can identify different speakers of the
same word with superhuman acuity.

Berger and system co-designer Jim-Shih Liaw, Ph.D., achieved this improved
performance by paying closer attention to the signal characteristics used by real
flesh-and-blood brains in processing information.

First proposed in the 1940s and the subject of intensive research in the '80s and
early '90s, neural nets are computers configured to imitate the brain's system of
information processing, wherein data are structured not by a central processing
unit but by an interlinked network of simple units called neurons. Rather than
being programmed, neural nets learn to do tasks through a training regimen in
which desired responses to stimuli are reinforced and unwanted ones are not.

"Though mathematical theorists demonstrated that nets should be highly effective
for certain kinds of computation (particularly pattern recognition), it has been
difficult for artificial neural networks even to approach the power of biological
systems," said Liaw, director of the Laboratory for Neural Dynamics and a
research assistant professor of biomedical engineering at the USC School of
Engineering.

"Even large nets with more than 1,000 neurons and 10,000 interconnections have
shown lackluster results compared with theoretical capabilities. Deficiencies
were often laid to the fact that even 1,000-neuron networks are tiny, compared
with the millions or billions of neurons in biological systems."

Remarkably, USC's neural net system uses an architecture consisting of just 11
neurons connected by a mere 30 links.

According to Berger, who has spent years studying biological data-processing
systems, previous computer neural nets went wrong by oversimplifying their
biological models, omitting a crucial dimension.

"Neurons process information structured in time," he explained. "They
communicate with one another in a 'language' whereby the 'meaning' imparted
to the receiving neuron is coded into the signal's timing. A pair of pulses
separated by a certain time interval excites a certain neuron, while a pair of
pulses separated by a shorter or longer interval inhibits it.

"So far," Berger continued, "efforts to create neural networks have had silicon
neurons transmitting only discreet signals of varying intensity, all clocked the way
a computer is clocked, in beats of unvarying duration. But in living cells, the
temporal dimension, both in the exciting signal and in the response, is as important
as the intensity."

Berger and Liaw created computer chip neurons that closely mimic the signaling
behavior of living cells - those of the hippocampus, the brain structure involved in
associative learning.

"You might say, we let our cells hear the music," Berger said.

Berger and Liaw's computer chip neurons were combined into a small neural
network using standard architecture. While all the neurons shared the same
hippocampus-mimicking general characteristics, each was randomly given
slightly different individual characteristics, in much the same way that individual
hippocampus neurons would have slightly different individual characteristics.

The network created was then trained, using a procedure as unique as the
neurons - again taken from the biological model, a learning rule that allows the
temporal properties of the net connections to change.

The USC research was funded by the Office of Naval Research; the Defense
Department's Advanced Research Projects Agency; the National Centers for
Research Resources, and the National Institute of Mental Health. The university
has applied for a patent on the system and the architectural concepts on which it
is based.

A demonstration of the Berger-Liaw Neural Network Speaker-Independent
Speech Recognition System can be found on line at

http://www.usc.edu/ext-relations/news_service/real /real_video.html

EM.BERGER99

University of Southern California News Service
3620 South Vermont Avenue, Los Angeles, CA 90089-2538
Tel: 213 740 2215 Fax: 213 740 7600
Email: news_service@usc.edu
WWW: http://uscnews.usc.edu

--
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.

One step closer to Star Trek every day... by Ross+C.+Brackett · 1999-10-01 00:46 · Score: 2

I am very excited by the possibilities of this technology. Just imagine it: a really good speech recognition system coupled with a really good natural language analyzer coupled with a good speech generator. What do you get? The comm/computer system from Star Trek:TNG. Hell, there's probably enough "Computer Voice" samples of Majel Barrett to at least give the speech generation software a good starting place.

Who needs a Palm Pilot when you can walk down the hall hands free as the briefs you on your next meeting, or allows you to read and compose your mail on the way to work. My hands shake.

My only concern: the people who design this system would need to included Star Trekish terminology and attitude into the list of things the computer could do. Example:

--
"Computer, please replay voice mail message 9 starting at time index 0-mark-9-5."

[computer chimes, message plays]

"Computer, message 9 sounds garbled. Run a level-three diagnostic on message integrity."

[pause]

"Diagnostic complete. Message shows signs of type-1 file corruption."

"Damn it!"

"Error: cannot comply with that directive"
--

If we could just get that far, then I'd be happy. Actually, no that's wrong. If we could just get that far, then invent warp drive, replicators, transporters, inertial dampeners, and holodecks, *then* I'd be happy.

Ross

Re:Still got a ways to go. by Anonymous Coward · 1999-10-01 23:40 · Score: 0

Check www.inplainwords.com - that's a cool language understanding website

Article misses biggest (and scariest) use ... by ian+stevens · 1999-10-01 00:48 · Score: 3

The article misses another interesting, albeit scary, use of this technology. If these could be made small enough and cheap enough, they could be placed in key locations across the country, forever listening in on passers-by.

Avoiding all the issues of privacy, consider the following scenario. The police want to arrest a suspect for some crime (drug traffiking, conspiracy, etc.) but have no proof and can't tap his phone lines since he encrypts all his phone conversations. Through some method, they train this speech-recognition device to the suspect's voice and either have someone with the device planted on them track the suspect or have an array of said devices placed in public areas where the suspect is known to hang out (bus terminals, bars, etc.). Sooner or later, the suspect might slip up and the authorities have enough evidence needed for an arrest.

Regarding privacy concerns, it seemed to me that this device could only track a handful of known voices ... probably requiring vast processing power to track every voice in a room. So it might be a while yet before everybody's conversations in bugged places get transcripted.

Damned cool technology, though.

--
ian

Re:Article misses biggest (and scariest) use ... by SamBeckett · 1999-10-02 01:03 · Score: 1

Very similar to 1984 by George Orwell.. Frightening at best
Re:Article misses biggest (and scariest) use ... by mizerai · 1999-10-01 01:08 · Score: 1

[...]
either have someone with the device planted on them track the suspect or have an array of said devices placed in public areas where the suspect is known to hang out (bus terminals, bars, etc.). Sooner or later, the suspect might slip up and the authorities have enough evidence needed for an arrest.

My god, you're right! They could never do this with a tape recorder or a digital sampler because it would never recognize the person's voice! It would just record everything!
Oh wait. That would work just fine. :P
Someone would just have to listen to the tape. Maybe one of these things could be used to scan the tape though...

--
--Mizerai
Re:Article misses biggest (and scariest) use ... by devphil · 1999-10-01 01:21 · Score: 2

Several hard- and pseudo-hard-SF authors (e.g., Niven in several short stories, and Sterling in his recent _Distraction_ novel) have written stories involving an alternative application to the "listening to everything" scenario: devices which listen for the patterned sounds of a mugging, a rape, or gunfire. Then the police are called, or (assuming visual pattern recognition) the people involved are knocked out.

I for one want a small device to listen for the sounds of coworkers down the hall muttering, "Hmmm, maybe phil knows about this," and report it to me, so I can hide.

--
You cannot apply a technological solution to a sociological problem. (Edwards' Law)

Call me crazy... by BlackSpyder · 1999-10-01 00:54 · Score: 0

Dosent anyone ever consider the "God Factor"?
I mean, you have DARPA with a new toy, and head shrinkers that have a new tool. And all the time that i was reading that article, i was getting the feeling that people were glorifying themselfs, when they were really just mimmicking what they already saw with the human brain.

Soon we'll see scientists with more neural nets, and the creator of the first one that says "I'm Sorry Dave" will think he is god.

This may sound like just a bunch of preachy BS, but it's very disturbing...

--
BlackSpyder

--
And the gods of Rome and Greece and Egypt all cried out in vain, for noone could save them from their own distruction.

You are all completely missing my point... by Anonymous Coward · 1999-10-02 00:06 · Score: 0

What is special about this technology is not _what_ it does. As several people have pointed out, speech recognition (at least the single word variety) is old hat. IBM had speech recognition technology working marginally during the days of the 386en.

What's special about this technology is _how_ it does what it does. I appreciate those of you who pointed me towards that HP Palmtop with speech rocognition built in, but guys, look at the specs! A 133MHz processor and 32 megs of RAM. Christ, you could practically run DragonDictate on that thing!

This new technology is different because of the amount of processing power and memory (both code and RAM) it takes. You can't run DragonDictate (or any other software based spech recognition stuff) on a cell phone. The CPU power simply isn't there. And if you put it here, it would kill the battery life so severely as to render the cell phone useless. I'm not even going to get into the size and heat issues.

But this new technology does give us the ability to do something we couldn't do before. It allows us to embed low-end but still accurate voice recognition into portable, battery powered devices. And _that's_ what's cool about it.

-Ben

Re:Better Babble Fish? by FunkMonkey#9 · 1999-10-02 00:29 · Score: 1

Well, it is a neural network. It could, theoretically, have a 100% accuracy in translations. All you have to do is teach it another language.

So long as it's instruction is good, it's french (or German, whatever) will also be good.

--

-- The One and Only NotMike.

one use by Anonymous Coward · 1999-10-02 01:36 · Score: 0

user types: FORMAT C: computer: ALL DATA ON DRIVE C: WILL BE ERASED ARE YOU SURE? user says: nooooooo!!!! computer response (using linux(I know that it's really /dev/hda1): FORMAT CANCELED computer response (using windoze): FORMATING DRIVE C: ... this would be cool for when the computer asks for conformation (yes, no, yes to all, cancel) or something.

Re: noise levels by Cuthalion · 1999-10-01 03:33 · Score: 1

"throwing more neurons" into a neural network does not necessarily improve its capabilities. Well, it kind of does. It increases its capabilities to learn special cases, but can often reduce it's ability to learn generalities - if it can memorize that f(2) = 4 and f(3) = 9, it may have a harder time realize that f(x) = x^2.

The more neurons you have the more heterogeneous your training data must be.

--
Trees can't go dancing
So do them a big favor
Pretend dancing stinks!

My puters playing hooky by Myriad · 1999-10-01 03:35 · Score: 2

I have to wonder, one of the major basis for the success of neural networks is that they are trained, rather then programmed in the traditional sense. This works fine while your researching and developing a singular system. But how do you mass-produce these systems? You can't just apply the same code across millions of them. Will there be classrooms filled with little computers learning how to be computers? What happens if one becomes a bully? What if one can't do math? And will there be trauma counselors on hand should one Blue Screen?

Dear Sir/Madam
I am writing to inform you that your network failed to show up for English Class today. We cannot stress enough how important regular attendance is key in achieving a proper education.
Please attend to this matter as this is its fourth missed class.
Thank you,
011100110
Principal - School of Advanced Network Training

--
"They do not preach that their god will rouse them, a little before the Nuts work loose." Kipling, 'The Sons of Martha'

Re:My puters playing hooky by sesquiped · 1999-10-01 04:58 · Score: 2

> You can't just apply the same code across millions of them

Of course you can. The information "learned" by a neural net is contained in a big list (really one or more matrices) of numbers. Training can be perfomed once to get a set of numbers or parameters that performs a task well, and then a product can be mass produced with that specific configuration. Sometimes, the product may use a know good configuration as a starting point and allow more learning. But neural network learning certainly can be reproduced.

[I realize the original comment was meant to be more funny than correct, but I think I should point out faulty premises.]
Re:My puters playing hooky by Sesse · 1999-10-01 05:10 · Score: 1

In every neural network (at least in traditional networks -- don't think this one would be THAT different), there are two important things: which neurons are connected, and the _weights_ of these neurons. The training only tries to find the optimal weights. After you are satisfied with the results, it's easy just to transfer the weights to a copy of the network (with the same structure). No re-training neccessary at all.

/* Steinar */

--
(This comment is of course GPLed.)

Dubious? No, not really. by Bearpaw · 1999-10-01 03:39 · Score: 1

"Sounds like some pretty dubious claims that some neurons can out do a human brain."

Relax. Some neurons can outdo a human brain for a specific, limited function. Note especially that according to the article, it was "benchmark testing using just a few spoken words".

Presumably it'd take a larger neural net to deal with tens of thousands of words. Though it's possible the concept could be extended to that level.

And why not?

Who needs a Palm Pilot... by Sesse · 1999-10-01 04:46 · Score: 1

...OR speech recognition to write mail with your voice. Actually, people are doing it today, via a very cool tool. It's called ASAACP (A Secretary And A Cell Phone) ;-) Yup, some do it... Not saying that it's not weird...

/* Steinar */

--
(This comment is of course GPLed.)

Re:Better Babble Fish? by Bearpaw · 1999-10-01 03:50 · Score: 1

Well, 80% is certainly better than my command of the french language, but it's still bad enough to risk getting slapped.

Actually, some slashdotters are using english at a level less than 80% correct ... so if you translated that to french, it would be about 64% correct.

Maybe we should just stick to pointing and grunting.

Re:Better Babble Fish? by Sesse · 1999-10-01 04:53 · Score: 1

Yeah, _first_ run it through a recognition process (at let's say... 95%), _then_ through Babel Fish (which has an accuracy of about 30%... if you're lucky), and then through some speech engine, which probably has an accuracy of 1%.

Sounds like you would get a close to _minus ten percent_ (OK, it's not negative...) translator to me ;-) At least with today's technology. I don't think this recognition stuff is ready for the masses before at least a few years have passed... Sounds like I'll just stick with the fish in my ear for now... ("Come on, it's only a little one...")

/* Steinar */

--
(This comment is of course GPLed.)

Re:Long way to go, but cool for AI by Bearpaw · 1999-10-01 03:56 · Score: 1

Be careful in thinking that this will be the great leap in technology, and we'll all be talking to our computers in a year. This 11 neuron system is capable of differentiating four words, each of which was trained extensively. That's a very tight niche. Until we have a system where each word doesn't have to be trained explicitly, we won't have gotten too far. (Imagine training your computer with the estimated 1+ million English words...)

A good point, but maybe the long involved process they used in the lab can be automated somehow. Lab work is sometimes like that. Some gruntwork is necessary to set up proof-of-concept, but there are often ways to speed up the gruntwork if the p-o-c gets you sufficient funding.

The English language by Sesse · 1999-10-01 04:57 · Score: 1

Sure, English might have a million words, but I doubt you will use more than 1-2% of that in daily life. A system taking the 10-20000 most common words would be more that good enough for most uses. Typing (or guessing, if you're not using it for dictation) that occasional weird word (like `Slashdot' :-) ) wouldn't be too much of a nuisance anyway.

/* Steinar */

--
(This comment is of course GPLed.)

Still got a ways to go. by mineralfan · 1999-10-01 05:17 · Score: 2

Although this article is impressive, realize that the ability to pick out words is entirely different from the ability to understand words, to use words. I would bet that a 2 year old baby still has better comprehension and understanding of ideas expressed by spoken words than this nerual net does. Think of the way our language evolves, all the slight variations in tone and in gesture(sarcasm anyone?) , regional dialects (it's like butta) and all the double meanings of words (cleave). Mind you this stuff is pretty neat, but we have a long way to go before we can have conversations with our computers. Even then, I would rather talk to a two year old, i'm sure they hold the secrets of NP math in their little brains, they just forget it all during their Power Rangers phase.

The real components. by Matt2000 · 1999-10-01 00:58 · Score: 4

If you read down near the bottom of the article, however, you will find this:

"The network was configured with just 11 artificial neurons, and in a sub-stage a live goat brain. The brain was activated through a patented process involving a castle and a lightning storm.

The researchers said one day they hoped that all humanity could benifit from the power of lighting.

Then they laughed kind of ominously."

Hotnutz.com

--

Mod Parent Up by CmdrTaco (Score: 2) 02:41 PM April

Better Babble Fish? by Shin+Dig · 1999-10-01 01:01 · Score: 1

This makes me start to think about the translation AIs used in Kim Stanley Robinson's _Green Mars_. If it can recognize language, then it could pump it through a translator, and out the other side could come 80% correct french. That would be kinda cool. :)

--
There is no silver bullet. Plus, werewolves make better neighbors than zombies or vampires anyway.

Skins by Skip666Kent · 1999-10-01 01:03 · Score: 2

Terminologies, dialects, genders and whatnot would (will) be user-defineable, much like WinAmp skins or QuakeWorld skins. You'll have endless variations of the Star Trek Theme (including the charming and original fembot monotone from the original series), the Gangsta Theme, the Sesame Street Big Bird Theme and, of course, my personal favorite, the Wicked British Nanny Theme.

"You have 3 tasks left incompleted on your to-do list, you Naughty little boy! This calls for a vigorous spanking!"

(whipcrack) GrrrrrrOWl!

--
**>>BELCH

Is everybody a pessimist?? by Anonymous Coward · 1999-10-01 01:07 · Score: 2

Come on! This is _the_ coolest piece of technology I have ever seen. Yes, there is the "big brother" possibility, but we shouldn't discourage a technology solely on that merit. Think of what this could do for deaf people! A pair of glasses that gives a text overlay of every (or certain) conversations in the room. Think how cool it would be to have your MP# library hooked up to a voice recognition system (yes.. ala trek). From what I understand, this system could still here your requests even when you had your music blasting. Talk about simplifying computer interfaces. Forget all this GUI crap!

Re:Is everybody a pessimist?? by SRMoore · 1999-10-01 01:16 · Score: 1

But what if one of your songs had a line that was the request to switch songs!!! =)

Remember, it's only a few words... by Dandre · 1999-10-01 01:08 · Score: 4

In addition to the other good comments posted regarding taking this announcement with a grain of salt, I must add that the new system can only recognize a few words -- with only 11 neurons, it couldn't do much else. Without further information, I would guess that training up a net to recognize more words would be quite complicated -- especially given the non-standard training algorithms that were used. It would be great to find a scientific paper written by the researchers on the issue instead of solely press-release material. -dandre

Long way to go, but cool for AI by chuck · 1999-10-01 01:09 · Score: 2

Be careful in thinking that this will be the great leap in technology, and we'll all be talking to our computers in a year. This 11 neuron system is capable of differentiating four words, each of which was trained extensively. That's a very tight niche. Until we have a system where each word doesn't have to be trained explicitly, we won't have gotten too far. (Imagine training your computer with the estimated 1+ million English words...)

On the other hand, this could be a great leap for neural networks in general. Realizing that the timing of synapse signals is a critical factor in neuron firing is going to shake up some things in AI. (At least, I was never familiar with neural networks that used timing cues. If I am wrong, please let me know.) Of course in a large neural network, you're going to have lots of propagation latencies as signals bounce around the net, and it makes sense that even more important than which neurons fire is when neurons fire. It actually seems to justify the complexity of neural nets because the timing data can represent a much larger data/search space than the simple fire/dormant state of each neuron.

This could be exciting.

--
My Freakin Blog

worthless without peer review by jetson123 · 1999-10-02 04:16 · Score: 3

The claims are worthless without descriptions of the experimental procedures, peer review, and replication. There are already many ways in which pattern recognition systems and neural networks can greatly outperform humans, even in the presence of noise; that says nothing about whether it is a practical advance or not.

While the press release doesn't say much about neural networks or whether the state of the art in speech recognition has improved, it tells us something about a disregard by USC for standards of scientific conduct: scientific publication by press release is improper.

Re:worthless without peer review by frendluv · 1999-10-02 05:19 · Score: 1

This is assuming that the current system of peer review- i.e. publication in scientific journals, 1-3 months for a reply, 1-3 months for a reply to the reply, etc.- is the best way to do it. I do not believe this to be the case. Corporations, for instance, make advances in science and release their findings via press release, and let the market do the "peer reviewing." I am not saying the peer review system is invalid; it has carried us as far as it has. But for the same reasons that universities are not necessarily keeping up with the times, the scientific community's exclusion of "the rest of us" is both arrogant and stupid. The discourse that is going on here on /. is evidence of this; people are posting their opinions on the matter, people who most likely never would have seen this if it had been shoved into some obscure (or even "popular") journal.

Peer review can occur in many forms. My guess is that the USC scientists have already submitted their findings to a journal. I think the press release was cool; it gives us "laypeople" something to discuss. The clergy always gets upset when that happens...

--
everything you know is wrong

This is not entirely true by Mysteron650 · 1999-10-02 06:46 · Score: 1

Unless there have been some recent changes that I'm not aware of, Echelon is not affiliated with NATO (in fact, many of NATOs members are spied on by Echelon). It is a network run primarily by intelligence agencies of UKUSA alliance - USA, UK, Canada, Australia and New Zealand (as well as a few other minor partners who give some contribution). It is well known that Echelon can search through internet traffic and fascimiles (among other "written" forms of long-range communication) for keywords, but at the current point in time it is unclear whether Echelon possesses the ability to do the same with telephone conversations via some form of speech recognition. An EU report from earlier this year seemed to indicate there was little evidence of this.

A small dose of insight. by mbkennel · 1999-10-01 05:27 · Score: 2

Background: I am a physicist who works in chaotic time series analysis. There are some colleagues in my institute who work on various theoretical aspects of synchronization and information processing with ''realistic'' neurons, i.e. ones which employ the right kinds of time-dynamics. ---------------------------------------------- Firstly, as has been pointed out, the apparent small size of the 'training set' makes the recognition task easier and the apparent results seem better than they are. But...that does not erase the actual accomplishment however. The tasks: It comes down to different statistical concepts.

If you have two hypotheses e.g. A and B, corresponding to 'two words' which were said, then it is easy to build systems which can recognize signals corresponding to A and those corresponding to B embedded in lots of noise. Basically you measure the likelihood ratio p(B)/p(A) using some sort of estimators that you've trained to light up with either A or B. If you gave me the data, I could do this with a number of different semi-conventional numerical techniques on a digital computer. I've seen similar things presented at conferences a few years ago---recognition of specific chaotic waveforms (specifically dolphin and whale song) embedded in lots of noise.

This is known as a "simple hypothesis test".

The more general circumstance, however is that the alternative is not A vs B, but A vs a huge multitude of other possibilities. This task is much more difficult, and correponds to the actual large-vocabulary speech recognition task. Now it becomes much more difficult to set a reliable threshold which will come on only when A is actually present, and not when A is absent. There is a tradeoff of false negative and false positive errors depending on your choice of threshold.

There is no possible way that this thing can recognize 50,000 words. There are only 30 connections, there is fundamentally not enough information processing power intrinsically in there.

What you would do is to have all sorts of these subunits lighting up their own 'word finder lights', and the result of *those* (i.e. the p(A) detectors) would then be inputs into higher level semantic networks of perhaps a similar type. These networks or hidden markov models or whatever are the ones that know which sorts of words follow other sorts of words, and thus let you get better recognition than the individual word finders themselves.

So, what is the accomplishement of this paper??

That they've apparently found an extremely efficient and well-performing low-level subunit using this time-domain information. From our own experimental observations (not on speech but on real live neurons from recently-living animals) this is very important. The fact that it is only 30 connections might mean that it is quite feasible to put 10 or 20 thousands of these subunits on a single chip, running in hardware. Given the factor of a thousand speed increase of electronics over neurons if you could time-division multi-plex different recognizers (blue sky dreaming here!) you could have that much many more of them during the milliseconds to seconds of audio-frequency processing time that we speak at.

If you notice, Professor Berger said that no other speaker-independent system outperformed humans, even in small test bases. Presumably that means in the small Bayesian post-hoc sorts of likelihood test regimes taht I described before. And in addition, it appears that this is not a simulation but that they built it on an actual physical computer chip, another very substantial advance.

My colleagues are going to ask the authors for the actual paper. The title and press release may be overblown, but this smells like real science and a significant advance here to me.

Take home message: even small groups of good neurons can do interesting and useful things. With the right architecture, a small group of neurons can outperform conventional "neuroid networks" of hundreds or thousands of nodes linked by linear transformations of sigmoidal basis functions. We may just be beginning to crack real-AI.

We see major body functions of lower animals being regulated by say ten neurons. Real neurons are much smarter than you think. :)

If small groups of neurons can do this, it makes you appreciate what a hundred billion might be able to do.

Your tax dollars at work. by Anonymous Coward · 1999-10-01 05:33 · Score: 0

If you live in the US, please take the time to thank your Federal Reps. for allowing federally funded researchers to patent federally funded inventions.

Re:Brain implants by Anonymous Coward · 1999-10-15 06:10 · Score: 0

It has been done, as a handicap aid for neck-down paralysis. However, it's slow, and it's "just" cursor control with an on-screen keyboard. (Still, with a retinal display, it could be cool, as long as you're up for huge amounts of training.)

Because by Anonymous Coward · 1999-10-01 05:35 · Score: 0

The military silences inventors in the private sector. - Anonymous silencee

i want one by Anonymous Coward · 1999-10-01 05:46 · Score: 0

i have typing disease

Ah, but think of sauce for the gander.... by A+nonymous+Coward · 1999-10-01 05:53 · Score: 1

Certainly the spy agencies and secret police (yes, I include DEA FBI etc :-) will get these first, but it won't be long after they are available to everyone, and cheap too. (As an aside, the recent news about the Canadian cell phone company kowtowing to the FBI wiretap rules just amused me; it won't be long before computers the size of buttons will be able to be phones under Linux control, and there's bugger all the FBI can do about it.)

So now choose your favorite target: reporters planting these all over city hall, civil rights activists planting them around the local police stations, and nationwide -- ah, imagine the juicy into to come out of the national level bureaucracies, Congress, the FBI and DEA themselves...

It's the same with The Diamon Age motes floating around, David Brin's ubiquitous cameras, and so on. The big guns will have their monopoly for a very short time, and then they will have the last surprise of their very snoopy life.

Life will be really interesting. This is definitely a multi edged sword!

--

--
Infuriate left and right

Hmmm... by zagmar · 1999-10-01 01:13 · Score: 1

Now I just need to upload my neural patterns to a more complex version of this, and voila! Mind Children!

Enough paranoia by xmedar · 1999-10-01 01:21 · Score: 1

Ok, yes the technology can be used to snoop, but just imagine combining it with some of those new glasses that project an image into your eye, you could instantly search the Net for supporting information for the conversation. If you think thats nuts, I now have telephone conversations and use search engines when I cannot recall something, which really freaked out someone the other week when we were talking about old TV programs and I couldnt remember the name of the actor in Max Headroom so I searched and found it was Matt Frewer while still chatting.

--
Any sufficiently advanced man is indistinguishable from God

Slashdot Mirror

Neural Net Outperfoms Human in Speech Recognition

203 comments