UK Researchers Make Neural Networks Smarter
Small Hairy Troll writes: "EDTN is running this story concerning a researcher in the UK who has come up with a method for getting those pesky Neural Nets to teach themselves to see. Called the 'Product of Experts,' the Neural Net is built using 'Experts.' If "you had one expert that preferred furry animals, whereas another expert preferred domesticated animals and another preferred small animals, their votes ... would light up dogs and cats very nicely." And an Edinburgh professor is quoted in the story as calling it "the first neural-network architecture that is both sensibly implementable and worth implementing."
Marvin Minsky is a moron. In the late 60s he said that Neural Networks have no future and singlehandedly set back the AI field 20 years because everyone trusted him, until people introduced the Backpropagation algorithm and started research into neural nets again in spite of him. By the way, here's my implementation of a generic neural net using backpropagation (150 lines of C++).
Actualy if anyone is REALLY interested in AI and not just another 15 year old Matrix fan, they shuld stay away from those books and get hold of something from Kyoto labs or Kloksin & melish or something
> Question: Seeing the words 'biologically valid' conjures up an image of scientists pursuing pure science rather than concentrating on the applications of it. Is the goal of NN today more theoretical (we want to get something to behave more like a smart being) than practical (we want something that will specifically put names to faces/discriminate balloons from weapons/identify handwriting like an expert).
Both.
Cognitive scientists are using NN technology as a 'biologically valid' model for cognition. (Though only a fool would remain unaware of the enormous gap between our NN toys and the real thing, and of the enormous simplification that goes into our toys.)
Others just look at NN as a technology to be exploited without reference to biology.
> I suspect that this field has narrowed in the last decade (but I may be wrong), and so I fear that it may be getting wayyy esoteric.
Wayyyyy. Like any other branch of science, especially CS, this field is rapidly "narrowing" in the sense of getting deeper, but also "broadening" in the sense of developing more branches and more connections to other fields. (E.g., lots of parallels have been shown between NN and physics, and between NN and statistics.)
> As a practical engineer who needs solutions today, should I devote more energies to this or less? What is happening elsewhere in the field?
It's no longer possible even for NN researchers to stay on top of everything that's going on in the field, so don't even think about investing that much time in it.
Beyond that, what's your field of application in engineering? Do your journals ever cover relevant NN technology? If not, you might be able to start a SIG, so that the effort of keeping an ear to the ground and filtering out the uninteresting material could be spread among the members, rather than going it solo.
--
Sheesh, evil *and* a jerk. -- Jade
Wow... when I worked in AI in college my thesis was "Heuristic Reasoning for Stress-Strain Finite Element Generation" (hey, if I don't toot my own horn...), and when I briefed my findings to one of my proctors he started grumbling and the other teacher said "Marvin, what's wrong. He's describing fuzzy logic." To which he replied, "Fuzzy Logic I understand, it's fuzzy explanations I don't get!"
... "we have reduced the learning time by coming up with a new conceptual way of classifying data" ... "the new way is more 'biologically valid'"...
I guess it's hard to explain the field of AI to outsiders, because even though I (used to) understand neural networks, I feel that I need some sort of a diagram to 'get it' here. But what I think I heard is that "learning is the big problem in NN, or maybe a better word would be teaching"
Okay, I do have a question for the experts out there. Help would be greatly appreciated 'cuz we may be able some day to apply this to National Missile Defence to help discriminate balloons from nukes, currently a big problem.
Question: Seeing the words 'biologically valid' conjures up an image of scientists pursuing pure science rather than concentrating on the applications of it. Is the goal of NN today more theoretical (we want to get something to behave more like a smart being) than practical (we want something that will specifically put names to faces/discriminate balloons from weapons/identify handwriting like an expert).
I suspect that this field has narrowed in the last decade (but I may be wrong), and so I fear that it may be getting wayyy esoteric. As a practical engineer who needs solutions today, should I devote more energies to this or less? What is happening elsewhere in the field?
SDMI: Finally! Music that won't rip or burn! Brought to you by the fine folks at RIAA.
This doesn't seem like much of a break-through. The article mentions Back-propagation networks as an effective (but as I understand it, slow) NN. The article says that this technique is an improvement on back-props. I do know that in the last year, Quick-props and Fast-props have both came out which are 'major' improvements on the back-props algorithm.
My question for the specialist is, how is the PoE model any different/better/worse than these other improvements? It is my understanding that Quick-props is very good for practical image recognition problems.
Keeping
You may be right when you have no idea about the relationships in the data, and just want the NN to "work". But a big correlation is not my idea of "smart".
When I fit a curve to some data, it's because I have a certain fundamental understanding about that data and how it should behave. Only an idiot would try to fit a parabola to an Arrhenius curve.
AFAIK, no NN gives this insight. It is therefore "dumb" in both senses of the word.
I work with statisticians and modellers, and they have an extremely low opinion of neural nets.
As described to me, neural nets are _huge_ (every datapoint is in) underconstrained matricies with an infinite number of equally valid solutions. "Training" [programming] them is an exercise to find a strategy for the "best" solution.
Practically, when NNs are well done they will give you back the data you fed into them! When exceptionally good, they will give reasonable interpolations on the data. But forget about correct extrapolations.
if(animal is furry && animal is small && animal is domesticated) it's probably a cat or a dog.
Problem is you have to hardcode in or read in what is furry and what is small and what is domesticated, whereas the nn figures out what these features are and adapts to changes in the input automatically. Otherwise you have some human extracting new features and hardcoding them somewhere.
This appears to be similar to the technique used by Hopfield's Mus Silicium neural net speech recognition contest. The solution ended up being that recognition occurs when a large number of neurons connected to the same output neuron 'synchronize' and fire at about the same time. The big difference between these approaches seems to be that Hopfield is using spiking neurons and these guys are using some form of back propagation to train smaller networks that have to agree on what some data set represents in order to return a positive result.
Well, this is partially true. At least the MLP neural network is just a non-linear function approximator, that can, in theory, learn any mapping from the inputs to the outputs. The network is trained using points from the input space together with the desired output. Statisticians would probably call MLP non-linear regression. I also know some statistics profs who have a rather high regard for neural networks. The name has a lot of hype, but the methodology works. About correct extrapolations: I would like to see anything correctly extrapolate in a general case given only a few random observed points and desired output. The performance of these methods depends on how well the assumptions of the method correspond to the way the observed data really behaves and you cannot tell that with certainty from a few random points.
The training is usually done using two separate sets of data: a training set and a test set. The training set is used to train the network and the test set is used to test the performance. When the performance of the network starts to degrade instead of improving, then the network is starting to overfit and loses it's ability to generalize.
Basically you can just think of the MLP neural network as a classifier (when your goal is to classify, which is usually the case with neural network) that draws arbitrary boundaries in n dimensional space. Somehow I always think of those blobby objects in computer graphics, when I think of the classification boundaries, but this is of course not strictly correct. I guess it still helps.
it's a reasonable interpolation and speaking of your own comment...
>You could've come off as someone who was
>interested and wanted to know more. But no. You >had to make a snide ass remark. Good one, bucko, >we now all know what a dumbass you really are.
'nough said...
Hi,
;)
I'm an AI researcher, and I'll tell you that you're patently wrong
One thing first: Hinton (the inventor of PoE) was one of the people re-popularizing NN's way back in the early 80's while at U of T.
Now, others have tried combining experts before. But Hinton's approach beats them empirically. The reason is that the experts are trained together in PoE rather than being trained seperately and considered jointly for evaluation, as in previous approaches...
Another thing: the NN's here are _very_ different from backprop NN's. The entire topology is different. Backprop neural networks are a special case of Bayesian nets, but PoE is based upon random fields. While there is current research being done on training random fields from bayes nets, there are in fact things you can represent with a random field that can't be represented with bayes net (some of Minsky and Papert's proofs for perceptrions in fact be extended to prove limitations on backprop nets and bayes nets). The converse is also true: there are distributions that can be represented in Bayes nets and not in random fields. (But of course, both classes of distributions can be represented as factor graphs...)
I remember seeing a video of Apple working on this about 5 years ago. Part of the original 'vision' behind the Newton was based on this. Needless to say it hasn't worked yet...
.sig
I know it makes "cool" talk smack of famous people. But it's a bit exaggerate to call someone moron so promptly.
BTW I know it's also "cool" to say C++ but your code is simply "dirty 'C'".
bau
Choose your weapon:
:)
object
interface
property
methods
wrapper
function
Hmmm. its a tough one - they all sound bad!
We clearly disagree. I take your vitriol as a sign you feel insecure in your position.
You should understand my application: modelling multicomponent sequential chemical reactions and predicting the yields. NNs do very poorly at this. The diffeq's work great because we do have some fundamental understanding of the underlying elementary processes.
With this, we can extrapolate with surprising success. And interpolate, both with surprisingly few parameters (a few dozen for 150 components). I was glad to see your admission the NNs cannot extrapolate. Extrapolation is very important to us, and we do it well! Dumber number crunching methods may well be incapable, so we should avoid them.
Oooh! =) Could it be the driving force behind the next generation scour.net?? Old sk00l pr0n search engines would be obsolete!
Yeah I know, if you just use arrays it can be much shorter, most of it is test code anyway. I wasn't bragging or anything. There are 3 includes in the program, which one was missing?
This stuff has been done before under the heading "Ensembles" and "Combining Multiple Models", various othe forms of communities of experts. This is also sometimes referred to as "Bagging" and "Boosting" as well. I dont have my references to hand, but if someone wants them I'll dig em out. I can reference papers going back to the early 90's if you'd like :) Check out EWSL-91 I think.
I might be missing some technical details that they've developed, and it would be extremely unfair to denigrate this research without reading the papers, but it does nae sound revolutionary to me. Given the person being quoted (EE), it doesnt sound like they come from the Machine Learning community at all, otherwise they might know this literature!
Winton
p.s. It's 3 am, been playing Myth 2 for a couple of hours, and a couple of beers, so I can't deal with the hassle of looking this stuff up with URL's etc...
How's my post not in English ? ... ?
If you feel the need to answer with an explanation then I suppose that somehow you understood my broken (?) English.
Anyhow, thanks for your opinion, Mr.
bau
I said it's C++ because it won't compile as C. You are right, it's not object-oriented though. Why does it make "cool" to say C++? I hate C++ and I like C better, it's just that I was using Visual Studio for this and it will compile anything that resembles C or C++, probably even my grandma's cake recipe. Oh, and I'm not saying this code is the pinnacle of AI programming, it's just a quick&dirty implementation I used to try something.
It makes cool to say C++ because one doesn't sound like a bigot (to the eyes of those newbies 8). But from your answer it doesn't seem like you were trying to be cool.
Cool !
My -GOD- man! That's one of the most disturbing things I've yet seen. You slimy bastard.
I don't think that was the point that "junkmaster" was trying to make. But there is a certain advantage to using randomly generated sets.
The mathematics of sd shows that by increasing the strengh of the weak models that are "combined", one requires fewer of these weak models to get the same quality of recognition. In fact, there is even an equation that puts limits on this (which I believe is based on Chebyshev's inequality, but I don't remember exactly) and the most commonly used implementation of sd does allow you to set a threshold for the quality of weak models you want chosen from those that are randomly generated.
However, remember that the concept of uniformity is the linchpins in sd. All the models that you choose to "combine" must be uniform, as defined by the theory, with respect to each other and to the problem space. Randomly chosen models tend to make it easier to accomplish this because the fact that they are random already gives them a closer to uniform coverage of the feature space. They just need to be tuned to get true uniformity.
There is nothing theoretically wrong with using stronger weak models to get to your solution faster. But by putting "intelligence" into the process of making the weak models, it would be much, much harder to ensure that they were mathematically uniform. I said it in my original post but I should stress the point that the uniformity concept is the reason that sd beats out all the other methods for "combining" weaker models in standardized benchmarks. The importance of obtaining it cannot be underestimated.
From your description I would think the method would be even more valuable if it had a better way than testing randomly generated models to find an optimal set. Hinton has proposed methods to dependently train his experts, which to me seems a very desirable property. Using randomly generated models is actually a strength of SD, the "proof which this margin is too small to contain." <grin>
Does anyone smell a zz.. Or, more like, a Terminator? On a serious note, this will have tremendous impact in the target acquisition sector; I'm sure people at Raytheon and Lockheed are drooling right now (rather, they're probably finishing up timing closure on new multimillion gate ASICs that implement just this type of stuff).
Love - The Sole Religion
If anyone is really interested in getting into AI I highly suggest reading anything from singinst.org, or The Society of Mind by Marvin Minsky, and also The Age of Spiritual Machines by Ray Kurzweil :)
------
http://vinnland.2y.net/
"I would say that 99 per cent of what my father has written about his own life is false." - L. Ron Hubbard Jr.
Its not nearly so simple as a logical and.
For one thing, his PoE model is designed to make each indivdual nets(I think) more capable at responding to a set of features. I don't know if it would intentionally segregate into set animal, set small, and set furry, but it's supposed to be much simpler than the standard supervised network to train.
All it needs to do is get good at sorting images and simplifying the input; a second stage of recognition is then applied to the, theoretically, simpler set of information.
The example you're using is incomplete; his PoE would detect the features small, animal, and furry, where the traditional model would detect the feature cat-like and the feature dog-like, without the sharing of information or neurons that the PoE enables. The second stage of his PoE, the recognition center, would use the sum-product of each of of the simpler feature detectors and then decide if it were cat or dog like.
Geek dating!
GPL Deconstructed
> The article was rather light on details, but this doesn't look like much of a breakthrough.
All the more so, since the notion of combining NN experts is already quite old. Haykin mentions it in the 1994 edition of his textbook.
Notice that that's 10% of the way back to the invention of electronic computers, and about 43% of the way back to when the backpropagation algorithm rescued neural networks from obscurity.
--
Sheesh, evil *and* a jerk. -- Jade
And what about the is operator? The article implied that it was somehow better fit to this task than the identification used in other systems.
Even Slashdot wants to hide some things
Isn't it time men stopped thinking of women as devices?
--
Fuck the system? Nah, you might catch something.
He has been working on various algorithms relating to Neural Nets, including the wake-sleep algorithm. Read the book "Unsupervised Learning: Foundations of Neural Computation" which he co-edited for some insights and relevant reasearch papers on the topic.
-Shieldwolf
just = (My)Opinion.toCents();
Neural network supervised learning experts blah blah... just give me a robot that can vacuum dammit! And not suck up my headphone cable in the process, that's the tricky part...
One notable exception is in the research of Stochastic Discrimination (sd). This technique was originally developed through mathematics rather than experimentation as is the case with NNs. In other words, rather than the "let's see what happens if" development of NNs, sd's approach is "the equations say this should happen". Because of this, it is very rigorously defined and the hows and whys are clearly understood.
Sd also "combines" weak models but in a way that, to the best of my knowledge, no one else has done before. The basics are:
- Incredibly weak models are generated to solve the given problem.
- Hundreds of thousands of these models are combined.
- These weak models must be uniform with respect to each other and to the problem space.
For example, rather than combining a few very specific models as described in this article (one for furry animals, one for domestic, etc), sd would randomly generate hundreds of thousands of weak models to solve the problem. Each of these models would look at a different set of features but there would be so many combinations that you wouldn't be able to name them. For example, maybe one model would learn to distinguish based on the length of tail, the color of snout, and diameter of the third toenail. This model obviously can't be named. The set of features it looks at are too odd. But if we note that it has some trivially weak ability to tell the difference between a dog and a cat then we accept it. For the problem of dogs vs cat, we may only require that any given model be 50.1% accurate on our training set. When we "combine" all these weak models, a strong solution emerges. Why this happens has its roots in the Central Limit Theorem.But before we "combine" them, we have to see that this weak model is uniform with respect to the other weak models. This is a term defined in the sd theory. Basically, what it means is that the weak models need to be evenly selected throughout the set of all possible weak models. In other words, there is no oversampling or bias. (Actually, this isn't quite right but goes in the right direction. Read one of the papers if you're interested.) The concept of uniformity is probably the most interesting part of sd and it is the primary concept that all the other "combination" techniques miss. In this article, for example, how do we know that there isn't a connection to being furry and being small? If there is a statistical dependency, then the vote won't be fair and results will be weaker.
Anyway, that's a real crash course in sd basics. So how does this algorithm perform? On the standard benchmarks (Irvine, for example) it handily outperforms anything out there. Right out of the box and without tuning. For more information see the web site or send me email.
This this the most ridiculous post I've ever read. As many replies have already explained, a Multi Layer Perceptron (one kind of NN) is an universal approximator, that is it can be used to model a mapping from one set to another, thanks to the data (a set of input-ouput associations).
There are other ways to obtain universal approximation (which means that basicaly any regular mapping can be approximately represented). NO method can be used to give correct EXTRApolations. This is not possible. Period.
Now, regarding INTERpolation of the data, Barron as demonstrated in 93 (see Universal approximation bounds for superposition of a sigmoidal function in IEEE transactions on Information Theory volume 39 number 3, pages 930-945) that MLP are more efficient than any other methods. This means that they use LESS parameters than other interpolation methodes (such as spline, kernel regression, etc.).
So, the post I'm answering to is bullshit. I strongly advise posters to read the NN FAQ before posting ridiculous claims.
yeah but what about my hairy baby alligator....., that falls into your catagory.......
:(
hang on... crap, you said probably
-
As cunning as a fox, which has just been appointed professor of cunning at Oxford University. http://www.kinlan.co
Hmm. Well maybe I'm missing something, but it seem that the technique merely abstracts the training away from the functional network by one remove. The "Experts" still need training; the system does not appear to "teach" itself anything, but instead relies on the pooled opinion of already trained Experts.
It still seems to be missing the training bootstrap - how to we train ourselves in a system in which we are untrained?
Don't get me wrong; kudos to the researchers. But no brownie points at all to the journalists, slashdot or others, who appear to mistake an adept implementation of a pattern recognition system for something that it is not.
I remember learning in Psych 207 (Cognition and Memory) that cats have been shown to have such "experts" in their brains. In particular, they have one for detection of horizontal lines, emabling them to get a good understanding of where a ledge is. You can stunt the growth of these experts by removing that type of stimulus at an early age. Placing kittens in a round room with vertical bars on the wall and allowing them to grow up there will effectively remove their ability to jump up onto a ledge.
-no broken link
(1) Someone please Moderate the previous reply up to being interesting /useful.
:) ! However, in the PoE paper there is absolutely no references to the work done in the Multi-Agent & Machine Learning communities.
(2) I reviewed quickly his initial paper, and then realised, oh that Hinton
(3) That being said, his work is definitely of interest - although I doubt that it is a huge breakthrough in AI as being suggested. If it is similar to the work done in the ML community, then basically what you get is a nice way to integrate differents Points of View's on the same situation - which helps overcome the tendency for ML algorthims to suffer from local minima and being sensitive to the actuall distrubutions of the data.
(4) I couldnt find any comparisons to work outside of the NN field.
Winton
the concept of semi-intelligent agents, acting together to perform what appear to be intelligent tasks is the main theme running through minsky's book society of mind, a great read, highly recommended.
to minsky, our intelligence is the product of millions of agents that autonomously perform various tasks and send messages to one another. what i found interesting about this article was that the ideas of the classical AI (minsky, et al.) are morphing with the "modern" AI. this is cool because when i studied artificial intelligence in college, i got the impression that there was a holy war between the two AI camps. it's nice to see the convergence..
-mikeSpeaking of Fuzzy Logic...
Pax Digitalia
This seems to be an interesting concept. If a neural network could be taught one's preferences, one's personality even, wouldn't it make an excellent agent? A little bot that could go do a lot of menial shit we loathe doing. The idea has been proposed, but would these be quick enough for the job? And if they were, would it be overkill?
Pax Digitalia
if(animal is furry && animal is small && animal is domesticated) it's probably a cat or a dog.
Congratulations, you've reinvented the and gate, which those of us outside the neural network community have been using for a long time.
Even Slashdot wants to hide some things
Today's sig brought to you by http://www.swankypimp.com