A.I. Advances Through Deep Learning
An anonymous reader sends this excerpt from the NY Times:
"Advances in an artificial intelligence technology that can recognize patterns offer the possibility of machines that perform human activities like seeing, listening and thinking. ... But what is new in recent months is the growing speed and accuracy of deep-learning programs, often called artificial neural networks or just 'neural nets' for their resemblance to the neural connections in the brain. 'There has been a number of stunning new results with deep-learning methods,' said Yann LeCun, a computer scientist at New York University who did pioneering research in handwriting recognition at Bell Laboratories. 'The kind of jump we are seeing in the accuracy of these systems is very rare indeed.' Artificial intelligence researchers are acutely aware of the dangers of being overly optimistic. ... But recent achievements have impressed a wide spectrum of computer experts. In October, for example, a team of graduate students studying with the University of Toronto computer scientist Geoffrey E. Hinton won the top prize in a contest sponsored by Merck to design software to help find molecules that might lead to new drugs. From a data set describing the chemical structure of 15 different molecules, they used deep-learning software to determine which molecule was most likely to be an effective drug agent."
A lot of vague marketing-speak in this article. "Deep learning"? The article basically talks about neural networks, just one of the techniques in machine learning.
It's hard to tell from the article, but they probably are trying to refer to Deep Belief Networks, which are a more recent and advanced type of Neural Network, which incorporates many layers:
Deep belief nets are probabilistic generative models that are composed of multiple layers of stochastic, latent variables. The latent variables typically have binary values and are often called hidden units or feature detectors. The top two layers have undirected, symmetric connections between them and form an associative memory. The lower layers receive top-down, directed connections from the layer above. The states of the units in the lowest layer represent a data vector.
The ``new'' (e.g. last decade or so) advances are in training hidden layers of neural networks. Kinda like peeling an onion, each layer getting progressively coarser representation of the problem. e.g. if you have 1000000 inputs, and after a few layers, only have 100 hidden nodes, those 100 nodes are in essence representing all the ``important'' (some benchmark you choose) information of those 1000000 inputs.
"If anything can go wrong, it will." - Murphy
I wonder how much of these improvements in accuracy are due to fundamental advances, vs. the capacity of available hardware to implement larger models and (especially?) the availability of vastly larger and better training sets...
I'm sure all of that helped, but the key ingredient is training mechanisms. Traditionally networks with multiple layers did not train very well, because the standard training mechanism "backpropagates" an error estimate, and it gets very diffuse as at goes backwards. So most of the training happened in the last layer or two.
This changed in 2006 with Hinton's invention of the Restricted Boltzman Machine, and someone else's insight that you can train one layer at a time using auto-associative methods.
"Deep Learning" / "Deep Architectures" has been around since then, so this article doesn't seem like much news. (However, it may be that someone is just now getting the kind of results that they've been expecting for years. Haven't read up on it very much.)
These methods may be giving ANN a third lease on life. Minsky & Papiert almost killed them off with their book on perceptrons in 1969[*], then Support Vector Machines nearly killed them again in the 1990s.
They keep coming back from the grave, presumably because of their phenomenal computational power and function-approximation capabilities.[**]
[*] FWIW, M&P's book shouldn't have done anything, since it was already known that networks of perceptrons don't have the limitations of a single perceptron.
[**] Siegelmann and Sontag put out a couple of papers, in the 1990s I think, showing that (a) you can construct a Turing Machine with an ANN that uses rational numbers for the weights, and (b) using real numbers (real, not floating-point) would give a trans-Turing capability.
Sheesh, evil *and* a jerk. -- Jade
In the past few years, a few things happened almost simultaneously:
1. New algorithms were invented for training of what previously was considered nearly impossible to train (biologically inspired recurrent neural networks, large, multilayer networks with tons of parameters, sigmoid belief networks, very large stacked restricted Boltzmann machines, etc).
2. Unlike before, there's now a resurgence of _probabilistic_ neural nets and unsupervised, energy-based models. This means you can have a very large multilayer net (not unlike e.g. visual cortex) figure out the features it needs to use _all on its own_, and then apply discriminative learning on top of those features. This is how Google recognized cats in Youtube videos.
3. Scientists have learned new ways to apply GPUs and large clusters of conventional computers. By "large" here I mean tens of thousands of cores, and week-long training cycles (during which some of the machines will die, without killing the training procedure).
4. These new methods do not require as much data as the old, and have far greater expressive power. Unsurprisingly, they are also, as a rule, far more complex and computationally intensive, especially during training.
As a result of this, HUGE gains were made in such "difficult" areas as object recognition in images, speech recognition, handwritten text (not just digits!) recognition, and in many more. And so far, there's no slowdown in sight. Some of these advances were made in the last month or two, BTW, so we're speaking about very recent events.
That said, a lot of challenges remain. Even today's large nets don't have the expressive power of even a small fraction of the brain, and moreover, the training at "brain" scale would be prohibitively expensive, and it's not even clear if it would work in the end. That said, neural nets (and DBNs) are again an area of very active research right now, with some brilliant minds trying to find answers to the fundamental questions.
If this momentum is maintained, and challenges are overcome, we could see machines getting A LOT smarter than they are today, surpassing human accuracy on a lot more of the tasks. They already do handwritten digit recognition and facial recognition better than humans.
The number of bits needed to represent an arbitrary real number exactly is infinite, but not uncountable.
The Tao of math: The numbers you can count are not the real numbers.
This is a very misleading metric. First, some not-insignificant number of the neurons in the brain are involved in non-cognitive computations. Muscle control, hormone regulation, kinesthesia, vision (not thinking about what is seen, but simply recognizing it), heart rates and other system regulation and so on.
Examples also exist of low-neuron (and synapse) count individuals who retain cognitive (and all other major) function; these examples cannot be explained away by "counting neurons."
We don't know which yet, but given that high neuron count has been ruled out as the single way to accommodate intelligence, we do know that we need to look to other mechanisms for human cognition. Structure, algorithm, other features known or unknown may be responsible for intelligence; and it may be that something entirely disjoint is responsible for the rise of intelligence; but we know it isn't simply high neuron count.
--fyngyrz (anon due to mod points)