Nvidia GPU-Powered Autonomous Car Teaches Itself To See And Steer (networkworld.com)
An anonymous reader quotes a report from Network World discussing Nvidia's project called DAVE2, where their engineering team built a self-driving car with one camera, one Drive-PX embedded computer and only 72 hours of training data: Neural networks and image recognition applications such as self-driving cars have exploded recently for two reasons. First, Graphical Processing Units (GPU) used to render graphics in mobile phones became powerful and inexpensive. GPUs densely packed onto board-level supercomputers are very good at solving massively parallel neural network problems and are inexpensive enough for every AI researcher and software developer to buy. Second, large, labeled image datasets have become available to train massively parallel neural networks implemented on GPUs to see and perceive the world of objects captured by cameras. The Nvidia team trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. Nvidia's breakthrough is the autonomous vehicle automatically taught itself by watching how a human drove, the internal representations of the processing steps of seeing the road ahead and steering the autonomous vehicle without explicitly training it to detect features such as roads and lanes.
An NN is a low-dimensional approach to solving problems. But most non-trivial problems -- and driving a vehicle is an entirely representative proxy for such problems -- tend not to be uniformly low-dimensional. Many aspects come into play suddenly and unpredictably. Some might not be seen for years, or ever. But then again, they might. In order to deal with such things, more than low-dimensional problem solving is required. NN's can't do it. They're inherently limited.
No, they're not. The lack of proper response to an unseen before event (which is called bad generalization) can come from at least 2 things: a bad sampling for the training set (no pedestrian in the set), and the absence of transfer learning (use of another system trained on pedestrian to improve the first one). The first one is really easy to solve but time consuming, a lot of very bright people are working on the second.
Honestly, automatic driving by visual cues is not longer a challenging computer vision problem. It is still a challenging engineering problem, but all the scientific tools needed to solve it are already there, and they are being actively used to solve it.
There are a lot of computer vision tasks that are now to be considered as solved from the research point of view. Very few people seem to acknowledge the enormous progress that have been made in the field in the last 10 years. Even people working in the field. But it's there, and now the engineers have to use it.
Video of some good progressive thrash music