Machine Learning Confronts the Elephant in the Room (quantamagazine.org)
A visual prank exposes an Achilles' heel of computer vision systems: Unlike humans, they can't do a double take. From a report: In a new study [PDF], computer scientists found that artificial intelligence systems fail a vision test a child could accomplish with ease. "It's a clever and important study that reminds us that 'deep learning' isn't really that deep," said Gary Marcus, a neuroscientist at New York University who was not affiliated with the work. The result takes place in the field of computer vision, where artificial intelligence systems attempt to detect and categorize objects. They might try to find all the pedestrians in a street scene, or just distinguish a bird from a bicycle (which is a notoriously difficult task). The stakes are high: As computers take over critical tasks like automated surveillance and autonomous driving, we'll want their visual processing to be at least as good as the human eyes they're replacing.
It won't be easy. The new work accentuates the sophistication of human vision -- and the challenge of building systems that mimic it. In the study, the researchers presented a computer vision system with a living room scene. The system processed it well. It correctly identified a chair, a person, books on a shelf. Then the researchers introduced an anomalous object into the scene -- an image of an elephant. The elephant's mere presence caused the system to forget itself: Suddenly it started calling a chair a couch and the elephant a chair, while turning completely blind to other objects it had previously seen.
"There are all sorts of weird things happening that show how brittle current object detection systems are," said Amir Rosenfeld, a researcher at York University in Toronto and co-author of the study along with his York colleague John Tsotsos and Richard Zemel of the University of Toronto. Researchers are still trying to understand exactly why computer vision systems get tripped up so easily, but they have a good guess. It has to do with an ability humans have that AI lacks: the ability to understand when a scene is confusing and thus go back for a second glance.
It won't be easy. The new work accentuates the sophistication of human vision -- and the challenge of building systems that mimic it. In the study, the researchers presented a computer vision system with a living room scene. The system processed it well. It correctly identified a chair, a person, books on a shelf. Then the researchers introduced an anomalous object into the scene -- an image of an elephant. The elephant's mere presence caused the system to forget itself: Suddenly it started calling a chair a couch and the elephant a chair, while turning completely blind to other objects it had previously seen.
"There are all sorts of weird things happening that show how brittle current object detection systems are," said Amir Rosenfeld, a researcher at York University in Toronto and co-author of the study along with his York colleague John Tsotsos and Richard Zemel of the University of Toronto. Researchers are still trying to understand exactly why computer vision systems get tripped up so easily, but they have a good guess. It has to do with an ability humans have that AI lacks: the ability to understand when a scene is confusing and thus go back for a second glance.
Deep Learning isn't deep. And "Neural Networks" work nothing like a real neural network (a.k.a brain) does. They are all terms that "AI researchers" use to inflate their importance and to obtain funding for their work. The entire AI field is a massive joke, but now we have dropped some major taxpayer money on it so it isn't going away anytime soon.
A four-year-old wouldn't though: she would name the objects then say "why is there an elephant in the living room?".
Indeed, Republicans randomly showing up in my living-room makes me freak out too :-)
Seriously, though, AI will have to be broken into more digestible and manageable chunks to be practical: a kind of hybrid between expert systems and neural nets. Letting neural nets do the entirety of processing is probably unrealistic for non-trivial tasks. AI needs dissect-able modularity to both split AI workers into coherent tasks, and to be able to "explain" to the end users (or juries) why the system made the decision it did.
For example, a preliminary pass may try to identify individual objects in a scene, perhaps ignoring context at first. If say 70% look like house-hold objects and 30% look like jungle objects, then the system can try processing it further as either type (house-room versus jungle) to see which one is the most viable*. It's sort of an automated version of Occam's Razor.
In game processing systems, such as automated chess, there are various back-tracking algorithms for exploring the possibilities (AKA "game tree candidates"). One can set various thresholds on how deep (long) to look at one possible game branch before giving up to look at another. It may do a summary (shallow) pass, and then explore the best candidates further.
My sig (Table-ized A.I.) gives other similar examples using facial recognition.
* In practice, individual items may have a "certainty grade list" such as: "Object X is a Couch: A-, Tiger: C+ Croissant sandwich: D". One can add up the category scores from all objects in the scene and then explore the top 2 or 3 categories further. If the summary conclusion is the scene is a room, then the rest of the objects can be interpreted in that context (assuming they have a viable "room" match in their certainty grade list.) In the elephant example, it can be labelled as either an anomaly, or maybe reinterpreted as a giant stuffed animal, per expert-system rules. (Hey, I want one of those.)
Table-ized A.I.
AI vision can do some things that no human can do. Quickly and accurately identify handwritten postcodes on envelopes was an early win. Matching colours happens at every paint shop.
It is certainly not human capable, yet. But it has improved dramatically over the last decade, and is likely to do so. And tricks such as stereo vision, wider colour sense, and possibly Lidar help a lot.
The one elephant example seems to be a shitty AI. There is a modern tendency to leave everything to a simplistic Artificial Neural Network, and then wonder why weird things can happen. Some symbolic reasoning is also required, ultimately.
When AI approaches human capability, it will not lose its other abilities. So it will be far better than human vision, eventually.
Ask yourself, when the computers can eventually program themselves, why would they want us around?
Many animals that fail a mirror test have managed to live for generations, catch pray and live well off the land. Don't be so fast
1. Crashing with other small birds is usually not dangerous.
2. The "other bird" is a competitor. Fighting it (for territory/food/mating purposes) may be important. And that "other bird" seems kind of agressive too. Got to crash it, teach it a lesson (or get chased away).
3. Bird crash avoidance protocol may have a simple rule like "when head-on, always turn left". Works when meeting another bird, not so much when meeting a mirror.
To be fair humans have trouble with this too. When we see things at a distance or in poor lighting our brains do a lot of assuming to help decide what it is. Something in an unusual context can often be confusing at first, as the brain goes for the most common and likely options first.
One way to help with this is to train the AI to recognize when it is uncertain. A lot of effort goes in to getting high accuracy levels, but usually very little in to recognizing situations when the answer just isn't clear.
The other thing that really helps humans is time. It's easier to determine a sheep from a rock when you see it move its head, or even just see its coat moving in the breeze. Static photos don't offer that additional information.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC