Slashdot Mirror


Machine Learning Confronts the Elephant in the Room (quantamagazine.org)

A visual prank exposes an Achilles' heel of computer vision systems: Unlike humans, they can't do a double take. From a report: In a new study [PDF], computer scientists found that artificial intelligence systems fail a vision test a child could accomplish with ease. "It's a clever and important study that reminds us that 'deep learning' isn't really that deep," said Gary Marcus, a neuroscientist at New York University who was not affiliated with the work. The result takes place in the field of computer vision, where artificial intelligence systems attempt to detect and categorize objects. They might try to find all the pedestrians in a street scene, or just distinguish a bird from a bicycle (which is a notoriously difficult task). The stakes are high: As computers take over critical tasks like automated surveillance and autonomous driving, we'll want their visual processing to be at least as good as the human eyes they're replacing.

It won't be easy. The new work accentuates the sophistication of human vision -- and the challenge of building systems that mimic it. In the study, the researchers presented a computer vision system with a living room scene. The system processed it well. It correctly identified a chair, a person, books on a shelf. Then the researchers introduced an anomalous object into the scene -- an image of an elephant. The elephant's mere presence caused the system to forget itself: Suddenly it started calling a chair a couch and the elephant a chair, while turning completely blind to other objects it had previously seen.

"There are all sorts of weird things happening that show how brittle current object detection systems are," said Amir Rosenfeld, a researcher at York University in Toronto and co-author of the study along with his York colleague John Tsotsos and Richard Zemel of the University of Toronto. Researchers are still trying to understand exactly why computer vision systems get tripped up so easily, but they have a good guess. It has to do with an ability humans have that AI lacks: the ability to understand when a scene is confusing and thus go back for a second glance.

9 of 151 comments (clear)

  1. Deep learning isn't deep by 110010001000 · · Score: 2, Insightful

    Deep Learning isn't deep. And "Neural Networks" work nothing like a real neural network (a.k.a brain) does. They are all terms that "AI researchers" use to inflate their importance and to obtain funding for their work. The entire AI field is a massive joke, but now we have dropped some major taxpayer money on it so it isn't going away anytime soon.

    1. Re:Deep learning isn't deep by The+Evil+Atheist · · Score: 5, Insightful

      So you're angry because they're trying to get funding for their work? You want them to research for free, and then only once they have something that can catch up to moving goalposts, THEN you'll have no problem funding them?

      --
      Those who do not learn from commit history are doomed to regress it.
  2. Re:To be fair to AI by sphealey · · Score: 4, Insightful

    A four-year-old wouldn't though: she would name the objects then say "why is there an elephant in the living room?".

  3. Re:To be fair to AI by Tablizer · · Score: 4, Insightful

    If an elephant suddenly appeared in my room I'd lose my shit to.

    Indeed, Republicans randomly showing up in my living-room makes me freak out too :-)

    Seriously, though, AI will have to be broken into more digestible and manageable chunks to be practical: a kind of hybrid between expert systems and neural nets. Letting neural nets do the entirety of processing is probably unrealistic for non-trivial tasks. AI needs dissect-able modularity to both split AI workers into coherent tasks, and to be able to "explain" to the end users (or juries) why the system made the decision it did.

    For example, a preliminary pass may try to identify individual objects in a scene, perhaps ignoring context at first. If say 70% look like house-hold objects and 30% look like jungle objects, then the system can try processing it further as either type (house-room versus jungle) to see which one is the most viable*. It's sort of an automated version of Occam's Razor.

    In game processing systems, such as automated chess, there are various back-tracking algorithms for exploring the possibilities (AKA "game tree candidates"). One can set various thresholds on how deep (long) to look at one possible game branch before giving up to look at another. It may do a summary (shallow) pass, and then explore the best candidates further.

    My sig (Table-ized A.I.) gives other similar examples using facial recognition.

    * In practice, individual items may have a "certainty grade list" such as: "Object X is a Couch: A-, Tiger: C+ Croissant sandwich: D". One can add up the category scores from all objects in the scene and then explore the top 2 or 3 categories further. If the summary conclusion is the scene is a room, then the rest of the objects can be interpreted in that context (assuming they have a viable "room" match in their certainty grade list.) In the elephant example, it can be labelled as either an anomaly, or maybe reinterpreted as a giant stuffed animal, per expert-system rules. (Hey, I want one of those.)

  4. AI is different, and getting better every year by aberglas · · Score: 4, Insightful

    AI vision can do some things that no human can do. Quickly and accurately identify handwritten postcodes on envelopes was an early win. Matching colours happens at every paint shop.

    It is certainly not human capable, yet. But it has improved dramatically over the last decade, and is likely to do so. And tricks such as stereo vision, wider colour sense, and possibly Lidar help a lot.

    The one elephant example seems to be a shitty AI. There is a modern tendency to leave everything to a simplistic Artificial Neural Network, and then wonder why weird things can happen. Some symbolic reasoning is also required, ultimately.

    When AI approaches human capability, it will not lose its other abilities. So it will be far better than human vision, eventually.

    Ask yourself, when the computers can eventually program themselves, why would they want us around?

    1. Re:AI is different, and getting better every year by Anonymous Coward · · Score: 2, Insightful

      - Humans under age of 15 can see about 20% of moving objects in the traffic
      - In Human/Bicycle accidents the most common quote from driver is "I didn't see the bicycle" or "It came from nowhere"
      - There are a lot of optical illusions that fool humans

      It annoys me when humans are always presented as perfect things that can see, but AI should be able to handle every bizarre situation. If we have an AI that will hit an elephant on the road, there will still be zero accidents in Finland as there are no elephants here. For India roads, we probably need to train them to see the elephants and then there is again no problem. What ever is common enough, gets trained, what ever is rare enough, doesn't matter,

      because we would still be saving millions of lives. Even with hugely imperfect system, because it is just that good when compared to humans.

  5. Re: To be fair to AI by Anonymous Coward · · Score: 1, Insightful

    Many animals that fail a mirror test have managed to live for generations, catch pray and live well off the land. Don't be so fast

  6. Re:To be fair to AI by Anonymous Coward · · Score: 2, Insightful

    1. Crashing with other small birds is usually not dangerous.
    2. The "other bird" is a competitor. Fighting it (for territory/food/mating purposes) may be important. And that "other bird" seems kind of agressive too. Got to crash it, teach it a lesson (or get chased away).
    3. Bird crash avoidance protocol may have a simple rule like "when head-on, always turn left". Works when meeting another bird, not so much when meeting a mirror.

  7. Re:limited concepts by AmiMoJo · · Score: 3, Insightful

    To be fair humans have trouble with this too. When we see things at a distance or in poor lighting our brains do a lot of assuming to help decide what it is. Something in an unusual context can often be confusing at first, as the brain goes for the most common and likely options first.

    One way to help with this is to train the AI to recognize when it is uncertain. A lot of effort goes in to getting high accuracy levels, but usually very little in to recognizing situations when the answer just isn't clear.

    The other thing that really helps humans is time. It's easier to determine a sheep from a rock when you see it move its head, or even just see its coat moving in the breeze. Static photos don't offer that additional information.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC