Slashdot Mirror


Machine Learning Confronts the Elephant in the Room (quantamagazine.org)

A visual prank exposes an Achilles' heel of computer vision systems: Unlike humans, they can't do a double take. From a report: In a new study [PDF], computer scientists found that artificial intelligence systems fail a vision test a child could accomplish with ease. "It's a clever and important study that reminds us that 'deep learning' isn't really that deep," said Gary Marcus, a neuroscientist at New York University who was not affiliated with the work. The result takes place in the field of computer vision, where artificial intelligence systems attempt to detect and categorize objects. They might try to find all the pedestrians in a street scene, or just distinguish a bird from a bicycle (which is a notoriously difficult task). The stakes are high: As computers take over critical tasks like automated surveillance and autonomous driving, we'll want their visual processing to be at least as good as the human eyes they're replacing.

It won't be easy. The new work accentuates the sophistication of human vision -- and the challenge of building systems that mimic it. In the study, the researchers presented a computer vision system with a living room scene. The system processed it well. It correctly identified a chair, a person, books on a shelf. Then the researchers introduced an anomalous object into the scene -- an image of an elephant. The elephant's mere presence caused the system to forget itself: Suddenly it started calling a chair a couch and the elephant a chair, while turning completely blind to other objects it had previously seen.

"There are all sorts of weird things happening that show how brittle current object detection systems are," said Amir Rosenfeld, a researcher at York University in Toronto and co-author of the study along with his York colleague John Tsotsos and Richard Zemel of the University of Toronto. Researchers are still trying to understand exactly why computer vision systems get tripped up so easily, but they have a good guess. It has to do with an ability humans have that AI lacks: the ability to understand when a scene is confusing and thus go back for a second glance.

10 of 151 comments (clear)

  1. To be fair to AI by FilmedInNoir · · Score: 4, Funny

    If an elephant suddenly appeared in my room I'd lose my shit to.

    --
    Sig. Sig. Sputnik
    1. Re:To be fair to AI by OffTheLip · · Score: 3, Funny

      Tusk, tusk no need to worry...

    2. Re:To be fair to AI by sphealey · · Score: 4, Insightful

      A four-year-old wouldn't though: she would name the objects then say "why is there an elephant in the living room?".

    3. Re:To be fair to AI by Tablizer · · Score: 4, Insightful

      If an elephant suddenly appeared in my room I'd lose my shit to.

      Indeed, Republicans randomly showing up in my living-room makes me freak out too :-)

      Seriously, though, AI will have to be broken into more digestible and manageable chunks to be practical: a kind of hybrid between expert systems and neural nets. Letting neural nets do the entirety of processing is probably unrealistic for non-trivial tasks. AI needs dissect-able modularity to both split AI workers into coherent tasks, and to be able to "explain" to the end users (or juries) why the system made the decision it did.

      For example, a preliminary pass may try to identify individual objects in a scene, perhaps ignoring context at first. If say 70% look like house-hold objects and 30% look like jungle objects, then the system can try processing it further as either type (house-room versus jungle) to see which one is the most viable*. It's sort of an automated version of Occam's Razor.

      In game processing systems, such as automated chess, there are various back-tracking algorithms for exploring the possibilities (AKA "game tree candidates"). One can set various thresholds on how deep (long) to look at one possible game branch before giving up to look at another. It may do a summary (shallow) pass, and then explore the best candidates further.

      My sig (Table-ized A.I.) gives other similar examples using facial recognition.

      * In practice, individual items may have a "certainty grade list" such as: "Object X is a Couch: A-, Tiger: C+ Croissant sandwich: D". One can add up the category scores from all objects in the scene and then explore the top 2 or 3 categories further. If the summary conclusion is the scene is a room, then the rest of the objects can be interpreted in that context (assuming they have a viable "room" match in their certainty grade list.) In the elephant example, it can be labelled as either an anomaly, or maybe reinterpreted as a giant stuffed animal, per expert-system rules. (Hey, I want one of those.)

    4. Re:To be fair to AI by lgw · · Score: 3, Funny

      They think they're maybe bigger than the other bird, so of course it will change course to avoid them. They're playing chicken.

      --
      Socialism: a lie told by totalitarians and believed by fools.
  2. Expertise by JBMcB · · Score: 3, Informative

    These problems have been well known in AI circles for decades. The crappy tech media are finally catching on that marketing departments selling AI solutions maybe exaggerate the capabilities of their tech a twinge.

    --
    My Other Computer Is A Data General Nova III.
  3. Re:Deep learning isn't deep by The+Evil+Atheist · · Score: 5, Insightful

    So you're angry because they're trying to get funding for their work? You want them to research for free, and then only once they have something that can catch up to moving goalposts, THEN you'll have no problem funding them?

    --
    Those who do not learn from commit history are doomed to regress it.
  4. AI is different, and getting better every year by aberglas · · Score: 4, Insightful

    AI vision can do some things that no human can do. Quickly and accurately identify handwritten postcodes on envelopes was an early win. Matching colours happens at every paint shop.

    It is certainly not human capable, yet. But it has improved dramatically over the last decade, and is likely to do so. And tricks such as stereo vision, wider colour sense, and possibly Lidar help a lot.

    The one elephant example seems to be a shitty AI. There is a modern tendency to leave everything to a simplistic Artificial Neural Network, and then wonder why weird things can happen. Some symbolic reasoning is also required, ultimately.

    When AI approaches human capability, it will not lose its other abilities. So it will be far better than human vision, eventually.

    Ask yourself, when the computers can eventually program themselves, why would they want us around?

  5. limited concepts by DrYak · · Score: 4, Interesting

    it has probably seen an elephant, but probably not in a living room.

    and the net has probably a limited concept of the context.
    (the big gray blob with a leathery texture in the middle of aiving room is usy a sofa)

    cue in the recently published research about machine vision and sheeps
    (whenever the system sees white dot spread on a green scenery backfround, it says "sheep". even if it is white rocks sprinkled around the grass.
    this prompted the researcher to crowd-mine pictures of goats and sheeps doing unusual stuff. and whenever the CV net saw a fluffy texture, it assumed the most frequent word in that context, calling "dog" any fluffy texture carried by a human in their arms, and "cat" any fluffy texture on a kitchen table, even in case of a shpeherdess carrying a lamb, or a mischievous goat invading a kitchen)

    the thing is: CV Net are basically only at what they were trained for. if you give them something completely weird an unusual, they might reacg weirdly.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:limited concepts by AmiMoJo · · Score: 3, Insightful

      To be fair humans have trouble with this too. When we see things at a distance or in poor lighting our brains do a lot of assuming to help decide what it is. Something in an unusual context can often be confusing at first, as the brain goes for the most common and likely options first.

      One way to help with this is to train the AI to recognize when it is uncertain. A lot of effort goes in to getting high accuracy levels, but usually very little in to recognizing situations when the answer just isn't clear.

      The other thing that really helps humans is time. It's easier to determine a sheep from a rock when you see it move its head, or even just see its coat moving in the breeze. Static photos don't offer that additional information.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC