Slashdot Mirror


Research Highlights How AI Sees and How It Knows What It's Looking At

anguyen8 writes Deep neural networks (DNNs) trained with Deep Learning have recently produced mind-blowing results in a variety of pattern-recognition tasks, most notably speech recognition, language translation, and recognizing objects in images, where they now perform at near-human levels. But do they see the same way we do? Nope. Researchers recently found that it is easy to produce images that are completely unrecognizable to humans, but that DNNs classify with near-certainty as everyday objects. For example, DNNs look at TV static and declare with 99.99% confidence it is a school bus. An evolutionary algorithm produced the synthetic images by generating pictures and selecting for those that a DNN believed to be an object (i.e. "survival of the school-bus-iest"). The resulting computer-generated images look like modern, abstract art. The pictures also help reveal what DNNs learn to care about when recognizing objects (e.g. a school bus is alternating yellow and black lines, but does not need to have a windshield or wheels), shedding light into the inner workings of these DNN black boxes.

14 of 130 comments (clear)

  1. Automatic cars are just around the corner... by HornWumpus · · Score: 4, Funny

    Unfortunately they are wrapped around a tree; just around the corner. Mistook a bee 3 inches from the camera for a school bus.

    --
    John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    1. Re:Automatic cars are just around the corner... by peon_a-z,A-Z,0-9$_+! · · Score: 4, Interesting

      Everytime I see this topic appear on Slashdot (Last time) I think:

      You're putting a neural network (NN) through a classification process where it is fed this image as a "fixed input", where the input's constituent elements are constant, and you ask it to classify correctly the same way as a human would. The problem with this comparison is the human eye does not see a "constant" input stream; the eye captures a stream of images, each slightly skewed as your head moves and the images changes slightly. Based on this stream of slightly different images, the human identifies an object.

      However, in this research, time and again a "team" shows a "fault" in a NN by taking a single, nonvarying image input to a NN and calling it a "deep flaw in the image processing network", and I just get a feeling that they're doing it wrong.

      To your topic though: You better hope your car is not just taking one single still image and performing actions based on that. You better hope your car is taking a stream of images and making decisions, which would be a completely different class of problem than this.

    2. Re:Automatic cars are just around the corner... by reve_etrange · · Score: 3, Informative

      You better hope your car is not just taking one single still image and performing actions based on that.

      In fact, most of them don't use computer vision much at all. Google's self-driving car for example uses a rotating IR laser to directly measure its surrounds.

      --
      .: Semper Absurda :.
  2. Reverse OCR by yarbo · · Score: 5, Interesting

    Reminds me of the reverse OCR tumblr. It generates patterns of squiggles a human could never read but the OCR recognizes as a word.

    http://reverseocr.tumblr.com/

  3. seems a lot like human vision to me by shadowrat · · Score: 2

    idk, these results seem more similar to how humans see than they do different. When people don't know exactly what they are looking at, the brain just puts in it's best guess. people certainly see faces and other familiar objects in tv static. They see bigfoot in a collection of shadows or a strange angle on a bear. i even feel like i did sort of see a peacock in the one random image labeled peacock. it's sort of like the computer vision version of a rorschach test.

    1. Re:seems a lot like human vision to me by Beck_Neard · · Score: 2

      It might be similar but it's not the same mechanism. When you see an object in static, your brain knows that it's just making a guess so the guess is assigned low confidence. But here they showed that you can actually design a picture that looks random but is assigned very high confidence of being an object.

      This type of phenomenon is very well known. It's not news, people have known about this sort of stuff in artificial neural nets since the 80's. I guess they just sort of assumed that deep belief nets would get around this problem, but as far as I know there's no reason to believe that. There's a related phenomenon which is assigning very low confidence to a picture that is very clearly a certain class of object - and then if you add a small bit of noise the confidence goes way up. For the interested, this is a good page which explains why some of these issues happen: http://colah.github.io/posts/2...

      Just one thing I want to get off my chest: I wish this deep learning fad would die. I first started using deep belief nets around 2006 or so when Hinton published his now-infamous Science paper. I thought it was cool and used it a lot, but I knew it had limitations. Then around 2012 or so this whole thing just started becoming a hugely-hyped meme that everyone wants to get on board, without any knowledge or wisdom - they just want results. This is going to be a recipe for yet another AI "failure", when people realize that they couldn't live up to their own hype.

      --
      A fool and his hard drive are soon parted.
    2. Re:seems a lot like human vision to me by nine-times · · Score: 2

      When people don't know exactly what they are looking at, the brain just puts in it's best guess. people certainly see faces and other familiar objects in tv static. They see bigfoot in a collection of shadows or a strange angle on a bear.

      Yes, I think it's very interesting when you look at Figure 4 here. They almost look like they could be an artist's interpretation of the things they're supposed to be, or a similarity that a person might pick up on subconsciously. The ones that look like static may just be the AI "being stupid", but I think the comparison to human optical illusions is an interesting one. We see faces because we have a bias to see them. Faces are very important to participating in social activities, since they give many cues to another person's emotions and intentions. It's a whole form of communication. A lot of other sensory biases and reactions are related to things like finding food, avoiding predators, and understanding potentially dangerous obstacles (e.g. if I step here, am I going to fall down?).

      So if these are optical illusions for computers, what are the computer's biases based on? The computer isn't trying to find food or avoid predators, so what is it "trying to do" when it "sees"?

  4. Re:So, useless then? by TheCarp · · Score: 2

    > It's OK, if AI is this stupid, we need not worry about it taking over any time soon.

    If only that worked for congress.

    --
    "I opened my eyes, and everything went dark again"
  5. Also... by raftpeople · · Score: 2

    If the network was trained to always return a "best match" then it's working correctly. To return "no image", it would need to be trained to be able to return that, just like humans are given feedback when there is no image.

  6. Clickbait by preaction · · Score: 2

    a DNN is only interested in the parts of an object that most distinguish it from others.

    So it needs to learn that these exact images are tricks being played on it, so it can safely ignore it. This is exactly what machine learning is. What's the story?

  7. Re:This synopsis by babymac · · Score: 2

    But don't worry. I'm sure the armchair experts of Slashdot will be along any minute to tell us how this all just a bunch of hype and that the computers are stupid (I'm not disagreeing - for the moment) and AI is at least ten millions years away and will likely NEVER come to pass. Seriously though, I think a large portion of this site's users have their heads in the sand. I don't work in the field, but I am very interested in it and I read a lot of material from a lot of reputable sources. It seems to me that there are some very deep pockets out there treating this as a serious project and are determined to succeed. Personally, I think they will succeed and far sooner than almost everyone will expect. To have a huge impact, AI doesn't have to be perfect. It doesn't have to reason at a human level to be of use or have a noticeable effect on the economy. And once simpler forms of AI arrive, it will advance very rapidly. I think the folks here on Slashdot will be denying the possibility of such a thing right up until the day before they find themselves on the unemployment line. I think we (and our political leaders) should be preparing for a new economy today while there's still time. Otherwise, it'll be a catastrophe for the majority of working people and society at large.

    --
    "War makes me sad." - Me
  8. Image processing; LIDAR; ADAS perspective by volvox_voxel · · Score: 2

    I've done some image processing work.. It seems to me that you can take the output of this Neural network and correlate it with some other image processing routines, like feature detection, feature meteorology, etc; A conditional probability based decision chain,etc.

    I work on a LIDAR sensor meant for Anti-. I work at a start-up that makes 3D laser-radar vision sensors for robotics and autonomous vehicles /anti-collision avoidance. The other day, I learned that such sensors allow robots to augment their camera vision systems to have a better understanding of their environment. It turns out that it's still an unsolved problem for a computer vision systems to unambiguously recognize that it's looking at a bird or a cat, and can only give you probabilities.. A LIDAR sensor instantly gives you a depth measurement out to several hundred meters that you can correlate your images to . The computer can combine the color information, along with depth information to have a much better idea of what it's looking at. For an anti-collision avoidance system, it has to be certain what it's looking at, and that cameras alone aren't good enough. I find it pretty exciting to be working on something that is useful for AI (artificial intelligence) research. One guy I work with got his Ph.D using Microsoft's Kinect sensor, which is something that gives robots depth perception for close-up environments..

    “In the 60s, Marvin Minsky (a well known AI researcher from MIT, whom Isaac Asimov considered one of the smartest people he ever met) assigned a couple of undergrads to spend the summer programming a computer to use a camera to identify objects in a scene. He figured they'd have the problem solved by the end of the summer. Half a century later, we're still working on it.”

    http://imgs.xkcd.com/comics/ta...

  9. Re:This synopsis by Anguirel · · Score: 3, Interesting

    There's also a tremendous gap between what we consider complex and what we consider simple. For example, the brain is complex. However, individual elements of our brains are incredibly simple. Basic chemical reactions. Neurons firing or not. It's the sheer number of simultaneous simple pieces working together that makes it complex.

    Lots of simple AI algorithms all working together make the complexity. This isn't climbing a tree. It's one person poking at chemicals until they get high-energy combustible fuels, and another playing with paper to make paper airplanes better, and a third refining ceramics and metals to make them lighter and stronger and to handle different characteristics, and then they all get put together and you have a person on the moon.

    The illusion is that you think we need to make a leap to get from here to there. There's never a leap. It's lots of small simple steps that get you there.

    --
    ~Anguirel (lit. Living Star-Iron)
    QA: The art of telling someone that their baby is ugly without getting punched.
  10. Training classifiers require "rejectable" samples by StephenBenoit · · Score: 2

    The DNN examples were apparently trained to discriminate between a members of a labeled set. This only works when you have already cleaned up the input stream (a priori) and guarantee that the image must be an example of one of the classes.

    These classifiers were not trained on samples from outside the target set. This causes a forced choice: given this random dot image, which of the classes have the highest confidence? Iterate until confidence is sufficiently high, and you have a forgery with the same features the classifier is looking for.

    For example, the digit training set (0,1,2...9) would need to be augmented with pictures of 'A', 'D', a smiley face, a doodle of a tree, a silhouette of Alfred Hitchcock and some spider webs. The resulting classifier would be more robust. The target classes (0,1,2,...9) would be counterbalanced with a null class (everything else). Looking inside the receptive fields of a robust image classifier is rather satisfying: you will find eigenimages that project back to image structures that are human recognizable, too.

    The lesson in training your classifier is to either verify your assumption (all incoming samples must be a member of the chosen classes) or train (expose) your classifier to out-of-class samples.