Slashdot Mirror


Tiny, Blurry Pictures Find the Limits of Computer Image Recognition (arstechnica.com)

A new PNAS paper takes a look at just how different computer and human visual systems are. Humans can figure out that a mangled word "meant" something recognizable while a computer can't. Likewise with images: humans can piece together what a blurry image might depict based on small clues in the picture, where a computer would be at a loss. The authors of the PNAS paper used a set of blurry, tricky images to pinpoint the differences between computer vision models and the human brain.

They used pictures called "minimal recognizable configurations" (MIRCs) that were either so small or so low-resolution that any further reduction would prevent a person from being able to recognize them. The computer models did a better job after they were trained specifically on the MIRCs, but their accuracy was still low compared to human performance. The reason for this, the authors suggest, is that computers can't pick out the individual components of the image whereas humans can. This kind of interpretation is "beyond the capacities of current neural network models," the authors write.

50 comments

  1. Not one example? by nuckfuts · · Score: 5, Informative

    This story is rather lacking without a single example of what they're talking about.

    1. Re:Not one example? by SeaFox · · Score: 3, Funny

      Yeah, the author should ENHANCE this story a bit.

    2. Re:Not one example? by Dan+East · · Score: 2

      The story was full of those low resolution samples. They are just 1x1 images. And they're white.

      --
      Better known as 318230.
    3. Re:Not one example? by ShanghaiBill · · Score: 4, Informative

      Here is a page with some examples.

      Here is a PDF of the paper, which has more examples.

      I don't think it means much. Instead of showing that humans see better than computers, it really just shows that this one researcher is bad at programming computer vision systems. If he took his dataset, and made it a Kaggle Competition, I think someone would design a computer vision system that would do much better than his.

    4. Re:Not one example? by Anonymous Coward · · Score: 0

      Thank you. it's ironic that /, often rails against those ad-heavy slideshows, but this would be the perfect content for a slideshow. where's buzzfeed when you need them?

    5. Re:Not one example? by Kjella · · Score: 1

      Perhaps... but it looks like a tough act to follow. By decomposing the picture he shows how many different characteristics we use in combination on an ordinary image. The sharp drop-off shows how we latch on to one small defining feature and work our way backwards to the answer. Or maybe it's easier to argue in reverse, this here blob looks like anything. Add a bow here and it's a ship. Extract the neck up, it's a horse. Show a chin here, it's a suit. Several times it's about seeing the edge and thus then entire shape, like here's a wing and here's a leg.

      I must admit I don't know where the state of the art in computer vision is today. But the kind of "decompose and integrate" we see here would be rather impressive, like you don't compare "a horse" to other horses. You actually divide it up and say it has a horse's head, a horse's neck, a horse's legs, a horse's overall shape, they're all like voting for whether it's a horse or not... like say a centaur wouldn't be a bad match but a split vote, is this a human or is it a horse because parts of it score really well on one or the other.

      --
      Live today, because you never know what tomorrow brings
    6. Re:Not one example? by ShanghaiBill · · Score: 1

      But the kind of "decompose and integrate" we see here would be rather impressive, like you don't compare "a horse" to other horses. You actually divide it up and say it has a horse's head, a horse's neck, a horse's legs, a horse's overall shape, they're all like voting for whether it's a horse or not

      That is actually how convolutional neural networks work. They basically decompose the image into sub-images, and then vote. If a CNN was programmed for this task, and trained on plenty of data, I think it could do well, and very likely surpass human abilities. Just because these researchers were lousy at programming/training their NN, that does not mean NNs are fundamentally bad at it.

      I have done a lot of work in computer vision, and when I first used NNs back in the 1980s I was very unimpressed. They were computationally expensive, difficult to train, and produced very bad results. But I started using them again in 2014 and I was astonished at the progress. Today, of course, we have much faster computers, including GPUs that are very effective at running NNs. We also have way more training data, from online image databases. But we also have much better algorithms, like backprop, boosting, dropout, and autoencoders, that work with deep networks. NNs can do pretty well with fuzzy images and partial data. So I am not convinced that they cannot beat humans at the tasks described in this paper.

    7. Re:Not one example? by Anonymous Coward · · Score: 0

      Except it would not be great for a slideshow, as the linked page above clearly shows what is going on at a glance without having to click or wait for additional things to show. It works fine for a quick skim, and for going back for the details with minimal effort. Slideshows, especially ones with a lot of ads, still suck.

    8. Re:Not one example? by Anonymous Coward · · Score: 0

      .il? i ain't clicking that shit.

    9. Re:Not one example? by AK+Marc · · Score: 1

      The study doesn't look at false positives either. Throw in a picture of a brown paper bag, and the human will declare it an eye, while the computer will (correctly) not find it among the stated possibilities. Same with a squirrel called a horse. The human brain is really good at coming up with an answer, even if there isn't enough information for an answer. Computers are more likely to reject all answers, rather than settle on the wrong one because it seems more probable (at least the ones used in these types of games). So focusing on false negatives without allowing for the possibility of false positives seems like a poor experimental setup. Perhaps someone will look at the study and make a more useful and complete study based on it that will give more useful results.

    10. Re:Not one example? by wanax · · Score: 1

      They did consider false positives.. They had a catch category (see page 4, last par) and human's did extremely well at it (see 3rd par, page 5).

    11. Re:Not one example? by wanax · · Score: 2

      I'm a professional neuroscientist that specializes in vision research with a computational bent. They used all the main stream, state of the art, openly available object recognition algorithms currently in use. Computer vision is a huge market, with many applications, from the DoD to self-driving cars to image-based searches. I doubt some 5-figure prize is going to out perform the best algorithms several distinct industries and academia have managed to create while being funded to the tune of over a billion a year for the last 10 years or so.

      These are serious researchers. If you think you think you can get any type of computer vision that significantly outperforms humans on this type of task, there is a unicorn startup and multiple ultra-high profile publications awaiting you.

      And just FYI based on your further post: they used two types of convolutional neural nets (see Methods: Model Versions and Parameters).

    12. Re:Not one example? by burtosis · · Score: 1

      And this is why self driving cars today can't compete against distracted teenage drivers. For all the claims of how perfect computer vision and sensor fusion is, humans are far superior.

    13. Re:Not one example? by djinn6 · · Score: 1

      That's a completely different field though. Humans are so far above the minimum capability for driving, they put on music or radio to fight the boredom. Some go as far as doing makeup, eating, texting or calling other people. If everyone concentrated completely on driving, the accident rate would be practically zero. We might even be able to raise the speed limit by 50%.

      Software doesn't need to come close to human capability to be a great driver. It just need to be better than the 0.1% of the time when humans aren't even looking at the road. Being able to see twice as far at all angles, react instantly, never get distracted and never fall asleep gives self-driving cars a huge advantage. Yeah, it might not recognize the thing moving into it's path as a kid on a bicycle, but it doesn't need to. All it needs to do is see something blocking its way, then step on the brakes, and it can do that 100 times faster than a human.

    14. Re:Not one example? by Anonymous Coward · · Score: 0

      Self driving cars use no human input so I completely disagree they only need to be better than the 0.1% not paid attention to. They need to be better than average human drivers which, if you follow the state of the art, is so ridiculously in infancy we wont see computers best human divers for a long, long time in all typical aspects of driving. Computer aided driving is what you describe and it is viable and in vehicles today.

    15. Re: Not one example? by Anonymous Coward · · Score: 0

      Cars can and should cheat though. Between infared, sonar, radar and other imaging technology they should be able to cover the gap.

    16. Re:Not one example? by plover · · Score: 1

      Clearly you have never seen my uncle drive. Despite a lifetime of practice, at no point in his life could he have ever bested any of the current self-driving cars, never mind the advances we'll likely see in the next decade.

      Keep in mind there are still people on the roads who hold licenses that were granted before driving tests were required.

      --
      John
    17. Re:Not one example? by The+Raven · · Score: 1

      Not quite. The paper explains that the computer is different quantitatively than the humans. Specifically, computer vision systems degraded slowly with no hard cutoff, while human vision systems had hard cutoffs where a small degredation in the image (crop, blur) led to a big reduction in the number of accurate identifications on the Turk.

      This cliff was not present in computer vision models. Not just present at a more detailed level (ie, the computer failed earlier) but not present at all indicating that computer vision methods do not yet emulate the human visual processing method.

      --
      "I will trust Google to 'do no evil' until the founders no longer run it." Hello Alphabet.
  2. CSI by Anonymous Coward · · Score: 0

    Can't you just press enhance to make them perfect again? I saw it on CSI...

    1. Re: CSI by Anonymous Coward · · Score: 0

      Don't forget that some details will require multiple enhance presses.

    2. Re: CSI by Anonymous Coward · · Score: 0

      Zoom then enhance works even better, preferably after creating a GUI interface using Visual Basic.

  3. Disagree by Anonymous Coward · · Score: 0

    Simply use google to predict what they meant to say.

  4. Context by Anonymous Coward · · Score: 1

    The explanation is simple: context. We humans have many context information on our brains, very useful to infer knowledge from a wide range of noisy inputs (such as blurry pictures). If we train a computer to identify some aspects of blurry images *within specific context*, the computer will do a decent job.

    1. Re:Context by Anonymous Coward · · Score: 0

      There's no reason a neural network couldn't do the same.

      I wonder if these results are related to the similar experiments that have been done in ESP. Typically, if there are two isolated subjects and one is looking at a choice of one out of four postcards, the second individual can guess which card is being looked at (a 1/4 chance to get it right) and will typically be right about 1/3 of the time.

      The NSA prefers I not speculate as to the reasons.

    2. Re: Context by Anonymous Coward · · Score: 0

      Publication bias and/or sloppy statistics. Usually more of the latter. Persi Diaconis (magician and mathematician who literally ran away to join the circus) has done a whole load of work on probability, particularly analysing the design of ESP experiments.

    3. Re: Context by Anonymous Coward · · Score: 0

      That's easy to say in general, and certainly the rallying cry of the pseudo-skeptics, but set up the experiment yourself and I'll bet you see the same result. I'm not talking about ALL ESP studies, just one particular experiment. I agree that there are too many confounding variables in most cases (even in this case!) to make much more than vague hypotheses.

  5. Wonderful by Andurian · · Score: 1

    Wonderful. Now I'll be forced to look at really time blurry pics every time I want to do anything on the web.

  6. Link by Cow+Jones · · Score: 3, Informative

    This seems to be the project the article is talking about: http://www.wisdom.weizmann.ac.il/~dannyh/Mircs/mircs.html

    --

    Ah, arrogance and stupidity, all in the same package. How efficient of you. -- Londo Mollari
    1. Re:Link by Anonymous Coward · · Score: 2, Informative

      & paper http://www.pnas.org/content/early/2016/02/09/1513198113.full.pdf

  7. It's the retina, not the brain by Anonymous Coward · · Score: 2, Insightful

    Jerry Lettvin, in the 1960's did experiments on single optical nerve cells that showed how the retina itself enhances and discovers edges. Human vision is not a "pixel image", it's based on collecting and amplifying *edges* and differentials. Until the computer processing and the cameras, themselves, used for computer vision get this built in at the most basic levels of the CCD and immediate processing, a great deal of the most critical data is thrown out before any more sophisticated ""computer brain" can apply its algorithms.

    1. Re:It's the retina, not the brain by Anonymous Coward · · Score: 0

      I don't understand your point. Spatial discontinuities are easily detected from a pixel based image using various algorithms such as canny edge detection or a laplace transform. Temporal discontinuities are easily detected from the "difference image" from adjacent video frames. Are you advocating these algorithms be integrated in to the CCD's supporting silicon? They are not especially computationally intensive to generate as needed so I'm not sure if I understand what benefit you hope to see from this?

      I will say that this form of analysis is fundamental to classifiers such as Histogram of Oriented Gradients and Convolution Neural Networks, both of which are preferred algorithms for image recognition already.

      The advantage of pushing these "features" closer to the imaging sensor to reduce bandwidth usage is only beneficial to the extent that bandwidth and/or latency are bottlenecks. The tradeoffs there do justify some efforts to localize computation at the point of use from a systems engineering perspective which is why NVIDIA is pushing their Tegra chips for Autonomous vehicles, however in most cases: it is more efficient to transmit the image data unmolested to a server and do the computationally intensive work remotely.

      Based on your comment, I imagine you're as excited about the "Stentrode" as I am?

    2. Re:It's the retina, not the brain by Anonymous Coward · · Score: 0

      > I don't understand your point. Spatial discontinuities are easily detected from a pixel based image using various algorithms such as canny edge detection or a laplace transform.

      It's too late by then. The edge enhancement happens locally, among very small numbers of cells in the retina itself. By the time the image has been stored by a pixel based system and handed off for computer processing, edge detection that an eye does trivially and locally has already been lost in the inevitable "noise" in the digital quantization of the signal itself.

      > The advantage of pushing these "features" closer to the imaging sensor to reduce bandwidth usage

      There is an enormous difference between "reducing bandwidth" and "enhancing the relative features". All current digital camers and standard image capture systems lose edge detection in the A/D conversion, because they're measuring absolute digital measurements, not differences or edge detection. You can't just restore that computationally later, the data is *gone*.

      "Stentrode"? Looking now... About as likely to work for neuro-mechanical controllers as a screwdriver made from gummy worms The electrodes are too far from individual neurons. Placed in a blood vessel, they're flooded by electrically condiuctive flued, and partially blocked from adjacent nerves by the blood vessel wall. The larger the electrode, the more electrical noise from nearby neurons, which are all resting in a salty, fairly conductive fluid. Unfortunately, the smaller the electrode, the larger the equivalent of "thermal" noise obscuring your signals, and the less chance of isolating a genuine signal in any kind of useful time. They've had very similar issues with every kind of bio-electrical system ever made, including the Boston Arm with myo-electrical signals. *All* the neuroligally controlled systems have at best a 200 msec delay before any machine mochine can be generated due to the necessary filtering, the result of the combination of those two factors.

      Ideally, you want to get the electrodes embedded in cross-sectional areas of motor nerves, such as David Edell explored with chip silicon on transected nerves, to measure far more precise signals, and you can use what is effectively optimized analog pre-enhancement processing to replace things like the But you have to transect the nerve for that, so far, definitely not a good idea in brain tissue like the Stentrode attempts to reach. I can see it for gross motor control: you might be able to get a much better signal, but that will be partly from neurological retraining and restructuring to and feedback to make the neurological changes.

    3. Re:It's the retina, not the brain by Anonymous Coward · · Score: 0

      Modern decent quality sensors are dominated by thermal noise or even just Poisson noise in higher quality ones. You can digitize it below that noise floor, and nothing is lost because you didn't do some analysis technique before the digitization step. You then have available a lot more computation options depending on whatever you want to do with the signal.

  8. Seems wrong by Anonymous Coward · · Score: 0

    The problem is with the algorithms used, not the capabilities of computers. If done right, for this specific task at hand, a computer would beat a human every time. For example, a computer looking at a 2x2 pixel square image of a letter could compare it against what it knows every character scaled down to 2x2 looks like under various scaling algorithms, the brightness levels of the four available pixels, and tell you with very high accuracy what it's looking at. A human, on the other hand, would have no clue.

    1. Re:Seems wrong by plover · · Score: 1

      The problem is with the algorithms used, not the capabilities of computers. If done right, for this specific task at hand, a computer would beat a human every time. For example, a computer looking at a 2x2 pixel square image of a letter could compare it against what it knows every character scaled down to 2x2 looks like under various scaling algorithms, the brightness levels of the four available pixels, and tell you with very high accuracy what it's looking at. A human, on the other hand, would have no clue.

      For a specific task, sure, you can do all kinds of computer optimizations to make the recognition easier. But the experiment you are describing isn't valid in the general case where you have no idea what the context is. Have a look at the paper. These fragments of pictures could be the letter "Y" rendered in Arial Ultra Bold Italic on a white sign at twilight, an eagle in flight across a blue sky, or an X-ray of an artery. With nobody to say "this is from a font directory", or "this photo was taken outside by the river", the list of possibilities is just too large - doesn't matter if it's a person or a computer vision algorithm.

      --
      John
  9. Summarily Funny by Anonymous Coward · · Score: 0

    computers can't pick out the individual components of the image whereas humans can

    Yet in the realms of traditional machine vision the trick was employed long ago. When the deep networks came in the scene they discovered the same mechanism without assistance. Confusing, erroneous summaries be damned!

  10. Programming like stuff by Anonymous Coward · · Score: 0

    What I wonder is if is possible to create a computer virus that screws the machine so much that besides be used to host cat porn, but also make people sick, like a cancer machine or something.

    1. Re:Programming like stuff by Anonymous Coward · · Score: 0

      Damn, couldn't decide whether to attempt a systemd or Windows 10 joke here.

  11. I did the same thing by rsilvergun · · Score: 2

    when I was 13 and I liked it!

    --
    Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
  12. PNAS by Anonymous Coward · · Score: 1

    Uh-huh-huh-huh it says PNAS.

  13. I don't see what the problem is by Stephen+Gilbert · · Score: 1

    Just zoom and enhance.

  14. !! Called INTELLIGENCE and you can't program that by Anonymous Coward · · Score: 0

    Computers are not intelligent. People are. Most. Many. Okay, some. Okay, me!

  15. human recognition is very different by yes-but-no · · Score: 1

    To recognize a bald eagle, I don't need to be fed with a million eagle pictures; show me one flying eagle once or may be two or three times, it's done. Human visual cortex must be using ways of rotating a 3-D object and projecting how the object will appear if viewed in 2-D from different angles; also it can do simple scaling (bigger/smaller); and how color changes can affect [grey-scale/color].

    Machine learning takes a million eagle pictures and does something of a curve-fitting to know how far a new point is from current cluster of points. It has no idea of 3-D objects/what effect a rotation could do; etc. In this case it's like a brute-force method versus a more sophisticated algorithm.

    also a human will try to match the shown picture to whatever set of objects he knows already. e.g. a 3 year old who has learnt only say first 10 alphabets, if shown P may say it's D.. 'coz to him/her the closest match is D. computer vision may not do this.. because with a training set size running into millions.. potentially every class will claim a hit.

    1. Re:human recognition is very different by Anonymous Coward · · Score: 0

      Yup. As we're learning more about artificial neural networks, we discover that they work nothing like biological ones. There have been stories about this on Slashdot before.

  16. What is too blurry for the computer? by magarity · · Score: 1

    What the heck? I thought all you have to do is zoom and click 'enhance' and a computer can make a reasonably clear picture no matter how blurry the original.

  17. Image board ... by PPH · · Score: 1

    ... meme

    --
    Have gnu, will travel.
  18. So what I saw.. by Smirker · · Score: 1

    This human's neural network detected zoomed in faces with black bars covering their eyes, presumably out of shame, while they struggle with and guzzle a long, pinkish-red, penis-shaped object in their mouth.

    PDF (Page 2): http://www.pnas.org/content/ea...

    Excuse me for a moment.

  19. mIRC by Anonymous Coward · · Score: 0

    They can only use it for 30 days before paying up to Khaled Mardam-Bey.

  20. Re:!! Called INTELLIGENCE and you can't program th by Anonymous Coward · · Score: 0

    Computers are not intelligent. People are. Most. Many. Okay, some. Okay, me!

    Nice try, APK, but you lost that argument five years ago.