Tiny, Blurry Pictures Find the Limits of Computer Image Recognition (arstechnica.com)

← Back to Stories (view on slashdot.org)

Tiny, Blurry Pictures Find the Limits of Computer Image Recognition (arstechnica.com)

Posted by BeauHD on Saturday February 20, 2016 @06:19AM from the prescription-lenses dept.

A new PNAS paper takes a look at just how different computer and human visual systems are. Humans can figure out that a mangled word "meant" something recognizable while a computer can't. Likewise with images: humans can piece together what a blurry image might depict based on small clues in the picture, where a computer would be at a loss. The authors of the PNAS paper used a set of blurry, tricky images to pinpoint the differences between computer vision models and the human brain.

They used pictures called "minimal recognizable configurations" (MIRCs) that were either so small or so low-resolution that any further reduction would prevent a person from being able to recognize them. The computer models did a better job after they were trained specifically on the MIRCs, but their accuracy was still low compared to human performance. The reason for this, the authors suggest, is that computers can't pick out the individual components of the image whereas humans can. This kind of interpretation is "beyond the capacities of current neural network models," the authors write.

26 of 50 comments (clear)

Min score:

Reason:

Sort:

Not one example? by nuckfuts · 2016-02-20 06:23 · Score: 5, Informative

This story is rather lacking without a single example of what they're talking about.
1. Re:Not one example? by SeaFox · 2016-02-20 06:38 · Score: 3, Funny
  
  Yeah, the author should ENHANCE this story a bit.
2. Re:Not one example? by Dan+East · 2016-02-20 06:40 · Score: 2
  
  The story was full of those low resolution samples. They are just 1x1 images. And they're white.
  
  --
  Better known as 318230.
3. Re:Not one example? by ShanghaiBill · 2016-02-20 07:35 · Score: 4, Informative
  
  Here is a page with some examples.
  Here is a PDF of the paper, which has more examples.
  I don't think it means much. Instead of showing that humans see better than computers, it really just shows that this one researcher is bad at programming computer vision systems. If he took his dataset, and made it a Kaggle Competition, I think someone would design a computer vision system that would do much better than his.
4. Re:Not one example? by Kjella · 2016-02-20 08:35 · Score: 1
  
  Perhaps... but it looks like a tough act to follow. By decomposing the picture he shows how many different characteristics we use in combination on an ordinary image. The sharp drop-off shows how we latch on to one small defining feature and work our way backwards to the answer. Or maybe it's easier to argue in reverse, this here blob looks like anything. Add a bow here and it's a ship. Extract the neck up, it's a horse. Show a chin here, it's a suit. Several times it's about seeing the edge and thus then entire shape, like here's a wing and here's a leg.
  I must admit I don't know where the state of the art in computer vision is today. But the kind of "decompose and integrate" we see here would be rather impressive, like you don't compare "a horse" to other horses. You actually divide it up and say it has a horse's head, a horse's neck, a horse's legs, a horse's overall shape, they're all like voting for whether it's a horse or not... like say a centaur wouldn't be a bad match but a split vote, is this a human or is it a horse because parts of it score really well on one or the other.
  
  --
  Live today, because you never know what tomorrow brings
5. Re:Not one example? by ShanghaiBill · 2016-02-20 09:06 · Score: 1
  
  But the kind of "decompose and integrate" we see here would be rather impressive, like you don't compare "a horse" to other horses. You actually divide it up and say it has a horse's head, a horse's neck, a horse's legs, a horse's overall shape, they're all like voting for whether it's a horse or not
  That is actually how convolutional neural networks work. They basically decompose the image into sub-images, and then vote. If a CNN was programmed for this task, and trained on plenty of data, I think it could do well, and very likely surpass human abilities. Just because these researchers were lousy at programming/training their NN, that does not mean NNs are fundamentally bad at it.
  I have done a lot of work in computer vision, and when I first used NNs back in the 1980s I was very unimpressed. They were computationally expensive, difficult to train, and produced very bad results. But I started using them again in 2014 and I was astonished at the progress. Today, of course, we have much faster computers, including GPUs that are very effective at running NNs. We also have way more training data, from online image databases. But we also have much better algorithms, like backprop, boosting, dropout, and autoencoders, that work with deep networks. NNs can do pretty well with fuzzy images and partial data. So I am not convinced that they cannot beat humans at the tasks described in this paper.
6. Re:Not one example? by AK+Marc · 2016-02-20 16:00 · Score: 1
  
  The study doesn't look at false positives either. Throw in a picture of a brown paper bag, and the human will declare it an eye, while the computer will (correctly) not find it among the stated possibilities. Same with a squirrel called a horse. The human brain is really good at coming up with an answer, even if there isn't enough information for an answer. Computers are more likely to reject all answers, rather than settle on the wrong one because it seems more probable (at least the ones used in these types of games). So focusing on false negatives without allowing for the possibility of false positives seems like a poor experimental setup. Perhaps someone will look at the study and make a more useful and complete study based on it that will give more useful results.
  
  --
  Learn to love Alaska
7. Re:Not one example? by wanax · 2016-02-20 19:06 · Score: 1
  
  They did consider false positives.. They had a catch category (see page 4, last par) and human's did extremely well at it (see 3rd par, page 5).
8. Re:Not one example? by wanax · 2016-02-20 19:27 · Score: 2
  
  I'm a professional neuroscientist that specializes in vision research with a computational bent. They used all the main stream, state of the art, openly available object recognition algorithms currently in use. Computer vision is a huge market, with many applications, from the DoD to self-driving cars to image-based searches. I doubt some 5-figure prize is going to out perform the best algorithms several distinct industries and academia have managed to create while being funded to the tune of over a billion a year for the last 10 years or so.
  These are serious researchers. If you think you think you can get any type of computer vision that significantly outperforms humans on this type of task, there is a unicorn startup and multiple ultra-high profile publications awaiting you.
  And just FYI based on your further post: they used two types of convolutional neural nets (see Methods: Model Versions and Parameters).
9. Re:Not one example? by burtosis · 2016-02-21 17:00 · Score: 1
  
  And this is why self driving cars today can't compete against distracted teenage drivers. For all the claims of how perfect computer vision and sensor fusion is, humans are far superior.
10. Re:Not one example? by djinn6 · 2016-02-21 19:47 · Score: 1
  
  That's a completely different field though. Humans are so far above the minimum capability for driving, they put on music or radio to fight the boredom. Some go as far as doing makeup, eating, texting or calling other people. If everyone concentrated completely on driving, the accident rate would be practically zero. We might even be able to raise the speed limit by 50%.
  
  Software doesn't need to come close to human capability to be a great driver. It just need to be better than the 0.1% of the time when humans aren't even looking at the road. Being able to see twice as far at all angles, react instantly, never get distracted and never fall asleep gives self-driving cars a huge advantage. Yeah, it might not recognize the thing moving into it's path as a kid on a bicycle, but it doesn't need to. All it needs to do is see something blocking its way, then step on the brakes, and it can do that 100 times faster than a human.
11. Re:Not one example? by plover · 2016-02-22 09:38 · Score: 1
  
  Clearly you have never seen my uncle drive. Despite a lifetime of practice, at no point in his life could he have ever bested any of the current self-driving cars, never mind the advances we'll likely see in the next decade.
  Keep in mind there are still people on the roads who hold licenses that were granted before driving tests were required.
  
  --
  John
12. Re:Not one example? by The+Raven · 2016-02-23 13:23 · Score: 1
  
  Not quite. The paper explains that the computer is different quantitatively than the humans. Specifically, computer vision systems degraded slowly with no hard cutoff, while human vision systems had hard cutoffs where a small degredation in the image (crop, blur) led to a big reduction in the number of accurate identifications on the Turk.
  This cliff was not present in computer vision models. Not just present at a more detailed level (ie, the computer failed earlier) but not present at all indicating that computer vision methods do not yet emulate the human visual processing method.
  
  --
  "I will trust Google to 'do no evil' until the founders no longer run it." Hello Alphabet.
Context by Anonymous Coward · 2016-02-20 06:35 · Score: 1

The explanation is simple: context. We humans have many context information on our brains, very useful to infer knowledge from a wide range of noisy inputs (such as blurry pictures). If we train a computer to identify some aspects of blurry images *within specific context*, the computer will do a decent job.
Wonderful by Andurian · 2016-02-20 06:43 · Score: 1

Wonderful. Now I'll be forced to look at really time blurry pics every time I want to do anything on the web.
Link by Cow+Jones · 2016-02-20 06:44 · Score: 3, Informative

This seems to be the project the article is talking about: http://www.wisdom.weizmann.ac.il/~dannyh/Mircs/mircs.html

--

Ah, arrogance and stupidity, all in the same package. How efficient of you. -- Londo Mollari
1. Re:Link by Anonymous Coward · 2016-02-20 06:50 · Score: 2, Informative
  
  & paper http://www.pnas.org/content/early/2016/02/09/1513198113.full.pdf
It's the retina, not the brain by Anonymous Coward · 2016-02-20 06:47 · Score: 2, Insightful

Jerry Lettvin, in the 1960's did experiments on single optical nerve cells that showed how the retina itself enhances and discovers edges. Human vision is not a "pixel image", it's based on collecting and amplifying *edges* and differentials. Until the computer processing and the cameras, themselves, used for computer vision get this built in at the most basic levels of the CCD and immediate processing, a great deal of the most critical data is thrown out before any more sophisticated ""computer brain" can apply its algorithms.
I did the same thing by rsilvergun · 2016-02-20 07:03 · Score: 2

when I was 13 and I liked it!

--
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
PNAS by Anonymous Coward · 2016-02-20 07:46 · Score: 1

Uh-huh-huh-huh it says PNAS.
I don't see what the problem is by Stephen+Gilbert · 2016-02-20 08:05 · Score: 1

Just zoom and enhance.
human recognition is very different by yes-but-no · 2016-02-20 08:41 · Score: 1

To recognize a bald eagle, I don't need to be fed with a million eagle pictures; show me one flying eagle once or may be two or three times, it's done. Human visual cortex must be using ways of rotating a 3-D object and projecting how the object will appear if viewed in 2-D from different angles; also it can do simple scaling (bigger/smaller); and how color changes can affect [grey-scale/color].

Machine learning takes a million eagle pictures and does something of a curve-fitting to know how far a new point is from current cluster of points. It has no idea of 3-D objects/what effect a rotation could do; etc. In this case it's like a brute-force method versus a more sophisticated algorithm.

also a human will try to match the shown picture to whatever set of objects he knows already. e.g. a 3 year old who has learnt only say first 10 alphabets, if shown P may say it's D.. 'coz to him/her the closest match is D. computer vision may not do this.. because with a training set size running into millions.. potentially every class will claim a hit.
What is too blurry for the computer? by magarity · 2016-02-20 09:09 · Score: 1

What the heck? I thought all you have to do is zoom and click 'enhance' and a computer can make a reasonably clear picture no matter how blurry the original.
Image board ... by PPH · 2016-02-20 09:41 · Score: 1

... meme

--
Have gnu, will travel.
So what I saw.. by Smirker · 2016-02-20 19:18 · Score: 1

This human's neural network detected zoomed in faces with black bars covering their eyes, presumably out of shame, while they struggle with and guzzle a long, pinkish-red, penis-shaped object in their mouth.
PDF (Page 2): http://www.pnas.org/content/ea...
Excuse me for a moment.
Re:Seems wrong by plover · 2016-02-22 09:56 · Score: 1

The problem is with the algorithms used, not the capabilities of computers. If done right, for this specific task at hand, a computer would beat a human every time. For example, a computer looking at a 2x2 pixel square image of a letter could compare it against what it knows every character scaled down to 2x2 looks like under various scaling algorithms, the brightness levels of the four available pixels, and tell you with very high accuracy what it's looking at. A human, on the other hand, would have no clue.
For a specific task, sure, you can do all kinds of computer optimizations to make the recognition easier. But the experiment you are describing isn't valid in the general case where you have no idea what the context is. Have a look at the paper. These fragments of pictures could be the letter "Y" rendered in Arial Ultra Bold Italic on a white sign at twilight, an eagle in flight across a blue sky, or an X-ray of an artery. With nobody to say "this is from a font directory", or "this photo was taken outside by the river", the list of possibilities is just too large - doesn't matter if it's a person or a computer vision algorithm.

--
John