Facebook Thinks Occlusion Is the Next Great Frontier For Image Recognition

← Back to Stories (view on slashdot.org)

Facebook Thinks Occlusion Is the Next Great Frontier For Image Recognition

Posted by samzenpus on Monday September 7, 2015 @10:25AM from the have-you-seen-this-man? dept.

An anonymous reader writes: Researchers at Facebook AI Research (FAIR) have published a paper contending that image recognition research is now advanced enough to consider the problem of occlusion, wherein the objects AI must identify are either partially cropped or partially hidden. Their solution is the predictably labor-expensive route of human annotation of existing image-set databases, in this case 'finishing off' occluded objects with vector outlines and assigning them a z-order. This article looks at the practical and even philosophical problems of getting IR algorithms to 'guess' objects usefully, and asks whether practical IR research might not be currently limited both by the use of over-specific image datasets and — in the field of neural networks — by problems of theory and limited 'local' processing power in critical real-time situations.

32 comments

Min score:

Reason:

Sort:

Facebook Thinks by Hognoxious · 2015-09-07 10:29 · Score: 0, Offtopic

It's a slashdot record! It's blatantly wrong, within the first two words of the title .

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
1. Re:Facebook Thinks by SuricouRaven · 2015-09-07 10:34 · Score: 2
  
  Facebook does conduct research into AI. They need such technology to more effectively mine their vast database for advertising information.
  Occlusion handling is the difference between 'subject identified as Joe Bloggs' and 'Subject identified as Joe Blogs wearing Adidas trainers and posing in front of a Skoda. Increase targeting of well-known fashion brands, decrease targeting for automotive products.'
2. Re:Facebook Thinks by ls671 · 2015-09-07 10:41 · Score: 1
  
  But...
  Will it be susceptible to optical illusions?
  
  --
  Everything I write is lies, read between the lines.
3. Re:Facebook Thinks by Shadow+of+Eternity · 2015-09-07 10:52 · Score: 2
  
  It will have to be. The ability to figure out what you're looking at with incomplete information is exactly what leads to optical illusions, you can't really have one without the other.
  
  --
  A bullet may have your name on it but splash damage is addressed "To whom it may concern."
4. Re:Facebook Thinks by ShanghaiBill · 2015-09-07 11:00 · Score: 1
  
  Will it be susceptible to optical illusions?
  Vision systems based on artificial neural nets are susceptible to many of the same optical illusions as people, and for mostly the same reasons. The basic vertebrate eye has been around for 530 million years. If optical illusions were easy to avoid, nature would have figured out a way to do it by now.
5. Re:Facebook Thinks by ls671 · 2015-09-07 11:45 · Score: 1
  
  Thank you, I am starting to work on countermeasure right away so I can keep my private life. I'll arrange so it thinks I am some politician, a giraffe, an SUV or something else. There is all kinds of illusionist shows on TV so it shouldn't be that hard.
  
  --
  Everything I write is lies, read between the lines.
6. Re:Facebook Thinks by Anonymous Coward · 2015-09-07 20:02 · Score: 0
  
  nature would have figured out a way to do it
  I disagree. There was no selective nor sexual pressure to evolve workarounds for optical illusions, because they virtually do not occur in natural environment.
7. Re:Facebook Thinks by Anonymous Coward · 2015-09-07 20:34 · Score: 0
  
  Or you could just not post pictures on facebook.
8. Re:Facebook Thinks by Anonymous Coward · 2015-09-08 03:30 · Score: 0
  
  I disagree. There was no selective nor sexual pressure to evolve workarounds for optical illusions, because they virtually do not occur in natural environment.
  Do you know how many animals use camouflage as either a hunting aid or as a defense mechanism?
i spy with my little eye by turkeydance · 2015-09-07 10:44 · Score: 1

you
They can't even find faces in pictures! by Anonymous Coward · 2015-09-07 10:48 · Score: 0

Especially black faces. Facebook is so racist. So racist. Their Republican rulers are racists. If they can't even find a face, how are they going to identify other objects, especially by looking for only parts of other objects.
1. Re:They can't even find faces in pictures! by Anonymous Coward · 2015-09-07 11:08 · Score: 1
  
  > Especially black faces
  This! I cook a lot and post pictures to Facebook. It can never find my face, but it thinks my stovetop is a face.
2. Re:They can't even find faces in pictures! by PopeRatzo · 2015-09-07 14:49 · Score: 1
  
  I cook a lot and post pictures to Facebook. It can never find my face, but it thinks my stovetop is a face.
  Thanks for the idea. I'm going to go right now and arrange two fried eggs and a strip of bacon in a smiley face and post it as my Facebook profile picture.
  
  --
  You are welcome on my lawn.
3. Re:They can't even find faces in pictures! by lucien86 · 2015-09-08 07:33 · Score: 1
  
  Funny thing is that virtually all AI vision systems have problems with black faces. It isn't human racism that is the cause or 'machine' racism, its the physics of cameras and optics and light itself. At least with modern HDR cameras it is a problem we have some hope of beating..
  
  --
  Below the speed of light Special Relativity is one of the most accurate theories in physics - above the speed of light..
Time should be used in occlusion problem by presidenteloco · 2015-09-07 10:59 · Score: 1

If a series of images is available and observer or target or intermediate objects are moving, occlusion will vary image to image and the nature of the delta portions should be highly informative for recognition. This requires an object/region re-identification subsystem.
Also, scene context statistics should be used, much as preceding utterances are used in speech recognition. Given that we've already recognized a situation type with this that and the other object-type in it in this (possibly dynamic) relation, what are the a priori probabilities for these other types of objects to occur in scene, and assess occluded objects against highest probability objects in situation type. Much more constrained/determined recognition problem in which pieces of objects might suffice to identify them.

--

Where are we going and why are we in a handbasket?
1. Re:Time should be used in occlusion problem by lucien86 · 2015-09-08 07:39 · Score: 1
  
  Doesn't matter how much research they do, this kind of vision only will only work if part of a Strong AI. The keyword is 'dynamic processing' and its pretty difficult even with Strong AI. I know, its a field I have worked on directly.
  
  --
  Below the speed of light Special Relativity is one of the most accurate theories in physics - above the speed of light..
Why supervised? by mobby_6kl · 2015-09-07 11:06 · Score: 1

So if regular object recognition is such a solved problem, why to they need people to manually prepare the images? I'd just take a normal image, recognize the objects, and then partially cover some of them to train their algorithm.
1. Re:Why supervised? by craighansen · 2015-09-07 14:59 · Score: 1
  
  Yes. One can synthetically create cropped images to train CNNs. Then if you recognize "person standing" in the left side of an image and "front end of commercially relevant automobile" in the right side of the image you can likely expect that this is a person standing in front of the automobile, unless the template for junkyard is also signalling recognition. Then you zero in on which of your friends is standing there, and try to get that friend to recommend to you that you need a new car just like his. Almost a solved problem, no?
Seems like lots of work ahead still by SuperKendall · 2015-09-07 11:10 · Score: 1

The use of vector completion and all is a good idea, but it seems systems like that would work better in conjunction with other techniques, like trying to consider context of the area where you are in. What is behind a tall narrow object varies a lot depending on if you are in a jungle vs. a parking garage...

--
"There is more worth loving than we have strength to love." - Brian Jay Stanley
makes sense, but don't tell them by raymorris · 2015-09-07 11:14 · Score: 1

That sure makes sense. Don't tell them, though. The inability of image recognition software to handle cropped pictures is one thing which my better replacement for CAPTHCA depends on. CAPTHCA sucks because humans aren't much better at computers at recognizing squiggly letters. We are, however, MUCH better at recognizing certain specifc types of images when they are cropped and rotated.
Enhance 34 to 46. by LMariachi · 2015-09-07 11:20 · Score: 3, Funny

Pull back. Wait a minute. Go right. Stop.
Enhance 57 to 19. Track 45 left. Stop.
Enhance 15 to 23.
Gimme a hard copy right there.
1. Re:Enhance 34 to 46. by craighansen · 2015-09-07 15:10 · Score: 1
  
  No only did that Decker have access to a plainly ridiculous level of zoom, when panning around, the perspective of the image changes, and object that were hidden from the original perspective appear. https://www.youtube.com/watch?... We're left having to assume that "enhance" operation can do wonders on an old snapshot, or that it's something of an old snapshot from a holographic Polaroid. It would sure make image occlusion an easier problem to solve.
2. Re:Enhance 34 to 46. by LMariachi · 2015-09-07 16:59 · Score: 1
  
  I figured the device was "looking around the corner" by extrapolating from visible reflections. A human can easily do that given a properly-placed mirror, even a curved or broken one, but a computer might be able to piece it together from distorted fragments around the room — a shiny doorknob here, a beercan there, a metallic light fixture up above. Sort of reverse raytracing?
3. Re:Enhance 34 to 46. by e5150 · 2015-09-07 20:09 · Score: 1
  
  Well, that's nothing compared to uncrop.
4. Re:Enhance 34 to 46. by Anonymous Coward · 2015-09-18 08:55 · Score: 0
  
  You can do this to a minor degree with a Lytro.
Detecting penises in vaginas and anuses. by Anonymous Coward · 2015-09-07 11:46 · Score: 0

The main application I see of this is in detecting pornographic images containing a penis that's partially embedded in, and obscured by, a vagina or an anus.
While thanks to technological advances it may be trivial to detect a fully erect and unobscured penis, detecting a penis that is only partially visible could very well be an extremely difficult problem.
It's difficult to censor such images when it's difficult to determine if they depict penetration to begin with.
1. Re:Detecting penises in vaginas and anuses. by PopeRatzo · 2015-09-07 14:51 · Score: 1
  
  While thanks to technological advances it may be trivial to detect a fully erect and unobscured penis, detecting a penis that is only partially visible could very well be an extremely difficult problem.
  Example:
  https://i.ytimg.com/vi/7I95IFw...
  
  --
  You are welcome on my lawn.
GA by Tablizer · 2015-09-07 16:37 · Score: 2

What about a kind of genetic algorithm to evolve candidate 3D models, and the model that best matches observations and context "wins". However, that is computationally intensive. But, it is highly parallelizable.

--
Table-ized A.I.
One word by Anonymous Coward · 2015-09-07 19:47 · Score: 0

Porn!
The ultimate goal by jandersen · 2015-09-07 19:53 · Score: 1

In fact, this won't stop at merely recognising faces that are partially obscured - in the not so distant future, they will be able to recognise faces that are completely absent!
So... by Anonymous Coward · 2015-09-07 21:16 · Score: 0

Things that are (partially) hidden behind others are more difficult to recognise?
News at eleven!
Enemy of the State by Anonymous Coward · 2015-09-07 22:37 · Score: 0

This is one scene in that film where every geek freaks out in annoyance before the explanation is given on how the system works. (admittedly there was also some artistic freedoms used in some areas because FILMS LOL)
The scene in question was the ability for the spooks to rotate around a scene virtually from one or more cameras (which we can do), but they took it to an extreme and could visualize areas that were impossible for any camera to see.
The system created a rough estimate of what a scene would probably look like just for the sake of completeness rather than accuracy.
Then, if they are lucky, a camera caught a view where there was a change in the layout of the scene, so it could be updated globally through the whole animation to create a scene of events that are reasonably accurate, given limited information.
So, that guy dropped a gameboy or something in to Will Smiths pocket at some point, one camera might have seen his bag at one angle, and another camera at another, and there was a difference in shape.
Even in the film though, they say it could be anything, even nothing. It might just be the bag was dislodged from a tense state, or the lights.
Predictive image recognition will be an interesting field for sure.