Slashdot Mirror


Breakthrough In Face Recognition Software

An anonymous reader writes: Face recognition software underwent a revolution in 2001 with the creation of the Viola-Jones algorithm. Now, the field looks set to dramatically improve once again: computer scientists from Stanford and Yahoo Labs have published a new, simple approach that can find faces turned at an angle and those that are partially blocked by something else. The researchers "capitalize on the advances made in recent years on a type of machine learning known as a deep convolutional neural network. The idea is to train a many-layered neural network using a vast database of annotated examples, in this case pictures of faces from many angles. To that end, Farfade and co created a database of 200,000 images that included faces at various angles and orientations and a further 20 million images without faces. They then trained their neural net in batches of 128 images over 50,000 iterations. ... What's more, their algorithm is significantly better at spotting faces when upside down, something other approaches haven't perfected."

30 of 142 comments (clear)

  1. Upside Down? by Anonymous Coward · · Score: 5, Insightful

    "What's more, their algorithm is significantly better at spotting faces when upside down, something other approaches haven't perfected."

    Add this step: Rotate the image and run the algorithm each x degrees. What am I missing?

    1. Re:Upside Down? by kekx · · Score: 5, Insightful

      Performance.

    2. Re:Upside Down? by idontusenumbers · · Score: 2

      False positives

    3. Re:Upside Down? by binarybum · · Score: 2

      We are finally going to catch this guy!! - http://img.izismile.com//img/i...

      (the problem is the background - your brain is very good at understanding what upside-down means, but an algorithm trained by seeing tons of right-sided up images only understands that a silo is rounded on top and straight on the bottom. - The question I have, is what are the pratical implications of all the extra processing power that might take? Finally figuring out who that gymnast was from that circ-du-soleil screenshot? )

      --
      ôó
    4. Re:Upside Down? by Anonymous Coward · · Score: 5, Interesting

      As someone who literally works on face detection/tracking software on low power ARMv7/8 CPUs, I can safely say you are dead wrong.

      Assuming width==height (not likely given any current video formats or cameras), and assuming width%8 == 0 - it's a simple transposition of the rows and/or columns to do +/- 90/180 degrees, yes - and assuming you can fit your ENTIRE image in L1 cache you're going to incur minimal stalls (especially with an SoC that has a decent prefetch engine).

      In reality:
        * width != height
        * width is however typically divisible by 8 so you can do pure NEON (not hybrid NEON + ALU/VFP) transpositions
        * an 8bit grayscale VGA (640x480) image doesn't even fit in L1 cache, let alone a 720/1080p format (though most CV applications scale things down significantly, you tend to work at 320x180 - but that still doesn't fit in most L1 caches, although it does fit in 'some')
        * L2 cache hits are dozens of cycles, L2 cache misses are HUNDREDS of cycles
        * A real world case of rotating a 320x180 image takes ~2ms on a 700Mhz Cortex A9, that is not 'practically zero', that's 12% of your processing time at 60Hz - 36% of your processing time if you're going to rotate 3 times.

      (Note: using 700Mhz Cortex A9 as an example as that's typical in automotive hardware systems we typically deal with, although the last 2 years has brought ~1-1.5Ghz A15's into the mix - though most of those cars aren't even on the market yet)

    5. Re:Upside Down? by mcrbids · · Score: 2

      There's lots that you are missing.

      The issue isn't the input data, it's the processing method. The processing method mentioned here as "revolutionary" is just about exactly the method that Raymond Kurzweil posited: a hierarchy of "nodules" that pattern match on a cascading network of pattern matches....

      We're living with a modern-day Turing. Do we give him ample credit?

      --
      I have no problem with your religion until you decide it's reason to deprive others of the truth.
    6. Re:Upside Down? by Anonymous Coward · · Score: 3, Informative

      'Performance' is indeed an ambiguous term, it can refer to accuracy (RMS error of the detection results and false positive/negative rates in most cases) and it can also refer to speed (which I'm biased to thinking of as a programmer).

      I've never seen both meanings used in some combined metric, from an algorithmic perspective you tend to only care about accuracy as the 'performance' metric - and from a production perspective you (typically) care about 'speed' as the performance metric.

      On most ARM systems, you're correct - memory is almost always the bottleneck (ARMv7 processors are actually quite fast, IF you can keep them processing instructions every cycle, which is very hard if not impossible depending on the algorithm). Memory allocations take tens to hundreds of microseconds (scales up to milliseconds sometimes for large allocations, depending on the memory configuration and how the SoC's memory controller is designed), and loads from DDR (not from cache) take hundreds of cycles or more, and if they're an immediate dependency of the code (Eg: you issue a load into a register, which is used a few cycles ahead) - you're stalling your entire core for however long the memory controller takes to bring it through the MMU (into cache) and finally into your register.

      This is compared to a typical desktop/laptop with DDR frequencies typically over double that of LPDDR, and L2 caches that are large enough to run entire applications in (without ever touching DDR) in some cases... and when all else fails, gigantic x86 CISC cores who don't have to stall the entire pipeline when waiting on a DDR load and can opportunistically 'continue' processing code further down the road while it waits for memory.

      Meanwhile, if the core wasn't stalled from a load - hundreds of cycles can typically 'process' (various algorithms fit into this footprint) hundreds of pixels using NEON instructions against 8bit (or dozens of 32bit) pixels - so hopefully that puts it into perspective (a single memory stall can cost you the time it takes to process 10-100 pixels, very roughly) - and most algorithms do 1+ loads (that may stall depending on how things are written/prefetched/etc) per N pixels in NEON code.

      I know we internally make our researchers very much aware of the pros/cons of various algorithmic approaches (they're vaguely aware that gather/scatter memory operations are hard/impossible to optimise - so things like doing too many histograms are avoided if possible, they're aware of alignment considerations, they're aware memory loads/stores/allocations are extremely costly, and most of them even know a bit of SIMD, though most don't care enough to understand cycle counts of instructions / cache considerations / how prefetching works / why branching isn't ideal in some micro-archs / etc) - but from their perspective, they still 'just' care about accuracy - not performance (they'll generall design for performance, and make trade offs for performance if it's significant - but generally they're aiming for accuracy).

      When their research code hits our (programmer) desks, we tend to care about speed - while holding true to the original algorithm.

      Side note: You may want to forgive your phone OS when it feels slow next time, it's got apps trying to run 'with' an OS (or worse, java vm) constantly clobbering the L1/L2 data caches - we have a hard enough time in the automotive industry running baremetal code or very lean RTOS'.

  2. This is supposed to be a good thing? by Snotnose · · Score: 4, Insightful

    For every "terrorist" they track through the mall, how many ordinary Joes like me who like their privacy are also tracked and stored in huge databases for all time?

    1. Re:This is supposed to be a good thing? by Scorpinox · · Score: 2

      Yeah, I was surprised there was no mention of the huge privacy implications this has. But hey, maybe this'll reduce the number of IDs and RFID cards you have to carry around since it'll be so easy to identify and track you when you're just walking around.

    2. Re:This is supposed to be a good thing? by RoknrolZombie · · Score: 4, Insightful

      All of them.

    3. Re:This is supposed to be a good thing? by Anonymous Coward · · Score: 4, Insightful

      I think it's pretty well understood that there *are* terrorists...

      Yes.

      ... and a lot of them ...

      By almost every measure: No.

      ...and they're walking among us.

      For virtually every useful North American or Western European definition of 'us': No.

    4. Re:This is supposed to be a good thing? by Jack+Griffin · · Score: 5, Insightful

      I think it's pretty well understood that there *are* terrorists and a lot of them and they're walking among us.

      I disagree with this statement. If there were even a handful of real terrorists amongst us, there'd be blood in the streets. Seriously, if you really are hell bent on murdering infidels, it's not hard to drive a bus into a pack of school children, or carry a tin of petrol and a lighter into your nearest train station. That's the nature of terrorism, it is so trivial to execute that the threat is equally trivial to measure. See the history of the IRA for real world examples.

    5. Re:This is supposed to be a good thing? by retroworks · · Score: 3, Informative
      "For every "terrorist" they track through the mall, how many ordinary Joes like me who like their privacy are also tracked and stored in huge databases for all time?"

      Indeed, all of them.

      Have you noticed you can go into Best Buy or Staples, pick up a camera or look at a printer you never searched for online, and you find ads for the device on Facebook? Didn't notice? Give it a try. It's far beyond this 2013 (minority) report http://www.businessinsider.com...

      --
      Gently reply
    6. Re:This is supposed to be a good thing? by viperidaenz · · Score: 2

      It's recognition, not identification.
      As in a yes/no if an image contains a face. No who is in the image.

  3. Spike boots by Tablizer · · Score: 5, Funny

    What's more, their algorithm is significantly better at spotting faces when upside down

    Rats, there goes my ceiling-walking bank-robbery plans.

    1. Re:Spike boots by BarbaraHudson · · Score: 2

      If you wore a mask that made your face not look like a face, it will ignore you.

      --
      "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
    2. Re:Spike boots by Jack+Griffin · · Score: 2

      Or a mask with someone else's face on it, Or a T-shirt with a few faces on it, or a baseball cap, or a burqha...

    3. Re:Spike boots by hughperkins · · Score: 3, Informative

      Yes, check this out 'High Confidence Predictions for Unrecognizable Images', by Nguyen, Yosinkski and Clune, http://arxiv.org/abs/1412.1897 . It's a paper that shows an image that the net is 99.99% sure is an electric guitar, but looks nothing like :-)

      For the technically minded, the paper's authors propose that the reason is that the network is using a discriminative model, rather than a generative model. That means that the network learns a mathematical boundary that separates the images that it sees, in some kind of high-dimensional transformed space. It doesn't learn how to generate such new images, ie, you cant ask it 'draw me an electric guitar' :-) Maybe in a few years :-)

      The authors don't compare the network too much with the human brain though, ie, are they saying that the human brain is using a generative model? Is that why the human brain doesn't see a white noise picture, and claim it's a horse?

    4. Re:Spike boots by ceoyoyo · · Score: 2

      There are two popular types of deep ANN at the moment: restricted Boltzmann machines and auto-encoders. RBMs are generative. Autoencoders can also be generative if you train them in a particular way, which works much better so most people train them that way anyway. So you can take an ANN and ask it to draw you a picture of a guitar.

      I disagree with the authors of that paper. It seems more likely to me that they've cherry picked particular examples that fool their particular ANN. That's pretty easy to do for humans too - Google "optical illusion." As you point out, there's also the white noise trick. Show a group of people an image of white noise and they'll find all sorts of things in it. Particularly if you ask "you guys don't see the dragon?"

  4. Re:so breakthrough by ceoyoyo · · Score: 2

    There wasn't a good algorithm for training general deep ANNs until 2006, although convolutional neural networks were an exception to that. It's likely nobody tried it before because computers weren't fast enough and the discovery of layer-wise unsupervised training hadn't made deep networks popular yet.

  5. And you can't opt out by number17 · · Score: 2

    The grocery store or ATM, with cameras all over the place, could do it by simply having a sign at the front of the store that says CCTV. They could record your picture at the register and associate it with a bank card or credit card. After 5 transactions they could guarantee that the person using that card has your face and is likely the owner. They could then flag how many times somebody else uses your card. They could track you throughout the store, like they do now but associated with an individual. Stores would have cameras at the entrances and exits. They would know how many people are currently in the store and who they are. They can't track though, I use cash, right?

    The grocery store subscribes to a third party a face recognition aggregator.

    At the beginning you'll have a shadow profile (#34950892). All it takes it one pump at a gas station, or taking cash out at the ATM to associate your face. About the only thing you could do is wear a disguise, but a different one each time.

    1. Re:And you can't opt out by Narcocide · · Score: 2

      The disguises and cash wouldn't be worth much in the way of anonymity if you were still carrying your cellphone.

  6. Facial recognition is still very much imperfect by Anonymous Coward · · Score: 4, Interesting

    Very much anecdotal, but here goes anyway - a little while back, I found a recipe for cow tongue that seemed intriguing. If I had eaten it before I couldn't recall, at least I hadn't prepared it myself. So off to the butcher's I was, as this is not found in every shop. The tongues they had on display there seemed very tiny (in retrospect, they must have been veal tongues), so I said "give me the largest tongue you have". As the saying goes, "you should be careful what you wish for" - what I ended up with was a monster, something like over 1.3kg (nearly three pounds). I really didn't need that much, but all I could do was to say thanks and go home with my prey.

    As I laid it on my cutting board, pretty much filling it entirely, it looked at the same time so awesome and gruesome that I had to take a photo of it (not a food blogger, or a blogger of any kind, I just had to document it). And to share the experience, I sent it to a friend via Hangouts. Now, as she uses Hangouts from the GMail web interface, the images are not visible inline but are Google+ links. So she clicks the link.

    ...and G+ helpfully asks her "Is this xxxxx?" (xxxxx == her name) While people are, rightfully, concerned whether companies such as Google know too much about their lives, at least when it comes to Google and facial recognition, they have a long way to go.

  7. Face it by Tablizer · · Score: 2

    When there is a competition to test solutions, do they call it a "face off" or a "face face off"?

  8. Re:Weren't deep convolutional nets debunked? by serviscope_minor · · Score: 4, Informative

    Debunked?

    They're a machine learning algorithm. All such algorithms do is place a fancy decision boundary in a high dimensional space. DnNs do a decent job for certain classes of problem. Far away from the training data, the boundary is not useful, but that's the same with all algorithms pretty much.

    So no. They haven't been debunked.

    --
    SJW n. One who posts facts.
  9. Re:so breakthrough by tmosley · · Score: 4, Informative

    It seems to me, as I have been following the progress of the technology over the last year or so, that it was only recently that scientists either had the idea to layer networks on top of one another, or gained the ability to. This started with the algo that would analyze pictures for content and tag them, ie a picture of a girl playing with a dog was tagged as such. It was approaching primate-level "cognition" in that specific context a few months ago, but now I have read that it has reached or surpassed peak human level, where rather than labeling the dog as a dog, it labeled it as its specific breed, or labeled a flower as its specific type that I had never heard of. Combining that with this new data point, it would seem that visual perception in machines has exploded into post-human territory. Shit is getting real.

  10. Re:so breakthrough by hughperkins · · Score: 4, Interesting

    They're using a standard technique. Convolutional networks started to become big with LeCun's 1998 paper on learning to recognize hand-written digits http://yann.lecun.com/exdb/pub... . His lenet-5 network could identify the digit accurately 99% of the time.

    Convolutional networks are starting to become used to play Go, eg 'Move evaluation in Go using Deep Convolutional Neural Networks', by Maddison Huang, Sutskever and Silver, http://arxiv.org/pdf/1412.6564... Maddison et al used a 12-layer convolutional network to predict where an expect would move next with 50% accuracy :-)

    Progress on convolutional networks moves forward all the time, in an incremental way. If we had one article per day about one increment it would quickly lose mass appeal though :-) The article is about one increment along the way, but does symbolize the massive progress that is being made.

    Convolutional networks work well partly because they can take advantage of the massive computional capacity made available in GPU hardware.

  11. So... by jtownatpunk.net · · Score: 4, Funny

    Can I finally automatically tag the performers in my porn collection? I'm asking for a friend.

  12. Face detection, perhaps? by Torp · · Score: 3, Interesting

    I didn't read the article, of course, but the summary sounds like they're doing face *detection* not recognition.
    Detection: find which portions of an image are faces.
    Recognition: compare to a database of faces and find out whose face it is.
    First is way easier than the other.

    --
    I apologize for the lack of a signature.
  13. Do not want by MrL0G1C · · Score: 2

    Facial recognition mostly gets used for all of the wrong reasons, Facebook tracking, illegal police tracking etc.

    'photos of innocent people have been retained in contempt of an explicit order from the court to remove them' - 18million by police

    Facebook's new face recognition policy astonishes German privacy regulator

    And what about people who don't have Facebook accounts, does Facebook allow 'tagging' of their faces?, I'm already annoyed by Facebooks obvious data collection on me as shown by the fact I get email from them telling me who my friends are and inviting me to join.

    --
    Waterfox - a Firefox fork with legacy extension support, security updates and better privacy by default.