Slashdot Mirror


Breakthrough In Face Recognition Software

An anonymous reader writes: Face recognition software underwent a revolution in 2001 with the creation of the Viola-Jones algorithm. Now, the field looks set to dramatically improve once again: computer scientists from Stanford and Yahoo Labs have published a new, simple approach that can find faces turned at an angle and those that are partially blocked by something else. The researchers "capitalize on the advances made in recent years on a type of machine learning known as a deep convolutional neural network. The idea is to train a many-layered neural network using a vast database of annotated examples, in this case pictures of faces from many angles. To that end, Farfade and co created a database of 200,000 images that included faces at various angles and orientations and a further 20 million images without faces. They then trained their neural net in batches of 128 images over 50,000 iterations. ... What's more, their algorithm is significantly better at spotting faces when upside down, something other approaches haven't perfected."

17 of 142 comments (clear)

  1. Upside Down? by Anonymous Coward · · Score: 5, Insightful

    "What's more, their algorithm is significantly better at spotting faces when upside down, something other approaches haven't perfected."

    Add this step: Rotate the image and run the algorithm each x degrees. What am I missing?

    1. Re:Upside Down? by kekx · · Score: 5, Insightful

      Performance.

    2. Re:Upside Down? by Anonymous Coward · · Score: 5, Interesting

      As someone who literally works on face detection/tracking software on low power ARMv7/8 CPUs, I can safely say you are dead wrong.

      Assuming width==height (not likely given any current video formats or cameras), and assuming width%8 == 0 - it's a simple transposition of the rows and/or columns to do +/- 90/180 degrees, yes - and assuming you can fit your ENTIRE image in L1 cache you're going to incur minimal stalls (especially with an SoC that has a decent prefetch engine).

      In reality:
        * width != height
        * width is however typically divisible by 8 so you can do pure NEON (not hybrid NEON + ALU/VFP) transpositions
        * an 8bit grayscale VGA (640x480) image doesn't even fit in L1 cache, let alone a 720/1080p format (though most CV applications scale things down significantly, you tend to work at 320x180 - but that still doesn't fit in most L1 caches, although it does fit in 'some')
        * L2 cache hits are dozens of cycles, L2 cache misses are HUNDREDS of cycles
        * A real world case of rotating a 320x180 image takes ~2ms on a 700Mhz Cortex A9, that is not 'practically zero', that's 12% of your processing time at 60Hz - 36% of your processing time if you're going to rotate 3 times.

      (Note: using 700Mhz Cortex A9 as an example as that's typical in automotive hardware systems we typically deal with, although the last 2 years has brought ~1-1.5Ghz A15's into the mix - though most of those cars aren't even on the market yet)

    3. Re:Upside Down? by Anonymous Coward · · Score: 3, Informative

      'Performance' is indeed an ambiguous term, it can refer to accuracy (RMS error of the detection results and false positive/negative rates in most cases) and it can also refer to speed (which I'm biased to thinking of as a programmer).

      I've never seen both meanings used in some combined metric, from an algorithmic perspective you tend to only care about accuracy as the 'performance' metric - and from a production perspective you (typically) care about 'speed' as the performance metric.

      On most ARM systems, you're correct - memory is almost always the bottleneck (ARMv7 processors are actually quite fast, IF you can keep them processing instructions every cycle, which is very hard if not impossible depending on the algorithm). Memory allocations take tens to hundreds of microseconds (scales up to milliseconds sometimes for large allocations, depending on the memory configuration and how the SoC's memory controller is designed), and loads from DDR (not from cache) take hundreds of cycles or more, and if they're an immediate dependency of the code (Eg: you issue a load into a register, which is used a few cycles ahead) - you're stalling your entire core for however long the memory controller takes to bring it through the MMU (into cache) and finally into your register.

      This is compared to a typical desktop/laptop with DDR frequencies typically over double that of LPDDR, and L2 caches that are large enough to run entire applications in (without ever touching DDR) in some cases... and when all else fails, gigantic x86 CISC cores who don't have to stall the entire pipeline when waiting on a DDR load and can opportunistically 'continue' processing code further down the road while it waits for memory.

      Meanwhile, if the core wasn't stalled from a load - hundreds of cycles can typically 'process' (various algorithms fit into this footprint) hundreds of pixels using NEON instructions against 8bit (or dozens of 32bit) pixels - so hopefully that puts it into perspective (a single memory stall can cost you the time it takes to process 10-100 pixels, very roughly) - and most algorithms do 1+ loads (that may stall depending on how things are written/prefetched/etc) per N pixels in NEON code.

      I know we internally make our researchers very much aware of the pros/cons of various algorithmic approaches (they're vaguely aware that gather/scatter memory operations are hard/impossible to optimise - so things like doing too many histograms are avoided if possible, they're aware of alignment considerations, they're aware memory loads/stores/allocations are extremely costly, and most of them even know a bit of SIMD, though most don't care enough to understand cycle counts of instructions / cache considerations / how prefetching works / why branching isn't ideal in some micro-archs / etc) - but from their perspective, they still 'just' care about accuracy - not performance (they'll generall design for performance, and make trade offs for performance if it's significant - but generally they're aiming for accuracy).

      When their research code hits our (programmer) desks, we tend to care about speed - while holding true to the original algorithm.

      Side note: You may want to forgive your phone OS when it feels slow next time, it's got apps trying to run 'with' an OS (or worse, java vm) constantly clobbering the L1/L2 data caches - we have a hard enough time in the automotive industry running baremetal code or very lean RTOS'.

  2. This is supposed to be a good thing? by Snotnose · · Score: 4, Insightful

    For every "terrorist" they track through the mall, how many ordinary Joes like me who like their privacy are also tracked and stored in huge databases for all time?

    1. Re:This is supposed to be a good thing? by RoknrolZombie · · Score: 4, Insightful

      All of them.

    2. Re:This is supposed to be a good thing? by Anonymous Coward · · Score: 4, Insightful

      I think it's pretty well understood that there *are* terrorists...

      Yes.

      ... and a lot of them ...

      By almost every measure: No.

      ...and they're walking among us.

      For virtually every useful North American or Western European definition of 'us': No.

    3. Re:This is supposed to be a good thing? by Jack+Griffin · · Score: 5, Insightful

      I think it's pretty well understood that there *are* terrorists and a lot of them and they're walking among us.

      I disagree with this statement. If there were even a handful of real terrorists amongst us, there'd be blood in the streets. Seriously, if you really are hell bent on murdering infidels, it's not hard to drive a bus into a pack of school children, or carry a tin of petrol and a lighter into your nearest train station. That's the nature of terrorism, it is so trivial to execute that the threat is equally trivial to measure. See the history of the IRA for real world examples.

    4. Re:This is supposed to be a good thing? by retroworks · · Score: 3, Informative
      "For every "terrorist" they track through the mall, how many ordinary Joes like me who like their privacy are also tracked and stored in huge databases for all time?"

      Indeed, all of them.

      Have you noticed you can go into Best Buy or Staples, pick up a camera or look at a printer you never searched for online, and you find ads for the device on Facebook? Didn't notice? Give it a try. It's far beyond this 2013 (minority) report http://www.businessinsider.com...

      --
      Gently reply
  3. Spike boots by Tablizer · · Score: 5, Funny

    What's more, their algorithm is significantly better at spotting faces when upside down

    Rats, there goes my ceiling-walking bank-robbery plans.

    1. Re:Spike boots by hughperkins · · Score: 3, Informative

      Yes, check this out 'High Confidence Predictions for Unrecognizable Images', by Nguyen, Yosinkski and Clune, http://arxiv.org/abs/1412.1897 . It's a paper that shows an image that the net is 99.99% sure is an electric guitar, but looks nothing like :-)

      For the technically minded, the paper's authors propose that the reason is that the network is using a discriminative model, rather than a generative model. That means that the network learns a mathematical boundary that separates the images that it sees, in some kind of high-dimensional transformed space. It doesn't learn how to generate such new images, ie, you cant ask it 'draw me an electric guitar' :-) Maybe in a few years :-)

      The authors don't compare the network too much with the human brain though, ie, are they saying that the human brain is using a generative model? Is that why the human brain doesn't see a white noise picture, and claim it's a horse?

  4. Facial recognition is still very much imperfect by Anonymous Coward · · Score: 4, Interesting

    Very much anecdotal, but here goes anyway - a little while back, I found a recipe for cow tongue that seemed intriguing. If I had eaten it before I couldn't recall, at least I hadn't prepared it myself. So off to the butcher's I was, as this is not found in every shop. The tongues they had on display there seemed very tiny (in retrospect, they must have been veal tongues), so I said "give me the largest tongue you have". As the saying goes, "you should be careful what you wish for" - what I ended up with was a monster, something like over 1.3kg (nearly three pounds). I really didn't need that much, but all I could do was to say thanks and go home with my prey.

    As I laid it on my cutting board, pretty much filling it entirely, it looked at the same time so awesome and gruesome that I had to take a photo of it (not a food blogger, or a blogger of any kind, I just had to document it). And to share the experience, I sent it to a friend via Hangouts. Now, as she uses Hangouts from the GMail web interface, the images are not visible inline but are Google+ links. So she clicks the link.

    ...and G+ helpfully asks her "Is this xxxxx?" (xxxxx == her name) While people are, rightfully, concerned whether companies such as Google know too much about their lives, at least when it comes to Google and facial recognition, they have a long way to go.

  5. Re:Weren't deep convolutional nets debunked? by serviscope_minor · · Score: 4, Informative

    Debunked?

    They're a machine learning algorithm. All such algorithms do is place a fancy decision boundary in a high dimensional space. DnNs do a decent job for certain classes of problem. Far away from the training data, the boundary is not useful, but that's the same with all algorithms pretty much.

    So no. They haven't been debunked.

    --
    SJW n. One who posts facts.
  6. Re:so breakthrough by tmosley · · Score: 4, Informative

    It seems to me, as I have been following the progress of the technology over the last year or so, that it was only recently that scientists either had the idea to layer networks on top of one another, or gained the ability to. This started with the algo that would analyze pictures for content and tag them, ie a picture of a girl playing with a dog was tagged as such. It was approaching primate-level "cognition" in that specific context a few months ago, but now I have read that it has reached or surpassed peak human level, where rather than labeling the dog as a dog, it labeled it as its specific breed, or labeled a flower as its specific type that I had never heard of. Combining that with this new data point, it would seem that visual perception in machines has exploded into post-human territory. Shit is getting real.

  7. Re:so breakthrough by hughperkins · · Score: 4, Interesting

    They're using a standard technique. Convolutional networks started to become big with LeCun's 1998 paper on learning to recognize hand-written digits http://yann.lecun.com/exdb/pub... . His lenet-5 network could identify the digit accurately 99% of the time.

    Convolutional networks are starting to become used to play Go, eg 'Move evaluation in Go using Deep Convolutional Neural Networks', by Maddison Huang, Sutskever and Silver, http://arxiv.org/pdf/1412.6564... Maddison et al used a 12-layer convolutional network to predict where an expect would move next with 50% accuracy :-)

    Progress on convolutional networks moves forward all the time, in an incremental way. If we had one article per day about one increment it would quickly lose mass appeal though :-) The article is about one increment along the way, but does symbolize the massive progress that is being made.

    Convolutional networks work well partly because they can take advantage of the massive computional capacity made available in GPU hardware.

  8. So... by jtownatpunk.net · · Score: 4, Funny

    Can I finally automatically tag the performers in my porn collection? I'm asking for a friend.

  9. Face detection, perhaps? by Torp · · Score: 3, Interesting

    I didn't read the article, of course, but the summary sounds like they're doing face *detection* not recognition.
    Detection: find which portions of an image are faces.
    Recognition: compare to a database of faces and find out whose face it is.
    First is way easier than the other.

    --
    I apologize for the lack of a signature.