Slashdot Mirror


Breakthrough In Face Recognition Software

An anonymous reader writes: Face recognition software underwent a revolution in 2001 with the creation of the Viola-Jones algorithm. Now, the field looks set to dramatically improve once again: computer scientists from Stanford and Yahoo Labs have published a new, simple approach that can find faces turned at an angle and those that are partially blocked by something else. The researchers "capitalize on the advances made in recent years on a type of machine learning known as a deep convolutional neural network. The idea is to train a many-layered neural network using a vast database of annotated examples, in this case pictures of faces from many angles. To that end, Farfade and co created a database of 200,000 images that included faces at various angles and orientations and a further 20 million images without faces. They then trained their neural net in batches of 128 images over 50,000 iterations. ... What's more, their algorithm is significantly better at spotting faces when upside down, something other approaches haven't perfected."

5 of 142 comments (clear)

  1. Upside Down? by Anonymous Coward · · Score: 5, Insightful

    "What's more, their algorithm is significantly better at spotting faces when upside down, something other approaches haven't perfected."

    Add this step: Rotate the image and run the algorithm each x degrees. What am I missing?

    1. Re:Upside Down? by kekx · · Score: 5, Insightful

      Performance.

    2. Re:Upside Down? by Anonymous Coward · · Score: 5, Interesting

      As someone who literally works on face detection/tracking software on low power ARMv7/8 CPUs, I can safely say you are dead wrong.

      Assuming width==height (not likely given any current video formats or cameras), and assuming width%8 == 0 - it's a simple transposition of the rows and/or columns to do +/- 90/180 degrees, yes - and assuming you can fit your ENTIRE image in L1 cache you're going to incur minimal stalls (especially with an SoC that has a decent prefetch engine).

      In reality:
        * width != height
        * width is however typically divisible by 8 so you can do pure NEON (not hybrid NEON + ALU/VFP) transpositions
        * an 8bit grayscale VGA (640x480) image doesn't even fit in L1 cache, let alone a 720/1080p format (though most CV applications scale things down significantly, you tend to work at 320x180 - but that still doesn't fit in most L1 caches, although it does fit in 'some')
        * L2 cache hits are dozens of cycles, L2 cache misses are HUNDREDS of cycles
        * A real world case of rotating a 320x180 image takes ~2ms on a 700Mhz Cortex A9, that is not 'practically zero', that's 12% of your processing time at 60Hz - 36% of your processing time if you're going to rotate 3 times.

      (Note: using 700Mhz Cortex A9 as an example as that's typical in automotive hardware systems we typically deal with, although the last 2 years has brought ~1-1.5Ghz A15's into the mix - though most of those cars aren't even on the market yet)

  2. Spike boots by Tablizer · · Score: 5, Funny

    What's more, their algorithm is significantly better at spotting faces when upside down

    Rats, there goes my ceiling-walking bank-robbery plans.

  3. Re:This is supposed to be a good thing? by Jack+Griffin · · Score: 5, Insightful

    I think it's pretty well understood that there *are* terrorists and a lot of them and they're walking among us.

    I disagree with this statement. If there were even a handful of real terrorists amongst us, there'd be blood in the streets. Seriously, if you really are hell bent on murdering infidels, it's not hard to drive a bus into a pack of school children, or carry a tin of petrol and a lighter into your nearest train station. That's the nature of terrorism, it is so trivial to execute that the threat is equally trivial to measure. See the history of the IRA for real world examples.