Slashdot Mirror


Breakthrough In Face Recognition Software

An anonymous reader writes: Face recognition software underwent a revolution in 2001 with the creation of the Viola-Jones algorithm. Now, the field looks set to dramatically improve once again: computer scientists from Stanford and Yahoo Labs have published a new, simple approach that can find faces turned at an angle and those that are partially blocked by something else. The researchers "capitalize on the advances made in recent years on a type of machine learning known as a deep convolutional neural network. The idea is to train a many-layered neural network using a vast database of annotated examples, in this case pictures of faces from many angles. To that end, Farfade and co created a database of 200,000 images that included faces at various angles and orientations and a further 20 million images without faces. They then trained their neural net in batches of 128 images over 50,000 iterations. ... What's more, their algorithm is significantly better at spotting faces when upside down, something other approaches haven't perfected."

5 of 142 comments (clear)

  1. Re:This is supposed to be a good thing? by retroworks · · Score: 3, Informative
    "For every "terrorist" they track through the mall, how many ordinary Joes like me who like their privacy are also tracked and stored in huge databases for all time?"

    Indeed, all of them.

    Have you noticed you can go into Best Buy or Staples, pick up a camera or look at a printer you never searched for online, and you find ads for the device on Facebook? Didn't notice? Give it a try. It's far beyond this 2013 (minority) report http://www.businessinsider.com...

    --
    Gently reply
  2. Re:Weren't deep convolutional nets debunked? by serviscope_minor · · Score: 4, Informative

    Debunked?

    They're a machine learning algorithm. All such algorithms do is place a fancy decision boundary in a high dimensional space. DnNs do a decent job for certain classes of problem. Far away from the training data, the boundary is not useful, but that's the same with all algorithms pretty much.

    So no. They haven't been debunked.

    --
    SJW n. One who posts facts.
  3. Re:so breakthrough by tmosley · · Score: 4, Informative

    It seems to me, as I have been following the progress of the technology over the last year or so, that it was only recently that scientists either had the idea to layer networks on top of one another, or gained the ability to. This started with the algo that would analyze pictures for content and tag them, ie a picture of a girl playing with a dog was tagged as such. It was approaching primate-level "cognition" in that specific context a few months ago, but now I have read that it has reached or surpassed peak human level, where rather than labeling the dog as a dog, it labeled it as its specific breed, or labeled a flower as its specific type that I had never heard of. Combining that with this new data point, it would seem that visual perception in machines has exploded into post-human territory. Shit is getting real.

  4. Re:Spike boots by hughperkins · · Score: 3, Informative

    Yes, check this out 'High Confidence Predictions for Unrecognizable Images', by Nguyen, Yosinkski and Clune, http://arxiv.org/abs/1412.1897 . It's a paper that shows an image that the net is 99.99% sure is an electric guitar, but looks nothing like :-)

    For the technically minded, the paper's authors propose that the reason is that the network is using a discriminative model, rather than a generative model. That means that the network learns a mathematical boundary that separates the images that it sees, in some kind of high-dimensional transformed space. It doesn't learn how to generate such new images, ie, you cant ask it 'draw me an electric guitar' :-) Maybe in a few years :-)

    The authors don't compare the network too much with the human brain though, ie, are they saying that the human brain is using a generative model? Is that why the human brain doesn't see a white noise picture, and claim it's a horse?

  5. Re:Upside Down? by Anonymous Coward · · Score: 3, Informative

    'Performance' is indeed an ambiguous term, it can refer to accuracy (RMS error of the detection results and false positive/negative rates in most cases) and it can also refer to speed (which I'm biased to thinking of as a programmer).

    I've never seen both meanings used in some combined metric, from an algorithmic perspective you tend to only care about accuracy as the 'performance' metric - and from a production perspective you (typically) care about 'speed' as the performance metric.

    On most ARM systems, you're correct - memory is almost always the bottleneck (ARMv7 processors are actually quite fast, IF you can keep them processing instructions every cycle, which is very hard if not impossible depending on the algorithm). Memory allocations take tens to hundreds of microseconds (scales up to milliseconds sometimes for large allocations, depending on the memory configuration and how the SoC's memory controller is designed), and loads from DDR (not from cache) take hundreds of cycles or more, and if they're an immediate dependency of the code (Eg: you issue a load into a register, which is used a few cycles ahead) - you're stalling your entire core for however long the memory controller takes to bring it through the MMU (into cache) and finally into your register.

    This is compared to a typical desktop/laptop with DDR frequencies typically over double that of LPDDR, and L2 caches that are large enough to run entire applications in (without ever touching DDR) in some cases... and when all else fails, gigantic x86 CISC cores who don't have to stall the entire pipeline when waiting on a DDR load and can opportunistically 'continue' processing code further down the road while it waits for memory.

    Meanwhile, if the core wasn't stalled from a load - hundreds of cycles can typically 'process' (various algorithms fit into this footprint) hundreds of pixels using NEON instructions against 8bit (or dozens of 32bit) pixels - so hopefully that puts it into perspective (a single memory stall can cost you the time it takes to process 10-100 pixels, very roughly) - and most algorithms do 1+ loads (that may stall depending on how things are written/prefetched/etc) per N pixels in NEON code.

    I know we internally make our researchers very much aware of the pros/cons of various algorithmic approaches (they're vaguely aware that gather/scatter memory operations are hard/impossible to optimise - so things like doing too many histograms are avoided if possible, they're aware of alignment considerations, they're aware memory loads/stores/allocations are extremely costly, and most of them even know a bit of SIMD, though most don't care enough to understand cycle counts of instructions / cache considerations / how prefetching works / why branching isn't ideal in some micro-archs / etc) - but from their perspective, they still 'just' care about accuracy - not performance (they'll generall design for performance, and make trade offs for performance if it's significant - but generally they're aiming for accuracy).

    When their research code hits our (programmer) desks, we tend to care about speed - while holding true to the original algorithm.

    Side note: You may want to forgive your phone OS when it feels slow next time, it's got apps trying to run 'with' an OS (or worse, java vm) constantly clobbering the L1/L2 data caches - we have a hard enough time in the automotive industry running baremetal code or very lean RTOS'.