Face Recognition - Real or Science Fiction?
An anonymous reader writes "Facial recognition software has been touted as one of the technologies that will change our future, particularly in law enforcement. How close are we to being recognized by a computer anywhere we go, as portrayed in movies like Minority Report? According to the industry's recent Public Relations releases, these products are closer than we think.
The reality though, is that current products work only when utilizing a small comparative sample, and any attempts for an individual to disguise themselves typically throw off the results. To see how far this technology needs to go before becoming mainstream, one site utilized Government-tested face recognition software, available freely through MyHeritage.com, to compare hundreds of famous people, animals, and cartoons to a database of 2,000 celebrities. Some of the results showed promise for the technology, but most were just funny — for example, who would mistake Barbara Streisand for Shrek, or Lance Bass of N'Sync for a Teletubby?"
After working in computer vision for 5 years I've realized that most problems aren't hard - they are not well defined. Mathematically face recognition is not a problem that can be stated.
Many other problems in CV are like this - edge detection, segmentation, etc. But people write hacks that work in restricted conditions and say they've solved.
And look, you could always just put on those Groucho Marx glasses.
This is all well and good, but the minute I get falsely identfied as a criminal just for being in the bar district late at night in the wrong place/wrong time I won't be too happy. . .
disclaimer: I've been known to store numbers in my ass for which to dig out when quantities are required.
I'm wondering about the legality of all this, especially in a criminal justice system. My DNA, for example, can't be used in court as evidence unless certain hoops have been jumped through; the prosecutor needs a reason to obtain a DNA sample and then procedures must be followed.
I wonder if the same systems will apply to a computer analysed image of my face; will there be a criterea for when this image is admissable in court? Will I have rights concerning my image? Or are we just going towards a 1984 style system. Interesting because this hasn't been the result of DNA admissions to court, despite the seemingly more robust nature of this evidence.
But don't we almost always get a computer to solve a problem that's not strictly a mathematical one using "hacks that only work in restricted conditions"?
Our spell-checkers in our word processors don't actually know anything about the rules of a language, phonics, etc. They just do lookups from a dictionary. If a word's not listed, it has no idea if it's spelled properly or not -- even if the misspelling is one that's simply not a possible correct sequence of letters for the language. Most don't even realize if a word is misspelled in the context of the sentence, as long as it matches a correct spelling in the word list.
Until we figure out how the human brain recognizes faces as individuals, we can't expect anything *but* a clever hack for a computer to do the same. And truthfully, I suspect the human brain takes many things into account to do a "recognition" on a person. How often do you see somebody in the store that you're pretty sure you know from a previous job, school, etc. but you're not quite sure? I've had this happen a few times, and to make a better determination, I had to take other factors into account, like the sound of their voice if I heard them speak, the way they walked, or maybe an expression that came across their face. Humans "key in" on specific things that help them remember a person. And depending on which "features" they chose, they may or may not be effective. (Say you remember a gal really well because of her long, flowing hair? If she cuts it real short, there's a good chance you won't recognize her at all anymore if she walks by you.)
The problem is the inputs. Do you inputs sets of geometry (eyes are X" apart, at an angle of 0.53 degrees, chin is .5" below lips, blah blah blah), the raw image, or something else? If you use the raw image, you'd need a system in the front end scale/rotate the images to be in about the same place otherwise you probably have no chance (unless you want your neural net to do that TOO, which would make training harder and take longer).
Even if you use geometry (we have a vague understanding of what makes people look similar or beautiful) you'll still run into problem. You have problems of perspective (not all pictures are taken straight on).
Garbage in, garbage out. The best solution is to provide tons of information and let the neural net sort out what matters and what doesn't (they are quite good at that) but that will require more training which means more time.
So in the end you may build a good system. But to use it you must provide it with geometry of a face that someone picks out after fixing the perspective on a photo. Or it works much like our brains and accounts for all that, but it will take you 6 years of non-stop training alone.
And what is a success? Two people who look similar? A perfect match? What if your software rates a picture of a celebrity impersonator (looking like the celebrity) over a picture of that celebrity looking different (movie role, disheveled mugshot, etc)? Is that a success?
And how do you rate the people for the training input? Sure a neural net can figure out the way to something where we know the end, but what about when we don't quite know the end?
It probably took evolution a VERY long time to get good at recognizing individuals. And even then, we are not that great (mistaken identity, all cocker spaniels look alike until you spend more time with them, etc).
It's a neat problem, but it is seriously tough even with the "voodoo magic" that a neural net would provide over trying to come up with a straight formula.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
If I was training to match V1-4, I'd have the input come from two "eyes" with inputs similar to what our eyes actually provide to our brain. We know quite a bit about visual cortex, but there's a lot we don't know. Initially, I'd train it using a batch of photographs for a single person (we'll call her "Momma") and then I'd train with a few others (where a match is a match only if it's the same person). From there, I'd create histograms of parameter settings that seem to do an adequate job on this small set, and then use this reduced parameter space to create populations that are evaluated after training on millions of photographs. (The photographs can be placed in front of the eyes - once for each photograph, mind you, and not for each "individual" being tested - just like we can recognize photos and not just people.)
I could imagine narrowing the parameter space down to 100 or so unknown parameters, and each training session might take several hours. Given enough resources (e.g., the Pittsburgh Supercomputer Center), I'd run population sizes of 500 or so (in parallel), so that you could possibly go through 4-5 generations per day. In a month, you might have some pretty good individuals. Of course, my research area is the hippocampus and not the visual cortex, so it might take significantly more than 100 parameters to even begin to set this up.
Now, someone else pointed out that such computers would not have the biases that we humans have, but that's not necessarily true. If you train the computer using an input set of 950,000 "white" people and 50,000 "black" people, it would tend to make the mistake of thinking that "black" people look a lot like each other. (Studies done with speech recognition have shown that neural networks trained on Japanese have a much harder time telling "l" from "r" than those trained on English.)
Ben Hocking
Need a professional organizer?