None of Your Pixelated or Blurred Information Will Stay Safe On The Internet (qz.com)
The University of Texas at Austin and Cornell University are saying blurred or pixelated images are not as safe as they may seem. As machine learning technology improves, the methods used to hide sensitive information become less secure. Quartz reports: Using simple deep learning tools, the three-person team was able to identify obfuscated faces and numbers with alarming accuracy. On an industry standard dataset where humans had 0.19% chance of identifying a face, the algorithm had 71% accuracy (or 83% if allowed to guess five times). The algorithm doesn't produce a deblurred image -- it simply identifies what it sees in the obscured photo, based on information it already knows. The approach works with blurred and pixelated images, as well as P3, a type of JPEG encryption pitched as a secure way to hide information. The attack uses Torch (an open-source deep learning library), Torch templates for neural networks, and standard open-source data. To build the attacks that identified faces in YouTube videos, researchers took publicly-available pictures and blurred the faces with YouTube's video tool. They then fed the algorithm both sets of images, so it could learn how to correlate blur patterns to the unobscured faces. When given different images of the same people, the algorithm could determine their identity with 57% accuracy, or 85% percent when given five chances. The report mentions Max Planck Institute's work on identifying people in blurred Facebook photos. The difference between the two research is that UT and Cornell's research is much more simple, and "shows how weak these privacy methods really are."
For a computer, most algorithms behind comparing two pictures is already a blurred picture of both. Most of these algorithms take samples/pixels of the pictures and see if the relationships of both sets of samples are the same or within a margin of deviation. There is little value in comparing pixel by pixel for exact matches. Similar to human finger prints.
A blurred picture is similar to taking less samples on one picture and setting the margin of deviation wider.
But for computers, 57% is pretty bad. 85% is also very bad and that's when you are telling the machine the answer. At those rates, this is kind of hard to do mass comparisons... the false positives would be far too high for any human to weed through. This will apply more for targeted searches where an investigator wants the 5 most probable matches to a blur. Unlike the researchers here who know the answer before hand, he still needs to take the guess on which one it actually is.
In a criminal investigation, if we had a database of likely suspects, this would work. But we are all about mass collection of data data data. With a large population of pictures, the blur will probably match a lot more than 5.
The guy was caught in Thailand. The German police "deswirled" his photograph:
https://en.wikipedia.org/wiki/Christopher_Paul_Neil