Slashdot Mirror


None of Your Pixelated or Blurred Information Will Stay Safe On The Internet (qz.com)

The University of Texas at Austin and Cornell University are saying blurred or pixelated images are not as safe as they may seem. As machine learning technology improves, the methods used to hide sensitive information become less secure. Quartz reports: Using simple deep learning tools, the three-person team was able to identify obfuscated faces and numbers with alarming accuracy. On an industry standard dataset where humans had 0.19% chance of identifying a face, the algorithm had 71% accuracy (or 83% if allowed to guess five times). The algorithm doesn't produce a deblurred image -- it simply identifies what it sees in the obscured photo, based on information it already knows. The approach works with blurred and pixelated images, as well as P3, a type of JPEG encryption pitched as a secure way to hide information. The attack uses Torch (an open-source deep learning library), Torch templates for neural networks, and standard open-source data. To build the attacks that identified faces in YouTube videos, researchers took publicly-available pictures and blurred the faces with YouTube's video tool. They then fed the algorithm both sets of images, so it could learn how to correlate blur patterns to the unobscured faces. When given different images of the same people, the algorithm could determine their identity with 57% accuracy, or 85% percent when given five chances. The report mentions Max Planck Institute's work on identifying people in blurred Facebook photos. The difference between the two research is that UT and Cornell's research is much more simple, and "shows how weak these privacy methods really are."

14 of 139 comments (clear)

  1. I felt a disturbance in the force. by Hognoxious · · Score: 5, Funny

    The algorithm doesn't produce a deblurred image

    I felt a great disturbance in the force, as if a million Japanese porn fans cried out in disappointment.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  2. Not too surprising... by orlanz · · Score: 3, Informative

    For a computer, most algorithms behind comparing two pictures is already a blurred picture of both. Most of these algorithms take samples/pixels of the pictures and see if the relationships of both sets of samples are the same or within a margin of deviation. There is little value in comparing pixel by pixel for exact matches. Similar to human finger prints.

    A blurred picture is similar to taking less samples on one picture and setting the margin of deviation wider.

    But for computers, 57% is pretty bad. 85% is also very bad and that's when you are telling the machine the answer. At those rates, this is kind of hard to do mass comparisons... the false positives would be far too high for any human to weed through. This will apply more for targeted searches where an investigator wants the 5 most probable matches to a blur. Unlike the researchers here who know the answer before hand, he still needs to take the guess on which one it actually is.

    In a criminal investigation, if we had a database of likely suspects, this would work. But we are all about mass collection of data data data. With a large population of pictures, the blur will probably match a lot more than 5.

    1. Re:Not too surprising... by AmiMoJo · · Score: 5, Interesting

      That pixelated images are insecure has been known about for years. I seem to recall it was even mentioned on Slashdot. There are many other attacks, for example if you have text (like a number plate) you can just try running a dictionary attack of images through a pixelation filter and select the closest matching result.

      Black bars have always been the preferred method.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    2. Re:Not too surprising... by Script+Cat · · Score: 2

      This is just another false evidence generator. Give me a pixelated image and I'll paint any number of an infinity of images that will average out to that patch of squares.
      There's data missing if you add data that's false evidence.
      There's dogs everywhere!

  3. Re:Research is a bit blurry by TheRaven64 · · Score: 2

    Exactly. Adding noise... adds noise. If you have a relatively small data set, then the edit distance between the blurred image and one or two of the originals is likely to be smaller than the others, which is what this kind of system determines. If you have a very large dataset, then you're going to end up with far more false positives.

    To give a simple example, consider a data set of four people: two white, two black, and of those one each with blond hair and one with dark. You add a lot of noise, but you can still effectively identify them by averaging the colour in the top third and bottom two thirds of the image. You should get a 100% accuracy even with a lot of noise in the image. Now consider doing the same thing on a data set of 100 people in those same four categories. At best, you'll narrow it down to about a quarter of the people.

    Neural networks aren't magic. They can approximate any mathematical function and they're often easier to generate than working out what the function that you actually want would look like. If there is enough information in the source data for discrimination, then a neural network can be trained to extract it and perform the classification. If there isn't, then you're out of luck.

    Often; however, these things work because the blurring is not actually a very lossy transform. It's a convolution filter that only discards a very small amount of information, but does so in a way that confuses the human brain (the opposite of something like JPEG, which tries to throw away only the information that the human visual cortex doesn't use to identify the image). A number of such transforms have been shown to be either fully reversible, or partially reversible such that you can identify the original quite clearly.

    --
    I am TheRaven on Soylent News
  4. Re:Why? by Big+Hairy+Ian · · Score: 4, Interesting

    I'm just reminded of the case (About a decade ago) of a pedophile who published photo's of himself abusing children with his face obfuscated by the photoshop swirl tool. The police desperately wanted to ID him but couldn't deobfuscate the photos so they published them minus the abuse sections hoping that members of the public might identify the man based on his surroundings. Of course the public not being utter fucknuggets quickly deswirled the photo and published it on the internet. I never did find out how long the guy survived.

    --

    Build a Man a Fire, and He'll Be Warm for a Day. Set a Man on Fire, and He'll Be Warm for the Rest of His Life.

  5. Sensationalist blabber.... by Anonymous Coward · · Score: 2, Informative

    It is a fundamental law of computer science that you cannot increase the amount of information in a given dataset. In this case the combined dataset of the blurred image and the learned statistical averages of a human face.

    Once an image has been blurred (information has been deleted) it cannot be recreated. What you can do is to apply statistical averages in the hopes of getting something which might resemble the original information. It will - however - be just that, cosmetic improvements based on statistical averages.

    If sufficient information has been removed by blurring the image, the deblurring process - no matter if you use the word AI or statistic averages - cannot recreate a uniquely identifiable image.

    1. Re:Sensationalist blabber.... by Blaskowicz · · Score: 2

      Though if you have many blurred pictures of a face or license plate, you might get on something. There might be quite a lot of information in a minute-long video that includes a blurred face.

  6. Re:Why? by wonkey_monkey · · Score: 2

    It was the police who deswirled the photographs.

    --
    systemd is Roko's Basilisk.
  7. Limited usefulness by hackertourist · · Score: 3, Insightful

    They had a photo with an obscured face and the same photo with unobscured face in their training set. It seems obvious a computer can match those two. The solution would be to use unique photos, not uploaded anywhere, as the source for obscuration and only publish the obscured version.

  8. Re:Why? by Anonymous Coward · · Score: 3, Informative

    The guy was caught in Thailand. The German police "deswirled" his photograph:

    https://en.wikipedia.org/wiki/Christopher_Paul_Neil

  9. Re:Old school censoring.... by AHuxley · · Score: 2

    Depends who is after the info and what contacts they have and at what price.
    Law enforcement, ex or former law enforcement, private detective might all have their contacts.
    The other issues is state police, federal agencies and the mil just seeking all pics online for matching faces, passenger faces and plate numbers in case they are ever seen near any sensitive site.
    The private sector will often have their own security walk out and take a picture, use facial recognition, try and find a plate number.
    A protester, someone doing a first amendment audit might be walking around, careful never to trespass but their transport might be within walking distance. Law enforcement may not wish to be on camera doing a chat down so they drive around until they find the plate number of interest.
    Another step later is to see if the plate is on any state or federal, mil social media databases.
    Private detectives also have access to very large private sector social media databases that try to offer a lot of images once and now on social media as a service.
    A lot of different groups will hire private detectives to run plates and faces on any one seeking work or new asking questions. Does the resume really hold, the car match the history? Citizen journalists might have the paperwork, hair cut, accent, life story, friend on the inside but then walk back to the car and get photographed..
    So on the state, federal mil and private sector, a lot of interest is in social media, any kind of images and images over the decades of media and early social media.
    Removing something public from social media quickly is often too late as federal, state and private brands then have that data. An image of a license plate is all in the mix and has many interested groups collecting.

    --
    Domestic spying is now "Benign Information Gathering"
  10. The correct term is "differently resolved". by Pseudonymous+Powers · · Score: 2

    Why would you show a blurred photo anyway? Show the face in full, or don't show it at all. There is no compromise here.

    That's no image filter, that's just the way my face naturally looks, you insensitive clod!

  11. Re:"Deep learning" by Anonymous Coward · · Score: 2, Informative

    "Deep learning" is a configuration of a neural network. Historically we couldn't have nested neural networks because we didn't know how to train them in any reasonable amount of time. Then we figured out how, and discovered nested nets work far better than traditional neural networks.

    So you get more specific and descriptive going from: algorithms -> AI -> reinforcement learning -> neural networks -> deep learning.