Slashdot Mirror


Why Anonymized Data Isn't

Ars has a review of recent research, and a summary of the history, in the field of reidentification — identifying people from anonymized data. Paul Ohm's recent paper is an elaboration of what Ohm terms a central reality of data collection: "Data can either be useful or perfectly anonymous but never both." "...in 2000, [researcher Latanya Sweeney] showed that 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex. ... For almost every person on earth, there is at least one fact about them stored in a computer database that an adversary could use to blackmail, discriminate against, harass, or steal the identity of him or her. I mean more than mere embarrassment or inconvenience; I mean legally cognizable harm. ... Reidentification science disrupts the privacy policy landscape by undermining the faith that we have placed in anonymization."

4 of 280 comments (clear)

  1. Duh. by SatanicPuppy · · Score: 3, Informative

    Am I the only one who always gives their birthday as 01/01/1970 and their zip code as 20500?

    I mean, seriously. They don't need to know. Why would I give 'em the right numbers? They're lucky I even allow them to have rough demographic data.

    --
    ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
  2. Re:Paul Ohm? by natehoy · · Score: 4, Informative

    Nonsense, it could be a extension of the current Law:

    "In electrical circuits, Ohms' law states that the current through a conductor between two points is directly proportional to the potential difference or voltage across the two points, and inversely proportional to the resistance between them. In data anonymity, the law states that the general usefulness of any set of data that originally contained personally-identifiable information is inversely proportional to the degree of anonymity applied to said data."

    See, on simple law to memorize, and now data analysts learn just a teensy bit about electricity and EEs learn just a teensy bit about data anonymization.

    --
    "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
  3. Re:Three things? Really? by Daniel_Staal · · Score: 5, Informative

    That Paradox ignores the year. Add that in and it starts to become harder.

    --
    'Sensible' is a curse word.
  4. Re:Paul Ohm? by Beardo+the+Bearded · · Score: 4, Informative

    Okay, let's take a road. The speed at which traffic can travel depends on the quality of the surface, gradient, camber, zoning, etc. Let's call this the "road conditions", with a lower number being better roads.

    The number of cars that want to get through that road is a primary unit, which we can refer to as the "volume of traffic".

    The third major criteria is the speed at which the traffic actually flows. This is the "actual flow" of traffic -- in other words, the "influence of other cars" on the traffic congestion.

    In other words:
    volume = influence of traffic * road conditions

    or:
    V = IR

    --

    ---
    ECHELON is a government program to find words like bomb, jihad, plutonium, assassinate, and anarchy.