Slashdot Mirror


AOL, Netflix and the End of Open Research

An anonymous reader writes "In 2006, heads rolled at AOL after the company released anonymized logs of user searches. With last week's announcement that researchers had been able to learn the identities of users in the scrubbed Netflix dataset, could the days of companies sharing data with academic researchers be numbered? Shortly after the AOL incident, Google's Eric Schmidt called the data release 'a terrible thing,' and assured the public that 'this kind of thing could not happen at Google.' Will any high tech company ever take this kind of chance again? If not, how will this impact research and and the development of future technologies that could have come from the study of real data?"

4 of 85 comments (clear)

  1. Correlations by Lachryma · · Score: 5, Insightful
    The identities were learned because the users shared their movie preference information with IMDB.

    I don't see this as a problem, yet.

  2. k-anonymity and l-diversity by omnirealm · · Score: 5, Informative

    There exist effective techniques that can anonymize the data in order to thwart attempts to correlate identities, while still preserving the statistical properties of the data that make it useful to researchers. They include k-anonymity and l-diversity:

    http://privacy.cs.cmu.edu/people/sweeney/kanonymity.html

    http://www.cs.cornell.edu/~dkifer/papers/ldiversity.pdf

    --
    An unjust law is no law at all. - St. Augustine
  3. Inviting drama by Rob+T+Firefly · · Score: 5, Funny

    Google's Eric Schmidt called the data release 'a terrible thing,' and assured the public that 'this kind of thing could not happen at Google.' Eric, you fool! Have you no concept of the world's tencency toward drama and hilarity? Loudly declaring "this kind of thing could never happen at Google" is like saying "at least it's not raining" or "it's a million-to-one chance" or some other damn fool thing that will prove you wrong nine times out of ten.
  4. Re:Opt-in by kcwhitta · · Score: 5, Insightful

    The problem with opt-in statistical gathering is that they can skew a sample, subtly biasing it. This would invalidate a lot of scientific research.