Slashdot Mirror


AOL, Netflix and the End of Open Research

An anonymous reader writes "In 2006, heads rolled at AOL after the company released anonymized logs of user searches. With last week's announcement that researchers had been able to learn the identities of users in the scrubbed Netflix dataset, could the days of companies sharing data with academic researchers be numbered? Shortly after the AOL incident, Google's Eric Schmidt called the data release 'a terrible thing,' and assured the public that 'this kind of thing could not happen at Google.' Will any high tech company ever take this kind of chance again? If not, how will this impact research and and the development of future technologies that could have come from the study of real data?"

4 of 85 comments (clear)

  1. k-anonymity and l-diversity by omnirealm · · Score: 5, Informative

    There exist effective techniques that can anonymize the data in order to thwart attempts to correlate identities, while still preserving the statistical properties of the data that make it useful to researchers. They include k-anonymity and l-diversity:

    http://privacy.cs.cmu.edu/people/sweeney/kanonymity.html

    http://www.cs.cornell.edu/~dkifer/papers/ldiversity.pdf

    --
    An unjust law is no law at all. - St. Augustine
  2. Re:The Impact by mabhatter654 · · Score: 2, Informative

    but it's the companies data, not yours. Once they strip out your name and such your privacy claims are limited. Not that people won't piece things back together using an outside database This is what happened in the Netflix case. They were able to guess user's #3956 name at ANOTHER website. They could probably keep the info off the net-at-large by only letting researchers use their equipment under NDA so not everybody has this info.

    As far as "legal searching" goes, they already do this... legally, they just pay money for private access to these databases with your name still attached! Anything tied to banking or SSN the govt already has in spades.

  3. Re:Researchers are to blame. by palegray.net · · Score: 2, Informative

    This is kinda like saying security researchers are to blame for discovering and publishing weaknesses in software. Responsible citizens just pretend everything is fine and wait for someone really bad to discover the same weaknesses and exploit them. Because it's so much easier chasing down criminals than it is to fix problems in the first place by adopting better security practices. I guess we could just arrest all researchers who publicize uncomfortable truths. What's the number to Adobe's legal department? I'm sure they still have a few district attorneys on speed dial...

  4. ISP's already sell all your web browsing logs by Mal+Reynolds · · Score: 2, Informative

    This is just the tip of the iceberg. If you live in the US, it's likely that logs of all your web activity are being sold to clickstream companies. The data logs being sold by the ISPs seem to use the exact same sort of inadequate anonymity practices as were used by AOL.

    The problem is that no matter how well the data is cloaked, a users browser habits can easily make the anonymity worthless. As has been seen in the case of NetFlix and AOL, it's easy to figure out whom a person is by simply looking at anonymized logs. A single visit to a social networking site is often enough to make a good guess. But when a specific anomized IP address visits the same page of social networking sites, or edits social their profile at a social networking site, or reviews an item at a vendor site, the real identity of that "anonymized" IP address is completely confirmed.

    Simply cloaking an IP address will never provide anonymity. But the companies that purchase your web surfing logs would have no use for logs that weren't attached to a single user. Unless the ISPs were to keep track of and filter out every single vendor site which revealed a user's real name, there would seem to be no safe way to anonymize user logs. Since there are countless numbers of web forums, vendors, and social networking sites, it would seem technically impossible to truly provide any safe level of anonymity for user logs. Selling these logs is just a bad practice that needs to be stopped.

    I can only wonder why the EFF and other organizations haven't made a bigger deal about this. These ISPs are selling all of their user's web logs. I cannot imagine any effective way the ISP's could ever anonymize this data. More info: http://wanderingstan.com/2007-03-19/is_comcast_selling_your_clickstream_audio_transcript http://arstechnica.com/news.ars/post/20070315-your-isp-may-be-selling-your-web-clicks.html