AOL, Netflix and the End of Open Research
An anonymous reader writes "In 2006, heads rolled at AOL after the company released anonymized logs of user searches. With last week's announcement that researchers had been able to learn the identities of users in the scrubbed Netflix dataset, could the days of companies sharing data with academic researchers be numbered? Shortly after the AOL incident, Google's Eric Schmidt called the data release 'a terrible thing,' and assured the public that 'this kind of thing could not happen at Google.' Will any high tech company ever take this kind of chance again? If not, how will this impact research and and the development of future technologies that could have come from the study of real data?"
I don't see this as a problem, yet.
There exist effective techniques that can anonymize the data in order to thwart attempts to correlate identities, while still preserving the statistical properties of the data that make it useful to researchers. They include k-anonymity and l-diversity:
http://privacy.cs.cmu.edu/people/sweeney/kanonymity.html
http://www.cs.cornell.edu/~dkifer/papers/ldiversity.pdf
An unjust law is no law at all. - St. Augustine
> how will this impact research and and the development
> of future technologies that could have come from the
> study of real data?
It's definitely a hindrance. Kind of like not letting cops search houses without permission.
There are people who do not really care if their search results are added to the collection that is released. If Google had an opt-in option for data that they were going to release to academic researchers, I would opt-in. I imagine that there are other people who do not care who is looking at their searches. Something that companies might consider if they wanted to release search results is the option for the users to see what information gets released.
Slashdot Burying Stories About Slashdot Media Owned
I love this quote from TFA:
"Companies do not make money by giving researchers access to data. "
Wrong! Netflix released data to get a better recommendation system. The better they can pick movies for you, the more you will like their service. The $1million prize is peanuts compared to the increase in revenue a better system can bring.
I wonder if anyone has estimated the value of the man hours invested in this contest?