AOL, Netflix and the End of Open Research
An anonymous reader writes "In 2006, heads rolled at AOL after the company released anonymized logs of user searches. With last week's announcement that researchers had been able to learn the identities of users in the scrubbed Netflix dataset, could the days of companies sharing data with academic researchers be numbered? Shortly after the AOL incident, Google's Eric Schmidt called the data release 'a terrible thing,' and assured the public that 'this kind of thing could not happen at Google.' Will any high tech company ever take this kind of chance again? If not, how will this impact research and and the development of future technologies that could have come from the study of real data?"
I don't see this as a problem, yet.
> how will this impact research and and the development
> of future technologies that could have come from the
> study of real data?
It's definitely a hindrance. Kind of like not letting cops search houses without permission.
But how many would? There are "Chilling Effects" all over the place. For example, I don't want to share my data because it may not be deleted, (Gmail and facebook) and I don't want you to share my data because I don't know what you will do with it, (RIAA) and no one wants to approach the line because lawyers are too damn expensive. I think we need to reinstitute "Trial by Combat" as a defense. Nothing else has stopped frivolous legal shenanigans...
The final question regarding "what research opportunities will be lost" because of data privacy is pretty horrible. It is analogous to "what crime prevention successess will be sacrificed, because society was not willing to live as a collective prisoner to the state". I.e. duh- yes, you can prevent crime from locking everyone up. But there are *more important values* to be achieved by not presuming everyone guilty and locking them up ahead of time. I.e. in the same way, yeah, you could have all kinds of great research if companies abandoned any attempt at restricting the dissemination of information they have about their consumers. But again, there are things of greater value. ... It's just another form of the fact that liberty isn't free. It has a price. Those unwilling to pay that price, won't get liberty.
So in other words, shut up about your lost research opportunities. Go take a walk outside and cherish what liberty and privacy you have.
The problem with opt-in statistical gathering is that they can skew a sample, subtly biasing it. This would invalidate a lot of scientific research.
I love this quote from TFA:
"Companies do not make money by giving researchers access to data. "
Wrong! Netflix released data to get a better recommendation system. The better they can pick movies for you, the more you will like their service. The $1million prize is peanuts compared to the increase in revenue a better system can bring.
I wonder if anyone has estimated the value of the man hours invested in this contest?
i.e., it might come as a surprise when researchers discover that NOBODY (who opted in) searches the internet for pornography, music torrents, Paris Hilton...
Hell, out of Google's top 20 searches, you might get maybe 3 listed?
From scanning those articles it looks as if they are just methods for defining levels of anonymity in a dataset, rather than providing any effective means of achieving it (please correct me if I'm wrong).
I can't see how, for example, if I am planning a study of small area (ie zip code level) variation in the levels of some disease or other, while adjusting for, say, age, sex, and ethnicity, that I could do so without a dataset that included all of these items. How could you make the records less unique without throwing away the data?
We have to accept that if we want meaningful research to happen, then we need some amount of data sharing and linking needs to occur. We need to rely, in medicine at least, on ethics committees to represent our best interests when it comes to striking the balance.
It seems to me that the trend for guarding personal data like its the family silver is a relatively modern thing. If it continues, then reliable unbiassed medical research, especially disease monitoring and control will become impossible.
You're probably just trolling, but in case you aren't, seeing the rampant crime that is institutionalized in modern prisons, I think your argument falls flat on its face.
Liberty doesn't have security as its price. Liberty and Security are often correlated, not directly correlated inversely as you assume.
As more people are free to do things that don't infringe on others' security, security often goes up as the people who would be breaking security systems for their own benefit have plenty of other "acceptable" ways to reap goods, with much fewer risks to boot.