Anonymity of Netflix Prize Dataset Broken
KentuckyFC writes "The anonymity of the Netflix Prize dataset has been broken by a pair of computer scientists from the University of Texas, according to a report from the physics arXivblog. It turns out that an individual's set of ratings and the dates on which they were made are pretty unique, particularly if the ratings involve films outside the most popular 100 movies. So it's straightforward to find a match by comparing the anonymized data against publicly available ratings on the Internet Movie Database (IMDb) (abstract on the physics arxiv). The researchers used this method to find how individuals on the IMDb privately rated films on Netflix, in the process possibly working out their political affiliation, sexual preferences and a number of other personal details"
Perhaps if we're obscure and pretentious enough, no one will want to spy on us! Brillant!
The world changes. Learn to live with it.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
The summary is somewhat misleading- the only accounts that can be identified are those that belong to people who also rate on IMBD and who have thus chosen to make at least some of their ratings public. If person X rates 1000 movies on Netflix and has made 20 or so ratings on IMDB publically available, then it is possible to infer with some small uncertainty which of the anonymized individuals in the NetFlix database they are. Thus you have possibly figured out their ratings of the other 980 movies they rated for Netflix but did not post on IMBD. Interesting, but not earth-shattering or a serious breach of privacy, I would say.
It's psychosomatic. You need a lobotomy. I'll get a saw.
Othe the other hand, if somebody *already* knows who you are, the lesson is that it can take surprising little public information to identify your entire history of ratings at Netflix.
For example, the authors found for 40% of individuals, accurate ratings on a scale of 1-5 for only *two* random movies,together with a knowledge to within 14 days of when they were seen, would be sufficient to identify an individual in the dataset. As they comment, that's the kind of information cooleagues give out every day around the water cooler.
Repeating the experiment with a knowledge of 8 movies, 6 hits in the database would be sufficient to identify the personal histories of 99% of the people in that data.
If I had mod-points, I'd mod you up insightful. I didn't think someone would spot where I copied the review from so fast.
True, but in the real world, it's not as simple as that. There are cases of publicly available databases that you gave no permission to grant access to (for example, AOL's release of their search queries). There are other cases when a database has restricted access, but a person with access to it takes it and uses it in comparison with other databases available. Hackers are always a trouble; since some have gotten into such "secure" areas as the CIA and IRS, what's to keep them from potentially getting into any database?
The problem is one of privacy - in the worst case (or, for those who are cynical, common case) we have none. There's been some answers proposed to solve this. If you're interested, I'd start by reading the original paper on k-anonymity, which attempts to create privacy in a world where one can possibly have access to any database, ever. It can be found here: http://privacy.cs.cmu.edu/people/sweeney/kanonymity.html. (There are, of course, a multitude of other methods; k-anonymity is just a good starting point.)
not true -- obscure films help a little bit but not too much. we put up a recent draft of our paper in which the dependence on obscure movies is much reduced.
"b) voting similarly or identically on lots of films so that they can get a better idea as to whether it is the same person based on them liking the same films the same amounts."
again not true at all. one of the main claims of our paper is that our method is tolerant to an INCREDIBLE amount of noise. we have the math to back this up.
--Arvind Narayanan
Yeah, I really liked it too. Quite surreal, funny, and the VHS copy I bought was purchased in Kazakhstan, so it has Russian subtitles, just to add to the weirdness. And when you tell people it has Bruce Willis in it, they're surprised.
Andi McDowell imitates a dolphin in it too.
Get your own free personal location tracker