Hi. I'm one of the authors. Please read our FAQ. It answers that very question. In short, our de-anonymization algorithm applies to far more than public social networks like twitter, including some very sensitive ones.
We have an FAQ about this paper. It answers many of the misconceptions expressed in the comments here.
In particular, our algorithm applies to much more than public social networks like twitter and flickr. A variety of networks including the phone call network are being shared behind your back in anonymous form, and our de-anonymization techniques apply just as much. You'll probably agree that people expect more privacy there.
See my blog for a variety of demonstrations and thought-experiments of de-anonymization.
However, when you're talking about dozens of movies, all you need is a correlation. Our algorithm is powerful enough to tolerate a large amount of noise. If you read the paper, we were able to match up users between imdb and netflix with a very high level of confidence, in the sense that the best match was 15-30 standard deviations away from the second best match. In statistics terms, that's a insanely close match.
"Besides, this all relies on people voting for a) really obscure films so they can be easily identified "
not true -- obscure films help a little bit but not too much. we put up a recent draft of our paper in which the dependence on obscure movies is much reduced.
"b) voting similarly or identically on lots of films so that they can get a better idea as to whether it is the same person based on them liking the same films the same amounts."
again not true at all. one of the main claims of our paper is that our method is tolerant to an INCREDIBLE amount of noise. we have the math to back this up.
The most viewed pages stats present a very different story. Ignoring wikipedia-related pages and recently featured articles, the top few are:
Wii
Sex
World War II
United States
Christmas
Deaths in 2006
Naruto
Sexual intercourse
Pornography
The Holocaust
List of big-bust models and performers
List of sex positions
A few years ago I used to be part of the Chennai LUG (Chennai is the capital of Tamil Nadu). I used linux simply because it was fun to hack, and was never one for the advocacy. But there were a bunch of guys in the LUG who were a lot more into the whole freedom thing and would go to great lengths to educate the public and/or meet with the political honchos and get them to make some changes. I used to think they were crazy (still do:-), but also admired their perseverance. I guess their hard work and a lot of others' is paying off.
Thank you. That one comment is worth more than all the other comments on this story. It is also the reason why the the entire discussion on this story is irrelevant; it doesn't matter what/. thinks -- it's not slashdotters that are using social networks.
This is exactly how DNA works.
Programmers have so far not had much success at this approach, one because our computers are puny, and two because our programming practices have been tailored for engineered code. But as hardware gets fast enough that most common tasks can be run at a one-millionth slowdown and still run fine, we will get to a point where we can write glue code that runs some other piece of code, throws away 99.9% of its computation, and only uses the rest, simply because of the value of human time vs. computer time. I have written code like this. It was just research projects, not "production systems", but nevertheless it is possible that the future will have less and less coding from scratch.
Clearly, the FTC did something with a sincere intent, but/. can think of no way to present it except cynically. Is it surprising that techies have so little lobbying power and are not taken seriously by the mainstream?
I suspect that most of these people were simply upgrading from older versions of firefox. It would be interesting to see the browser stats for these downloads. That would paint a clearer picture.
Not true. Speaking from first hand experience in both countries, linux usage in India is much higher than in America both in the home and office. There have been a number of genuine large-scale Windows-to-linux switches, as opposed to just talking about it or migrating a dozen servers in a corner somewhere. The average bank clerk (my mom included:-) is actually using linux terminals on a day-to-day basis.
Grate! wy don't we al start speling lik this? After al it sownds de saym dozen it?
But that's not even the point -- Indian languages use phonetic spelling (mostly true of Roman transcriptions as well). Different spelling always implies different pronunciation.
OK first of all, this has been their policy for a long time, and it was well known. I have no idea why it's making the news only now. I knew about it at least a year and a half ago when I googled for reviews to see if they were any good.
Second of all, 'heavy users' means people who have nothing to do all day every day other than watch movies. The guy who's interviewed in the article says he used to recieve 22 movies a month and no longer does. Holy shit, don't these people have no life of any kind?
Thirdly, did they really expect 'unlimited' to mean unlimited? Is it beyond a mental ability of even a 6 year old kid to figure out that that would mean the company would go out of business? How can anyone be so naive? I guess if you live in a welfare state long enough you start to think the state and corporations and everything else exist to make your ass more comfortable while you sit on your couch eatching popcorn and watching TV.
Fourth, do you think there's someone else offering a sweeter deal? Good luck trying to find it.
There are a lot of posts on dailyKos that talk about this, and here's a couple of quotes that sum up my own reaction:
I hardly think a staffer posted this on his/her own.
This took guts.
Any staffer would know the mixed feeling and consternation this site has toward the Senator.
and
I fully expect that this comes from the senator, just like any press release does, just maybe not written 100% by him and I also don't see him sitting at his computer typing with one hand while reading html for dummies with the other.
Wikipedia is (mostly) hosted in the US. The German court does not have jurisdiction. End of story.
They can do whatever they want to the wikipedia.de domain, but de.wikipedia.org as well as the actual content is totally unaffected.
Sounds interesting, although I'm more interested in what this type of approach can mean for anti-aging, which is also focused around combating cell degeneration and promoting regeneration. Maybe someone with more medical knowledge can clue me in?
"End of the free encyclopedia that anyone can edit" sounds very ominous, but 4 days is nothing. Any halfway serious contributor should have no problem with that waiting period, especially since it is only applied to a small handful of articles. Plus the policy states that it should be applied reactively and not proactively in anticipation that an article may be vandalized. All said, a minor change that has
been blown up because of the connection to the Seigenthaler ruckus.
For the reasons that follow, we conclude that the religious nature of ID [intelligent design] would be readily apparent to an objective observer, adult or child...
The evidence at trial demonstrates that ID is nothing less than the progeny of creationism...
...we have addressed the seminal question of whether ID is science. We have concluded that it is not, and moreover that ID cannot uncouple itself from its creationist, and thus religious, antecedents...
The breathtaking inanity of the Board's decision is evident when considered against the factual backdrop which has now been fully revealed through this trial. The students, parents, and teachers of the Dover Area School District deserved better than to be dragged into this legal maelstrom, with its resulting utter waste of monetary and personal resources. (emphasis mine.)
Hi. I'm one of the authors. Please read our FAQ. It answers that very question. In short, our de-anonymization algorithm applies to far more than public social networks like twitter, including some very sensitive ones.
We have an FAQ about this paper. It answers many of the misconceptions expressed in the comments here. In particular, our algorithm applies to much more than public social networks like twitter and flickr. A variety of networks including the phone call network are being shared behind your back in anonymous form, and our de-anonymization techniques apply just as much. You'll probably agree that people expect more privacy there. See my blog for a variety of demonstrations and thought-experiments of de-anonymization.
However, when you're talking about dozens of movies, all you need is a correlation. Our algorithm is powerful enough to tolerate a large amount of noise. If you read the paper, we were able to match up users between imdb and netflix with a very high level of confidence, in the sense that the best match was 15-30 standard deviations away from the second best match. In statistics terms, that's a insanely close match.
--Arvind Narayanan
They said you couldn't identify a person's record in the dataset even if you know some (or all!) of their ratings.
We showed that that's not true. Even if there's a LOT of noise. That's all there is to it.
--Arvind Narayanan
not true -- obscure films help a little bit but not too much. we put up a recent draft of our paper in which the dependence on obscure movies is much reduced.
"b) voting similarly or identically on lots of films so that they can get a better idea as to whether it is the same person based on them liking the same films the same amounts."
again not true at all. one of the main claims of our paper is that our method is tolerant to an INCREDIBLE amount of noise. we have the math to back this up.
--Arvind Narayanan
Wii
Sex
World War II
United States
Christmas
Deaths in 2006
Naruto
Sexual intercourse
Pornography
The Holocaust
List of big-bust models and performers
List of sex positions
Sad.
A few years ago I used to be part of the Chennai LUG (Chennai is the capital of Tamil Nadu). I used linux simply because it was fun to hack, and was never one for the advocacy. But there were a bunch of guys in the LUG who were a lot more into the whole freedom thing and would go to great lengths to educate the public and/or meet with the political honchos and get them to make some changes. I used to think they were crazy (still do :-), but also admired their perseverance. I guess their hard work and a lot of others' is paying off.
Thank you. That one comment is worth more than all the other comments on this story. It is also the reason why the the entire discussion on this story is irrelevant; it doesn't matter what /. thinks -- it's not slashdotters that are using social networks.
That's a beautiful story. Thank you.
This is exactly how DNA works. Programmers have so far not had much success at this approach, one because our computers are puny, and two because our programming practices have been tailored for engineered code. But as hardware gets fast enough that most common tasks can be run at a one-millionth slowdown and still run fine, we will get to a point where we can write glue code that runs some other piece of code, throws away 99.9% of its computation, and only uses the rest, simply because of the value of human time vs. computer time. I have written code like this. It was just research projects, not "production systems", but nevertheless it is possible that the future will have less and less coding from scratch.
Clearly, the FTC did something with a sincere intent, but /. can think of no way to present it except cynically. Is it surprising that techies have so little lobbying power and are not taken seriously by the mainstream?
I suspect that most of these people were simply upgrading from older versions of firefox. It would be interesting to see the browser stats for these downloads. That would paint a clearer picture.
Not true. Speaking from first hand experience in both countries, linux usage in India is much higher than in America both in the home and office. There have been a number of genuine large-scale Windows-to-linux switches, as opposed to just talking about it or migrating a dozen servers in a corner somewhere. The average bank clerk (my mom included :-) is actually using linux terminals on a day-to-day basis.
I'm also surprised a lot of posters seem to be seeing this sort of thing for the first time!
The mad scientist from Back the Future was Dr. Emmett Brown.
But that's not even the point -- Indian languages use phonetic spelling (mostly true of Roman transcriptions as well). Different spelling always implies different pronunciation.
The h's in Indian words are aspiration markers.
Come on! A thousand years from now, Islam will clearly be more than 1.5 thousand years old ;^)
[I'm actually surprised no one beat me to it.]
OK first of all, this has been their policy for a long time, and it was well known. I have no idea why it's making the news only now. I knew about it at least a year and a half ago when I googled for reviews to see if they were any good. Second of all, 'heavy users' means people who have nothing to do all day every day other than watch movies. The guy who's interviewed in the article says he used to recieve 22 movies a month and no longer does. Holy shit, don't these people have no life of any kind? Thirdly, did they really expect 'unlimited' to mean unlimited? Is it beyond a mental ability of even a 6 year old kid to figure out that that would mean the company would go out of business? How can anyone be so naive? I guess if you live in a welfare state long enough you start to think the state and corporations and everything else exist to make your ass more comfortable while you sit on your couch eatching popcorn and watching TV. Fourth, do you think there's someone else offering a sweeter deal? Good luck trying to find it.
Of course. EMACS - eight megabytes and constantly swapping.
Wikipedia is (mostly) hosted in the US. The German court does not have jurisdiction. End of story. They can do whatever they want to the wikipedia.de domain, but de.wikipedia.org as well as the actual content is totally unaffected.
Sounds interesting, although I'm more interested in what this type of approach can mean for anti-aging, which is also focused around combating cell degeneration and promoting regeneration. Maybe someone with more medical knowledge can clue me in?
"End of the free encyclopedia that anyone can edit" sounds very ominous, but 4 days is nothing. Any halfway serious contributor should have no problem with that waiting period, especially since it is only applied to a small handful of articles. Plus the policy states that it should be applied reactively and not proactively in anticipation that an article may be vandalized. All said, a minor change that has been blown up because of the connection to the Seigenthaler ruckus.
Will it fit in my ear?