Slashdot Mirror


Anonymity of Netflix Prize Dataset Broken

KentuckyFC writes "The anonymity of the Netflix Prize dataset has been broken by a pair of computer scientists from the University of Texas, according to a report from the physics arXivblog. It turns out that an individual's set of ratings and the dates on which they were made are pretty unique, particularly if the ratings involve films outside the most popular 100 movies. So it's straightforward to find a match by comparing the anonymized data against publicly available ratings on the Internet Movie Database (IMDb) (abstract on the physics arxiv). The researchers used this method to find how individuals on the IMDb privately rated films on Netflix, in the process possibly working out their political affiliation, sexual preferences and a number of other personal details"

164 comments

  1. Sexual preferences? by tygerstripes · · Score: 4, Funny

    Who goes out of their way to rate "Anal Whores 3" online?

    --
    Meta will eat itself
    1. Re:Sexual preferences? by morgan_greywolf · · Score: 2, Funny

      Bill Clinton?

    2. Re:Sexual preferences? by mh1997 · · Score: 5, Funny

      Who goes out of their way to rate "Anal Whores 3" online?
      The good thing about porn flicks, as a general rule, is that they're too bland to have really bad plots. The search for good dialogue strays too far off the beaten path established by the social mores of the target market, be that old men, college students, or perverts out on dates. There are pornos with solid plots, just rarely pornos with complicated plots.

      What they generally aren't is full of capers designed by crackheads in search of sexual relief, or a dominatrix dying to destroy the gold market with a Da Vinci alchemy machine only a cat burglar from Hoboken could steal.

      Yes, the plot of Anal Whores 3 is as convoluted as it is kitschy. Mercedes and Veronica Diamond forcibly enlist the help of happy-go-lucky and half-a-second-out-of-prison pizza delivery man Hawk (Peter North) to steal the pieces to a machine that turns lead vibrators into gold. Hawk isn't halfway to a cup of coffee with his wise cracking cohort, Tommy (Johnny Cockring) when he finds himself back in the burglary game. Casing out a heist he meets nun/professional patron of the arts/double agent/love interest Jessie Jane (vows of bestiality can put the kibosh on even the best of cinematic love interests). When you throw in a CIA agent (Dick Coburn) and a couple of double dildos, you've managed to make the world's most convoluted porno....

    3. Re:Sexual preferences? by styryx · · Score: 4, Interesting

      That's the plot of Hudson Hawk. Good flick.

    4. Re:Sexual preferences? by ioshhdflwuegfh · · Score: 1
      Speaking about sexual preferences, "anal whores" and "a pair of computer scientists from the University of Texas", one of this pair, the first one from the signatures, has written a on his personal web site about the second:

      Advisor: Vitaly Shmatikov <-- seriously awesome
    5. Re:Sexual preferences? by mh1997 · · Score: 3, Informative

      If I had mod-points, I'd mod you up insightful. I didn't think someone would spot where I copied the review from so fast.

    6. Re:Sexual preferences? by Minwee · · Score: 5, Funny

      Yes, they would have to have watched Hudson Hawk to do that. That narrows the field considerably.

    7. Re:Sexual preferences? by ammoQ · · Score: 1

      Most flix at youporn.com have been rated by several users. At least I've heard so.

    8. Re:Sexual preferences? by Jtheletter · · Score: 2, Funny

      The search for good dialogue strays too far off the beaten path established by the social mores of the target market

      I see what you've done there..... ;)

      --
      -- I'm not a pessimist, I'm a realist. It's not my fault that life sucks so much. --
    9. Re:Sexual preferences? by Anonymous Coward · · Score: 0

      Is it me, or is this the fist time the verb "to narrow" has been used in a porno context?

    10. Re:Sexual preferences? by caluml · · Score: 2, Informative

      Yeah, I really liked it too. Quite surreal, funny, and the VHS copy I bought was purchased in Kazakhstan, so it has Russian subtitles, just to add to the weirdness. And when you tell people it has Bruce Willis in it, they're surprised.
      Andi McDowell imitates a dolphin in it too.

    11. Re:Sexual preferences? by SkyDude · · Score: 1
      While I'm sure your posting is sort of serious, I can't help chuckle at the thought of being serious about a form of "entertainment" where every five minutes or so, someone's tongue or penis is in one of someone else's bodily orifices.

      It's kind of ............surreal.

      --
      == First cross river, then insult alligator.
    12. Re:Sexual preferences? by Anonymous Coward · · Score: 0

      The good thing about porn flicks, as a general rule, is that they're too bland to have really bad plots. The search for good dialogue strays too far off the beaten path...

      I couldn't agree more.

    13. Re:Sexual preferences? by Minwee · · Score: 2, Insightful

      Is that any more surreal than a form of "entertainment" in which people get shot at or blown up every five minutes or so?

    14. Re:Sexual preferences? by Anonymous Coward · · Score: 0

      Who goes out of their way to rate "Anal Whores 3" online?
      Bill Clinton?
      Nah. That's just the name I used when I posted the review to IMDB.
    15. Re:Sexual preferences? by TClevenger · · Score: 1
      That's a side of Sandra Bernhard I didn't want to see.

      "Looks like Bunny's got today's balls balls."

  2. Probabilities by dj245 · · Score: 4, Insightful

    The researchers used this method to find how individuals on the IMDb privately rated films on Netflix, in the process possibly working out their political affiliation, sexual preferences and a number of other personal details"

    This is a loaded statement. The most you can determine is that if a person likes movie A, B, C and D but hated E and F, there is a higher probability they are a guy. If they liked Z but didn't like X, there is a higher probability they might be a republican than not. You're still anonymous.

    Unless, of course, you're one of the three people that liked "Glitter". Then I think they might have something on you.

    --
    Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
    1. Re:Probabilities by Anonymous Coward · · Score: 0

      How does IMDb have your political affiliation or sexual preference?

    2. Re:Probabilities by Se7enLC · · Score: 3, Insightful

      I think they're on to something here. They cracked the anonymity by using the public movie ratings (and the dates those ratings were made) as a key. If the user has rated enough movies (especially some of the less-often-rated movies) you can uniquely identify which user they are. Once you know which user they are, you have now connected a username to the list of private ratings.

      Now, they go one step too far to say that you can determine anything but movie preferences out of a movie rating list. Just because somebody liked or disliked brokeback mountain doesn't mean they are gay or straight, just like their opinion of michael moore movies doesn't give political affiliation.

      It will tell you what movies they rented, though, and some people might not be happy having their movie-renting history publicly available.

    3. Re:Probabilities by Anonymous Coward · · Score: 0

      Yes but if you rate Brokeback Mountain, and many other films featuring gay sex (IMDb is full of adult movies FYI), one can claim to high certainty that you are gay. If you are rating a film which is not mainstream, and which is not being rated by many people, you are chosing the steeper side of Bell curve, and it is easy to guess you.

    4. Re:Probabilities by Dare+nMc · · Score: 2, Insightful

      one step too far to say that you can determine anything but movie preferences out of a movie rating list.

      also your taking a aggregate of the household. So a household (will call them Chen'ys) had a gay kid, and the devil living in the same house with a Saint... good luck figuring out when the gay kid updates the queue, and when the Wife, or the Devil is at the keyboard.
    5. Re:Probabilities by morgan_greywolf · · Score: 1

      Now, they go one step too far to say that you can determine anything but movie preferences out of a movie rating list. Just because somebody liked or disliked brokeback mountain doesn't mean they are gay or straight, just like their opinion of michael moore movies doesn't give political affiliation. No, but if they liked Brokeback Mountain, every Michael Moore movie ever produced, An Inconvenient Truth, and Fritz the Cat, you can probably bet that they're a card-carrying liberal.

    6. Re:Probabilities by ThreeGigs · · Score: 1

      some people might not be happy having their movie-renting history publicly available

      Being able to "see other ratings by this user" yields their movie rental history, algorithms or no. Is this what the big fuss is about?

    7. Re:Probabilities by BorgCopyeditor · · Score: 0, Troll

      No shit, Sherlock.

      --
      Shop as usual. And avoid panic buying.
    8. Re:Probabilities by IBBoard · · Score: 1

      Or maybe they like cowboy films and are open minded, as well as liking expose material and documentaries (not sure about the others).

      Maybe they're not, but there's always the possibility.

    9. Re:Probabilities by Chapter80 · · Score: 5, Insightful
      I think you're missing the point.

      If you rate a handful of movies on ImDB, under the persona "MyNickname12345" and that can be traced to your personal MySpace page, you have made that choice. No problem.

      If you then submit 100 movie ratings to Netflix, assuming that it is PRIVATE information that will not be linked back to you, and then Netflix releases the data to the public, now the 100 movies can be correlated to you, and your name can be revealed. Researchers have shown how PRIVATE DATA released to the public can be linked to already public information. PROBLEM!

    10. Re:Probabilities by ShiningSomething · · Score: 1

      That's not entirely true. You can rate movies you have not rented (maybe a friend has, or you saw them on TV or at the theater).

    11. Re:Probabilities by ioshhdflwuegfh · · Score: 1

      That's not entirely true. You can rate movies you have not rented (maybe a friend has, or you saw them on TV or at the theater). That's not entirely true. You can rate movies you have not seen at all (maybe a friend has, or you saw their trailer, read about them, etc...)
    12. Re:Probabilities by Anonymous Coward · · Score: 0

      Just because somebody liked or disliked brokeback mountain doesn't mean they are gay or straight...
      Well, for the most part yes. Pretty much those who saw Brokeback Mountain are into the butt sex.
    13. Re:Probabilities by coolGuyZak · · Score: 2, Interesting

      Some tech-savvy households may enable profiles on Netflix, enabling each person to track their likes & dislikes independently. (I did this for my GF, who has wildly disparate tastes from me). I'm not sure what effect that would have on the data. It'd certainly be neat if the scientists could differentiate between individual and multiple users using a particular profile.

    14. Re:Probabilities by ioshhdflwuegfh · · Score: 1

      If you then submit 100 movie ratings to Netflix, assuming that it is PRIVATE information that will not be linked back to you, and then Netflix releases the data to the public, now the 100 movies can be correlated to you, and your name can be revealed. Researchers have shown how PRIVATE DATA released to the public can be linked to already public information. PROBLEM! What is the capital problem you are talking about? What does "PRIVATE DATA released to the public" mean? Are they private? PRIVATE? Public?
    15. Re:Probabilities by Anonymous Coward · · Score: 0

      This is, of course, why one keeps multiple screen names or places passwords on things that he doesn't want the world to see.

    16. Re:Probabilities by Chapter80 · · Score: 1

      This is, of course, why one keeps multiple screen names or places passwords on things that he doesn't want the world to see.
      No, in the scenario I posed, the person was ok about releasing the information that was linked by screen name. The private information was linked by correlation of which movies were reviewed.

      most people underestimate the ease of correlating supposedly anonymized data, as has been shown time and time again.

    17. Re:Probabilities by RealGrouchy · · Score: 1

      Researchers have shown how PRIVATE DATA released to the public can be linked to already public information. Somehow this seems like one of those obvious[-in-hindsight] studies.

      Isn't there always a risk when you release private data to the public, especially when it involves something they have written?

      As an example, specific people who are banned from web forums are often recognized fairly quickly by users based on their writing/argument styles alone.

      - RG>
      --
      Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
    18. Re:Probabilities by timeOday · · Score: 1

      I think this research may be unethical. If I argue persuasively that somebody could stalk somebody by following them around with binoculars and laser microphones, that's one thing; but if I prove my point by DOING that, it's different.

    19. Re:Probabilities by leenks · · Score: 1

      A straight guy is unlikely to hire / review a gay porn film, etc. I imagine similar things are possible with politically charged films, possibly more so, by looking at the way people write, or the things they write about. Many people find it hard to keep their thoughts to themselves if suitably fired up about something they disagree with.

    20. Re:Probabilities by LrdDimwit · · Score: 1

      Oh, that's easy -- the Devil is the one watching Gigli.

    21. Re:Probabilities by FLEB · · Score: 1

      I'd say it depends on what you do with the knowledge once you have it. They seem to be going about it in a responsible manner-- releasing only enough relatively tame data to prove the viability of the process.

      --
      Information wants to be free.
      Entertainment wants to be paid.
      You just want to be cheap.
    22. Re:Probabilities by plover · · Score: 1

      Releasing only ... tame data

      They released a lot more than some data: they published the algorithm. Anyone is free to write their own implementation of it. And anyone who is participating in the Netflix prize already has a copy of the database.

      However, I do agree with you that they went about it in a responsible manner. They revealed it. Without their insight, we might have continued living in ignorance that some "unknown adversary" (external to Netflix) is already correlating our movie rental habits, or our book-buying habits on Amazon, or our posts on Slashdot ... hey, wait a minute!

      --
      John
    23. Re:Probabilities by Se7enLC · · Score: 1

      It only reveals the history of their PUBLIC ratings. Unless I read the article wrong, users are also allowed to mark a review as private and have it not show up, apart from in the aggregate for the movie.

    24. Re:Probabilities by Anonymous Coward · · Score: 0

      Yeah. Only those evil QUEER-LOVING liberals would enjoy Brokeback Mountain.

      Real Republicans beat queers and drag em behind their trucks. They don't watch no faggoty cowboy movies, no sir.

  3. The German Police by thegermanpolice · · Score: 1

    The German Police will be pleased.

  4. only a matter of time by downix · · Score: 1

    Privacy is becoming a fleeting thing in this interconnected world. Perhaps we should reanalyze our perspective on it all?

    --
    Karma Whoring for Fun and Profit.
    1. Re:only a matter of time by SatanicPuppy · · Score: 2, Informative

      Perhaps if we're obscure and pretentious enough, no one will want to spy on us! Brillant!

      The world changes. Learn to live with it.

      --
      ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
    2. Re:only a matter of time by blahplusplus · · Score: 1

      But then we'd have to re-analyze capitalism itself, an I don't think society is ready for *that* rich people would simply pay for organizations to falsify their data, it would be one sided.

    3. Re:only a matter of time by phobos13013 · · Score: 4, Interesting

      Actually TFA seems to suggest that the more obscure and pretentious we are, the easier it is the track us. If we become homogeneous drones voting on the top 100 films, we are safe! Even so, I don't plan to become a homogeneous drone...

      --
      ...and it should be known by now
    4. Re:only a matter of time by pegdhcp · · Score: 1
      As somebody also mentioned above, it becomes easier to spot you when your parameters put you closed to the edge of bell shaped curve. Also it is important to remember movies being one of the most common forms of art (how many different sculptures you see in a year, and how many movies...) they have a greater granularity to match individual tastes and preferences....

      However I guess that, IMDB being public, and Netflix private, there should be a shift in expressed opinions. Assuming that, it is accepted "BAD TASTE" to be interested in a private investigator taking cases on animals, if I liked Ace Ventura I will be more likely to admit it in an anonymous environment...

    5. Re:only a matter of time by achenaar · · Score: 1

      Noooooo, we should reanalyze *every else's* perspective on it all... via Netflix/IMDb :)

    6. Re:only a matter of time by mollymoo · · Score: 1

      The world changes. Learn to live with it.

      What a defeatist attitude. Why not try to change it into the world in which you want to live? You can be damn sure someone else is trying to change it into the world they want to live in, which may well be at odds with the world you want to live in, so if you just "learn to live with it" you're setting yourself up to be shat on from a great height.

      If you don't like the idea of personal information being mined in this way, talk to your friends and write to your elected representatives. Try to raise awareness, get the law changed to raise the bar for what it considered "personally identifiable", increase the funds available for enforcement of breaches of personal privacy; there are a number of things you can do to push the world in the direction you want. Alone you might not achieve much, but if you never try you'll never know how un-alone you are in your views. Excepting natural events (which we can increasingly control too), all the things which have changed in human society were changed by people. If you just learn to live with whatever changes, you'll never be one of those people who decide how things change. At the absolute minimum, vote.

      --
      Chernobyl 'not a wildlife haven' - BBC News
    7. Re:only a matter of time by Anonymous Coward · · Score: 0

      BAD TASTE = 10!!!!!!!! Damn, I think I just gave myself away!

  5. Do what now? by faloi · · Score: 4, Insightful

    It doesn't sound like the anonymity of the prize set was broken through any fault of NetFlix. It sounds like some sampling of users made the mistake of rating movies on a site where the info is publicly available, and a site where it's not. All they did was correlate the two.

    So the lesson is, basically, don't post stuff that you don't want to be public to a website that makes it public, right? This is sounds roughly like blaming the DMV for figuring out a car owners likely political leanings by the bumper stickers on their car.

    --
    "It is a miracle that curiosity survives formal education." -Albert Einstein
    1. Re:Do what now? by Dare+nMc · · Score: 1

      was broken through any fault of NetFlix.

      just because someone choose to go public with liking "The Rise of Theodore Roosevelt" doesn't mean they should know that the company will take some seemingly private data linking you to really likeing "brokeback mmoutain", and the series "The L word" and publicize it later. and that the combination of your post, and the combination now violates netflix's privacy policy (in spirit)
      IE they say they will only disclose "on an anonymous basis" anything but your reviews. We know from the AOL disclosure, linking data to a number that represents you. Doesn't qualify as anonymous anymore.
    2. Re:Do what now? by IBBoard · · Score: 4, Insightful

      Exactly - all they did was found that there was a correlation that might mean that the people are the same on IMDB and NetFlix. There's also the possibility that they're different people and that they just voted similar on different places.

      Besides, this all relies on people voting for a) really obscure films so they can be easily identified and b) voting similarly or identically on lots of films so that they can get a better idea as to whether it is the same person based on them liking the same films the same amounts.

      Just because two people from two different data sets both like (and are the only people in the data sets to like) lemon and custard jam as well as peanut butter with chips doesn't mean they're the same person, it just means they could be the same person and have similar tastes in obscure foods.

    3. Re:Do what now? by Peter+Mork · · Score: 3, Insightful

      Exactly - all they did was found that there was a correlation that might mean that the people are the same on IMDB and NetFlix.

      Caveat: I haven't had a chance to pore over the statistical calculations. However, the paper notes that their similarity measure was 38 standard deviations from the norm. Assuming the math is valid, this seems on par with a DNA test, which also provides a correlation. I wouldn't be so quick to dismiss the results until you can find a serious methodological problem.

    4. Re:Do what now? by JPMH · · Score: 2, Insightful
      Their lesson is that it can take surprising little public information to identify you.

      For example, ratings on a scale of 1-5 for 2 movies, and a knowledge of when they were seen to within 14 days, was suffiecient to identify the complete data histories of 40% of the Netflix clients. As the authors say, that's the kind of information cooleagues give out every day around the water cooler.

      Repeating the experiment with a knowledge of 8 movies, 6 hits in the database would be sufficient to identify the personal histories of 99% of clients included in the Netflix data.

    5. Re:Do what now? by Anonymous Coward · · Score: 0

      Any time you can connect a substantial amount of activity to a specific person, you have the potential of getting more information about that person by cross-correlating the activity with information in other databases. This is one of the most basic (and most disturbing) techniques of data mining, and it's what makes data mining so powerful.

      By connecting each transaction with an individual, NetFlix makes it much easier to get through the layer of anonymity. They didn't make it easy to get through, but they didn't make it very difficult either.

    6. Re:Do what now? by roadkill_cr · · Score: 2, Informative

      True, but in the real world, it's not as simple as that. There are cases of publicly available databases that you gave no permission to grant access to (for example, AOL's release of their search queries). There are other cases when a database has restricted access, but a person with access to it takes it and uses it in comparison with other databases available. Hackers are always a trouble; since some have gotten into such "secure" areas as the CIA and IRS, what's to keep them from potentially getting into any database?

      The problem is one of privacy - in the worst case (or, for those who are cynical, common case) we have none. There's been some answers proposed to solve this. If you're interested, I'd start by reading the original paper on k-anonymity, which attempts to create privacy in a world where one can possibly have access to any database, ever. It can be found here: http://privacy.cs.cmu.edu/people/sweeney/kanonymity.html. (There are, of course, a multitude of other methods; k-anonymity is just a good starting point.)

    7. Re:Do what now? by IBBoard · · Score: 2, Insightful
      While yes, they did get a very perfect match on that record, the line about it is:

      ...our algorithm identified the records of two users the Netflix Prize dataset with eccentricities of around 28 and 15, respectively.


      Granted they went for a small number of IMDB users due to their TOS, but that's still a tiny fraction. They mention finding a perfect match in IMDB and 1/8th of the NetFlix database towards the start of the report (although the sentence is a bit clunky and unclear). If that's their general accuracy then even if they can perfectly match some people (a statistical possibility) then they can't match enough to leave most people needing to worry.
    8. Re:Do what now? by arvindn · · Score: 3, Informative
      "Besides, this all relies on people voting for a) really obscure films so they can be easily identified "

      not true -- obscure films help a little bit but not too much. we put up a recent draft of our paper in which the dependence on obscure movies is much reduced.

      "b) voting similarly or identically on lots of films so that they can get a better idea as to whether it is the same person based on them liking the same films the same amounts."

      again not true at all. one of the main claims of our paper is that our method is tolerant to an INCREDIBLE amount of noise. we have the math to back this up.

      --Arvind Narayanan

    9. Re:Do what now? by yali · · Score: 2, Insightful

      So the lesson is, basically, don't post stuff that you don't want to be public to a website that makes it public, right?

      Nope, it's more complicated than that.

      Suppose that you want to keep your political attitudes private -- for whatever reason, you decided it's nobody else's business. On IMDb, linked to your real identity, you only rate movies with non-political content, which you don't mind anybody knowing your opinion about. On Netflix, you believe that your ratings will be kept private, and you want to take advantage of their recommendations. So you rate all the same movies that you rated on IMDb, but you also post your ratings of Fahrenheit 9/11, The Corporation, etc. With the method described in this paper, somebody could potentially link your supposedly anonymized political ratings back to your real identity.

    10. Re:Do what now? by Dread_ed · · Score: 1

      Decoding and privacy aside, this analysis of movie ratings might be a good way to find twins seperated at birth or to match people up for dates.

      NetFlix.dating anyone?

      --
      When the only tool you have is a claw hammer every problem starts to look like the back of someone's skull.
    11. Re:Do what now? by drew · · Score: 1

      ...99% of clients who had also rated the same movies on IMDB
      (unless I greatly misunderstand their method)

      I've you've only ever given your movie ratings to NetFlix, then they still have no way to correlate that with any other source. (And even then, all they've shown is that User #1234 in the Netflix List is the same person as RandomPseudonym582 on IMDB, which personally I don't find to be terribly interesting.)

      --
      If I don't put anything here, will anyone recognize me anymore?
    12. Re:Do what now? by mosch · · Score: 1

      This is a nearly absurd point.

      I was against the Iraq War from the initial drumbeats following 9/11 through today. But I didn't like Fahrenheit 9/11 at all.

      I still don't get what conclusion we're supposed to draw about a person's weight based on their opinion of super-size me. Are fat people supposed to love it or hate it? Beats the hell out of me. All I know is that Spurlock is a hack, and I hate seeing him on screen.

      The authors of this study can claim that they can find things out with an incredible mathematical certainty, but what do they really know? Not much. They're good at math, not so good at people.

    13. Re:Do what now? by yali · · Score: 1

      Two responses...

      (1) The behavioral interpretations are probabilistic, and that doesn't mean they're invalid. You may be one person who is against the war and also hated Fahrenheit 9/11. But I would bet that on average, people who liked the movie are substantially more likely to be antiwar than people who hated it. So you might not be able to determine somebody's politics with 100% accuracy based on their movie rating (and I doubt the authors think that's true); but an educated guess informed by the ratings would be much better (on average) than a guess that ignored the ratings.

      (2) Above point notwithstanding, there are reasons this is a violation of privacy that go beyond any "real" relationship between your movie ratings and your actual attitudes or personality. What also matters is that people other people will draw conclusions and perhaps treat somebody accordingly. If somebody sees your low rating of the movie, they'll probably make the assumption that you're conservative. The whole point was that somebody might consider their politics nobody's business; other people's assumptions, even if incorrect, will violate that. If they get put on pro-war mailing lists; if their coworkers start asking if they're going to the Bush rally this weekend; if their liberal boss passes them over for a promotion -- all of these stem from a violation of privacy, even (maybe especially) if the information was incorrect.

  6. Anonymity broken by stupidity by CastrTroy · · Score: 2, Interesting

    Seems like it was only broken because the identity of the people was posted somewhere else, along with the ratings. My only question is how they connected the rankings on Netflix, to the rankings on IMDB. Does Netflix take the liberty of submitting all the users rankings to IMDB for them, and also include their name with this data? If you just have anonymous dataset A, with anonymous dataset B, you could match up users from both and figure out which person in A is the same person in B, but you still wouldn't know who the person is. However, if you now have dataset B be not anonymous, then it's not too difficult to compare movie ratings and find out who the people are.

    --

    Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    1. Re:Anonymity broken by stupidity by Neon+Spiral+Injector · · Score: 1

      They are just saying it is likely a person rated a movie on Netflix and IMDb at roughly the same time. That is the correlation which is need to connect the anonymous with the publicly posted information.

      While I do rate a few films on IMDb I usually do them in batches, where on Netflix I rate the movie as soon as I'm finished viewing it. So the time link wouldn't be there between my two accounts.

    2. Re:Anonymity broken by stupidity by stranger_to_himself · · Score: 1

      What NetFlix did that was stupid was include the names of the movies in their dataset. There was no need for this for the prize (unless anybody was using the names for prediction I suppose), anonynous identifiers would have been okay.

    3. Re:Anonymity broken by stupidity by Neon+Spiral+Injector · · Score: 1

      If a person liked season 1 of Stargate: SG1 it would be a good idea to recommend season 2 to them. Goes for sequels too. So yeah, titles are needed a little bit.

    4. Re:Anonymity broken by stupidity by jfengel · · Score: 1

      They were hoping that by giving the name of the movie they could pull in other sources of data (RottenTomatoes, IMDB) to make better guesses. They were seeking more than just better curve-fitting of the data points. If nothing else, the IMDb's huge pile of data points of ratings gives you a lot more fodder for your collaborative filter, but only if you can tie in the names of the movies.

      But yeah, that introduces a data leak.

    5. Re:Anonymity broken by stupidity by stranger_to_himself · · Score: 1

      They were hoping that by giving the name of the movie they could pull in other sources of data (RottenTomatoes, IMDB) to make better guesses.

      Are you sure? Are people using data from outside the training set? Because if what you say is true then they're essentially asking people to use some kind of probabilistic record linkage to include external databases, which would automatically include the personal identifiers. This would be highly dubious behaviour.

    6. Re:Anonymity broken by stupidity by Constantine+XVI · · Score: 1

      And methinks you need more than just titles as well. Ex: Joe rents "Shaun of the Dead", and hates it. It would be a good idea to not suggest "Hot Fuzz", as they share many of the actors, directors, etc.

      --
      "I think an etch-a-sketch with an ethernet port would beat IE7 in web standards compliance."
    7. Re:Anonymity broken by stupidity by zippthorne · · Score: 1

      Not really. They just needed to provide sequel and tie-in information in their dataset. The exact relationship doesn't matter. And you can't get that info from the titles necessarily, anyway. I mean, how is a text-based algorithm going to know that "Serenity" is the movie tie-in to the series, "Firefly?"

      --
      Can you be Even More Awesome?!
    8. Re:Anonymity broken by stupidity by Random_Goblin · · Score: 1

      yeah but what would you think if i said i hadn't seen evil dead yet?

    9. Re:Anonymity broken by stupidity by Dare+nMc · · Score: 1

      methinks you need more than just titles as well.

      exactly why titles aren't needed.
      You let the AI algorithm make the basket. When you have processed a big enough data set a good AI algorithm will have matched the pattern already. IE if 99% of the people who hate tt0365748 also hated tt0425112 then the algorithm will have already picked that out of the sample data. It would even more so avoid suggestions like recommending identical title movies, that have nothing in common.

      You would only need the aggregate data, if you were picking which movies to create, or how to sell the movie to the person. IE a genetic algorithm may pinpoint the reason, for the correlation between the two movies. But only after the pattern match already picked out the correlation. Now it might make allow you tell the person person who liked "Bridges of Madison county", and "Field of Dreams" that "Twister was also filmed in Iowa" when you recommend it, but that wasn't allowed in this competition.
    10. Re:Anonymity broken by stupidity by Anonymous Coward · · Score: 0

      Why? Just assign each movie a number. In your case you would see a lot of people rening both #1234 and its sequal #5678 and rating both highly, your system could then figure out that people who had rented one of those stood a good chance of liking the other.

    11. Re:Anonymity broken by stupidity by jfengel · · Score: 1

      They don't need to know this particular user. The collaborative filtering system works by comparing your pattern to other patterns. Find somebody (anybody) with a similar pattern and then use their ratings on movies you didn't rate. You can use other Netflix users (which is what Netflix has been doing all along) but you can increase your pool with the IMDB. It makes it more likely to find somebody who has similar ratings to yours overall and has seen some movie that you haven't rated.

      Which almost makes it inevitable that people who rated things on both Netflix and the IMDb would be found out: that other person used as your model would be perfect because it's you.

    12. Re:Anonymity broken by stupidity by darthflo · · Score: 1

      If the algorithm would be going to do more than just correlate past ratings with ids of other movies and sequels/tie-ins of other movies, more data about the movie is required. Crew data would be extremely useful here. If somebody liked Live Free or Die Hard and Perfect Stranger, The Fifth Element may also be of interest because of the common lead actor.
      Including such data in the example sets would in turn allow to determine the correlation between movies and their internal id number pretty quickly. Even if the crew metadata was replaced by another set of internal ids, mapping (most of) those to actors and film ids to titles ought to be trivial for someone skilled in the art.

      Netflix probably should have given their users the possibility to "opt-out" of the anonymized data set, but I feel the damage this particular incident has caused or may cause in the future is pretty minor. A high-probability match can be found for some users, if those users decided to both rate films on the IMDb and Netflix in the same time window, knowing at least the IMDb part of their votes was going to be public. I don't know Netflix' whole catalog but I'd assume they don't rent out really bad stuff and watching some porn (or non-classy films) at home isn't that bad, now is it?

    13. Re:Anonymity broken by stupidity by jelton · · Score: 1

      That's a Cosby sweater!

      --
      I am not a lawyer. This post does not constitute any form of legal advice.
  7. did it work? by Speare · · Score: 2, Interesting

    The researchers used this method to find how individuals on the IMDb privately rated films on Netflix, in the process possibly working out their political affiliation, sexual preferences and a number of other personal details

    {tongueincheek}Yeah, but the question is, will knowing those personal facts generate better movie recommendations?{/tongueincheek}

    When there's a significant prize at stake, researchers can try all sorts of slimy tricks to win. (I'm not saying that's the motive behind this report, but there are many "researchers" going for the prize.) And when there's significant profits at stake, a corporation will damn-fire-certainly use whatever means they can use to maximize those profits, regardless of whether it might be "ethical."

    --
    [ .sig file not found ]
    1. Re:did it work? by Anonymous Coward · · Score: 0

      When there's a significant prize at stake, researchers can try all sorts of slimy tricks to win.

      THERE CAN BE ONLY ONE!
    2. Re:did it work? by johnbr · · Score: 1

      And when there's significant profits at stake, a corporation will damn-fire-certainly use whatever means they can use to maximize those profits, regardless of whether it might be "ethical."
      Here, let me fix that for you:
      When there's significant profits at stake, individual humans will damn-fire-certainly use whatever means they can use to maximize those profits, regardless of whether it might be "ethical".
  8. How does this break anonymity? by Anonymous Coward · · Score: 2, Insightful

    For those who haven't rated movies on IMDB, such as myself - and I imagine a large proportion of subscribers.

  9. Data-mining and the actual problem by Anonymous Coward · · Score: 4, Interesting

    There are two things going on here. One, many people are asking how you could identify any personal information about people based on their movie preferences. The answer is data-mining. Very sophisticated techniques exist to do things exactly like this, i.e. take a data set and find out about the people.

    The second problem is that by deanonymizing the NetFlix data, you can start to cheat on the NetFlix prize. The requirement to win $1 million is that your recommendation engine is 10% better than the one they are currently using. However, if you can learn the exact preferences of some users in the dataset (i.e. by finding the rest of their ratings on IMDB) then you can hardcode that into your recommendation engine and get the recommendations for these users exactly right. This can boost your score even though your actual system is no better than the existing one. This is known as over-fitting to the data.

    Finally, this paper is over a year old. Can we please have some new news?

    1. Re:Data-mining and the actual problem by Anonymous Coward · · Score: 0

      This won't work because submitted algorithms are run against data that wasn't made public.

    2. Re:Data-mining and the actual problem by _14k4 · · Score: 1

      I don't know, about your second problem. They separate the dev/test data from the quiz data - and even that is halved into two sections. With the intent to stop a "hill climb" in the results. What says that the dataset used in developing the code is a subset of the data used in the test to find the winners?

    3. Re:Data-mining and the actual problem by darthflo · · Score: 1

      They don't have to hardcode data from the test set into their algorithm. Possible solution goes like this: 1) Have IMDb rating dump available (if this is trivially possible, retrieve it at runtime) 2) Load quiz data, create profiles 3) Create proviles out of the IMDb data (if not already done) 4) Correlate profiles from 2 and 3 5) Use some (e.g. Netflix') statistical method to determine the probability of user x linking film y 6) Override values from 5 with x's actual IMDb data where possible Ta-daah, you're using all the advantages non-anonymity grants you while being nicely flexible and ignoring privacy :]

    4. Re:Data-mining and the actual problem by Pollardito · · Score: 1

      This won't work because submitted algorithms are run against data that wasn't made public. the data that wasn't made public might be further ratings from people who had data in the public test set. imagine that both the public and private data sets are mirrored in the public rankings on IMDB, anyone that was able to match your public data set against IMDB can "predict" all your other IMDB rankings 100% and if any of those are in the private data set than they'll get those ones exactly right.

      i'm curious how the prediction is being tested against the private data though, it may not be possible to use this method if the test is being completed internally by netflix themselves by running the program (versus providing the contestants with a list of movies to predict using their program and then returning the predictions).
    5. Re:Data-mining and the actual problem by _14k4 · · Score: 1

      So.. wanna join my netflix team? :P

      I found that this application may be a good way to tinker with some math, statistics, and such while I self learn - with no intention of submitting anything real to netflix itself.

    6. Re:Data-mining and the actual problem by darthflo · · Score: 1

      Sure, where do I join? :)

      Sounds like a nice opportunity to learn something. I don't have any idea on how to implement any of the stuff I mentioned before, though.

    7. Re:Data-mining and the actual problem by _14k4 · · Score: 1

      I simply created my own team over at netflixprize.com to get access to the datasets. I don't plan on submitting anything worthwile to the project, or rather the contest - however, I do plan on learning a lot about more methods of datamining. What I do here at work isn't mining, but simply reporting. (Albeit on tables with millions of records, still.)

  10. Easy solution by Thanshin · · Score: 4, Funny

    Every time you feel the need to vote 10 in Glitter, also vote 10 to The Godfather.
    Every time you cheer for Brokeback Mountain, also put a 10 in Huge Knockers MXII.
    Every time you want to express your love for Dersu Uzala, vote a 10 in Spice World, with added commentaries.

    That way, everybody will know you're a security conscious computer scientist. Or a squizophrenic moron.

    1. Re:Easy solution by ioshhdflwuegfh · · Score: 1

      Exactly. Or, as one character in a comedy I've seen sometimes ago (I can't remember the title) is tripping all horrified in Amsterdam after eating some brownies, saying to his friend something like "I once watched gay porn,... I didn't know it... girls just never showed up,... they never showed up!!". He then jumps around the bar all in horror, a terrible trip he has, then the bar owner tells him something like: "come down white boy, there is no hash in those brownies".

    2. Re:Easy solution by Anonymous Coward · · Score: 0

      Eurotrip. I saw it the other day and it's surprisingly entertaining for a teen comedy.

    3. Re:Easy solution by Sancho · · Score: 1

      That was Eurotrip. You just admitted to watching Eurotrip.

      Wait. So did I.

    4. Re:Easy solution by Anonymous Coward · · Score: 0

      Derzu Uzala! You must be one of the 23 people ever to have seen that film! I liked it the first time, and 20 years later, watched it again and fell asleep while doing so.

    5. Re:Easy solution by atraintocry · · Score: 1

      I think that was EuroTrip. "We do not sell Hash brownies here, we are simple Dutch bakery. Now put your clothes back on, white boy!"

    6. Re:Easy solution by Belial6 · · Score: 1

      Or, you are married. I know, queue the jokes about Slashdot readers not having wives.

      Really, one of the biggest problems with Netflix ever getting good recommendations is that they are not trying to make recommendations for individuals. They are making recommendations for a group of people who's tastes my not cross over at all. You joke in your post about what movies get a 10 (well, 5 anyways), but would it seem unreasonable that a family of 4 could come up with those very ratings?

    7. Re:Easy solution by littlekosh · · Score: 1

      10/10 for Glitter? If a member of my imaginary family-of-four rated Glitter that highly it would become an imaginary family-of-three.

      --
      655321
    8. Re:Easy solution by Belial6 · · Score: 1

      You would be what the father of an 11 year old girl will put up with her watching... Yes, even Glitter.

    9. Re:Easy solution by ioshhdflwuegfh · · Score: 1

      :D Yes, that's it!

  11. requires another (partial)public revealing to work by call+-151 · · Score: 3, Informative

    The summary is somewhat misleading- the only accounts that can be identified are those that belong to people who also rate on IMBD and who have thus chosen to make at least some of their ratings public. If person X rates 1000 movies on Netflix and has made 20 or so ratings on IMDB publically available, then it is possible to infer with some small uncertainty which of the anonymized individuals in the NetFlix database they are. Thus you have possibly figured out their ratings of the other 980 movies they rated for Netflix but did not post on IMBD. Interesting, but not earth-shattering or a serious breach of privacy, I would say.

    --
    It's psychosomatic. You need a lobotomy. I'll get a saw.
  12. The world is not on fire by puppetluva · · Score: 3, Insightful

    This is total hyperbole.

    All they researchers are saying is that they can deduce some of your preferences based on your other preferences. Of COURSE you can do that, that was the whole point of the contest Netflix put up.

    What they are _not_ saying is that they now know who you are, where you live, or anything uniquely identifying about you. So basically, you are still anonymous.

    I'm starting to tire of news headlines that claim the world is on fire when someone actually just does something slightly derivative from the norm and thinks they are brilliant. The noise from these non-events mask actual brilliant achievements and make it seem that everyone is doing banal work.

    1. Re:The world is not on fire by Peter+Mork · · Score: 2, Insightful

      All they researchers are saying is that they can deduce some of your preferences based on your other preferences.

      The researchers are making a stronger claim. They are stating that based on actual public ratings (available from IMDB) they can generate actual private ratings published by Netflix under the guise of anonymity. As the paper notes, someone competing for the Netflix prize could use this data to improve the accuracy of their prediction algorithm. However, the point of this paper is to reveal that public ratings can be used to identify purportedly anonymous private ratings.

      As a comparison, imagine if the public information consisted of the dates that various people went to the doctor for a yearly physical. This is hardly sensitive information. Now imagine that your insurance company provided a list of (id, date, diagnosis) records. Ostensibly, the id field is an arbitrary (anonymous) identifier. The paper shows that based on limited background knowledge (a handful of (date, 'physical exam') records), an attacker could reverse engineer your diagnosis history.

    2. Re:The world is not on fire by JPMH · · Score: 2, Informative

      Othe the other hand, if somebody *already* knows who you are, the lesson is that it can take surprising little public information to identify your entire history of ratings at Netflix.

      For example, the authors found for 40% of individuals, accurate ratings on a scale of 1-5 for only *two* random movies,together with a knowledge to within 14 days of when they were seen, would be sufficient to identify an individual in the dataset. As they comment, that's the kind of information cooleagues give out every day around the water cooler.

      Repeating the experiment with a knowledge of 8 movies, 6 hits in the database would be sufficient to identify the personal histories of 99% of the people in that data.

    3. Re:The world is not on fire by TubeSteak · · Score: 1

      All they researchers are saying is that they can deduce some of your preferences based on your other preferences. Of COURSE you can do that, that was the whole point of the contest Netflix put up.

      What they are _not_ saying is that they now know who you are, where you live, or anything uniquely identifying about you. So basically, you are still anonymous. Did you even read the summary?

      They took anonymous ratings & discovered they can link some of them to IMDB usernames. We can argue over whether or not those IMDB usernames are "uniquely identifying" or "anonymous" but they definitely say something about who you are.

      I'm sure a percentage of those IMDB usernames are easily linked to real people through a trivial google search. Does that break this alleged veil of anonymity? Datamining isn't that hard these days.
      --
      [Fuck Beta]
      o0t!
    4. Re:The world is not on fire by puppetluva · · Score: 1

      I did read the summary and felt that the IMDB linkage was a real stretch.

      Linkage of that kind is only useful if the user-populations for IMDB commenters and Netflix commenters are the same (at least 50%) and that most people make the same comments and ratings on both systems in the same way _most_ of the time. Chances are that if the populations are _not_ the same and that the commenters don't mostly duplicate their ratings for every movie in each place. . . In that case, you then you probably get more false positives than positive correlations.

      If the ratings on imdb and netflix are nearly exactly the same for certain users, all that you have determined is a potential netflix user-id (which is probably a unique number in the dataset, not necessarily their username) to IMDB username linkages - but not a certain one.

    5. Re:The world is not on fire by ioshhdflwuegfh · · Score: 1

      The researchers are making a stronger claim. They are stating that based on actual public ratings (available from IMDB) they can generate actual private ratings published by Netflix under the guise of anonymity. As the paper notes, someone competing for the Netflix prize could use this data to improve the accuracy of their prediction algorithm. However, the point of this paper is to reveal that public ratings can be used to identify purportedly anonymous private ratings. The researchers are making a stronger claim. They state:

      As shown by our experiments with cross-correlating non-anonymous records from the Internet Movie Database with anonymized Netflix records (see below), it is possible to learn sensitive non-public information about a person's political or even sexual preferences.
    6. Re:The world is not on fire by arvindn · · Score: 1
      Netflix claimed that the data is anonymous.

      They said you couldn't identify a person's record in the dataset even if you know some (or all!) of their ratings.

      We showed that that's not true. Even if there's a LOT of noise. That's all there is to it.

      --Arvind Narayanan

    7. Re:The world is not on fire by Jay+L · · Score: 1

      Great example, Peter Mork.

      In addition, I'd point out that this can probably be generalized to (a) any anonymous data set that's combined with (b) some other non-anonymized data set that will map onto it. Here, we have (a) a sample of anonymized Netflix data and (b) a sample of non-anonymized IMDB ratings. So a lot of the reactions are either "Well, duh, if you post publicly to IMDB, you've posted publicly, so you're stupid!" or "it's only movies, who cares?"

      I care. Not just in the hypothetical of "what if the TSA decides that someone who rented Farenheit 9/11 shouldn't fly?" (which isn't that extreme these days). No, that's too easy.

      Instead, just combine the fact that (a) nearly every company you buy from is selling anonymized transaction data to data mining firms, and (b) you post to Slashdot. Oops! There goes your anonymity, right there.

      Netflix-to-IMDB was easy; if you are the only person on both Netflix and IMDB whose favorite three movies are Birth of a Nation, The Boss of it All, and Overboard (starring Goldie Hawn), then yeah, of course they know you're you. That's not the bad part. The bad part is that you're also the only person from your zip code who bought gas at the gas station nearest to the Ron Paul campaign event and then posted a pro-Ron Paul post on Daily Kos - with an anonymous screen name, but similar to the one you used on IMDB. The Kos post was from the same IP address that someone used six months later to download a large book about molecular chemistry from the Gutenberg Project, within a few days of a mail bomb that was sent to the local IRS facility. And so on.

      Once you can link the worlds of "sparse but identifying data" with "ubiquitous but anonymous" data, a lot of bad things can happen. And there's really no way to prevent that linking.

  13. From the paper by JPMH · · Score: 1
    From the paper:

    First, we can immediately find his political orientation based on his strong opinions about "Power and Terror: Noam Chomsky in Our Times" and "Fahrenheit 9/11." Strong guesses about his religious views can be made based on his ratings on "Jesus of Nazareth" and "The Gospel of John". He did not like "Super Size Me" at all; perhaps this implies something about his physical size? Both items that we found with predominantly gay themes, "Bent" and "Queer as folk" were rated one star out of five. He is a cultish follower of "Mystery Science Theater 3000". This is far from all we found about this one person, but having made our point, we will spare the reader further lurid details.
  14. What are you rating in IMDB vs Netflix by SmallFurryCreature · · Score: 4, Insightful

    As far as I know in IMDB you are rating the overall quality of the movie, not I agree with it OR I want to see more like this.

    One example, Shindlers list, great movie, do NOT want to see it again. Same with Grave of the fireflies. Some movies just ain't for multiple viewings. They are my "favorite movies I never want to see again".

    On the other hand I got movies I can watch any day of the week, but that I would NEVER rate as highly. Cannonbal run is one such movie. It watch it far too often, but I wouldn't call it a good movie. You can always fine me ready for a Jacky Chan movie or a spagethi western.

    Is the netflix rating system a "I liked this movie and want to see more like it" system or a "This movie was brilliant and I would highly recommend it too everyone else" type of rating system?

    Granted some people get it confused, probably the same people that use the slashdot moderation system to silence views they don't like, but that only makes basing conclusions on user ratings even more problematic.

    I can rate a movie highly even if I do not agree with it, simply because it is good. And I can rate a movie I really like to watch as crap simply because I know I like watching crap.

    I don't like the godfather movies, I can see they are high quality, I just don't like them. So my rating them would be fairly high as for quality, but low for 'I want to see more like this'.

    I thought that the netflix system was "I want to see more like this" based. Surely nobody is so stupid as to think a quality rating and a "i like this" rating system are the same? Or am I completly in the wrong in seeing a difference between the two? Am I insane in thinking that you can see a movie as being a great artwork and still not liking it or viceversa?

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

    1. Re:What are you rating in IMDB vs Netflix by apt142 · · Score: 1

      I would think that "I like this" and "This is a good movie" are two different measurements on a film.

      I must be a pessimist, but I don't believe the average Joe would agree with that statement. I think most people would see the two statements as synonymous. That is, if they even think about the distinction. Mostly I think they'd just grab their "gut" feeling and go with it.

      I suppose we could test the argument by comparing movies that are ranked high on quality with total movie rentals or some other more precise measurement of watching frequency.

    2. Re:What are you rating in IMDB vs Netflix by xtracto · · Score: 2, Interesting

      One example, Shindlers list, great movie, do NOT want to see it again. Same with Grave of the fireflies. Some movies just ain't for multiple viewings. They are my "favorite movies I never want to see again".

      Just out of curiosity, why don't you want to see those films again? both of them are really good films and although I would not see them every weekend (as for example Sin City), I enjoy watching them from time to time. The plot is interesting, the photography/drawing is nice and the screen writing is well done.

      I find it difficult to understand your statement, "favorite movies I never want to see again", if you do not want to see them again, then you do not enjoy watching them... unless you dislike enjoyment and only watch films that make you cry or have a bad time (I would suggest you United 93... worst film I have seen in a looong long time... or Broeback Mountain, a 1 hour marlboro country ad).

      I not not know about the netflix scoring algorithm but I have found criticker.com quite reliable for my tastes.

      Am I insane in thinking that you can see a movie as being a great artwork and still not liking it or viceversa?
      It might be akin to the "La Gioconda" painting. Everybody says it is the best piece of art of all the time, yet, after having watched it *twice* live in the Louvre I have yet to find something special about it (I prefer for example, paintings from Giovanni Paninni, which is relatively unknown)

      --
      Ubuntu is an African word meaning 'I can't configure Debian'
    3. Re:What are you rating in IMDB vs Netflix by Danny+Rathjens · · Score: 2, Insightful

      As far as I know in IMDB you are rating the overall quality of the movie, not I agree with it OR I want to see more like this.

      No. You give people way too much credit if you think their ratings on public sites are that nuanced or objective. I think most people just rate things on how well they like it themselves. A significant portion seem to even just give 10s to anything they like, too.

      I also find it amusing how the votes tend to congregate somewhere in the 3rd quartile a bit above average(e.g. 7 on a 1-10 scale) rather than 5.5 where it would be if people ranked things more fairly. (I wonder if this is associated with that effect where people always rank themselves above average despite evidence to the contrary, as well.)

    4. Re:What are you rating in IMDB vs Netflix by ps236 · · Score: 2, Insightful

      > I also find it amusing how the votes tend to congregate somewhere in the 3rd quartile a bit above average(e.g. 7 on a 1-10 scale) rather than 5.5 where it would be if people ranked things more fairly

      I'm not sure about that. People will tend to watch films they think/hope they will like. So, the ones where they think 'that'll be absolute poop' they won't bother watching, so, hopefully, won't bother rating.

      So, people should rate fewer films as 'poop' than as 'great', because they select only the 'hopefully good' films to review.

      If you forced people to go to see and review all films, even the ones where you have to drag them screaming through the door, then the average rating would almost certainly decrease considerably.

    5. Re:What are you rating in IMDB vs Netflix by ZorbaTHut · · Score: 1

      In addition to the previous comment, when I rate movies I'm usually rating movies that I remember. If a movie is entirely unmemorable, I'm not gonna remember that I watched it and thus I'm not going to rate it.

      That means the above-average movies and the total flops get rated, but not the below-average movies.

      --
      Breaking Into the Industry - A development log about starting a game studio.
    6. Re:What are you rating in IMDB vs Netflix by Danny+Rathjens · · Score: 1

      Yeah. Not just that, but also people are probably more motivated to actually vote for a movie they liked. Why bother to go look at the imdb entry at all for a movie you didn't like? (Unless it's something that just *needs* to be voted low to warn others. e.g. Highlander II) :)

    7. Re:What are you rating in IMDB vs Netflix by Anonymous Coward · · Score: 0

      To a certain extent you are wrong in that. People don't get netflix subscriptions to rent the same movie(s) over and over again, that's what the movie bin at S-Mart is for, people get netflix subscriptions so that they can see lots of different movies once or maybe twice each. To use your example, having seen one movie that qualified as a single viewing favorite did that stop you from appreciating the next one?

      Taste in art is like taste in anything else, it's 100% subjective. That which you describe as great artwork might not be what I describe as great artwork. That which you like I might not. The real difference isn't between liking it and appreciating it, those things work hand in hand to come up with an overall rating, the real difference is between "How do I think I would rate this movie all things considered?" and "How do I think everyone else would rate this movie all things considered?".

      The real beauty of the law of large numbers is that it doesn't matter which view you take of how to rate something, or how to mod a post, get enough people doing it and the numbers will converge onto a single value which we can call the rating, because it's all subjective to begin with.

      The real problem of the law of large numbers is that it's often used on very noisy datasets, a single value called rating might not be enough information to predict whether or not someone will like a movie, but that's what the prize is all about.

    8. Re:What are you rating in IMDB vs Netflix by arvindn · · Score: 1
      Yes, such differences in meaning exist.

      However, when you're talking about dozens of movies, all you need is a correlation. Our algorithm is powerful enough to tolerate a large amount of noise. If you read the paper, we were able to match up users between imdb and netflix with a very high level of confidence, in the sense that the best match was 15-30 standard deviations away from the second best match. In statistics terms, that's a insanely close match.

      --Arvind Narayanan

    9. Re:What are you rating in IMDB vs Netflix by coaxial · · Score: 1

      Is the netflix rating system a "I liked this movie and want to see more like it" system or a "This movie was brilliant and I would highly recommend it too everyone else" type of rating system? It's both. The system allows users to say how much they preferred a movie. This can then be used to predict what movies a user will prefer in the future. If an unseen movie is preferred by users that have expressed preferences similar to yours, then it will be recommended to you.

      But your question about what the semantics of a rating are is good one. The answer is, we don't really know, and it doesn't really matter from a practical standpoint. People like things of "high quality" whatever that means. "The Godfather" is high quality movie, but then again so is "Harold and Kumar Go to White Castle". They're very different, but they're both "high quality" (whatever that means) for their respected genres, but I wouldn't necessarily say "If you loved The Godfather, you'll love Harold and Kumar!"

      People also rate movies differently. There are users that on a five point scale, only use 5 and 1. Then there are others that rate everything between 4 and 2. And then there's other that rate everything between 5 and 3, and so on. You can't just compare the ratings directly among the users. You have to scale them.

      Take me for example. Right now, I'm rating the music on my ipod. (I don't know why, I just am.) I decided 1 is for absolute crap (e.g. pretty anything by Diamanda Galas. I highly recommend her cover of the Hank Williams classic, "I'm So Lonesome, I Could Cry." Everyone should hear that at least once.) and 5 is for "my blowing sublimity" (e.g. Massive Attack's A Prayer for England) If I just like it or dislike it, 4 or 2. Things I'm a bit ambivalent about 3. Sounded simple. Nice good rules. But now after rating a few hundred songs, I've realized that there are several things that are more like 3.5 or 2.5. I like the song, but not as much as the other songs rated 4, but more than other songs rated 3. What do I do? Push the songs I definitely like up to 5, and rate this song 4? But what about the songs that are truly special? Where do I rate "Smells Like Teen Spirit?" Originally it was mind-blowing, but over the years it's mind-blowing ability has passed. It's still a good, and very influential song, but given what exists now, it's not as standout as it was before.

      Because of all of this, I'm starting to like simple three state ratings. Thumbs up, thumbs down, or no rating. It removes the guess work, and the paradox of choice.

    10. Re:What are you rating in IMDB vs Netflix by rocket+rancher · · Score: 1

      Am I insane in thinking that you can see a movie as being a great artwork and still not liking it or viceversa?

      No, you are not insane. It's something more subtle than that. Your malady is one shared by all clever, insightful people who are surrounded by unclever clods. Scientists, engineers, musicians, artists, writers -- anybody who thinks abstractly on a routine basis -- will often appear insane or worse to people who can't or won't think abstractly at all. People who can't or won't think abstractly are not able to disconnect their idea of the way things should be from the way things really are. As Chomsky so rightly pointed out, the map is not the territory. When confronted with somebody who, for example, uses conflicting ideas to describe the same object, as you did when you characterized Grave of the Fireflies as a great movie that you never wanted to see again, the result is confusion for people who think great movies should be watched over and over again, and a knowing nod from those few of us whose map of reality can actually handle that apparent contradiction. It would be interesting to try to see if the same statistical methods used to crack the anonymity of the Netflix prize database would be useful in selecting abstraction as a character trait, based perhaps on how often a person deploys conflicting concepts when reviewing books or movies.

  15. I think a lot of you naysayers... by mdm-adph · · Score: 1

    ...would be a lot more appreciative of this proof of concept if someone trawled Slashdot threads to see how often you feed trolls by responding to comments with a "-1" rating... :P

    --
    It is by my will alone my thoughts acquire motion; it is by the juice of the coffee bean that the thoughts acquire speed
  16. This is a 'research' paper? by RocketJeff · · Score: 1, Insightful

    First, we can immediately find his political orientation based on his strong opinions about "Power and
    Terror: Noam Chomsky in Our Times" and "Fahrenheit 9/11." Strong guesses about his religious views can
    be made based on his ratings on "Jesus of Nazareth" and "The Gospel of John". He did not like "Super
    Size Me" at all; perhaps this implies something about his physical size? Both items that we found with
    predominantly gay themes, "Bent" and "Queer as folk" were rated one star out of five. He is a cultish
    follower of "Mystery Science Theater 3000". This is far from all we found about this one person, but having
    made our point, we will spare the reader further lurid details.


    Finding a paragraph like this in a research paper makes me call into question the motives and intentions of the 'researchers.' They seems sort of like the Jerry Springer of research (since he's just trying to help the families he has on his show...).

    They imply that the person didn't like "Super Size Me" because he's probably fat (or are they trying to imply that he has a problem with gaining weight and is jealous?).

    Also, they imply that because he rated two "predominantly gay theme" items as poor he must not be homosexual. Or are they implying that because he rented/rated these that he must be gay (because who would ever rent them otherwise).

    The fact that they use the "there's more juicy stuff about this guy, but we can't tell because we're serious researchers" line at the end is the pièce de résistance that really shows what motivates these researchers.
    1. Re:This is a 'research' paper? by nagora · · Score: 3, Insightful
      You're missing the point completely. Other people will be using "data mining" of this sort, and making serious decisions about whether you support terrorism, or are just generally not a "good citizen", and they won't be revealing their judgments to the public to let them know what might be going on.

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    2. Re:This is a 'research' paper? by amccaf1 · · Score: 2, Insightful
      From TFA:

      He did not like "Super Size Me" at all; perhaps this implies something about his physical size?
      Or maybe he's a manager of a McDonalds. Or a part-time Ronald McDonald. Or...
      --
      "Flag on the moon. How did it get there?"
    3. Re:This is a 'research' paper? by PhysicsPhil · · Score: 1

      Finding a paragraph like this in a research paper makes me call into question the motives and intentions of the 'researchers.' They seems sort of like the Jerry Springer of research (since he's just trying to help the families he has on his show...).

      It's clear you didn't read the paper. To be sure, the quoted paragraph did appear in the paper, which of course was selected for the summary because it was the most interesting. The full paper is 24 pages of substantially heavier research and analysis. The paragraph in question was actually towards the end of the paper in a 'case study' section indicating what kind of information might be plausibly derived from an anonymity attack against the NetFlix database.

      Also in the paper are one lemma, five theorems, a discussion thereof, a presentation and discussion of the the de-anonymizing algorithm, along with an interesting discussion of spareness within the original Netflix database (i.e. how similar are records from two different people).

    4. Re:This is a 'research' paper? by ioshhdflwuegfh · · Score: 1

      [...]This is far from all we found about this one person, but having made our point, we will spare the reader further lurid details. Also in the paper are one lemma, five theorems, a discussion thereof, a presentation and discussion of the the de-anonymizing algorithm, along with an interesting discussion of spareness within the original Netflix database (i.e. how similar are records from two different people). What I want is not some tight point, nor an arrow pointing to this point, but exactly the lurid details, (and preferably but not necessarily how they link to all these theorems and the algorithm.)
    5. Re:This is a 'research' paper? by urcreepyneighbor · · Score: 1

      He is a cultish follower of "Mystery Science Theater 3000". That describes half the tards here. Including myself. ;D
      --
      "The fight for freedom has only just begun." - Geert Wilders
    6. Re:This is a 'research' paper? by RocketJeff · · Score: 1

      It's clear you didn't read the paper. [snipped]

      Also in the paper are one lemma, five theorems, a discussion thereof, a presentation and discussion of the the de-anonymizing algorithm, along with an interesting discussion of spareness within the original Netflix database (i.e. how similar are records from two different people).

      Actually, I did read the entire paper - not in depth, but enough to get the basic picture of how they went about the task.

      They presented what appears to be sound research before they decided they'd had done enough "serious" work and then descended into the depth of tabloid journalism by presenting what appear to be spurious and lurid claims about the individual in their "case study." That decent is what makes me wonder about the work they did before that and the applicability of it to a more general case.

      Also, after reading it again, I'm still not convinced that their correlation between NetFlix and IMDB users is that great. Have they tried contacting the IMDB user(s) they identified to see if they're the NetFlix user(s) that they think they are?
    7. Re:This is a 'research' paper? by nagora · · Score: 1
      They presented what appears to be sound research before they decided they'd had done enough "serious" work and then descended into the depth of tabloid journalism by presenting what appear to be spurious and lurid claims about the individual in their "case study."

      Gee, you mean like a tabloid might make if such details were "accidently" leaked to them in, say, the run-up to an election? You still don't think they were making a valid point? I would ask if you needed a map drawn, but you already had and apparently that wasn't enough.

      TWW

      --
      "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
  17. Brokeback's decline by spideyct · · Score: 1

    Wait - you mean if I enjoyed a movie with a gay theme, people are going to assume I'm gay?

    Anyone think the IMDB rating of Brokeback Mountain is going to plummet dramatically. (It is 7.8 today)

    And of course, if it does, we will be able to correlate the timing of the sudden drop with the publishing of this slashdot article, allowing us to link the slashdot readership with imdb users. Now we have your Netflix ratings, IMDB ratings, AND slashdot postings all correlated...

  18. Liked Brokeback Mountain == gay liberal cowboy by Rogerborg · · Score: 1

    Now do you see the POWAH inherent in that knowledge?

    --
    If you were blocking sigs, you wouldn't have to read this.
  19. Re:requires another (partial)public revealing to w by TubeSteak · · Score: 2, Insightful

    Interesting, but not earth-shattering or a serious breach of privacy, I would say. And who exactly are you to say so?
    Because it isn't a Credit Card # or SSN it isn't serious?

    A) Some people would rather go to jail or commit suicide than admit to something embarrassing they'd rather keep private. Privacy isn't (just) about hiding (illegal) things from the Government.

    B) Demographic information is something you can never take back and can never change.
    At least I can get a new credit card & SSN.
    --
    [Fuck Beta]
    o0t!
  20. Better Description by dlsmith · · Score: 1

    I think the real problem here is being buried in misguided analysis about the meaning of anonymity and associating movie preferences to political affiliation, etc.

    Here's what's really been demonstrated: private information about users of some IMDB accounts who have rated movies on both IMDB and Netflix has be made public, despite Netflix's implicit assertion that releasing anonymous data is "safe." The user himself has not really been compromised -- nobody knows his address, phone number, names of family members, etc. -- but people now know more about the IMDB account than was intentionally published. When the user publicly posted his opinions about 5 (say) favorite movies, he did not expect his private opinions about 100 others, as expressed in Netflix, to also be publicly associated with that account.

    The practical impact isn't clear. If the private information were conveniently published by IMDB, so that nobody had to work very hard to view it, it might sway how likely readers are to trust a certain reviewer. The impact of that change in trust doesn't seem very meaningful, though, and in any case, the private information *isn't* conveniently published. If, under similar circumstances, there were a correlation between private information and an eBay account, then there could be a real financial impact.

    Another concern is that, if other factors have already made it possible to correlate an IMDB account with a real person, then someone can make the jump to associating all this private data with that individual. For example, I might link to my IMDB profile from my blog so all my coworkers can see my public reviews, not realizing that it's now possible for them to determine what movies I've privately watched.

  21. wait until sometime 2008... by Anonymous Coward · · Score: 0

    then we can have this discussion again over more prevalent movies that are controversial. whoever did this research paper really should have waited til Golden Compass comes out. I use GC as an example because the movie when it comes out is going to have a lot of people not pleased with it. Also Prince Caspian comes out in '08 as well. While totally off based, those two movies are going to be what defines the new IMDB in the given year.

    I can say that whatever off beat movies that come out are going to have substantial rating as well, but Golden Compass is going to have the most impact, just because its going to be very disliked because of the whole history surrounding it(and yes there will more then likely be boycotts opening day)and its producers.

    enough said? I think so.

  22. Re:requires another (partial)public revealing to w by Constantine+XVI · · Score: 1

    Re: B, you can usually change any sort of non-biological (and, using extreme measures, some biological ones too) demographic information about yourself. There's nothing that says you can't suddenly turn from liberal to conservative or vice versa, or get married (or turn gay/lesbian), etc.

    OT: is there a way to escape greaterthan/lessthan signs?

    --
    "I think an etch-a-sketch with an ethernet port would beat IE7 in web standards compliance."
  23. what utter nonsense by e-scetic · · Score: 1

    If you think you can determine political affiliation based on how someone rates movies, especially in America, then you're just plain retarded.

    To take an example, a left-winger might rate Michael Moore flicks poorly because one thing about Moore's stuff is he almost always seems to avoid more effective ways of making his points. They agree with the message, disagree with the methodology or style of film. On the other side of things, a libertarian, Goldwater Republican, "conservative", etc., might rate Moore's Sicko highly, because there is undoubtably something wrong and shameful with health care in America whether you believe in socialized medicine or not.

    But you know what - it wouldn't surprised me if the day came when your movie ratings came back to haunt you. America and other countries do seem to be headed in that direction.

  24. Re:requires another (partial)public revealing to w by Lijemo · · Score: 1

    OT: is there a way to escape greaterthan/lessthan signs?

    apersand-lt-semicolon results in <

    apersand-gt-semicolon results in >

    (no spaces or dashes.)

  25. Simple as you said, I do NOT enjoy watching them by SmallFurryCreature · · Score: 2, Interesting

    The comment "favotire movie I never want to see again" is one I got from a review of Grave of the Fireflies that I just happened to totally agree with. Don't read the reviews, just watch it yourselve and if you are not into Anime just set that aside for the duration of the movie, then ask yourselve again, if you can understand that comment.

    It is powerfull movie, like Shindlers List, but not a happy tale. I am not talking a tear jerker movie here, I am talking a "we will all burn in hell for this" movie. Tear jerkers I can take, Christmas in August is one. Sad tale, nicely told but ultimately human. It makes you sad, not sick of humanity.

    Perhaps I am just too emotional about this kinda stuff, one reason might be that I grew up with halfunderstood tales of "that was were your great-uncle was picked up". When you realize just why your grandmother had 9 brothers and sisters yet you never met any. I got one aunt, my grand-parents had 3 kids, a starvation story like GotF hits a lot closer with a history like that. (The dutch hunger winter)

    I enjoy all kinds of movies and would NOT have NOT watched these two, but that doesn't mean I want to see them again. There are some people who list Shindlers List as a feel good movie because it 'ends well'. I suppose you might see it that way, I don't.

    I can regonize your statements that the photography is nice and the screen writing is well done, but the plot is intresting? To you it is a plot, to me it is a sickening part of history that I am far too close to.

    Perhaps it is a bit like how Richard Pryor's monologue about the 200th celebration of the US was not exactly all that cheerfull.

    Terry Pratchets Nanny Ogg describers at one point the difference between merry and mirth (or something like that) she describes how she was joyfull when her child was being born but she wasn't exactly chuckling at the time. Enjoying a movie and enjoying it are two different things, at least for me. I can't describe it any clearer.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

  26. Old news: see "New Scientist" 11 November 2006 by DavidHumus · · Score: 1
    Here ahref=http://technology.newscientist.com/article/mg19225776.800-has-netflix-given-away-the-answers-in-its-software-competition.htmlrel=url2html-4572http://technology.newscientist.com/article/mg19225776.800-has-netflix-given-away-the-answers-in-its-software-competition.html> - sorry, subscription required for full text of the article, so I'll just reveal one piece of nonsense (not to diminish their accomplishment):

    Competition judge Charles Elkan at the University of California, San Diego, agrees that the method could work if enough Netflix users also use IMDB, but he believes it will be possible to detect and disqualify cheats when they submit their computer code. Narayanan and Shmatikov aren't convinced. "There are techniques to obscure how data is introduced within the thousands of lines of code you'd submit," Shmatikov says.
    Yeah, someone's going to hand over a million dollars after you tell them "here's the algorithm, just ignore that large chunk of hex over there."
  27. Not really broken, is it? by Anonymous Coward · · Score: 0

    From what I'm getting out of this, all they've really shown is they can guess the identities of a few people by correlating movie ratings in content and time between imdb and netflix? That doesn't sound like much of an ID to me.

  28. Re: favorite movies I never want to see again by Anonymous Coward · · Score: 0

    Finally someone else that gets it!! I use almost exactly the same words to describe Shindler's List. It's a really good movie, and I think everyone should see it once, but I never want to see it again. Ever.

    And actually I'm starting to think I should avoid re-watching any movie that has ever caused such intense emotions, because I've discovered that you just can't duplicate the original experience in repeated viewings. When I was a teenager, the end of Last of the Mohicans was incredibly moving, but more recent watchings leave me feeling mostly empty -- partly because I'm now closer to Cora's age, so I'm not as attached to Alice as I was back in 1992, but also because I know what's going to happen. Ditto for Fearless. I left the movie theater a changed person, but when I saw it again on DVD a few years ago I got nothing. Meanwhile, this year's Stardust gave me that same changed person feeling, but I'm afraid to watch it again so I won't ruin the memory of the original experience.

    On the "oh shit" side of the spectrum, the Usual Suspects and Fight Club aren't quite as good the second time around, but they're still very watchable and very good movies. However, the Sixth Sense is only good for two viewings (* the second is only necessary if you weren't paying close attention the first time around), and I have absolutely no desire to ever see it or any other M. Night Shyamalan movie ever again.

  29. Idiots! by Impy+the+Impiuos+Imp · · Score: 1

    > The researchers used this method to find how individuals on the IMDb privately rated films on Netflix,
    > in the process possibly working out their political affiliation, sexual preferences and a number of
    > other personal details"

    How about working out the other ratings the people made so they can include it in a trivial predictive app and submit it for the million dollar prize?

    --
    (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
  30. Re: dude, you don't know what you're missing! by Anonymous Coward · · Score: 0

    Bruce Willis as Hudson Hawk: "Bunny, ball ball." (mimicking Sandra Bernhard's line earlier in the movie)
    [Bunny the dog looks up and squeals excitedly]: "rrrrrrr?"
    [Hawk launches the rocket propelled tennis ball at the dog]
    [Bunny flies out the window of the castle]

    That scene alone is worth the price of admission, dammit!

    (Disclaimer: I own the DVD, and I think It's easily one of the 10 best comedies ever. Personally I'd give the movie an 8/10, but for other people I'd give it about a 7/10.)

  31. Wrong headline by imsabbel · · Score: 1

    It should really read "non-anonymous ratings of movies in IMDB gives away your movie preferences".
    Because really, thats all the mat their "research" has.

    And in fact, if you rate movies in IMDB, and your handle can be tracked, who needs the netflix data?

    This whole thing is a non-issue, and the paper is so content-slim i doubt it will be accepted anywhere (well, maybe "new scientist" will print it...)

    --
    HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
  32. Meaningless Fluff by sexconker · · Score: 1

    Nothing was broken.
    All they did was notice that you can match up review scores and dates pretty well with stuff on IMDB.

    This means nothing - all you get is a possible link to a username.
    A username does not equal a person.

    Most usernames/accounts on IMDB/Netflix correspond to several people.

    Even if you could directly say "username x is a single person and this is their complete ratings history", username x is still anonymous. It's not like anyone hacked into netflix and knows what ratings I, as a person, gave out.

    More "research" that means nothing, more sensationalist crap.

  33. Welcome to the new world. by Anonymous Coward · · Score: 0

    There are statistical associations everywhere. Just because you can't see them doesn't mean a computer can't. Anonymity is impossible -- identity will always leak through in some form.

  34. Re:requires another (partial)public revealing to w by IpalindromeI · · Score: 1

    OT: is there a way to escape greaterthan/lessthan signs?

    You have to use the HTML escape codes, which are &lt; and &gt; .

    --

    --
    Promoting critical thinking since 1994.
  35. Endless media hyperbole by SlappyBastard · · Score: 1

    To the issue of your anonymity being shattered, puh-lease. If you post information in a public forum such as IMDB and it can be correlated to information from MySpace, it wasn't a giant leap into your privacy. It was just gathering already public information. What's the big deal?

    You choose to post that stuff where it could be publicly viewed. The fact that it lines up with data from Netflix only proves that NF did in fact provide a quality dataset. Big deal.

    --
    I scream. You scream. I assume that means we're both acquainted with the problem. We proceed.
  36. IMDB Goofs by Anonymous Coward · · Score: 1, Funny

    I'm sorry but the continuity mistakes where Peter North's socks are on before the money shot then clearly off afterwards and when Mercedes says "Fuck me deeper!" before she is fucked ruined the film for me.

  37. Re:requires another (partial)public revealing to w by v0x0j · · Score: 1

    Actually, thats hardly any breach of privacy. Your ratings might be made public only if you made it public yourself on IMBD. Theoretically, your netflix profile will include more movies, but then it would not be close match, and identifying will not occur. Even if it is closest match - it's game of probabilities then: it gives person grounds for plausible deniability. In any case, that blog write up is complete bullshit: it assumes that people rate the same on imdb and netflix. It's not the case, not only because people expect latter to be private, but 1) because scales are different, and people interpret different scales differently 2) if you don't rate immediately at the same time on both sites, your experience change, and you rate same movie differently 3) i use imbd to remember which movies i saw, and use netflix to recommend me movies, which means my reason to rate is different, which makes ratings to differ. Another assumption mentioned on that blog is date match - same movies rated on same day. If I represent average person, I don't usually rate movies on both sites at the same moment, i do them in batches. My take on that "research" - it's not conclusive load crap.

  38. OT: if you like Hudson Hawk, you'll also like.... by mbourgon · · Score: 1

    The Fifth Element. I think it's because it's a mixed-genre movie. Not action, not comedy, somewhere in between.
    Both star Bruce Willis, interestingly enough.

    "Hey mister, are you gonna die?"
    "Do you know what it's like to be called Chlamydia for a year?"
    "You are a slender reed compared to that guard"

    Both HH and 5E are in my top 10 movies. And the commentary on Hudson Hawk is great - they talk about how they hired the narrator from Rocky & Bullwinkle, so that you'd know the tone they were taking. Fun stuff.

    --
    "Sometimes a woman is a kind of religion, she can save your soul & set you free from all your sins" - Bad Examples
  39. Re:requires another (partial)public revealing to w by darthflo · · Score: 1

    & results in &
    &gt; => >, &lt; => <
    :)

  40. Re:FUCKING RETARD by Anonymous Coward · · Score: 0

    crymore slashdot faggot

  41. RIAA/MPAA by Anonymous Coward · · Score: 0

    Wow, it is surprising that the RIAA/MPAA has not yet offered them a job. With several layers and an infallible rhetoric, I am sure they would find a way to explain to a judge that if you have rated a movie on Netflix without owning the DVD or have a paper proof that you bought a theater ticket for the movie, then you must have downloaded it illegally... Isn't it obvious ??

  42. Re:requires another (partial)public revealing to w by palantir0 · · Score: 1

    Now tell me, how many people would rate on both systems and use the same rating? This implies you remember your rating. Maybe I don't know enough about the system but the system is highly suspect unless the user has rated a large quantity of movies that are in both systems and includes many far outside of the mainstream movies. Honestly, if you are doing things publicly, forget about privacy. More than this can be used to identify you. Cheers

  43. More woe for HMRC then by AmiMoJo · · Score: 2, Interesting

    None of the mainstream media picked up on it, but I remember thinking this sort of thing might be possible with the data lost by HMRC too. I bet Tesco would love to get their hands on it for planning where to put new stores and what to stock etc. Combined with their Clubcard database, of course.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  44. Re:requires another (partial)public revealing to w by CopaceticOpus · · Score: 1

    For most people, it wouldn't be a serious breach of privacy. However, you can imagine a scenario where it would be.

    Imagine a pastor who uses a recognizable username for many sites, including both IMDB and his church's web forums. He uses Netflix as a way to feed his secret love of movies with sexual content which his church would publicly denounce. Now these researchers could link his username to ratings for all these movies, and post the information online.

    All it would take then is for a curious church member to google the pastor's username, and his previously secret habit would now be public knowledge. He could lose his reputation and his career.

  45. Re:requires another (partial)public revealing to w by mosch · · Score: 1

    I think that hypocritical religious leaders deserve to be exposed because of their chosen place in society, so this sounds fan-fucking-tastic to me.

    If you have a hypocritical pastor, I'll buy the domain and host the site.

  46. Re:requires another (partial)public revealing to w by CopaceticOpus · · Score: 1

    Alright, in that case you like the outcome. What about an officer in a "Don't ask, don't tell" military who rents gay movies? What about an employee who rents an informational DVD about how to change careers, and posts a review which they believed was anonymous, in which they reveal their plans to leave their job?

    The issue isn't really about movie reviews. It's about the expectation of privacy at any website. If a site is offering privacy and anonymity, they need to be responsible with whatever information is entrusted to them, no matter how unimportant that information may seem. We're discovering that even anonymizing data before releasing it isn't adequate.