Slashdot Mirror


Build a Better Netflix, Win a Million Dollars?

An anonymous reader writes "In a quest to better movie recommendations, Netflix is opening their database (nytimes, registration and first child required) to users to try to craft a better recommendation technology. The problem is not easy. Says one researcher: 'You're competing with 15 years of really smart people banging away at the problem.'" Recommender systems are really an interesting problem, and that is likely very interesting data to play with.

19 of 197 comments (clear)

  1. So, we can then conclude by jimstapleton · · Score: 3, Funny
    Says one researcher: "You're competing with 15 years of really smart people banging away at the problem."


    So, the professionals have been working at it for a long time. Is it safe to assume some teenage to early college hacker will find a success within two weeks.
    --
    34486853790
    Connection too slow for X forwarding? Try "ssh -CX user@host"
  2. Simple by Anonymous Coward · · Score: 5, Funny

    if(user.getGender()==Person.MALE)
    recomendation=MovieGenre.PORN;
    else
    recomendation=MovieGenre.CHICKFLICK;

    And of course, slashdot must have sensed my post as my image word is "pervert"

    1. Re:Simple by kelzer · · Score: 4, Funny

      Old Version:

      if(user.getGender()==Person.MALE)
      recomendation=MovieGenre.PORN;
      else
      recomendation=MovieGenre.CHICKFLICK;

      New Version, sure to win the million bucks:

      if(user.getGender()==Person.MALE && user.getOrientation()==Person.STRAIGHT)
      recomendation=MovieGenre.PORN;
      else
      recomendation=MovieGenre.CHICKFLICK;

      --

      ---------------------------------------------
      SERENITY NOW!!!!!!!!!!!!!!!!
  3. I had a thought like this a while back... by AceCaseOR · · Score: 4, Interesting

    ..except, instead of making it open to the community (which is not a bad idea, I must say) I thought of having Google do it. This is, perhaps, IMHO, a much better idea. Now, what we really need is a Movie Genome Project, much like the Music Genome project that lead to Pandora.

    --
    Zagreus sits inside your head, Zagreus lives among the dead, Zagreus sees you in your bed and eats you in your sleep.
  4. go see porn sites by LiquidCoooled · · Score: 3, Interesting

    They have decent tech for building similar/recommended alternative pages.
    Especially the newer blogish type pages where theres a gallery and a small selection underneath.

    Not that I would know of course.

    --
    liqbase :: faster than paper
  5. Suggestion by 99BottlesOfBeerInMyF · · Score: 5, Insightful

    As a NetFlix user I have one suggestion for their recommendation system that can make it much better. Make it aware of the connection between series. That is to say, If you rent season 1 of something, suggest season 2, not season 4 (even if season 4 has better review ratings). If I mark season 1 of something as "not interested" instead of giving it a user rating, don't suggest every other season of that same show at the top of my recommendations. I mean how many times do I have to tell you I don't want to see any season of "Friends" ever, even if you pay me?

    1. Re:Suggestion by Xentor · · Score: 3, Insightful

      Hmm, I see your point.

      I was about to mention that I mark things as Not Interested when I own them, to avoid being reccommended the rest (Usually because I prefer to buy series I like, and rent actual movies), but then I realized that fits into what you said perfectly.

      Point conceded.

      --
      "The amount of intelligence on this planet is a constant. The population is growing." -Cole's Axiom
    2. Re:Suggestion by truthsearch · · Score: 4, Funny

      Point conceded.

      For the record, this is a turning point in slashdot history. I'll forever remember where I was when I first saw those words in a slashdot comment. (Which of course is at work, sitting through a boring meeting.)

  6. Privacy issues? by Vultan · · Score: 3, Interesting

    How will they handle privacy issues? Don't the same issues appear here that appeared with the AOL data this summer? With enough ratings you can narrow down to a specific person, and then find out about all the pr0n that this person has been getting as well.

    1. Re:Privacy issues? by Cruise_WD · · Score: 3, Informative

      From http://www.netflixprize.com/ :

      To prevent certain inferences being drawn about the Netflix customer base, some of the rating data for some customers in the training and qualifying sets have been deliberately perturbed in one or more of the following ways: deleting ratings; inserting alternative ratings and dates; and modifying rating dates.

      Plus all the usual replacing of IDs and such you'd expect. Looks like they're trying to avoid a repeat of the AOL debacle at least.

      --
      [ cruise / casual-tempest.net / xenogamous.com / transference.org / quantam sufficit ]
  7. RSSTimes by eldavojohn · · Score: 4, Insightful
    In a quest to better movie recommendations, Netflix is opening their database (nytimes, registration and first child required)...
    Not quite, you can find it here (or the minimalist version for anyone sick of ads).

    Why is it that the Slashdot editors are just too damn lazy to look up the RSS feed links to these pages?

    The problem is not easy. Says one researcher: "You're competing with 15 years of really smart people banging away at the problem."
    While this may be true, I wouldn't let it deter you. Collaborative filtering is a field that is far from dead. The interesting thing about collaborative filtering is that on the surface, it seems pretty straight forward but once you dig into the mechanics of it, there is actually a lot of playing you can do. Ironically, the way you display the data to the end user is often what determines how well of a job you did.

    Allow me to take a naïve approach at this topic and say we generate a movie index of each person. I would have A Clockwork Orange and Koyaanisqatsi at 5 while The Ring 2 would be at the very low end. My friend might have similar movies. If he has A Clockwork Orange up there, you might be able to compute a Euclidean distance between us. However, this approach falls apart because no one has seen Koyaanisqatsi and of the 20 movies I've ranked highly, they are hard to find.

    You don't have to stop there, however. You could also database the movies I marked as "uninterested" or the movies that were presented to me but I didn't vote on. Like if I had seen the offer to mark J-Lo's latest flop but didn't, wouldn't that tell you something about me?

    So these caveats present themselves all along the way and, at the end computation, you have many different strategies for this data. For example, while you might not be able to link my friend an I through movies, how far apart are we on a nod network? What I mean is, if you plotted every user in their own dimension depending on the movies they ranked and attempted to compute as good a distance as possible between all users, how far would I be away from my friend by hopping on these nodes? There's a lot of information to be gleaned in this sort of friend-of-a-friend collaborative approach.

    Now you need to present this information to the user. Do you just up and recommend him a movie? Do you take Amazon's approach and say "Other people did this -- so should you."? Or do you give them some sort of three dimensional flash plotting of you versus the people nearest to you? Do you allow the user to contact those closest to them? Those farthest away?

    My point is that while 15 years of research has been done, it doesn't mean there's been 15 years of testing and implementation which, in the end of creating products, is where most of the importance lies.
    --
    My work here is dung.
  8. Copy the Music Genome Project by Zaphod-AVA · · Score: 5, Interesting

    The problem with recommendation systems is that they use too little information to catagorize their subject.

    What they need to do is copy the methods of the Music Genome Project (www.pandora.com), and list a larger set of attributes for the films. This way it can recommend films by checking many more characteristics, such as director, tone, writer, or subject.

    1. Re:Copy the Music Genome Project by vontrotsky · · Score: 4, Informative

      The problem with recommendation systems is that they use too little information to catagorize their subject.

      What they need to do is copy the methods of the Music Genome Project (www.pandora.com), and list a larger set of attributes for the films. This way it can recommend films by checking many more characteristics, such as director, tone, writer, or subject.


      In this contest, you run your own code and submit the results to NetFlix to be scored. This means that you can use any other data (e.g. A Movie Genome projct) you can compile to enhance your rankings. Netflix apparently specifically designed the contest to allow this.

  9. Re:database? by thatnerdguy · · Score: 4, Informative
    --
    I saw the Sign, and it opened up my eyes
  10. Fix the problems with what they send me first by Jimmy+King · · Score: 5, Interesting

    I wish they'd fix the problems in the logic determining what they actually send me from my queue before fixing problems with what they recommend to me. If I've got season 1 of a show in my queue prior to season 2, don't start sending me season 2 because some disc of season 1 is unavailable (which has happened to me multiple with both netflix and blockbuster online), send me something else completely. They've got the tech to keep one season of a tv show in order, it can't possibly be that difficult to extend that to keeping multiple seasons of a show in order.

    On top of that, don't show me that it's available in my queue but send me something else instead. While I haven't asked netflix about this, I have asked blockbuster online, and I imagine they are both doing the same thing. The disc is "available" just not at the warehouse used to ship to me personally. Instead of basing one piece of information off of total stock and one off of local stock, base them both on the stock at the warehouse shipping to me.

  11. Difficulties on the data-gathering end by jfengel · · Score: 4, Interesting

    Any marketer will tell you that what people tell you they want and what people actually want are very different things. Even if people answer honestly, the data you gather is often unreliable: people simply don't have as good a handle on what they want as they think they do.

    Not that marketers have a better handle, but simply that people will swear up and down that they would buy a peanut-butter-filled hot dog, that they loved the one they tried, and then don't actually buy any.

    Don't believe me? Go see Snakes on a Plane. Nobody else did. (Sure, $33 million seems like a lot, but that's chump change for a major studio release these days.)

    The best improvements will come from insights gained between the lines. You may have rated The English Patient eleventeen stars, but if your next seven rentals were all episodes of The Girls Next Door, which you only rated 3 stars, it certainly looks like you want more Hugh Hefner and less Ralph Fiennes.

    The best data is the data that the subject doesn't realize he's giving you. Once you start imposing conscious choice on the ratings, you get only what they say they like, not what they really like.

  12. Intractable problem - liking the movie, not genre by OakDragon · · Score: 3, Interesting

    I stopped rating movies after I found that I got recommended a lot of crap. Say I rent a slasher movie that, for its genre, is artfully done. I rate it high. Now I have recommendations for a bunch of worthless, straight-to-video stuff that I really don't want to see.

    This is the real nut to crack, IMO. How do come up with an algorithm that rates 'quality,' an elusive concept that means different things to different people?

    Not to mention, I'm fickle.

  13. 5 star rating is flawed by BMonger · · Score: 3, Insightful

    I personally weigh movies on a number of different factors. I might give 3 stars to a movie because it has 4 of my favorite actors in it even if I didn't care for the plot. I might give 3 stars to a different movie with horrible acting but interesting camera angles (From Dusk Til Dawn 2). I tend to average out my ratings dependent on many things a movie has to offer.

    The problem is is that that is my rating system. It works for me. But it does little good to anybody else because they are rating based purely on something else.

    I think they need to implement the ability to rate more aspects of the movie. I'm sure some people out there rate the movie poorly if their disc is scratched or the transfer quality is poor even. A simple 1 to 5 system doesn't cut it. People rate things that aren't "Was the (romance) plot good?", "Do you like this director?", "Do you like these actors?". People rate things that aren't on the box.

  14. Here's a problem to solve with much larger impact by Yogs · · Score: 3, Interesting

    Disclaimer: I subscribe to the same sort of service, except through blockbuster... maybe Netflix does have this feature. My wife and I share a queue... I imagine many, many of these queues are shared. We have very, very different tastes in movies. Instead of getting recommendations that suit us both (which is next to impossible), the recommendations just get very, very confused. If I could just keep my and her recommendations from tangling, we would both have an easier time.