Close but no Cigar for Netflix Recommender System
Ponca City, We Love You writes "In October 2006, Netflix, the online movie rental service, announced that it would award $1 million to the first team to improve the accuracy of Netflix's movie recommendations by 10% based on personal preferences. Each contestant was given a set of data from which three million predictions were made about how certain users rated certain movies and Netflix compared that list with the actual ratings and generated a score for each team. More than 27,000 contestants from 161 countries submitted their entries and some got close, but not close enough. Today Netflix announced that it is awarding an annual progress prize of $50,000 to a group of researchers at AT&T Labs, who improved the current recommendation system by 8.43 percent but the $1 million grand prize is still up for grabs and a $50,000 progress prize will be awarded every year until the 10 percent goal is met. As part of the rules of the competition, the team was required to disclose their solution publicly. (pdf)"
Is the new margin of improvement for victory then?
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
Any chance of not tagging this story with this meme?
If the people who created Netflix's system are still with the company, I'd say they deserve some retroactive recognition (and bonuses). That's pretty damn good optimization if it's that hard to improve upon, and there seem to have been some really sophisticated people trying to beat them.
What I'm listening to now on Pandora...
I guess my "flip a coin, take a chance" method wasn't worth the $million. Back to the drawing board...
Will Netflix incorporate the near-winners' ideas into their current system? If so, won't future teams be aiming at a moving (improving) target? If not, won't current Netflix customers know that their recommendations could be better if Netflix just incorporated a now publicly-disclosed algorithm into their servers?
"We can categorically state we have not released man-eating badgers into the area." - UK military spokesman, July 2007
At the risk of getting marked redundant: I totally agree, and I don't work for Netflix
This is a great contest, considering they have to publicly release the solution.
Although what is AT&T doing working on this problem?
-nick
The prize was clearly a million dollars, not a cigar! I guess the editors don't even read the summary.
Tsunami -- You can't bring a good wave down!
IMDB http://www.imdb.com/ has a recommendation section. Pick a film you like and it gives you a recommendation from their database. Feom experience this works perfectly for me... Plus you can read extensive reviews of films if you like.
Seems netflix are spending a lot of money on something that they seem to have working ok and can be found on other websites anyway.
Accuracy in this contest is defined as the user rating highly the movies that the system would suggest to them. The whole point of it is trust. If you're throwing out lots of suggestions that the user doesn't like just to try to find one they might like, you're destroying their trust in the system. They won't bother even reading the recommendations if they know they're filled with garbage.
Most noteworthy aspect of the winning entry is that their winning method works by combining 107 different types of prediction strategies.
They state that you can get pretty far by blending the 3-4 best strategies, but of course doing so would not have netted them the progress prize
It is kind of sad realization that there actually is no better method. Your best bet is to use brute force and attempt to find some weighting methodology that combines known methods. By the way this is a well known issue in protein structure prediction competitions, for many years now so called meta-servers (predictions work by merely combining other predictions) win all the time. The joke is that we now need meta-meta-servers, combine the results of combiners
Also a clarification on the progress prize: to get it you need to have at least 1% improvement over the previous result. Considering that there is only 1.57% to go there is room for only one more progress prize until it hits the Grand Prize (10% improvement over the original results).
You do know netflix already HAS a recommendation engine, right? Supposedly a very good one. The whole point of the contest was to significantly improve that engine.
if ($director eq "Michael Bay") {
print "Not recommended";
}
That should improve the system by at least 20%
I'm skeptical about these sorts of prizes. The X prize, Top Coder, Clay Institute Millennium Prizes-- if those were the only reasons to do something, few would. Seems pretty risky to do a lot of work for what amounts to a lottery ticket. So, who got 2nd place, and how well did they do? 1 group wins a paltry $50K and a little publicity and recognition, maybe even an endorsement or two, and the other 27000 plus get what? Nothing much. It's cool and fun to work on such problems, but people have bills to pay. Nice to have the sort of job where one gets paid to work on stuff like that. Any contestants reading this? Maybe you could enlighten the rest of us on why you bothered competing?
As for Netflix, I wonder how much such an improvement is worth to them? More than $50K, I imagine. Pardon my cynicism, but seems like contests like this are a way to get a lot of ideas and work for very little money.
Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
That's not how the contest works. It's based on the RMSE that the original netflix algorithm got at the beginning of the contest. This is fixed and does not change. See the contest site for more details.
I can say I played with it because I found it fun. I'm a coder, it's what my brain is interested in. There have been contests for ages simply because human beings like to compete, even if second place gets nothing.
And FYI, netflix doesn't get any "ideas" from anyone but the winner. You only have to submit your code if you win.
They should give AT&T $843,000.
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
some
if age 18 and male then hard porn and south park
if age > 18 and male and lives at home then any sci-fi movie (plus points if it's a sequel)
if age > 18 and female then any movie with Princess in the title
100% match up !
----------------------------------- My Other Sig Is Hilarious -----------------------------------
will be able to factor out movies people think that they *should* rate highly from those that they actually like. You all know what I mean, your movie snob friend tells you to watch the latest film festival gay cowboys eating pudding movie and you feel like you have to give it at least 3 stars because otherwise netflix would know you just want to watch stuff get blown up.
You know who you are netflix fakies!
Not all life is cyber. Extra Income
that's what i get for listening to, uh, slashdot: http://slashdot.org/article.pl?sid=06/10/09/1344235
just because I don't care doesn't mean I don't understand!
From my experience with the Netflix Prize, and ML/stat.learning techniques in general, that last 1.57% is going to be the hardest. There is a diminishing returns effect going on here, i.e. the effort required for each successive 1% increase gets progressively larger.
An old-timer with old-timey ideas.
Netflix's system is already 90.3% accurate!
Send email from the afterlife! Write your e-will at Dead Man's Switch.
Here's the problem I had...I was picking comedy movies - any and all. Netflix recommended Horror and Action flicks. That is 0% correct. Let's see a 10% improvement on 0% accuracy is...0! Sounds right. Netflix NEVER even came close to trying to get it right. It never suggested anything in the genre I was getting my movies from. AND to make it worse, when I would check the "not interested" box by the Horror/Action flicks, they would still show up as recommendations. Come on NetFLix - at least don't show me the ones I told you not to show me.
Stop by and watch a Christmas movie, commercial or cartoon! -->http://www.XmasDVD.com
I, for one, have really never found Netflix's recommendations all that useful. It sometimes recommends movies that I've already Netflixed. But to be fair I think they fixed that. It has recommended movies that I already have in my queue but most of the time it will be movies that I have no interest in at all. Then there was the time I turned in a 'G' rated movie, Disney I think, and it recommened ether Saw I or Saw II.
Not really sure where it got that one from. Nothing I had turned in that week had anything remotly to do with the Saw movies. I've even looked at my past rentals and can't find anything I think would tie me to those movies. I'm out of my blood and gore phase years ago.
Maybe it just thought my taste in movies sucked. That has to be it.
Supporting World Peace Through Nuclear Pacification
They're looking in the wrong places, and trying to squeeze blood from a stone.
User ratings are a deeply flawed way of getting this information. They're one-dimensional and prone to serious randomizations based on the user's mood; a 5 today might have been a 3 tomorrow. Since most of the movies that a user rates will be between 3 and 5 (it's just not that hard to spot a movie you're going to hate, so why would you rent it in the first place?) that makes the data... well, not valueless, but containing a lot less truth than you'd like.
Netflix has a huge amount of additional data that they're not using:
* What did the user look at?
* What did the user rent?
* How did they order their queue?
* How long did the user keep the film?
* When did the user add additional films that can be considered "related"?
* What did the user mark "not interested" (not included in the data set, IIRC).
If they want better recommendations, it's time to stop looking for the quarter under the lamppost and broaden their horizons. You probably can't anonymize all that data well enough to let the world compete for it, but if their internal developers with all that data can't beat outsiders with less, they need to hire some new researchers.
That's what you get for lack of reading comprehension, including familiarizing yourself with what the contest actually is. The contest wasn't to beat Netflix's algorithm. The contest was to beat it by 10%. Nothing in the summary of the original article was incorrect.
I think you have to consider that netflix is working off a very large user base with a very large list of titles. In this sense, computation time is going to go way up the more you keep adding all these factors. I'm sure they've had projects internal to netflix to use more data, but found that it just didn't pay off with the increased computation time. It's much better to get good recommendations onto the page instantly than make the user wait 2 seconds for great recommendations. The same is possibly true for doing recommendations ahead of time and having to spend the extra compute time and storage space.
Plus, I think there's always going to be some level of "noise" in the system. People rating things incorrectly (clicked on the wrong number), people changing their minds, etc. And then there's the cases where it makes no logical sense that if I liked movie A, B and C that I should hate movie D. The question is, how good can a recommendation system get when it will always be thrown off by the noise.
So while I agree with you in theory, I think it may not work out to be such a great thing in practice.
When I did use NetFlix, I spent a good amount of time flagging as many movies I did NOT want and would never, ever rent as those I did or would. The result was a pretty consistent selection that reasonably matched my taste.
They just want to access the "truthiness" of recommendations :-)
Interested in a Flash-based MAME front end? Visit mame.danzbb.com
Why not give users more control over their recommendations? Heck, even a bunch of checkboxes would be useful.
For example, Netflix frequently recommends rated R movies to my family, but we have never rented a single R-rated movie and have no desire to do so. Moreover, every time we get a recommendation for an R-rated movie, we rate it "Not Interested." I've probably marked dozens of R-rated movies "Not Interested," but they continue to be recommended. (Either Netflix is trying to tell me to just give in and rent one already, or they really don't understand my family's movie preferences.)
A simple checkbox for "Do not recommend R-rated movies" would be all Netflix needs to substantially improve its accuracy for my family. I imagine Netflix could add checkboxes for similar criteria as well. In any case, I think a key point is giving more control over recommendations to the users themselves.
the JoshMeister on Security
The most interesting part of the research paper was this: "More specifically, if movie i was rated x days later than movie j, we multiply their similarity by exp(-x/600). The denominator 600 (days) was determined by cross validation, and reflects the fact that after two years, similarity decays by approximately a factor of 3." Apparently Joe Average's tastes in movies slowly evolve over time, and something you liked three years ago may not be that attractive today.
This raises the question, should someone's age affect the denominator? People in or just out of college generally see their tastes evolve quickly, while people in retirement homes might take decades to get tired of something.
I also wonder if this decay factor applies to other fields. Not just books or music, but toothpaste or politicians. In the US, your representative is presumably re-elected before your opinion has time to change much; the president just as you're getting tired of him. It makes me wonder how Senators get re-elected at all.
Nothing for 6-digit uids?
How do we know whether it is even possible, theoretically, to improve it by 10%?
1. inherent randomness in each individual's ratings
if you give me a list of movies today to rate them, then the same list a week later, I probably would give inconsistent scores. the more randomness in it, the less predictable it is. hack, Netflix could have deliberately introduced some randomeness in it so that nobody could ever get the prize.
2. sample size
imagine there is a underlying theoretical model that drives us to rate the way we do. that model would have a gazillion number of parameters. even if we have a sample size of a million users each rating a thousand movies, it is unclear that it is big enough.
When I think of "Food Porn", I certainly don't think of coffee table cookbooks...
Development notes at http://devscribbles.blogspot.com
So Netflix has money to spend on improving movie recommendations.
I'm a subscriber--why not give me a little of that cash?
I mean, if my opinion of a movie is this valuable, I expect to be compensated for participating in the system.
And that's why I never rate anything on the internet--they haven't made it worth my time.
Actually, what I'd prefer is if Netflix would give me a list of movies recommended by a group of professional reviewers I tend to agree with.
And at $4.99/month, that list doesn't have to be more than 2 selections!
You know, I have actually had some luck with the system. Some suggestions were flops, but some were good.
---- "Excuse me. Where's the children's gun section?"
I've taken a bunch of Netflix recommendations. Some were brilliant and some were stupid, but I'm not afraid to list a movie as 1 or 2 stars, and I seldom rate a movie at 5 stars unless it was really, really good. I figure this helps the recommendation process.
I'd like to see 3 improvements, though:
1) Half-star ratings. I'm given recommendations in fractional star increments, and there are times where I think half-stars make sense -- there's been movies that haven't totally sucked, and 2.5 stars seems appropriate, and some that have been better than 4 star but not quite 5.
2) A secondary list of characteristics for movies that were 4+ or 2- stars: "I rated this movie the way I did PRIMARILY because of the: (1) acting/cast, (2) director/direction, (3) plot/story, (4) effects/action." I think something like this would catch a lot of anomalous ratings that otherwise break suggestion algorithms as well as provide more dimensions.
3) The ability to rate actors and directors. I'd also say screenwriters, but I don't think enough people have enough knowledge of screenwriters for them to be able to build a reasonable algorithm.