BellKor Wins Netflix $1 Million By 20 Minutes

← Back to Stories (view on slashdot.org)

BellKor Wins Netflix $1 Million By 20 Minutes

Posted by kdawson on Monday September 21, 2009 @05:58PM from the seven-guys-and-a-million-bucks dept.

eldavojohn writes "As we discussed at the time, there was a strange development at the end of Netflix's competition in which The Ensemble passed BellKor's Pragmatic Chaos by 0.01% a mere twenty minutes after BellKor had submitted results past the ten percent mark required to win the million dollars. Unfortunately for The Ensemble, BellKor was declared the victor this morning because of that twenty-minute margin. For those of you following the story, The New York Times reports on how teams merged to form Bellkor's Pragmatic Chaos and take the lead, which sparked an arms race of teams conjoining to merge their algorithms to produce better results. Now the Netflix Prize 2 competition has been announced." The Times blog quotes Greg McAlpin, a software consultant and a leader of the Ensemble: "Having these big collaborations may be great for innovation, but it's very, very difficult. Out of thousands, you have only two that succeeded. The big lesson for me was that most of those collaborations don't work."

7 of 104 comments (clear)

Min score:

Reason:

Sort:

Bad Summary by Anonymous Coward · 2009-09-21 18:11 · Score: 5, Informative

The Ensemble beat BellKor by 0.01%... by their own reporting. According to Netflix, it was a tie. In the case of a tie, the first posted results wins.
1. Re:Bad Summary by tangent3 · 2009-09-21 20:39 · Score: 5, Informative
  
  The Ensemble beat BellKor by 0.01% on the quiz set. Basically there are 2.8 million records in the qualifying set that the teams must predict the grades of. Half of the records (which half is known only to Netflix) form the quiz set, the other half form the test set. Teams submit their prediction a limit of once a day to get a result from the quiz set, but the final decision of who won is made on the result of the test set.
  So even though Ensemble beat BellKor on the quiz set, the test set results came back dead even.
nonsense by wizardforce · 2009-09-21 18:34 · Score: 5, Insightful

The big lesson for me was that most of those collaborations don't work."
Setting an arbitrary goal that only .2% of competitors could meet does not mean that most collaborations don't work. If 90% of the teams met the target, you probably wouldn't be so quick to claim that the vast majority of collaborations do work but rather that the goal wasn't high enough.

--
Sigs are too short to say anything truly profound so read the above post instead.
Funny, I learned a different lesson... by Squiggle · 2009-09-21 18:37 · Score: 5, Insightful

The big lesson for me was that big collaborations were the most successful.
In creating solutions for hard problems most of everything fails and is horribly difficult. No big surprise there. Kinda odd that was the quoted lesson...

--
Complexity Happens
The Rules are the Rules... by Anonymous Coward · 2009-09-21 18:49 · Score: 5, Interesting

I agree that Ensemble "losing" because they posted 20 minutes later is a harsh result. However, those were the rules that Netflix set forth and Ensemble, intentionally or not, was making a risky gamble by waiting until right before the deadline to submit their project. And, perhaps the "tie goes to the earlier poster" rule makes some sense because it encourages making your submission earlier that you would otherwise and not "sniping" unless you're absolutely sure your project is better than the rest. At least as far as I can understand, the rule set forth the proper tradeoff -- Ensemble got to see the score to beat (BellKor's) before it posted; however, in exchange for that, its score needed to have been better in order to win. Had Ensemble wanted the first-mover's advantage and the win in event of a tie, it could have posted earlier than BellKor. The fact that BellKor posted only 20 minutes before the end of the competition suggests that Ensemble could have easily posted earlier without compromising its entry. That is, how much significant tinkering could have possibly been done in the last half hour of this multi-year competition?
Re:The Objective by martin-boundary · 2009-09-21 20:17 · Score: 4, Informative

Though I'm not convinced my ratings are really all that accurate anyway. I'm pretty sure if I'm in a certain mode before I see some movies I'd rate them quite a bit differently than other times, though without some way to wipe my memories of seeing it the first time, I'm not sure how I'd actually test that.

If you phrase it like that, you're somewhat missing the point. The target was to minimize an average prediction error over a large number of people, not the prediction error for a single person (eg you).
Here's an analogy which might help: Suppose you play the lottery and you try to predict 6 numbers exactly, then you'll have a vanishingly small chance of getting them right. But suppose you submit millions of sets of predictions, all different, then your chance is much larger of getting the actual 6 right.
Now the Netflix contest required predicting a few million ratings, and even if any one rating might be very far off the target, the task only required making sure that a large proportion of the predictions were pretty close to each of their targets and the remaining ones were not too far off.
The winners were able to make several million predictions such that most of them were, on average (in the RMSE sense used a lot in engineering), a distance of 0.85 from the real rating.
Even if in some instances their predictions were off by 4 (ie predict 1 when it is 5). For example, with 4 million predictions, if 1% of their predictions are off by 4, that's 40,000 instances of being off by 4, but this has to be compensated by several percent of being off by 0 if you want to get 0.85 on average.
Re:The Objective by crunchyeyeball · 2009-09-21 20:59 · Score: 5, Informative

Basically, you were asked to predict how a number of users would rate a number of movies, based on their previous ratings of other movies.
You were supplied with 100 million previous ratings (UserID, MovieID, Rating, DateOfRating), with the rating being a number beween 1 and 5 (5=best), and asked to make predictions for a seperate ("hidden") set comprising roughly 10% of the original data. You could then post a set of predictions to their website which would be automatically scored, and you'd receive a RMSE (Root Mean Squared Error) by email.
To avoid the possibility of tuning your predictions based on the RMSE, you could only post one submission per day, and the final competition-winning results would be scored against a seperate hidden set, independent of the daily scoring set.
It really was a fantastic competition, and anyone with a little coding knowledge (or SQL knowledge) could have a decent go at it. Personally, I scored an RMSE of 0.8969, or a 5.73% improvement over Netflix's benchmark Cinematch algorithm, having learnt a huge amount based on the published papers and forum postings of others in the contest, and my own incoherent theories.
In a way, everyone wins. Netflix gets a truly world-class prediction system based on the work of tens of thousands of researchers around the world hammering away for years at a time. Machine learning research moves a big step forward. BellKor et al get a big juicy cheque, and enthusiastic amateurs like myself get access to a huge amount of real-world research and data.