BellKor Wins Netflix $1 Million By 20 Minutes
eldavojohn writes "As we discussed at the time, there was a strange development at the end of Netflix's competition in which The Ensemble passed BellKor's Pragmatic Chaos by 0.01% a mere twenty minutes after BellKor had submitted results past the ten percent mark required to win the million dollars. Unfortunately for The Ensemble, BellKor was declared the victor this morning because of that twenty-minute margin. For those of you following the story, The New York Times reports on how teams merged to form Bellkor's Pragmatic Chaos and take the lead, which sparked an arms race of teams conjoining to merge their algorithms to produce better results. Now the Netflix Prize 2 competition has been announced." The Times blog quotes Greg McAlpin, a software consultant and a leader of the Ensemble: "Having these big collaborations may be great for innovation, but it's very, very difficult. Out of thousands, you have only two that succeeded. The big lesson for me was that most of those collaborations don't work."
the topic confuses me
The Ensemble beat BellKor by 0.01%... by their own reporting. According to Netflix, it was a tie. In the case of a tie, the first posted results wins.
It was a tie...
In football, I can see how a 20 second difference makes the difference between winning the superbowl. In a contest like this that took thousands of man hours of some brilliant people, calling Ensemble "second place" due to a 20 second difference is just wrong. I don't know if there was a better solution, but something just seems wrong about it all.
Are they actually putting the results into use?
Setting an arbitrary goal that only .2% of competitors could meet does not mean that most collaborations don't work. If 90% of the teams met the target, you probably wouldn't be so quick to claim that the vast majority of collaborations do work but rather that the goal wasn't high enough.
Sigs are too short to say anything truly profound so read the above post instead.
The big lesson for me was that big collaborations were the most successful.
In creating solutions for hard problems most of everything fails and is horribly difficult. No big surprise there. Kinda odd that was the quoted lesson...
Complexity Happens
it's still good for the CV.....
I agree that Ensemble "losing" because they posted 20 minutes later is a harsh result. However, those were the rules that Netflix set forth and Ensemble, intentionally or not, was making a risky gamble by waiting until right before the deadline to submit their project. And, perhaps the "tie goes to the earlier poster" rule makes some sense because it encourages making your submission earlier that you would otherwise and not "sniping" unless you're absolutely sure your project is better than the rest. At least as far as I can understand, the rule set forth the proper tradeoff -- Ensemble got to see the score to beat (BellKor's) before it posted; however, in exchange for that, its score needed to have been better in order to win. Had Ensemble wanted the first-mover's advantage and the win in event of a tie, it could have posted earlier than BellKor. The fact that BellKor posted only 20 minutes before the end of the competition suggests that Ensemble could have easily posted earlier without compromising its entry. That is, how much significant tinkering could have possibly been done in the last half hour of this multi-year competition?
OK, so somebody won a prize, offered by NetFlix, to do... what exactly?
Anyone? http://en.wikipedia.org/wiki/Elisha_Gray_and_Alexander_Bell_telephone_controversy
More importantly, are the algorithms open and not encumbered by any patents ? Will there be software, except Netflix that is, that will use them ?
Ratings systems are inaccurate because people tend to cluster their ratings towards the extremes, for a number of reasons. (I would go into what I believe to be those reasons and the conditions under which they are triggered, but it's really late.)
My proposed solution is to require ratings to conform to some probability distributions and fit some criteria:
1. A user's votes should be approximately normal, with some degree of deviation permitted.
2. [Approximately] 90% of everything is crap/crud (the quantized version of Sturgeon's Law) (for some definition of "crap/crud").
And a few more rules based on observations I have made but don't feel like listing (again, because it's late).
I think he's pointing to one of the inefficiencies of prize systems as a way to spur innovation. Thousands of people tried, spending tens or hundreds of thousands of work-hours and other resources, and only a fraction got "winning results" (yes, according to the arbitrary way that winning was defined). But the point is that the prize probably resulted in a very inefficient use of resources. We could hypothesize that the same result might have been achieved with only 25% of the resources spent on the prize - for example, by making the cost of entry non-zero, you could have eliminated teams with no chance of winning from participating.
Basically prize systems benefit from people's inability to accurately assess their real chances of winning - or put another way, prize systems free ride off of people's self-delusion.
Of course there are other factors to be considered, e.g., what would those wasted resources have gone to if they were not being used for the competition, perhaps there are incidental rewards to those resources having been used, perhaps people competed for reasons other than simply winning the prize, etc.
"Anyone who [rips a CD] is probably engaging in copyright infringement." - David O. Carson
This doesn't work. If you make the entry cost nonzero, you'll be much less efficient at doing *science*. Remember, the journey is much more important than the result. The benefits to society in disseminating knowledge of data mining technologies and good datasets largely dwarfs the knowledge of the winning entry (think Metcalfe's law).
For those who would do this w/o interest in money because they have such a passion for this sort of thing, this result won't phase them. But for others, the sheer mortality rate of these attempted collaborations, tied in with the company's apparent disinterest to provide something noteworthy to the other team due to a minor technicality is going to discourage people. Imagine how the losing team could turn on each other..."If only you didn't have to take that 25 minute crap we'd be cashing in!" "If only we had slept less!" etc. Really I hope the losing teams ends up like American Idol finalists who don't win but still go on to get successful contracts because they were still good despite a superficial margin that really serves no purpose to differentiate the two teams as far as competitive skill and knowledge of the contest. The results were fair and legal but this is a bittersweet victory with a bad forecast for future competitors if they have to be more concerned about these stupid idiosyncrasies. Hopefully Netflix or other companies with an idea like this have the foresight to create a more realistic margin and split the winnings in the fashion that is certainly more reasonable and gives both teams the recognition they deserve. The perfect situation here would be both teams getting half the money and simply stating in the press release the time of submission so those who MUST have bragging rights and hate the concept of a tie can bring up the submission time in a casual setting.
Basically prize systems benefit from people's inability to accurately assess their real chances of winning - or put another way, prize systems free ride off of people's self-delusion.
Pretty much. I had a look at the data early on, verified that by a tiny bit of cleverness I could hit the existing performance mark with far less iron than I'm sure NetFlix throws at the problem, recognized that getting improvements over that were going to take huge efforts in time and computing resources given the structure of the data, looked at what the other teams were doing--which was running hundreds of different algorithms and merging the results, validating my judgement on the difficulty of the problem--and decided the scope of the problem was far bigger than the resources I had available.
Anyone with a reasonable level of algorithmic experience on large numerical datasets would have made the same judgement, leaving only two kinds of people in the competition: the ones with huge corporate or university resources available to them, or the ones who had no real clue how hard the problem actually was. Sometimes the latter were able to collaborate with the former, which was probably useful. Every team needs its deluded optimists.
Blasphemy is a human right. Blasphemophobia kills.
From what I read, Netflix is implementing two or three 'parameters' or 'methods' (out of possibly thousands the teams may have used) for now. (Can't find the link atm)
>> I think he's pointing to one of the inefficiencies of prize systems as a way to spur innovation. Thousands of people tried, spending tens or hundreds of thousands of work-hours and other resources, and only a fraction got "winning results" (yes, according to the arbitrary way that winning was defined). But the point is that the prize probably resulted in a very inefficient use of resources. We could hypothesize that the same result might have been achieved with only 25% of the resources spent on the prize - for example, by making the cost of entry non-zero, you could have eliminated teams with no chance of winning from participating.
>> Basically prize systems benefit from people's inability to accurately assess their real chances of winning - or put another way, prize systems free ride off of people's self-delusion.
>> Of course there are other factors to be considered, e.g., what would those wasted resources have gone to if they were not being used for the competition, perhaps there are incidental rewards to those resources having been used, perhaps people competed for reasons other than simply winning the prize, etc.
So what you're saying is that NetFlix recognized a large leverage on their prize dollars, and contestants received non-tangible rewards for their participation.
Or in other words, prize systems are an efficient way to spur innovation. See also [[X Prize]]
-- I was raised on the command line, bitch
Your experience was very different from mine.
I found an obvious solution and wrote it down in the margin of a book. I even discovered a proof of this, but the margin was too narrow to contain it.
The benefits to society in disseminating knowledge of data mining technologies and good datasets largely dwarfs the knowledge of the winning entry (think Metcalfe's law).
You're only considering the benefits to society that result from this particular competition. The argument about prize systems being inefficient has to do with the fact that while they generate huge interest in a particular topic (and yes, generate more returns than simply the winning entry), they also result in an inefficient allocation of resources to that one particular topic.
I.e., some of the entrants would likely have benefited society more by flipping burgers or sweeping sidewalks than by wasting their time on the Netflix prize.
The problem is somewhat reduced if you have a large number of prizes on various topics, because then people can devote their time to areas where they have more of a chance of winning, or if you make the cost of entry non-zero (it can still be very low - anyone with any real interest and talent will not be turned off by a $1 or $5 registration fee, or by a simple test to assess their capabilities).
"Anyone who [rips a CD] is probably engaging in copyright infringement." - David O. Carson
Out of thousands, you have only two that succeeded.
Yes, because the Netflix rules were set up in such a way as to encourage winners to submit their results as soon as possible upon success. They're not going to wait around to give anyone else the chance to reach the same goal first. You might as well say "Only two people crossed the tape during that photo finish! The other thousand runners are failures!"
The big lesson for me was that most of those collaborations don't work.
By this standard, zero non-collaborations worked.
In fact, comparing the two skillsets is fallacious, since they are not substitutable: An AI programmer can certainly flip burgers with minimal training, but the converse is not true.
I didn't see anything in the article about when Netflix may implement the new algorithms? I've rated a ton of stuff on Netflix and seem to have totally confused the current system because I rarely get any recommendations and when I do they are totally off. For example I rated a Japanese horror film highly and Netflix then suggested 3 european romantic films (one comedy and two dramas).
http://www.popularculturegaming.com -- my blog about the culture of videogame players
The burger flipping example was facetious, of course. The point being that it doesn't matter if people are following their preferences - people do not automatically prefer to do that which they are most efficient at doing.
So, there simply is no shortage of burgerflippers in society as a direct result of the existence of the prize, only an increase in AI skills among a subpopulation.
Assume the entrants all had moderate computer programming skill. There was likely a lot of duplicative effort in the competition (this happens in other types of research as well). Overall benefits may have been greater if 50% of the entrants worked on 100 different open-source software projects (or 10 different prize projects) rather than everyone working on the NetFlix prize.
"Anyone who [rips a CD] is probably engaging in copyright infringement." - David O. Carson
Maybe I am alone here; but the only real trend in my movie likes is that I only watch GOOD movies. I have seen nothing in any of the articles on this that account for that. If I enjoyed 12 Monkeys; don't be suggesting Battlefield Earth to me just because they are both SF movies. To me, a better suggestion for a fan of 12 Monkeys would be Momento.
Humor from a Genetically Molested Mind
The Hutter Prize's incremental prize awards for progress, itself modeled on the M-Prize, is a superior way of awarding prize money. There is continual reward for teams that contribute substantially and no one team takes everything based on a technicality.
Seastead this.
My netflix queue contains movies chosen by me, my wife and my children and sometimes chosen for a visiting friend. If they would only allow me to maintain separate queues or tag the content as to who chose it, I would have thought that it would make predicting what we each like much easier. It's the same with itunes, the "genius" must think I'm schizophrenic.
Nullius in verba
That's a strange way of viewing efficiency. Who gets to decide what is worthwhile to do for others? You're implying here a framework of value judgements independent of individual preferences, whereas typical definitions of efficiency only require individual preferences.
Now from the second part of your comment, I have to infer that these value judgements have somthing to do with a certain dislike of duplication. That's fine as a personal preference, but does it make sense for assessing social value? On the contrary, I'd argue that duplication is important.
Take a school for example: every child is supposed to learn to read and write. That's an extreme form of duplication, which is inefficient according to the above value judgement. Thus it would be better for society if only one child or only a few learned to read and write, and all the others would simply go "flipping burgers" or do whatever their (lack) of skills made them efficient at: mining, or sewing clothes, etc.
But this is ludicrous, we know the benefits of educating children, even if it requires a lot of duplication. We also know the economic benefits of mass producing goods, which is another form of duplication which often results in lower prices. Why should research be different? It isn't: duplication is the common mechanism that ensures the verification of claims.
All of these examples apply to a competition like the Netflix prize: 1) there's the education dimension, people who compete learn new skills that will show up when the next wave of social websites gets produced, 2) there's the economic dimension, since a lot of participants will know how to do the same type of algorithms, thereby decreasing the cost for those who wish to hire specialists, and 3) there's the verification dimension, because duplicated algorithms are never 100% the same (different parameters, code paths etc), and comparing the dupes is a robust way of assessing the algorithm's capabilities.
I don't think you are talking about efficiency accurately. By your reasoning, every competitive activity, anywhere, should be done away with since the participants could have been of more value to society by doing any productive job. And yet, how much would you have to pay those people in order to do the janitorial work or burger flipping instead of whichever competitive activity they would choose to do voluntarily? That is the measure of the inefficiency of your argument.
Any time participants in any activity have real choices whether or not to participate, you cannot get inefficient outcomes. There are any number of activities where participants do not have real choices about whether or not to participate, and those activities tend to not have efficient outcomes. But people and teams who have been working for years, making submissions to Netflix, and getting visible feedback as to how they are progressing can certainly make efficient decisions. Those people know exactly how much effort they are putting into the submissions. They have a really good idea as to how their algorithm is doing. They have at least a fair idea as to the number of competitors. In fact, if you raise a barrier to entry, you're more likely to get inefficient outcomes. Human beings are largely risk-adverse in their decision making. In order for them to compete at all, they must feel that they are getting something out of the competition. Start charging to enter (even small amounts), and competitors that would do well won't enter. They'll go on to regular jobs where the pay isn't anywhere near a million dollars for success, but the risk of receiving nothing is pretty close to zero. The whole point of contests such as these are to let various people who think they have a good idea to test their ideas. If Netflix knew what was going to make for a good idea, they wouldn't have the contest. They'd put the good idea in code and let it play out.
Competitors value competition in and of itself, in a way that most employees don't value employment for itself. Employees value getting paid. Being employed is a means to that end.
I don't expect morality, equality, consistency, or justice from the law. I expect only legality.
That's a strange way of viewing efficiency. Who gets to decide what is worthwhile to do for others? You're implying here a framework of value judgements independent of individual preferences, whereas typical definitions of efficiency [wikipedia.org] only require individual preferences.
Your assumptions are 1) all entrants for the NetFlix prize prefer to spend their time on the NetFlix prize rather than something else (reasonable, and I agree to some extent); and 2) because these entrants prefer to spend their time doing this, it is efficient for them to do so (because you claim that efficiency is defined by preferences).
Your second assumption relies on a definition of efficiency as individual utility maximization, which in turn assumes that individual utility is defined by preferences. Those are valid definitions and assumptions, but they are somewhat arbitrary. Your own link points to several other definitions of efficiency that do not necessarily rely on individual utility maximization.
Now from the second part of your comment, I have to infer that these value judgements have somthing to do with a certain dislike of duplication. That's fine as a personal preference, but does it make sense for assessing social value? On the contrary, I'd argue that duplication is important.
Duplication in the sense of replication of scientific work, or in the sense of duplicative effort of a bunch of schoolchildren learning the same thing is of course necessary and desirable (to some extent). Both of those examples have limits - only a certain amount of replication in science is useful. You don't want all scientists spending all their time replicating the same experiment. Likewise, while we do want all children to learn to read and write, we don't want all children to learn how to act, how to repair cars, how to program computers, or how to fly planes. (Note that this is true regardless of preferences - even if all children preferred to learn how to repair cars, it would not necessarily be most efficient for them to do so). Time and resources are limited, so specialization is necessary.
Duplication is only useful to a certain extent. Many entrants likely would have a comparative advantage in working on some other project, or learning some other skill than working with recommendation algorithms. However, because there is no prize giving them an incentive to work on a project where they have a comparative advantage, they instead all flock to the NetFlix prize.
The whole criticism essentially comes down to the point that if you are going to use prizes widely as a means of encouraging innovation, it's better to have many prizes that will draw people to various areas of interest, rather than having one big prize, which will draw too much interest.
"Anyone who [rips a CD] is probably engaging in copyright infringement." - David O. Carson
> Greg McAlpin, a software consultant and a leader of the Ensemble: "Having these
> big collaborations may be great for innovation, but it's very, very difficult. Out of
> thousands, you have only two that succeeded. The big lesson for me was that most of
> those collaborations don't work."
Tough luck on the loss. Oh, and you're an idiot.
Saying "only two" worked is like saying "only one person actually found the car keys and all the other guys looking are a big Fail".
See, they stop looking after it's found. Two possibilities: they start slowly pouring in results faster and faster, like a Marathon race end, or nobody else does, because your two groups were the only two bright enough to begin with. In which case a mass attack is still useful the same way gym class for every student is useful to find fast runners for the NFL.
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
perhaps there are incidental rewards to those resources having been used
Right - everybody who seriously competed greatly enhanced their own personal knowledge of the field. I'd bet that most of that new working knowledge is not left to waste. There is a ripe market for prediction systems, and even the worst of the entrants can probably fulfil somebody's small need.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
it's better for competition. If the ensemble won then it would be like taking the 2nd 3rd and 4th placed runners and making their combined effort worth of a gold medal, and condemning the actual winner to 2nd place and silver
Go back to trolling sci.math, James :-(
Learn how to use emoticons, Dave. Unless of course, you really were sad about telling him to go back to sci.math?
Anyone who's been exposed to the ramblings of James Harris, even for a short time, will quickly discover a nasty taste in their mouths.
Hence the sad emoticon.