New Leader In Netflix Prize Race With One Day To Go

← Back to Stories (view on slashdot.org)

New Leader In Netflix Prize Race With One Day To Go

Posted by Soulskill on Sunday July 26, 2009 @02:15AM from the sniped-like-an-ebayer dept.

brajesh writes "The Netflix Prize, an algorithm competition to improve the Netflix Cinematch recommendation system by more than 10%, has a new leader — The Ensemble — just one day before the competition ends. The 30-day race to the end was kicked off after BellKor's Pragmatic Chaos submitted the first entry to break the 10% barrier, with the results showing a 10.08% improvement. The Ensemble, made up of three teams who chose to join forces ('Grand Prize Team,' 'Opera Solutions' and 'Vandelay United), has managed to overtake BellKor with a score of 10.09% — an improvement of .01% over the former leaders. From the article on Techcrunch: 'The competition will end [today], so teams still have a little bit of time left to make their last-second submissions, but things are looking good for The Ensemble. This has to be absolutely brutal for team BellKor.'"

87 comments

Min score:

Reason:

Sort:

I think by sys.stdout.write · 2009-07-26 02:18 · Score: 5, Insightful

that other websites should do this as well.

Slashdot, for instance, could have a contest to unbreak their fucking code by 10%.
1. Re:I think by Anonymous Coward · 2009-07-26 02:37 · Score: 3, Funny
  
  Are you joking? Slash is written in Perl, the best maintenance method is too start again.
  
  (Joking, partly).
2. Re:I think by Vectronic · 2009-07-26 02:37 · Score: 3, Insightful
  
  (-1 Offtopic) But, I've sort of hoped that a site, such as Slashdot, should somehow open-source their site code, it a sort of "community", and considering the context of the site, the amount of users, there are probably about 5,000 people capable of contributing decent code/help, and there has to be a rather significant number of those that are willing to.
  Add a section devoted to it, then Polls, about which contribution should be implemented, etc. Articles/Submission are sort of (controlled) "open-source", why not the site itself?
3. Re:I think by BadAnalogyGuy · 2009-07-26 02:40 · Score: 0
  
  *ahem*
  http://slashdot.org/faq/code.shtml
4. Re:I think by Blue+Stone · 2009-07-26 02:40 · Score: 5, Funny
  
  >Slashdot, for instance, could have a contest to unbreak their fucking code by 10%.
  I remember playing Call of Cthulhu many years ago and being told of the hideously deranging results of mere mortals who happened to gaze upon the unspeakable things that lurked in the dark places.
  I beg you not to lead others down your insane and twisting path.
  NO GOOD CAN COME OF IT! NO GOOOD!
  
  --
  Corporation, n. An ingenious device for obtaining individual profit without individual responsibility. - Ambrose Bierce
5. Re:I think by caramelcarrot · 2009-07-26 02:41 · Score: 1
  
  There is slashcode, but that project seems to be stagnant. http://www.slashcode.com/
6. Re:I think by Vectronic · 2009-07-26 02:43 · Score: 1
  
  Yeah, I know about that, but I think the reason that it is stagnant, is because it's not really a part of "Slashdot", it's off in it's own little URL world, it should be merged into slashdot.org
7. Re:I think by Exception+Duck · 2009-07-26 02:46 · Score: 1
  
  This has been empty for a while now:
  http://www.slashcode.com/sites.pl
  
  --
  Exception Duck - may or may not contain chicken.
8. Re:I think by houstonbofh · 2009-07-26 03:11 · Score: 1
  
  So you are saying that looking at all of the slashdot code, and actually understanding it breaks your mind? Well that explains this nasty system choking javascript then.
9. Re:I think by Anonymous Coward · 2009-07-26 04:18 · Score: 0
  
  And your suggestion would be too rewrite it in PHP ? *cough* *script-kiddie* *cough*
  I just hope your syntax is better than your grammar.
10. Re:I think by Anonymous Coward · 2009-07-26 04:37 · Score: 0
  
  You mean like this?
  http://www.slashcode.com/
11. Re:I think by Anonymous Coward · 2009-07-26 05:10 · Score: 0
  
  Syntax is a subset of grammar, you insensitive clod!!
12. Re:I think by brentonboy · 2009-07-26 05:15 · Score: 1
  
  Syntax is a subset of grammar, you insensitive clod!!
  Ha! But you're equivocating, of course. He means code syntax.
Uve Boll by Afforess · 2009-07-26 02:19 · Score: 2, Funny

What did they do, make sure that all of Uve Boll's movies never came up as a "Recommended for you" movie?

--
If our elected representatives no longer represent us, do we still live in a Democracy?
1. Re:Uve Boll by Anubis+IV · 2009-07-26 06:17 · Score: 1
  
  That was the first 9%, but that last 1.08% took a bit more thinking.
2. Re:Uve Boll by strat · 2009-07-26 16:22 · Score: 1
  
  Make it Michael Bay and I'd say we have a winnah!
I used to be very elitist about my reading by BadAnalogyGuy · 2009-07-26 02:20 · Score: 2, Interesting

Back when I first began using Amazon.com, I never bought a book based on the recommended items. I felt the recommendations were trite, ill-advised, and typically only peripherally related to the item I was buying.
Then the recommendations got better. Much better. I started to find myself buying things right out of the recommended section, and the product combination deals also became very tempting.
If Netflix can turn their recommendation engine into something similar, they will be sitting on a goldmine. As they say, people hate get sold to but they love buying.
1. Re:I used to be very elitist about my reading by Anonymous Coward · 2009-07-26 04:03 · Score: 0
  
  Looks like you aren't a Netflix customer. Their rec engine is already way better than amazon's.
  
  If you are going to make assumptions, at least try to be on the safe side, for fuck sake.
2. Re:I used to be very elitist about my reading by BadAnalogyGuy · 2009-07-26 04:14 · Score: 1
  
  The only assumption that I made was that the recommendation engine could be improved.
  With approximately 10,000 subscribers (as of 2008), and 1.3B in revenues from these subscribers, even a 1% increase in rentals would be worth 10 times the 1M they are paying to the winner of this contest.
  Amazon has almost 20B in revenues from a much larger group of customers. A 1% increase per customer here would be huge.
  Netflix, in addition to increasing the number of rentals per customer, should also be thinking about increasing the total number of customers.
3. Re:I used to be very elitist about my reading by Anonymous Coward · 2009-07-26 06:03 · Score: 0
  
  With approximately 10,000 subscribers (as of 2008), and 1.3B in revenues from these subscribers, even a 1% increase in rentals would be worth 10 times the 1M they are paying to the winner of this contest.
  How do you figure that? It's not as if Netflix charges by the DVD rental; people pay for the ability to rent/keep x DVDs at a time, for as long as they pay for that plan. People will (usually) already have a large number of movies they want to watch stored in their queue, so queue length isn't driving these people to more expensive subscription plans. I fail to see how a better recommendation system (which will simply lead to more movies in the queue) is going to translate into a significant number of customers making the leap to an x+1 subscription plan.
4. Re:I used to be very elitist about my reading by mattack2 · 2009-07-26 07:24 · Score: 1
  
  Wow, you need movies & books to be recommended?
  I have far more movies & books that I'm at least *vaguely* interested in than I can 'consume'. (A large part of the reason I started using the Netflix profile system was because of the 500 item limit in the queue.. and yes, I realize I won't ever watch the VAST majority of them.. but I would add movies/TV shows/documentaries that sounded interesting, and hit the limit. Note obviously a lot of the multiple items are separate discs in a collection, such as 'extras' discs or multiple discs of a TV show. I have since mostly separated into movies & TV profiles, but haven't moved all TV shows on my orig huge list to the TV profile.)
  Don't get me wrong, I'm interested in improving the recommendation system just for curiosity/algorithmic reasons.
Why now? by Anonymous Coward · 2009-07-26 02:22 · Score: 1, Insightful

Why not wait another day before submitting the improvement? All they did now was giving the other team one day to respond, and if they succeed, I doubt they will be able to submit yet another improvement. So why not simply wait until an hour or so before the deadline, or am I missing something about the rules, e.g. any submitted improvements prolong the deadline by one day?
1. Re:Why now? by garcia · 2009-07-26 02:32 · Score: 1
  
  Maybe they already have a solution which is higher and they are just being dicks? Maybe they aren't dicks at all and want to see the best team win? Maybe they think that their solution is unbeatable?
  Whatever it is, it is certainly a lot more interesting than I thought it'd ever be. Kudos to the groups that have broken the 10% barrier!
2. Re:Why now? by Anonymous Coward · 2009-07-26 02:44 · Score: 3, Insightful
  
  It does seem like a slight flaw in the rules if there is only one 30-day countdown timer. That is, if a competing team can hold off until the last moment to release their version that bests the current leader, as is the case here. Now that this improvement has been made public, there should be something like a 10-day response time for the other competing teams.
3. Re:Why now? by caffeinemessiah · 2009-07-26 02:47 · Score: 4, Interesting
  
  Why not wait another day before submitting the improvement? All they did now was giving the other team one day to respond, and if they succeed, I doubt they will be able to submit yet another improvement. So why not simply wait until an hour or so before the deadline, or am I missing something about the rules, e.g. any submitted improvements prolong the deadline by one day?
  For the grand prize, there was a final 30-day countdown from the time the first entry that achieved greater than 10% was received, which was a month ago. So it seems like this will indeed come down to an ebay-like sniping situation in the last few hours.
  I wouldn't feel too sorry for BellKor/KorBell though -- they've got many, many best paper awards at conferences and a huge degree of publicity out of the whole endeavor. In fact, in KDD 2009, they detailed most of the methods that most likely got them to the top -- i.e. they incorporated the fact that tastes and preferences drift over time. Simple, in retrospect of course. If you have an ACM subscription, you can read the 2009 paper here.
  Plus, since they work for AT&T/Yahoo Research, I remember Yehuda Koren stating that the money wouldn't have gone to them anyway -- possibly a large bonus, but I think they're entitled to that anyway. So I wouldn't feel too sorry for them.
  
  --
  An old-timer with old-timey ideas.
4. Re:Why now? by SpinyNorman · 2009-07-26 02:51 · Score: 1
  
  Try and flush out the competition, maybe? (unles it really is the best they have, or think they'll have).
  Or perhaps try to lull the competition into a false sense of security by only edging them by a hair, when they something better held back?
  Of course, with the amount of effort the teams have put into this, and the money at stake, you'd be nuts not to keep working on it flat-out until the time runs out; but still, if you're tired it could make a difference if you think you've got the competition by a comfortable margin as opposed to knowing you're in a losing position because they've already submitted their true best shot, or something close to it.
5. Re:Why now? by tonycheese · 2009-07-26 03:33 · Score: 1
  
  Well, perhaps they did not know exactly how Netflix would rate their efficiency until after a submission. .01% is a pretty close difference, and they might not have known whether they would overtake first or not without submitting and having their algorithm run by Netflix.
6. Re:Why now? by brian_tanner · 2009-07-26 04:49 · Score: 5, Informative
  
  It's also true that the winner is not the person who gets the highest score on the leaderboard. Most people seem to miss this.
  
  The leaderboard gives score on the QUIZ dataset, which is half of the answers that the team submits. The WINNER of the million dollars is the person who does best on the TEST dataset, the other half of the answers they submit. Nobody knows how good these guys are doing on the TEST set, either team could be overfitting the quiz set.
7. Re:Why now? by Anonymous Coward · 2009-07-26 07:31 · Score: 0
  
  Nobody knows how good these guys are doing on the TEST set, either team could be overfitting the quiz set.
  Yeah, but if they're not hitting the 10% mark on the quiz set, then they're probably not going to hit the target 10% on the test set either, regardless of whether they're overfitting to the public data.
8. Re:Why now? by brian_tanner · 2009-07-26 08:27 · Score: 1
  
  Yeah, but if they're not hitting the 10% mark on the quiz set, then they're probably not going to hit the target 10% on the test set either, regardless of whether they're overfitting to the public data.
  Yeah, there is a flaw in the evaluation mechanism, in my opinion. The good thing is that you don't need to hit 10% on the test set to win the money. Whatever team is qualified (10% on quiz) AND has the best test score wins. Even if they have terribly overfit the quiz set (the quiz set has been around for years now), and have terrible performance on test, one of the two qualified teams will win the money.
  
  The flaw is that other teams that have not hit 10% on quiz might be doing better on test. If that's true, those people cannot win the money, even though they apparently have a stronger (less overfit) solution. Of course, all of these scores are ridiculously close to each other anyway, but it seems contrary to the nature of a competition if the winner is not the team with the best submitted solution.
  
  I sincerely hope that no matter what happens, ALL of the test scores (for all teams) are revealed, so everyone can see what was really happening.
9. Re:Why now? by currivan · 2009-07-26 11:02 · Score: 2, Informative
  
  In fact, according to the second post by Yehuda Koren in this thread, it looks like BelKor does have the best test error rate and will be declared the winner. http://www.netflixprize.com/community/viewtopic.php?id=1498
should've "gamed" it by petes_PoV · 2009-07-26 02:29 · Score: 4, Interesting

rather than declaring your best result early, the Belkor team should have employed a bit of strategy and only declared a lesser result (if any). That would give the other teams something to aim at, without giving away their best results. These would be held back right up until the last minute and then submitted, so that other teams would not have time to make any further improvements (in fact, maybe this IS what they're doing). It's been a successful bidding strategy on eBay for years, so why wouldn't it translate into other competitive areas too?

--
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
1. Re:should've "gamed" it by stuckinarut · 2009-07-26 02:34 · Score: 5, Insightful
  
  Who's to say they haven't? People smart enough to win this competition are probably smart enough to think of this.
2. Re:should've "gamed" it by SpinyNorman · 2009-07-26 02:36 · Score: 1
  
  I'd be very surprised if Belkor doesn't have something better to submit at the last second.
  It'd certainly have been an awful strategy to trigger the endgame with all your cards on the table.
3. Re:should've "gamed" it by Manip · 2009-07-26 02:36 · Score: 4, Insightful
  
  This isn't eBay, they can't just magic high scores.
  If you game it or otherwise, everyone will end up submitting their max score, because, well... Why wouldn't they? Who cares if the other team knows you have 10.8%... Either they can beat it and will submit that score, or they cannot and won't.
4. Re:should've "gamed" it by mrvan · 2009-07-26 02:38 · Score: 1
  
  Maybe they did, and the 10.08 (pretty minimal increase from 10) was their low end result, and they will announce their 25% increase result in the coming day..
  Then again, maybe they didn't :-)
5. Re:should've "gamed" it by MartinSchou · 2009-07-26 02:47 · Score: 1
  
  The 10.08 was a 10.08% improvement over the original system. That's not exactly a minimal increase, and considering that the new leaders posted a 10.09% improvement over the original (0.0098% better than 10.08%) it's rather harsh to write off the 10.08% improvement as "pretty minimal".
6. Re:should've "gamed" it by Jah-Wren+Ryel · 2009-07-26 02:59 · Score: 1
  
  If you game it or otherwise, everyone will end up submitting their max score, because, well... Why wouldn't they? Who cares if the other team knows you have 10.8%... Either they can beat it and will submit that score, or they cannot and won't.
  OR maybe they can do better than 10.8% but because they thought they had it in the bag, they didn't put the extra effort in to really push those improvements through and now, with less than a day left, they don't have the time to get those improvements fully polished enough for submission
  
  This isn't eBay, they can't just magic high scores.
  Actually this is precisely like ebay. It appears that the prize got "sniped" out from under BellKor. The problem, just like ebay, is that the process has a fixed end-date. The way to avoid this problem (and produce the best results for the "seller" - in this case NetFlix - is to having a rolling end-date that is always a fixed period after the most recent highest result submission.
  Don't get me wrong, I am a BIG fan of sniping, but then I'm always a buyer on ebay, not the seller, and sniping is the best bidding policy to keep bidding-wars at bay.
  
  --
  When information is power, privacy is freedom.
7. Re:should've "gamed" it by Stile+65 · 2009-07-26 03:01 · Score: 1
  
  Actually, Netflix used a different way to prevent gaming the system. They split the submitted predictions into two sets - the "quiz" set and the "test" set. The quiz set results are on the leaderboard; the test set is used for final judging.
  
  --
  I claim first use of "Error No. 0B" - or "No. 0B error." It'll be the new ID 10T!
8. Re:should've "gamed" it by MrShaggy · 2009-07-26 03:01 · Score: 1
  
  Does Mighty Mouse come in time to save the day?
  Tune in next week, to see the Action-packed conclusion!
  
  --
  I have mod points and I am not afraid to use them.
9. Re:should've "gamed" it by shentino · 2009-07-26 03:16 · Score: 1
  
  True, and if only their own interest counts, that would be a good choice.
  Things is, it's not good sportsmanship to "game the rules" that way.
10. Re:should've "gamed" it by flynt · 2009-07-26 03:56 · Score: 1
  
  Basically impossible. The teams cannot compute their improvement. Netflix computes the improvement. The improvement is computed on a "secret" test dataset that only Netflix has access to. The models are developed on a public dataset available to everyone.
11. Re:should've "gamed" it by Anonymous Coward · 2009-07-26 04:37 · Score: 0
  
  Maybe Netflix should have learned something from the online auction sites and made the deadline push back whenever a new winner emerged. That would prevent snipers from waiting until the last minute. Say the deadline gets pushed out 7 days every time someone takes the lead.
  Sure, that could result in the competition getting pushed out indefinitely. But If that much is going on I don't see the harm in it. Once someone gets a result that's superior enough to stand for 7 days then they win.
12. Re:should've "gamed" it by Sancho · 2009-07-26 04:43 · Score: 2, Insightful
  
  I don't think that this contest is about honor.
13. Re:should've "gamed" it by Anonymous Coward · 2009-07-26 10:22 · Score: 0
  
  Sportsmanship is, possibly, appropriate to .. sports. Despite the fact that the submission title has "race" in it, it is not a sport and sportsmanship is unlikely to be a concern for the submitters.
  Indeed, even in sports, there are acts that are considered "gamesmanship" rather than "sportsmanship" because they are perfectly acceptable, but harm the efforts of the opposing team.
It's not Uve by thetoadwarrior · 2009-07-26 02:30 · Score: 2, Informative

Uwe Boll. It only sounds like a v because he's German.
1. Re:It's not Uve by Anonymous Coward · 2009-07-27 04:17 · Score: 0
  
  Sorry. I thought the v came from evil.
2. Re:It's not Uve by Anonymous Coward · 2009-07-28 02:27 · Score: 0
  
  No because he's a fucktard.
Ensemble learning by mysterons · 2009-07-26 02:42 · Score: 1

I'm actually surprised that this hasn't been done before. You can prove that using multiple models will on average produce better results than using any single model in isolation. For example, each netflix system will make different errors; using multiple systems will tend to average-out these errors and the consensus decision is most likely to be correct.
1. Re:Ensemble learning by Stile+65 · 2009-07-26 02:58 · Score: 5, Informative
  
  Many teams actually combined multiple methods to get a better score. In fact, "BellKor's Pragmatic Chaos" is a combination of three teams, I'm guessing - BellKor, BigChaos and Pragmatic Theory.
  Also, it helps to remember that what's posted on the leaderboard is the result of the "quiz" set - half of the actual set of recommendations you're asked to make. The other half, the "test set," is used for final judging. With such a small difference between BellKor's Pragmatic Chaos and The Ensemble on the quiz set (.0001 RMSE), the test set rank may actually end up reversed.
  
  --
  I claim first use of "Error No. 0B" - or "No. 0B error." It'll be the new ID 10T!
2. Re:Ensemble learning by Anonymous Coward · 2009-07-26 03:09 · Score: 0
  
  Most "individual" contest entries are, at this point, made up of over 100 different models. The improvement that can be gained by just "throwing another model on the pile" is very small.
3. Re:Ensemble learning by janwedekind · 2009-07-26 03:15 · Score: 1
  
  Actually it is not about averaging out. It's about building a better classifier from many good ones. See Adaboost.
4. Re:Ensemble learning by mysterons · 2009-07-26 03:28 · Score: 1
  
  Well, you really want to think about bias/variance reductions which brings ideas of averaging and using better classifiers together. For example, "bagging" can be thought of as a variance-reduction technique; "boosting" does both if I recall.
Algorithms? by wkurzius · 2009-07-26 02:45 · Score: 1

I thought Vandelay was into manufacturing latex.
1. Re:Algorithms? by frieko · 2009-07-26 03:14 · Score: 1
  
  They're thinking of quitting the exporting, and adding more import statements. And this is causing a problem, because, why not do both?
2. Re:Algorithms? by Anonymous Coward · 2009-07-26 04:41 · Score: 0
  
  Ultimately, they're waiting for Professor Von-Nostrand to weigh in with his opinion before the decide.
Any winner at all? by Fnord666 · 2009-07-26 03:42 · Score: 4, Interesting

My question is whether there will be any winner at all other than netflix? One of the rules for the competition was that you could not form multiple teams. This was to prevent people from gaining multiple submissions per day. Otherwise a five person group could create 30 teams and thus be able to submit 30 attempts per day. I believe both teams that have exceeded the 10% threshold and thus are eligible for the grand prize are composed of members from other teams and could be disqualified.

--
'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
1. Re:Any winner at all? by Anonymous Coward · 2009-07-26 03:49 · Score: 0
  
  I think you've missed a key difference between people forming multiple teams with the same people and people from multiple teams consolidating into a single team.
2. Re:Any winner at all? by ceoyoyo · 2009-07-26 03:57 · Score: 3, Insightful
  
  Why would that disqualify them? The didn't form multiple teams, they did the opposite -- they started with multiple teams and then merged them into one, abandoning or deleting the old, multiple accounts.
  I suppose you could speculate that the teams weren't ever independent, but I think that's fairly obviously not the case.
3. Re:Any winner at all? by Anonymous Coward · 2009-07-26 05:17 · Score: 0
  
  This rule only applies to teams that have the exact same set of members. The rule tries to prevent teams from creating many aliases to get around the one-result-submission-per-day rule. You can imagine that particularly now is the time that teams would like to be able to submit results as frequently as possible, which is why Netflix reminded everybody of the no-aliasing rule recently.
There is more than 1 day left by Anonymous Coward · 2009-07-26 03:49 · Score: 0

Call me crazy, but if you actually *read* the rules it says the contest is going until at least October 2nd, 2001.
Netflix Prize Rules
Terms and Conditions in a Nutshell
* Contest begins October 2, 2006 and continues through at least October 2, 2011.
1. Re:There is more than 1 day left by Qubit · 2009-07-26 04:05 · Score: 2, Funny
  
  Call me crazy, but if you actually *read* the rules it says the contest is going until at least October 2nd, 2001.
  Actually, yes, I think I will call you crazy.
  
  --
  
  coding is life /* the rest is */
2. Re:There is more than 1 day left by tomhudson · 2009-07-26 04:15 · Score: 2, Funny
  
  Call me crazy,
  
  Okay, you're crazy :-)
  
  but if you actually *read* the rules it says the contest is going until at least October 2nd, 2001.
  
  So, there's approximately minus 2855 days left?
  I just want to know if netflix gets to keep John Titor's time machine ... the time frame (2001) is right ...
3. Re:There is more than 1 day left by sleeper0 · 2009-07-26 04:16 · Score: 1
  
  Competition had 30 days to submit after the qualifying submission was presented. From your link: "After three (3) months have elapsed from the start of the Contest, when the RMSE of a submitted prediction set on the quiz subset improves beyond the qualifying RMSE an electronic announcement will inform all registered Participants that they have thirty (30) days to submit additional candidate prediction sets to be considered for judging."
4. Re:There is more than 1 day left by daveime · 2009-07-26 04:20 · Score: 1
  
  (Unless someone reaches the 10% goal *before* the end date).
5. Re:There is more than 1 day left by Anonymous Coward · 2009-07-26 05:20 · Score: 0
  
  I haven't gotten any email notifying me of only 30 days left... the last email I got was (July 9th, 2009 -- and before that was Oct 02, 2008):
  A reminder for participants in the Netflix Prize contest:
  Some participants have failed to comply with the contest rules, for example
  by creating multiple teams with an identical set of members. Such participants
  and any teams to which they belong may be suspended from participation in the
  contest, become ineligible for any Contest Prize, and/or have their
  submissions rejected by the judges.
  Teams which combine the work of multiple participants or multiple teams must
  ensure that all their contributing participants are in compliance with the
  contest rules.
  The contest rules are posted at http://www.netflixprize.com/rules
  It's an exciting time for the Netflix Prize contest.
  We thank you for participating and wish you luck!
Sometimes better design beats better algorythms by davidannis · 2009-07-26 04:28 · Score: 3, Insightful

They could improve the predictive value immensely if they allowed me and my wife to each rank the movies we watch together separately. With the current system, some movies are rated by just me, some by just her, and some have a consensus rating. It leads to a dataset full of garbage.
1. Re:Sometimes better design beats better algorythms by memristance · 2009-07-26 05:52 · Score: 1
  
  This brings up an interesting point. The Netflix algorithm is working from flawed/incomplete data generated from poor design decisions, so no matter how good the algorithm gets it still won't be able to accurately predict what movies will actually interest people based on a very subjective unidimensional rating. For example, the same person might rate a movie differently under differing conditions, and the rating itself may hinge entirely on one thing in the movie (s)he did(n't) like, whereas the movie might have been overall pretty good. It's like asking someone, 'on a scale of 1 to 5, what is your favorite color?'; it has next to no relation to its supposed objective.
  
  On top of all this, people are capricious at best when it comes to movie tastes; they might not even like a movie based on its own merits, but something completely orthogonal to the question such as it being the movie they saw on their first date. As such, no set of ratings from any given user can really be accurately matched with those of another to provide suggestions, since they may have liked/hated those movies for entirely separate reasons. Granted, some of these things can't easily be transcribed into data for formulaic processing, but you'd think Netflix could at least add an optional 'detailed rating' section (e.g., rate by pace, plot, action, acting, dialogue, etc.) to better describe why a user did or didn't like a flick.
2. Re:Sometimes better design beats better algorythms by Hawke666 · 2009-07-26 07:18 · Score: 2, Insightful
  
  That'd be all your fault. You should be creating separate account profiles for yourself and your wife.
3. Re:Sometimes better design beats better algorythms by St.Creed · 2009-07-26 07:54 · Score: 1
  
  Yeah, I should totally jump through hoops to improve their ability to sell to me. Just because it would make the programmers lives easier :)
  No, if Netflix wants to sell more, they should follow up on that recommendation and make it very very easy to have multiple identities on a given account and a button on the page to switch them.
  The reason is that there is a difference between the information needs of the administration of purchases (tied to an account in a 1:1 relationship) and the information needs of the marketing department (tied to people, who can be tied to an account in a many:1 relationship, or a 1:1, or 1:many relationship as well). If you put a one-size-fits-all discipline in there (as lots of IT-departments are unfortunately wont to do), you lose information.
  
  --
  Therefore, by the (faulty) logic you're using, you're just a cow with a keyboard - osu-neko (2604)
4. Re:Sometimes better design beats better algorythms by coaxial · 2009-07-26 08:06 · Score: 2, Insightful
  
  Data sets like this are always have garbage. There's the jackass that rates everything 5 stars. There's the jackass that rates everything 1 star. There's the jackass that rates the worst movies by consensus 5 stars, and vis versa.
  There are 61,441,618 ratings by 478,548 unique users in the publicly available training set.
  It just doesn't matter.
5. Re:Sometimes better design beats better algorythms by Hawke666 · 2009-07-26 08:17 · Score: 4, Informative
  
  Yeah, they do. see "Your account", "Account profiles". And then there's a dropdown on the top of the page. I don't see how they could make it much easier.
6. Re:Sometimes better design beats better algorythms by Anonymous Coward · 2009-07-26 10:58 · Score: 0
  
  1) Movies rated by individual users of a joint account often cluster very nicely. Probably movies rated jointed work the same way. What is the experimental basis for saying that knowing who is doing the rating would "improve the predictive value immensely?"
  2) In real life (not a netflix contest) there is a lot of other information that can improve predictions substantially. Just knowing the zipcode of the subscriber is a huge advantage.
7. Re:Sometimes better design beats better algorythms by Alpha830RulZ · 2009-07-26 16:16 · Score: 1
  
  At least some of the ensemble modeling techniques handle this just fine. They will develop classifiers that detect your ratings, classifiers that detect her ratings and classifiers that detect your joint ratings. See the previous citation for adaboost at wikipedia. They do this by looking at error from a given classifier, and finding additional weak classifiers that address the error. So if your wife likes schwarzenegger movies, your liking for tear jerkers will show up as errors, and the algorithm will seek an additional classifier to select for tear jerkers. Then eventually you get True Lies in the suggestion list. ;-)
  
  --
  I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.
8. Re:Sometimes better design beats better algorythms by Anonymous Coward · 2009-07-26 16:32 · Score: 0
  
  In fact, there are jackasses who rated every movie 2 stars and jackasses who rated every movie 3 stars and jackasses who rated every movie 4 stars as well. Approximately 3000 people altogether. Predicting those is easy.
Obvious by Anonymous Coward · 2009-07-26 05:22 · Score: 0

Screw the code. All they have to do is stop hiding the new releases. They purposely bury recent releases so they dont have to keep up with demand. Which is total bullshit. 10% improvement of what? Getting the same garbage they offer out there even faster? *sigh*
Amazon follow suit by Anonymous Coward · 2009-07-26 06:41 · Score: 0

Amazon could really benefit from a similar contest for its laughable Gold Box picks, but increasing the accuracy of recommendations by even 5000% would be low-hanging fruit.
Photo Finish by Anonymous Coward · 2009-07-26 06:50 · Score: 0

It's a tie, check the leaderboard again...
10.1% by paxcoder · 2009-07-26 07:09 · Score: 1

According to the linked leaderboard it's 10.10% for Ensemble, and 10.09% for BellKor's Pragmatic Chaos.
Be afraid.... be very afraid... by Baldrson · 2009-07-26 08:17 · Score: 3, Interesting

It's interesting that the fearmongering of the prior /. post about AI got hundreds of responses but this /. post, which is far more relevant to real AI, has gotten less than a hundred responses thus far. Anyway, congratulations to Netfilx for doing the right thing for their business in response to The Hutter Prize.

--
Seastead this.
Mod parent +5 Informative, BellKor et al have won by Anonymous Coward · 2009-07-26 12:11 · Score: 0

Yehuda claims to have the best test error rate. They will win the million dollars. It was super-exciting: It seemed BellKor et al will be defeated thanks to a little-known rule of the competiton (the 30-day last call rule). But they have won after all, thanks to a less-known rule (the quiz dataset - test dataset distinction).
EPIC by idontneedanickname · 2009-07-26 12:54 · Score: 1

I think this is bringing us one step closer to EPIC (video).
Stop Women's Suffrage by Anonymous Coward · 2009-07-26 15:36 · Score: 0

Great point, stop the suffering!
Creepy by TranscenDev · 2009-07-27 04:49 · Score: 1

While I would appreciate some good movie recommendations, I can't help but feel a little creeped out that netflix may be able to read my mind one day....maybe I can make up a movie in my imagination and netflix can play it for me! ~Ami
Chicago Web Design
Comment removed by account_deleted · 2009-07-28 04:38 · Score: 1

Comment removed based on user account deletion
Who's Better? by Anonymous Coward · 2009-07-28 19:38 · Score: 0

Since the test set was a lottery draw wherein any team could have come up on top, isnt it a little awkward for BPC to justify why their results would be better since the other team (Ensemble) is visibly better than then them on the leaderboard?