Slashdot Mirror


New Leader In Netflix Prize Race With One Day To Go

brajesh writes "The Netflix Prize, an algorithm competition to improve the Netflix Cinematch recommendation system by more than 10%, has a new leader — The Ensemble — just one day before the competition ends. The 30-day race to the end was kicked off after BellKor's Pragmatic Chaos submitted the first entry to break the 10% barrier, with the results showing a 10.08% improvement. The Ensemble, made up of three teams who chose to join forces ('Grand Prize Team,' 'Opera Solutions' and 'Vandelay United), has managed to overtake BellKor with a score of 10.09% — an improvement of .01% over the former leaders. From the article on Techcrunch: 'The competition will end [today], so teams still have a little bit of time left to make their last-second submissions, but things are looking good for The Ensemble. This has to be absolutely brutal for team BellKor.'"

25 of 87 comments (clear)

  1. I think by sys.stdout.write · · Score: 5, Insightful

    that other websites should do this as well.

    Slashdot, for instance, could have a contest to unbreak their fucking code by 10%.

    1. Re:I think by Anonymous Coward · · Score: 3, Funny

      Are you joking? Slash is written in Perl, the best maintenance method is too start again.

      (Joking, partly).

    2. Re:I think by Vectronic · · Score: 3, Insightful

      (-1 Offtopic) But, I've sort of hoped that a site, such as Slashdot, should somehow open-source their site code, it a sort of "community", and considering the context of the site, the amount of users, there are probably about 5,000 people capable of contributing decent code/help, and there has to be a rather significant number of those that are willing to.

      Add a section devoted to it, then Polls, about which contribution should be implemented, etc. Articles/Submission are sort of (controlled) "open-source", why not the site itself?

    3. Re:I think by Blue+Stone · · Score: 5, Funny

      >Slashdot, for instance, could have a contest to unbreak their fucking code by 10%.

      I remember playing Call of Cthulhu many years ago and being told of the hideously deranging results of mere mortals who happened to gaze upon the unspeakable things that lurked in the dark places.

      I beg you not to lead others down your insane and twisting path.

      NO GOOD CAN COME OF IT! NO GOOOD!

      --
      Corporation, n. An ingenious device for obtaining individual profit without individual responsibility. - Ambrose Bierce
  2. Uve Boll by Afforess · · Score: 2, Funny

    What did they do, make sure that all of Uve Boll's movies never came up as a "Recommended for you" movie?

    --
    If our elected representatives no longer represent us, do we still live in a Democracy?
  3. I used to be very elitist about my reading by BadAnalogyGuy · · Score: 2, Interesting

    Back when I first began using Amazon.com, I never bought a book based on the recommended items. I felt the recommendations were trite, ill-advised, and typically only peripherally related to the item I was buying.

    Then the recommendations got better. Much better. I started to find myself buying things right out of the recommended section, and the product combination deals also became very tempting.

    If Netflix can turn their recommendation engine into something similar, they will be sitting on a goldmine. As they say, people hate get sold to but they love buying.

  4. should've "gamed" it by petes_PoV · · Score: 4, Interesting

    rather than declaring your best result early, the Belkor team should have employed a bit of strategy and only declared a lesser result (if any). That would give the other teams something to aim at, without giving away their best results. These would be held back right up until the last minute and then submitted, so that other teams would not have time to make any further improvements (in fact, maybe this IS what they're doing). It's been a successful bidding strategy on eBay for years, so why wouldn't it translate into other competitive areas too?

    --
    politicians are like babies' nappies: they should both be changed regularly and for the same reasons
    1. Re:should've "gamed" it by stuckinarut · · Score: 5, Insightful

      Who's to say they haven't? People smart enough to win this competition are probably smart enough to think of this.

    2. Re:should've "gamed" it by Manip · · Score: 4, Insightful

      This isn't eBay, they can't just magic high scores.

      If you game it or otherwise, everyone will end up submitting their max score, because, well... Why wouldn't they? Who cares if the other team knows you have 10.8%... Either they can beat it and will submit that score, or they cannot and won't.

    3. Re:should've "gamed" it by Sancho · · Score: 2, Insightful

      I don't think that this contest is about honor.

  5. It's not Uve by thetoadwarrior · · Score: 2, Informative

    Uwe Boll. It only sounds like a v because he's German.

  6. Re:Why now? by Anonymous Coward · · Score: 3, Insightful

    It does seem like a slight flaw in the rules if there is only one 30-day countdown timer. That is, if a competing team can hold off until the last moment to release their version that bests the current leader, as is the case here. Now that this improvement has been made public, there should be something like a 10-day response time for the other competing teams.

  7. Re:Why now? by caffeinemessiah · · Score: 4, Interesting

    Why not wait another day before submitting the improvement? All they did now was giving the other team one day to respond, and if they succeed, I doubt they will be able to submit yet another improvement. So why not simply wait until an hour or so before the deadline, or am I missing something about the rules, e.g. any submitted improvements prolong the deadline by one day?

    For the grand prize, there was a final 30-day countdown from the time the first entry that achieved greater than 10% was received, which was a month ago. So it seems like this will indeed come down to an ebay-like sniping situation in the last few hours.

    I wouldn't feel too sorry for BellKor/KorBell though -- they've got many, many best paper awards at conferences and a huge degree of publicity out of the whole endeavor. In fact, in KDD 2009, they detailed most of the methods that most likely got them to the top -- i.e. they incorporated the fact that tastes and preferences drift over time. Simple, in retrospect of course. If you have an ACM subscription, you can read the 2009 paper here.

    Plus, since they work for AT&T/Yahoo Research, I remember Yehuda Koren stating that the money wouldn't have gone to them anyway -- possibly a large bonus, but I think they're entitled to that anyway. So I wouldn't feel too sorry for them.

    --
    An old-timer with old-timey ideas.
  8. Re:Ensemble learning by Stile+65 · · Score: 5, Informative

    Many teams actually combined multiple methods to get a better score. In fact, "BellKor's Pragmatic Chaos" is a combination of three teams, I'm guessing - BellKor, BigChaos and Pragmatic Theory.

    Also, it helps to remember that what's posted on the leaderboard is the result of the "quiz" set - half of the actual set of recommendations you're asked to make. The other half, the "test set," is used for final judging. With such a small difference between BellKor's Pragmatic Chaos and The Ensemble on the quiz set (.0001 RMSE), the test set rank may actually end up reversed.

    --
    I claim first use of "Error No. 0B" - or "No. 0B error." It'll be the new ID 10T!
  9. Any winner at all? by Fnord666 · · Score: 4, Interesting

    My question is whether there will be any winner at all other than netflix? One of the rules for the competition was that you could not form multiple teams. This was to prevent people from gaining multiple submissions per day. Otherwise a five person group could create 30 teams and thus be able to submit 30 attempts per day. I believe both teams that have exceeded the 10% threshold and thus are eligible for the grand prize are composed of members from other teams and could be disqualified.

    --
    'The tyrant will always find pretext for his tyranny.' - Aesop's Fables
    1. Re:Any winner at all? by ceoyoyo · · Score: 3, Insightful

      Why would that disqualify them? The didn't form multiple teams, they did the opposite -- they started with multiple teams and then merged them into one, abandoning or deleting the old, multiple accounts.

      I suppose you could speculate that the teams weren't ever independent, but I think that's fairly obviously not the case.

  10. Re:There is more than 1 day left by Qubit · · Score: 2, Funny

    Call me crazy, but if you actually *read* the rules it says the contest is going until at least October 2nd, 2001.

    Actually, yes, I think I will call you crazy.

    --

    coding is life /* the rest is */
  11. Re:There is more than 1 day left by tomhudson · · Score: 2, Funny

    Call me crazy,

    Okay, you're crazy :-)

    but if you actually *read* the rules it says the contest is going until at least October 2nd, 2001.

    So, there's approximately minus 2855 days left?

    I just want to know if netflix gets to keep John Titor's time machine ... the time frame (2001) is right ...

  12. Sometimes better design beats better algorythms by davidannis · · Score: 3, Insightful

    They could improve the predictive value immensely if they allowed me and my wife to each rank the movies we watch together separately. With the current system, some movies are rated by just me, some by just her, and some have a consensus rating. It leads to a dataset full of garbage.

    1. Re:Sometimes better design beats better algorythms by Hawke666 · · Score: 2, Insightful

      That'd be all your fault. You should be creating separate account profiles for yourself and your wife.

    2. Re:Sometimes better design beats better algorythms by coaxial · · Score: 2, Insightful

      Data sets like this are always have garbage. There's the jackass that rates everything 5 stars. There's the jackass that rates everything 1 star. There's the jackass that rates the worst movies by consensus 5 stars, and vis versa.

      There are 61,441,618 ratings by 478,548 unique users in the publicly available training set.

      It just doesn't matter.

    3. Re:Sometimes better design beats better algorythms by Hawke666 · · Score: 4, Informative

      Yeah, they do. see "Your account", "Account profiles". And then there's a dropdown on the top of the page. I don't see how they could make it much easier.

  13. Re:Why now? by brian_tanner · · Score: 5, Informative

    It's also true that the winner is not the person who gets the highest score on the leaderboard. Most people seem to miss this.

    The leaderboard gives score on the QUIZ dataset, which is half of the answers that the team submits. The WINNER of the million dollars is the person who does best on the TEST dataset, the other half of the answers they submit. Nobody knows how good these guys are doing on the TEST set, either team could be overfitting the quiz set.

  14. Be afraid.... be very afraid... by Baldrson · · Score: 3, Interesting

    It's interesting that the fearmongering of the prior /. post about AI got hundreds of responses but this /. post, which is far more relevant to real AI, has gotten less than a hundred responses thus far. Anyway, congratulations to Netfilx for doing the right thing for their business in response to The Hutter Prize.

  15. Re:Why now? by currivan · · Score: 2, Informative

    In fact, according to the second post by Yehuda Koren in this thread, it looks like BelKor does have the best test error rate and will be declared the winner. http://www.netflixprize.com/community/viewtopic.php?id=1498