Slashdot Mirror


Collaborative Filtering and the Rise of Ensembles

igrigorik writes "First the Netflix challenge was won with the help of ensemble techniques, and now the GitHub challenge is over, and more than half of the top entries are also based on ensembles. Good knowledge of statistics, psychology and algorithms is still crucial, but the ensemble technique alone has the potential to make the collaborative filtering space a lot more, well, collaborative! Here's a look at the basic theory behind ensembles, how they shaped the results of the GitHub challenge, and how this pattern can be used in the future."

14 of 58 comments (clear)

  1. Open source governance by Anonymous Coward · · Score: 2, Interesting

    So can ensembles be used to create more sophisticated forms of direct democracy? That is, where everyone has input into decision-making, but where that input is vastly more complex than simple majority rule by the mob?

    FWIW, this is called open source governance.

    1. Re:Open source governance by sakdoctor · · Score: 2, Insightful

      "better" would require a fitness function; and everyone thinks that they vote "best".

      If it were possible to define such a perfect function, we wouldn't really need voting anyway. We could just get a computer to crunch all the parameters and spit out a utopia.

    2. Re:Open source governance by kthejoker · · Score: 3, Informative

      "Fitness" doesn't always have to be related to the output; it can be related to the quality of a guessed input.

      Consider the corollary of a poll test: a model in which "trusted" voters receive extra votes while everyone else still gets on vote. You can determine "trustworthiness" (or "karma", if you will) the same way Slashdot does - through moderation and meta-moderation, or you can use a more objective "de minimis research" criteria (like a poll test but without the punishment for failure.)

      So someone voting on a school board bond election who can correctly answer questions about the stated usage of that bond, or the school district's financial bond rating, or who attends a school board meeting discussing the bond, could get 2 votes for the price of one.

      This would a) allow "passionate" (albeit informed) voters to have more of a say than someone who is indifferent, and b) encourage people to do research and get involved in politics.

      In a way, it's anti-democratic, but if you are going to insert any sort of elitism into the system, it might as well be a meritocracy.

    3. Re:Open source governance by Korin43 · · Score: 2, Interesting

      Instead of trying to come up with a way to decide who gets extra votes, why not just go with transferable votes, where you can either vote on issues or tranfer your vote to another person (reversible of course). That way if I know that someone else is more informed than me, I can transfer my vote to them and not bother with with the details of politics. Each representative then has their vote, plus all of their constituents votes.

      It deals with the problems of underrepresentation (right now if 51% of people voted for one person, and the other 49% voted for another, the 49 group is completely unrepresented), lack of knowledge among voters (most people know nothing about politics, but at least know someone who knows more than they do), and a lot of government incompetence problems (if someone in "your party" does something you don't like, it's not "the Republican" vs "the Democrat", you can just transfer your vote to another person in your party.. although hopefully this would just destroy political parties instead, since they're not helpful).

    4. Re:Open source governance by DragonWriter · · Score: 2, Insightful

      So someone voting on a school board bond election who can correctly answer questions about the stated usage of that bond, or the school district's financial bond rating, or who attends a school board meeting discussing the bond, could get 2 votes for the price of one.

      This would a) allow "passionate" (albeit informed) voters to have more of a say than someone who is indifferent, and b) encourage people to do research and get involved in politics.

      Giving everyone one vote already does that, without the ability to game-in discriminatory effects through poll tests (the ability to game-in discriminatory effects comes when you give people the right to write the questions and determine what the "correct" answers are, which is essential to having such quizzes); the more passionate are more likely to participate in politics (whether by voting or otherwise), and the more informed are more likely to realize their goals through whatever action they take, so both passion and information are already rewarded.

      In a way, it's anti-democratic, but if you are going to insert any sort of elitism into the system, it might as well be a meritocracy.

      "Merit" is subjective; all elitisms view are justified (esp. by the chosen "elite") as some kind of meritocracy.

  2. Group Labor by r6_jason · · Score: 2, Informative

    Of course having a group of people working together is a strength. If you are having a bad day or just feel like slacking off some one else is there to pick up the slack and keep the project moving. See also Division of labour http://en.wikipedia.org/wiki/Division_of_labour

    1. Re:Group Labor by Trepidity · · Score: 5, Interesting

      Although that's true with humans, it's a bit curious why it'd be true with algorithms. After all, the aggregation of 3 algorithms is still just an algorithm. It's not even totally clear which algorithms are ensembles and which aren't--- some non-ensemble methods could be re-analyzed using ensemble terminology, and some ensemble methods could be rewritten as unified iterative loops that don't look very ensemble-y. The jury's still out on the whole subject, as far as I can tell (I'm not an ML person, but I'm an AI person whose research bleeds into ML).

      An exception is when you're aggregating information from truly different statistical problems, in which case you inherently have an ensemble problem, until someone comes up with the theory (plus tractable implementation) to view the problem as one unified statistical problem. I think collaborative filtering is currently in that stage--- there's no canonical way to pose the problem in the terminology of statistical regression/etc. that captures all aspects of it.

  3. a few different things going on by Trepidity · · Score: 4, Interesting

    There's a lot of argument over why ensemble techniques work well in general, when using them on well-posed statistical problems. But in the collaborative filtering case, they work well at least in part because there's not a canonical way of posing the problem statistically that's also tractable--- there are instead multiple ways to view the problem, which expose different information. Aggregating those views is a pretty straightforward way of getting more information.

    For example, you can see the Netflix prize as a few different standard statistical problems. As a per-movie regression, predicting what Person A will rate Movie B, given ratings vector of Person A and the ratings vectors of everyone who's already rated Movie B [the per-person ratings vectors excluding B are the X's, and the ratings on B are the Y's]. Or you slice the movie-ratings matrix the other way, with per-movie ratings vectors as the X's. Add in some other views (those are the two most straightforward), aggregate all the info you get from them, and you do better than any one approach alone.

    1. Re:a few different things going on by mbkennel · · Score: 4, Interesting

      "There's a lot of argument over why ensemble techniques work well in general, when using them on well-posed statistical problems."

      My opinion:

      1) It's a form of regularization & noise averaging. Different classes of estimators have different systematic errors and proper averaging almost always performs better than any single estimator. In more limited contexts, e.g. parametric estimators with variable numbers of free parameters (small compared to # of data points) this is well established in Bayesian contexts. It's like regularization in the sense that the averaging will exclude the howlers---occasional idiosyncratic screw-ups from any single estimator, phenomena that tend to happen with under-regularized estimators.

      2) "But in the collaborative filtering case, they work well at least in part because there's not a canonical way of posing the problem statistically that's also tractable--- there are instead multiple ways to view the problem, which expose different information. Aggregating those views is a pretty straightforward way of getting more information."

      If the thing to be predicted also includes the unknown parameter of "exactly how are the judges going to define the performance metric", then similarly averaging over different possibilities is a good risk-minimization strategy.

      In practice (knowing how people watch movies and what netflix cares about)the statistical setting of Netflix seems to be this:

      There is an unknown distribution of draw tuples of (movies, people, ratings). (in practice, "date-of-rating" and"date-of-movie" turn out to be additional useful data).

      You have observed a large number of these tuples already.

      Then, given a number of draws for (movies, person, ratings) where person is fixed, predict the the rating given (new_movie, same_person).

      The asymmetry is natural because movies and people are not interchangable: movies do not have opinions about people.

      I don't consider Netflix to be very ill-posed statistically.

    2. Re:a few different things going on by NoOneInParticular · · Score: 4, Interesting

      There's a lot of argument over why ensemble techniques work well in general, when using them on well-posed statistical problems.

      Not sure if there's a lot of argument, as the bias-variance decomposition does seem to give some critical insight in why ensemble techniques work. Couldn't find a good link, but loosely every statistical prediction method will make errors due to bias (being systematicaly wrong, always in the same direction), and errors due to variance (being highly dependent on the fitness data, overfitted). Error methods such as sum-of-squares can be readily decomposed into a term for bias and a term for variance.

      Some methods will have errors mainly due to bias (linear methods). Others have error mainly due to variance (neural nets for instance). Whichever method has the lowest sum of the errors - bias + variance - wins. This holds for any prediction method, so also for an ensemble method. If you take for instance the ensemble method of bagging, you will use the same method on differently sampled data (Bootstrap AGGregated). This has as an effect that you average out all the error due to variance, and are left with the error due to bias. If you happen to use a low bias method (such as a neural network or a decision tree), this works fine.

      For the recommender challenges, something even niftier is happening. Here many different methods are aggregated, each with different biases and different variances. So, in theory, when doing this, one should be able to average out both bias and variance error, and converge on the (Bayesian) optimal predictor. Note however that averaging is in itself a prediction method, so ensemble methods have their own bias and variance tradeoff. Fortunately, computing the mean is an unbiased and low variance method in its own right.

      What I always found very interesting about ensemble methods is that they effectively contradict Occam's razor, in that the end result of an ensemble is not a single theory that predicts the data well, but a whole set of mutually contradicting theories that each hold some of the truth. The ensemble result might actually be huge, even when the system is simple.

  4. Weighting of ensembles by reginaldo · · Score: 2, Funny

    One of the difficulties of ensemble development is weighting the logic that is being develeped. For instance, one of the problems we deal with at my job is matching incoming text to it's cleaned value. We have a list of approved words ['happy', 'sad', 'angry', 'sleepy'], and a text input of 'hap'. We need to determine which valid word 'hap' should match. Some rules I can think of for properly matching are:

    1.)Length of input compared to cleaned word.
    2.)Number of nonpositional letter matches.
    3.)Number of positional letter matches.

    Depending on how rules are weighted determines what the answer will be (either sad or happy). I know at my job this weighting process requires very careful politicking. :D

  5. Re:Management Summary ? by Trepidity · · Score: 2, Informative

    Here's a stab at 3 sentences:

    Making a prediction by running multiple statistical prediction algorithms and combining their results often seems to work well. This is called an "ensemble method". Ensemble methods seem to work particularly well on collaborative filtering problems.

  6. Re:I'm sorry by Trepidity · · Score: 4, Informative

    Yeah, the term dates back at least to the 1990s. The classic survey paper (over 1000 citations!) on the subject is "Ensemble Methods in Machine Learning" [pdf] by Tom Dietterich (2000), for those who want to glance through a survey. Though be warned that some of its specific conclusions are now dated--- e.g. there's been a *lot* written in both statistics and machine learning since then on what boosting "really" is and why it works.

    Dietterich presents the more machine-learning view of it, focused on algorithms, combination of predictions, iterative refinement, etc. The best survey from a statistical approach is probably Ch. 16 of this book by three Stanford profs, which you can probably read some of on Google Books.

  7. Re:Management Summary ? by mbkennel · · Score: 2, Insightful

    Algorithms produce better results working in committees, unlike you.