Slashdot Mirror


Augmenting Data Beats Better Algorithms

eldavojohn writes "A teacher is offering empirical evidence that when you're mining data, augmenting data is better than a better algorithm. He explains that he had teams in his class enter the Netflix challenge, and two teams went two different ways. One team used a better algorithm while the other harvested augmenting data on movies from the Internet Movie Database. And this team, which used a simpler algorithm, did much better — nearly as well as the best algorithm on the boards for the $1 million challenge. The teacher relates this back to Google's page ranking algorithm and presents a pretty convincing argument. What do you think? Will more data usually perform better than a better algorithm?"

7 of 179 comments (clear)

  1. Five stars by CopaceticOpus · · Score: 5, Insightful

    If more data is helpful, then Netflix is really hurting themselves with their 5-star rating system. I'd only give 5 stars to a really amazing movie, but to only give 3/5 stars to a movie I enjoyed feels too low. Many movies that range from a 7/10 to a 9/10 get lumped into that 4 star category, and the nuances of the data are lost.

    How to translate the entire experience of watching a movie into a lone number is a separate issue.

  2. Re:attn computer scientists: stop renaming stuff by Anonymous Coward · · Score: 5, Funny

    you guys are nothing more than glorified engineers. Computer scientists are not glorified engineers. They're the butt of engineers' jokes too.
  3. Re:Is it just me that is surprised here? by gnick · · Score: 5, Informative

    The netflix challenge is to arrive at a better algorithm with the supplied data. Actually, the rules explicitly allow supplementing the data set and Netflix points out that they explore external data sets as well.
    --
    He's getting rather old, but he's a good mouse.
  4. Re:Heuristics?? by EvanED · · Score: 5, Informative

    One would hope that the thing that calculates the heuristic is an algorithm. See wikipedia.

  5. Re:attn computer scientists: stop renaming stuff by Freeside1 · · Score: 5, Funny

    Say what you want about computer scientists, but without them you'd probably be complaining on a chalkboard.

  6. Re:attn computer scientists: stop renaming stuff by JasonKChapman · · Score: 5, Funny

    Mathematics is physics without purpose, Chemistry is physics without thought, Engineering is physics

    Mathematics is physics without purpose, Chemistry is physics without thought, Engineering is physics without tenure.

    --
    Sorry, I'm a writer. That makes you raw material.
  7. Re:attn computer scientists: stop renaming stuff by Arthur+B. · · Score: 5, Funny

    "machine learning" is just statistical inference

    Riiiht. And mathematical research is just finding a Hamiltonian cycle in a graph defined by the set of axioms used.
    --
    \u262D = \u5350