Slashdot Mirror


Augmenting Data Beats Better Algorithms

eldavojohn writes "A teacher is offering empirical evidence that when you're mining data, augmenting data is better than a better algorithm. He explains that he had teams in his class enter the Netflix challenge, and two teams went two different ways. One team used a better algorithm while the other harvested augmenting data on movies from the Internet Movie Database. And this team, which used a simpler algorithm, did much better — nearly as well as the best algorithm on the boards for the $1 million challenge. The teacher relates this back to Google's page ranking algorithm and presents a pretty convincing argument. What do you think? Will more data usually perform better than a better algorithm?"

1 of 179 comments (clear)

  1. duh by kris.montpetit · · Score: 0, Flamebait

    From a designer-with-rudimentary-programming-abilities point of view, logic would dictate that having more data would always make sorting easier,I mean, digging through 10,000 results based on one piece of information will always be slower than if we have 2 more sets of data that bring the search pool down to 10. A clever 10 year old could tell you that. Can't this be ruled to common sense?

    The only way the better algorithm would win out is if the extra data is moot and can't shrink the search pool by any meaningful amount eg.

    USEFUL AUGMENTED DATA-george W Bush:

    • war monger
    • president
    • started war in iraq
    • kicked out of airforce for coke addiction
    • Initials G, W, and B
    USELESS AUGMENTED DATA-George W Bush:
    • wears shoes
    • doesn't often wear hats
    • likes turkey
    but once again i'm sure everybody knows that.