Augmenting Data Beats Better Algorithms

Posted by kdawson on Tuesday April 1, 2008 @07:10AM from the tell-it-to-the-dhs dept.

eldavojohn writes "A teacher is offering empirical evidence that when you're mining data, augmenting data is better than a better algorithm. He explains that he had teams in his class enter the Netflix challenge, and two teams went two different ways. One team used a better algorithm while the other harvested augmenting data on movies from the Internet Movie Database. And this team, which used a simpler algorithm, did much better — nearly as well as the best algorithm on the boards for the $1 million challenge. The teacher relates this back to Google's page ranking algorithm and presents a pretty convincing argument. What do you think? Will more data usually perform better than a better algorithm?"

1 of 179 comments (clear)

Min score:

Reason:

Sort:

Re:Depends on the Problem by blahplusplus · 2008-04-01 08:23 · Score: 3, Interesting

"I worked for a while on the Netflix prize, and if there's one thing I learned it's that a recommender system almost always gets better the more data you put into it, ...."

Ironically enough, you'd think they'd adopt the wikipedia model where their customers can simply vote thumbs up vs thumbs down to a small list of recomendations everytime they visit their site.

All this convenience comes at a cost though, you're basically giving people insight into your personality and who you are and I'm sure many "Recommendation engines" easily double as demographic data for advertisers and other companies.