Augmenting Data Beats Better Algorithms
eldavojohn writes "A teacher is offering empirical evidence that when you're mining data, augmenting data is better than a better algorithm. He explains that he had teams in his class enter the Netflix challenge, and two teams went two different ways. One team used a better algorithm while the other harvested augmenting data on movies from the Internet Movie Database. And this team, which used a simpler algorithm, did much better — nearly as well as the best algorithm on the boards for the $1 million challenge. The teacher relates this back to Google's page ranking algorithm and presents a pretty convincing argument. What do you think? Will more data usually perform better than a better algorithm?"
If more data is helpful, then Netflix is really hurting themselves with their 5-star rating system. I'd only give 5 stars to a really amazing movie, but to only give 3/5 stars to a movie I enjoyed feels too low. Many movies that range from a 7/10 to a 9/10 get lumped into that 4 star category, and the nuances of the data are lost.
How to translate the entire experience of watching a movie into a lone number is a separate issue.
He's getting rather old, but he's a good mouse.
One would hope that the thing that calculates the heuristic is an algorithm. See wikipedia.
Say what you want about computer scientists, but without them you'd probably be complaining on a chalkboard.
Mathematics is physics without purpose, Chemistry is physics without thought, Engineering is physics without tenure.
Sorry, I'm a writer. That makes you raw material.
Riiiht. And mathematical research is just finding a Hamiltonian cycle in a graph defined by the set of axioms used.
\u262D = \u5350