Build a Better Netflix, Win a Million Dollars?
An anonymous reader writes "In a quest to better movie recommendations, Netflix is opening their database (nytimes, registration and first child required) to users to try to craft a better recommendation technology. The problem is not easy. Says one researcher: 'You're competing with 15 years of really smart people banging away at the problem.'" Recommender systems are really an interesting problem, and that is likely very interesting data to play with.
http://www.netflixprize.com/
I saw the Sign, and it opened up my eyes
The problem with recommendation systems is that they use too little information to catagorize their subject.
What they need to do is copy the methods of the Music Genome Project (www.pandora.com), and list a larger set of attributes for the films. This way it can recommend films by checking many more characteristics, such as director, tone, writer, or subject.
In this contest, you run your own code and submit the results to NetFlix to be scored. This means that you can use any other data (e.g. A Movie Genome projct) you can compile to enhance your rankings. Netflix apparently specifically designed the contest to allow this.
From http://www.netflixprize.com/ :
To prevent certain inferences being drawn about the Netflix customer base, some of the rating data for some customers in the training and qualifying sets have been deliberately perturbed in one or more of the following ways: deleting ratings; inserting alternative ratings and dates; and modifying rating dates.
Plus all the usual replacing of IDs and such you'd expect. Looks like they're trying to avoid a repeat of the AOL debacle at least.
[ cruise / casual-tempest.net / xenogamous.com / transference.org / quantam sufficit ]
I see that the NYT article linked to just about everything except MovieLens. I've used the site, and folks might like to try it out. It looks simple, but it's fairly nice, having some of those fun dynamic pages that are all the rage these days. One neat thing in comparison to Netflix is that it will give a projected star rating for you, rather than simply saying "Recommended".
Of course, I'm biased since I had John Riedl as a professor in a few easy classes. I think he tried to spin off this research as a new company, but I'm not sure if it ever got off the ground.
One thing I'd really like to see has little to do with the quality of ratings, though. I'd like to be able to keep a common database of my ratings across multiple sites. At the moment, I've rated a number of movies at Netflix, MovieLens, and IMDb, but they aren't entirely consistent. Unfortunately, two of the sites use a ten-point system (IMDb has a ten-point scale, MovieLens goes up to 5 stars, but in half-star increments), while the other uses a five-point one (maybe six if you say "Not Interested"..).
Well, I'll have to poke around a bit with this stuff. I wouldn't be able to do much, though, since my level of knowledge in this arena is very limited...
Because the problem is not always linearly separable. This is not to say that a linear classifier wouldn't do a decent job (given appropriate slack variables or underlying probability distribution), but to do a really good job - one where you wouldn't be laughing at the recommendations - rule-based techniques (such as association rules, RIPPER, etc) seem to do better. They aren't perfect, of course, and that's why it's an open problem.
Disclaimer: I subscribe to the same sort of service, except through blockbuster... maybe Netflix does have this feature. My wife and I share a queue... I imagine many, many of these queues are shared. We have very, very different tastes in movies. Instead of getting recommendations that suit us both (which is next to impossible), the recommendations just get very, very confused. If I could just keep my and her recommendations from tangling, we would both have an easier time.
This problem is already solved.
With Netflix you can have multiple queues (up to one per disc-at-a-time out) and reassign the "number of discs out per queue" from 0-#out as long as the total isn't greater than #out. It also handles reassignment with discs outstanding well.
The result in my family is that we end up with independant queues and independant recommendations. If Blockbuster offers the same feature you could split your queue up and get what you want. On top of that you won't have to keep organizing your queue to get the correct movie next if someone takes time to get around to watching their movie.
There are two kinds of people: 1) those that need closure
No-registration-required article:l
http://www.foxnews.com/story/0,2933,217021,00.htm
Posting as AC because:
Linking to foxnews equals automatic Troll/Flamebait/Offtopic.
Correction: No one has stayed awake through Koyannisqatsi.
(FWIW, Powaqqatsi was a better flick, IMHO)
This sig intentionally left justified.
From the rules, it looks like your submission isn't code, it's a processed dataset. It's only in the terms for winning are that you have to explain your method to them (so that they don't get bitten by a horribly obfuscated entry) and have to non-exclusively license your submission to Netflix (it looks like you retain copyright and can license it to others if you so choose).
But that seems pretty reasonable...you only have to hand over your code if you win, otherwise you're only submitting the results of your program.
"Don't blame me, I voted for Kodos!"
If you read NetFlix' prize site, you'll find that they give clear cut statistical requirements for winning that are well defined. It's actually quite impressive the detail into which they go; it's clear that they want real engineers on this, and that they're willing to get seriously specific in order to make sure people know what's what.
StoneCypher is Full of BS