Call it "crowdsourcing", or harnessing "group smart" -- the approach is intriguing, and one of a kind. Being a curious soul myself, I decided to register a team from our company to check this out (who knows what may happen? we can be smart sometimes with enough luck :-)
A few interesting facts:
- The contest is actually slated to go another 5 years until 2011, the bar being raised each year to improve over last year's winner
- A fine but important distinction: the algorithm needs to predict how someone will rent a movie, NOT what movie someone will rent
- At first glance, the data provided by Netflix seems pretty "skimpy" in terms of richness. Basically you get:
- List of movies
- List of ratings assigned for each movie by an extensive list of Netflix members
- My first reaction was that having extra information on the movies themselves might help. There's a bunch of stuff available from IMDB . However, apparently there are license restictions and also Netflix doesn't really consider extra data to be valuable in improving their algorithm (see the discussion thread )
At the moment, I agree with Michael's assessment that trying to solve this with ratings data alone might not be the best way to go. There seem to be so many other interesting dimensions that should influence somone's movie rating: movie characteristics like the cast, director, etc., review from critics, local media review, geo/demographic information about the Netflix member, among others. None of these are being considered in the current algorithm. I can understand Netflix's hesitancy to interface with 3rd party resources, but perhaps they should make all the datapoints within Netflix's movie database available for this contest -- and second, encourage contestants to add their own qualitative datapoints. If the goal is to approach this as a pure improvement of a data mining problem -- then increasing the depth of data should help.
I'll keep you all posted how far we get on this. Being a small company, we will do this in the copious amount of spare time left over after working on existing client work that pays the bills. Still, it should be a lot of fun.
No comments:
Post a Comment