The past decade has seen an explosion of interest in machine learning and data mining, with significant advances in terms of both theoretical results and highly visible practical applications. One such application is that of automated recommender systems. Customers buy or rate items or products, and the ratings are stored in a database—indeed, there may be millions of customers and items. Past customer behaviors and preferences are then analyzed to automatically predict which items a customer is likely to rate highly or purchase, among items they have not already purchased or rated. Recommender systems of this type are now commonplace on the Web, for example, at sites such as Amazon.com for shopping and last.fm for music.
In October 2006, the Netflix company offered a prize of $1 million dollars for the development of an algorithm that could significantly improve the accuracy of its in-house recommendation system. Netflix customers rate movies they have seen on a scale of 1 to 5. The recommendation problem is to suggest new movies to customers based on a large database of past ratings. One way of measuring the accuracy of a recommendation algorithm is to compute the average squared difference between ratings from the algorithm and actual customer ratings, on item-customer pairs not previously seen by the algorithm. Netflix stated it would award the $1 million prize to the first person or team who could reduce the error rate of its in-house algorithm by a factor of 10%. For the competition, participants could download a large training set of data on which to develop their methods. This data set consisted of a highly sparse matrix of approximately 500,000 customers (rows) by 18,000 movies (columns), where less than 1% of the entries contained known ratings.
For three years (2006–2009), the competition was the focus of intense activity among computer scientists, mathematicians, engineers, and statisticians around the globe. By the end there were over 40,000 registered competitors from over 150 different countries. Progress was initially quite rapid and by December 2006 a few teams already had algorithms that were capable of a 5% error reduction on unseen test data—halfway to the million dollars!
But, as with many interesting problems, achieving the second 5% of improvement was much more difficult than achieving the first 5%. As the rate of progress slowed in 2007 there was much speculation as to whether the 10% would ever be achievable. By fall 2007, Yehuda Koren, Robert Bell, and Chris Volinsky from AT&T Research had emerged among the leading contenders for the prize. This AT&T team won the initial "progress prize" ($50,000) in October 2007 with an 8.4% reduction in error. Koren and colleagues continued to push closer to 10%, claiming the second progress prize (in collaboration with two Austrian university students) 12 months later, with a 9.4% reduction in error.
The following paper is an excellent example of how clever thinking about the fundamental factors that underlie a complex domain can lead to major improvement in prediction accuracy.
The following paper by Koren, now a senior research scientist at Yahoo! Research, was presented at the ACM SIGKDD Conference in Paris in June 2009. It is an excellent example of how clever thinking about the fundamental factors that underlie a complex domain can lead to major improvement in prediction accuracy. While the paper focuses on recommender systems, the thinking and strategy behind it provide useful lessons that are applicable to many data mining tasks.
A key aspect of this paper is the use of exploratory data analysis and visualization to uncover otherwise hidden information, in particular, to discover that the ratings data is non-stationary over time (as depicted in Figure 1). Koren proceeds to show that predictions can be systematically improved by adjusting for this non-stationarity. Multiple types of temporal variation in the data is incorporated into the equations for prediction models in a variety of ways, providing important improvement over predictions that do not include a temporal component.
This paper illustrates clearly how to combine statistical thinking (modeling of variation in a systematic manner) with computational methods (fitting these models to over 100 million ratings), to build sophisticated models on very large data sets. The author notes that "addressing temporal dynamics can have a more significant impact on accuracy than designing more complex learning algorithms." This point is vital in the general context of predictive analytics—careful consideration of the basic factors that govern how data is generated (and collected) can lead to significantly more accurate predictions.
To bring closure to the $1 million prize story (for readers who don’t know how it ends) the grand prize was awarded in September 2009 to a team consisting of Yehuda Koren and six colleagues (Bell and Volinsky from AT&T Labs-Research, two students from Austria, and two engineers from Montreal).
Join the Discussion (0)
Become a Member or Sign In to Post a Comment