Just For You

clusters of movies discovered by a computer algorithm

Sometime soon, a team of programmers is likely to receive a check from Netflix for $1 million. More than 4,000 teams have entered the movie-rental company’s Netflix Prize competition, which was established in 2006 to improve the recommender system Netflix uses to suggest movies to its 10 million-plus customers. As this article went to press, a coalition of previously competing teams, calling itself BellKor’s Pragmatic Chaos, had edged past the 10% ratings improvement over Netflix’s system, which will win them the Netflix Prize (unless another team beats their 10.5% improvement by late July).

The term “recommender system” has largely supplanted the older phrase “collaborative filtering.” These systems create recommendations tailored to individual users rather than universal recommendations for, well, everyone. In addition to movie recommendations like those from Netflix, many consumer-oriented Web sites, such as Amazon and eBay, use recommender systems to boost their sales. Recommender systems also underlie many less overtly commercial sites, such as those providing music or news. But in each case a recommender system tries to discern a user’s likely preferences from a frustratingly small data set about that user.

One lure of the Netflix Prize for researchers is Netflix’s database of more than 100 million movie ratings—which include user, movie, date of rating, rating—from some 480,000 users about nearly 18,000 movies. After training their algorithms with this data, teams predict the ratings for a secret batch of 2.8 million triplets (user, movie, date of rating). Netflix then compares their accuracy to that of its original Cinematch algorithm. Sharing a massive, real-life data set has energized research on recommender systems, says Bob Bell, a principal member of the technical staff at AT&T Research, and a member of BellKor’s Pragmatic Chaos. “It’s led to really big breakthroughs in the field,” says Bell.

From the start, the Netflix Prize has appealed to academically oriented researchers. The eventual winners, as well as the annual progress prize winners (who receive $50,000), agree to publicly share their algorithms, and many teams openly discuss their research in the online Netflix Prize Forum. For researchers, this openness adds to the intellectual excitement. “People like us are motivated by the research, not necessarily by the money,” says Chris Volinsky, executive director of statistics research at AT&T Research and a member of BellKor’s Pragmatic Chaos. “Having an academic flavor to the competition has really helped it to sustain energy for two-and-a-half years.”

Although Netflix is unique in publicly enlisting and rewarding outside researchers, many other companies are fine-tuning the choices their recommender systems present to customers. Some of their efforts, like those of Amazon, L.L. Bean, and iTunes, are obvious to users. Other companies work behind the scenes, quietly monitoring and personalizing the experience of each user. But either way, user satisfaction depends on not just new and improved algorithms, but individual human preferences, with all of their many quirks.

The Netflix Prize has brought a lot of attention to the field, notes John Riedl, a professor of computer science at the University of Minnesota. However, Riedl worries that the Netflix Prize puts “a little too much of the focus on the algorithmic side of the things, whereas I think the real action is going to happen in how you build interfaces … that expose the information in more creative and interesting ways.”

Implicit and Explicit Information

To entice its customers to rate movies, Netflix promises to show them other movies they will enjoy. Netflix also encourages its customers to provide detailed information about their viewing preferences. Unfortunately, this rich, explicit feedback demands a level of user effort that most Web sites can’t hope for.

Instead, many companies rely on implicit information about customer preferences, such as their purchasing history. However they get the feedback, though, researchers must manage with a sparse data set that reveals little of many customers’ tastes about most products. A further, critical challenge for Web-based recommender systems is generating accurate results in less than a second. To maintain a rapid response as databases grow, researchers must continually trade off effectiveness for speed.

One popular and efficient set of methods, called k nearest neighbors, searches for a handful of other customers (k of them) who have chosen the same items as the current customer. The system then recommends other items chosen by these “neighbors.”

In contrast, latent factor methods search customers’ choices for patterns that can explain them. Some factors have obvious interpretations, such as a user’s preference for horror films, while other statistically important factors have no obvious interpretation. One advantage of latent-factor methods is they can provide recommendations for a new product that has yet to generate much consumer data.

These algorithms all aim to solve the generic problem of correlating preferences without invoking knowledge of the domain they refer to, such as clothing, movies, and music. In principle, notes Joseph Konstan, a professor of computer science and engineering at the University of Minnesota, as long as individuals’ preferences remain constant, then with enough opinions from a sufficient number of people, “you don’t need to know anything about the domain.” In practice, Konstan says, limited data and changing tastes can make domain-specific approaches more effective.

One of the most sophisticated domain-specific approaches is used by Internet-radio company Pandora, which employs dozens of trained musicologists to rate songs on hundreds of attributes. “We are of the opinion that to make a good music recommendation system you need to understand both the music and the listeners,” says Pandora Chief Operating Officer Etienne Handman. Still, the most enjoyed Pandora playlists, he says, supplement the musicologists’ sophisticated ratings with statistical information from users.

Measuring Effectiveness

To win the Netflix Prize, a team must beat Cinematch by 10% on a purely statistical measure, the root mean square error, of the differences between predicted and actual ratings. Like content-based assessments, however, this objective metric falls short of what users want in recommendations.

“The predicted enjoyment [by this measure] is just one factor that goes into supporting the movies that we present,” says Jon Sanders, director of recommendation systems at Netflix. The company augments this metric with other features, such as movie genre, which affect the appeal of a movie to a user. In addition, Sanders notes, the Netflix recommendation interface says why a movie was recommended, to build trust that it is selected specifically for that user. “There’s much more to personalization than what the Netflix Prize reveals,” he says.

Giving users realistic expectations can defuse the mistrust caused by occasional bizarre recommendations (which make for good stories but bad business). On the other extreme, however, conservative suggestions can seem trivial. “It’s fairly easy to make recommenders that never do a stupid recommendation,” says Riedl. “You actually want to tune the algorithm where it’s more likely to make errors,” because then “it’s also more likely to make serendipitous relationships.”

In the case of news, presenting some unexpected connections has societal importance, because readers often gravitate to Web sites that reinforce their beliefs. “There’s a big debate in personalization in news in particular, about whether personalization will lead to pigeonholing, like whether people will only read the news that they like,” says Greg Linden, who ran a personalized news site, Findory, from 2004 to 2007. Diversity is also critical in other domains. “The key thing with recommender systems is they’re trying to help with discovery,” Linden notes, unlike search engines that “help you find something you already know you want.”

Widening Impact

Recommender systems haven’t helped solve the business challenge of earning significant revenue from personalizing the news, but they have transformed traditional retailing. Michel Wedel, a professor of consumer science at the University of Maryland, notes that recommender systems have become “more or less the backbone of many of the major firms on the Web,” and the Netflix Prize’s $1 million reward hints at the scale of the business they expect to receive. However, Wedel suggests that the best recommender systems are moving away from the explicit ratings used by Netflix, in part because others can intentionally skew the ratings.

In the future, recommender systems might help people navigate everything from work tasks to social relationships, says John Riedl.

The next generation of recommender systems will rely more on implicit information, such as the items that a user clicks on while navigating a site, says Francisco Martin, chief executive officer of Strands, Inc., a recommendation and personalization technologies company in Corvallis, OR. “Based on your navigation patterns, you’re correlating products, and you’re giving very valuable information to the recommender system,” he says. Improved recommender systems also track changing tastes and context-specific preferences. In the not-too-distant future, Martin also envisions bank-based systems that track all of an individual’s spending and use that information to make personal finance recommendations. “All of our life will be digitized,” he says.

Minnesota’s Riedl also imagines applications far beyond commerce, helping people navigate everything from work tasks to social relationships. Many observers have noted that human’s biological evolution has been largely overtaken by cultural evolution, he says. But by combining computer and human strengths, Riedl says, recommender systems let us “create systems that take us to new places, new geniuses.”

Figures

Figure. Clusters of movies discovered by a computer algorithm created for the Netflix Prize competition, with lines closer to yellow representing stronger similarities and colors closer to red representing weaker similarities.

Footnotes

DOI: http://doi.acm.org/10.1145/1536616.1536622

Implicit and Explicit Information

Measuring Effectiveness

Widening Impact

Figures

Just For You

DOI

August 2009 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Implicit and Explicit Information

Measuring Effectiveness

Widening Impact

Figures

Just For You

DOI

August 2009 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.