CACM logo

BLOG@CACM

What is a Good Recommendation Algorithm?

[article image]

Someone may win the one million dollar Netflix Prize soon.  Will the winning algorithm produce movie recommendations that people like?

User Comments

 (10)

I believe that the main point of this post is correct: the best RMSE is not equal to the best user satisfaction, but I am not sure that the TopN is the only one relevant metrics for the movie recommendation system. For example, TopN does not say anything about the diversity (if I LOVE French comedies with Pierre Richard, it does not mean that I want to watch only them this week, I want more suggestions in different genres), novelty, etc
I would expect good movie recommendation system to be a good 'exploration' interactive system, which could tell me why I may like this movie and why it is similar/different from the movie I like/dislike (http://www.clerkdogs.com/ is a good example)

The movie rating example reminds me of utility theory - I really have to brush up on that but there might be some fitting models of utility that could be used to derive an improved quality measure of recommendations in certain domains.
I think the domain or user-need specificity of the quality measure is key here. I've got different requirements on a news filtering system than a movie recommendation system.
The former should keep me informed while consuming a minimum of my time and I don't really need an explanation of why something was recommended to me - except for when it's so far of that I've to figure out what corrective action to take.
The latter should assist me in figuring out in which movie to invest time and money, and I'm willing to invest some time up front to make a good decision. Here, diversity in the set of recommended movies and an explanation of the reasons for recommending a movie are welcome.

What makes a recommendation system great? In my mind the answer is simple. The best recommendation systems are the ones that engage the user and drive customer loyalty.

Things like RMSE over a test data set given a training set are at best crude proxies for this, and at worst completely miss the mark. Even metrics like click through rate, order size and conversion rate that just consider session-level behavior can be misleading. In my experience they tend to drive you towards recommendations that are not globally optimal in the long term.

The delicate balance is to be reactive to short-term trends in the market, but to do so with an eye towards driving long-term value via deep relationships with your customers.

I have this conversation with richrelevance's customers all the time, and I'm pleased that they share my commitment to building long-lasting relationships with their customers.

Beyond how you interpret RMSE (or whatever metric you decide on), you really do have to to consider the user's task and the cost of a bad recommendation.

For a Netflix user, the cost of a bad recommendation is not so great. The risk of that bad recommendation (how bad does the recommedation have to be such that you still rent the movie have and still ruin your evening?) is also not so great.

I have long thought this is a perennial barrier for recommender research -- beyond how commercializable it might or might not be, there's only so far you can get trying to recommend movies. Recommenders are in use in lots of other domains, not all in product or media recommendation, but no research is being done there. Well, not a lot.

While I agree that user's generally want more 5-star movies and fewer 1-star movies, I disagree that this means recommendation is similar to TopN web search. Web search assumes very little interactivity, and once the user has found the one item/link he is looking for, he is done with the search activity.

With recommendations, on the other hand, people are more exploratory- and recall-oriented. I'll bet people don't just have 3 or 10 items in their Netflix queue. We would have to ask Netflix what that average queue length is, but anecdotal evidence (http://www.geeksugar.com/1865307) places that number in the dozens to hundreds range. That's much more recall-oriented than top3 or top10 web search.

Another example is music recommendation, ala Pandora. You seed Pandora with a few songs or artists that you like, and it then sets up a personalized, recommendation-oriented radio station for you, and streams the music to you at a rate of approximately 20 songs per hour. A couple of hours, over a couple of days, puts the number of recommendations in the hundreds. After a few weeks or months of using Pandora, this number moves to the thousands.

So unlike web search, where people want to find the one answer and be done, Pandora's music recommendation is a longer-term, recall-oriented process. And I'll bet people are even more willing to put up with some bad, and even more lukewarm, songs in the mix -- because they're more interested in getting as many good, different, interesting songs (dozens? hundreds?) as possible. Picking the 10 items that someone will love is not the only thing that matters to them. Recall trumps precision.

I think that the 5 star recommendation system is fundamentally flawed as a preference rating system. The five star system was meant to be a democratic rating system, and should have been used to measure individual preference. Netflix should have posed the challenge to develop a better rating system, not a better algorithm. Read more here:

http://www.thinksketchdesign.com/2009/03/25/web/media/netflix-on-facebook-the-slow-revolution-of-recommendation-engines

I've posted on this topic at

http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/netflix_prize_s.html

RMSE doesn't reward a system that's aware of its own uncertainty, and distinguishing between mediocrity and controversy does require a model of uncertainty.

Another thing that seems to be often overlooked is how you get users to trust recommendations. When I first started playing with recommendation algorithms I was trying to produce novel results -- things that the user didn't know about and would be interesting to them, rather than using some of the more basic counting algorithms that are used e.g. for Amazon's related products. What I realized pretty quickly is that even I didn't trust the recommendations. They seemed disconnected, even if upon clicking on them I'd realize they were, in fact, interesting and related.

What I came to from that was that in a set of recommendations you usually want to scale them such that you slip in a couple of obvious results to establish trust -- things the user almost certainly knows of, and probably won't click on, but they establish, "Ok, yeah, these are my taste." Then you apply a second ranking scheme and jump to things they don't know about. Once you've established trust of the recommendations they're much more likely to follow up on the more novel ones.

This differs somewhat from search where the catch phrase is "authoritative sources" (stemming back to Kleinberg's seminal paper on graph-based search) -- you want to hit the right mixes of novelty and identity, rather than just finding high degrees of correlation.

Perhaps for the best of both worlds, focusing on improving both search and recommendations (precision and recall) to offer people the two options for discovering media is the way to go.

http://www.jinni.com

I've posted on this topic at

http://www.stat.columbia.edu/~cook/movabletype/archives/2008/11/netflix_prize_s.html

RMSE doesn't reward a system that's aware of its own uncertainty, and distinguishing between mediocrity and controversy does require a model of uncertainty.

sign in to comment

If you are an ACM member, Communications subscriber, Digital Library subscriber, or use your institution's subscription, please set up a web account to access comments, premium content and additional site features.

If you are a SIG member or member of the general public, you may set up a web account to comment on free articles and sign up for email alerts.

Tools For Readers

Bookmark and Share
Default Font Size Large Font Size X-Large Font Size Text Size

Related ACM Resources

Conferences:

Books:

Courses:

  • Voice Over IP (Revised) - In this course you will examine the merger of voice and data communications into a single multi-protocol infrastructure referred to as telephony, voice over IP …

In The Digital Library


About Communications | Join ACM External Link | Renew External Link | Subscribe External Link | Sign In | For Authors | For Advertisers External Link | Privacy | Site Map | Help | Contact Us

Copyright © 2009 by the ACM. All rights reserved.