The Communications Web site, cacm.acm.org, features 13 bloggers in the BLOG@CACM community. In each issue of Communications, we'll publish excerpts from some of their posts, plus readers' comments.
Netflix is offering one million dollars for a better recommendation engine. Better recommendations clearly are worth a lot.
But what are better recommendations? What do we mean by "better"?
In the Netflix Prize, the meaning of better is quite specific. It is the root mean squared error (RMSE) between the actual ratings Netflix customers gave the movies and the predictions of the algorithm.
Let's say we build a recommender that wins the contest. We reduce the error between our predictions and what people actually will rate by 10% over what Netflix used to be able to do. Is that good?
Depending on what we want, it might be very good. If what we want to do is show people how much they might like a movie, it would be good to be as accurate as possible on every movie.
However, this might not be what we want. Even in a feature that shows people how much they might like any particular movie, people care a lot more about misses at the extremes. For example, it could be much worse to say that you will be lukewarm (a prediction of 3½ stars) on a movie you love (an actual of 4½ stars) than to say you will be slightly less lukewarm (a prediction of 2½ stars) on a movie you are lukewarm about (an actual of 3½ stars).
Moreover, what we often want is not to make a prediction for any movie, but find the best movies. In TopN recommendations, a recommender is trying to pick the best 10 or so items for someone.
A recommender that does a good job predicting across all movies might not do the best job predicting the TopN movies. RMSE equally penalizes errors on movies you do not care about seeing as it does errors on great movies, but perhaps what we really care about is minimizing the error when predicting great movies.
There are parallels here with Web search. Web search engines primarily care about precision (relevant results in the top 10 or top three). They only care about recall when someone would notice something they need missing from the results they are likely to see. Search engines do not care about errors scoring arbitrary documents, just their ability to find the TopN documents.
Aggravating matters further, in both recommender systems and Web search, people's perception of quality is easily influenced by factors other than the items shown. People hate slow Web sites and perceive slowly appearing results to be worse than fast-appearing results. Differences in the information provided about each item, especially missing data or misspellings, can influence perceived quality. Presentation issues, even the color of links, can change how people focus their attention and which recommendations they see. People trust recommendations more when the engine can explain why it made them. People like recommendations that update immediately when new information is available. Diversity is valued; near duplicates disliked. New items attract attention, but people tend to judge unfamiliar or unrecognized recommendations harshly.
In the end, what we want is happy, satisfied users. Will a recommendation engine that minimizes RMSE make people happy?
Another thing that seems to be often overlooked is how you get users to trust recommendations. When I first started playing with recommendation algorithms I was trying to produce novel resultsthings that the user didn't know about and would be interesting to them, rather than using some of the more basic counting algorithms that are used for Amazon's related products.
What I realized pretty quickly is that even I didn't trust the recommendations. They seemed disconnected, even if upon clicking on them I'd realize they were, in fact, interesting and related.
What I came to from that was that in a set of recommendations you usually want to scale them such that you slip in a couple of obvious results to establish trustthings the user almost certainly knows of, and probably won't click on, but they establish, "OK, yeah, these are my taste." Then you apply a second ranking scheme and jump to things they don't know about. Once you've established trust of the recommendations they're much more likely to follow up on the more novel ones.
What I am saying is that, rather than just viewing privacy as not sharing information with others, or viewing privacy as projecting a desired persona, we should also consider how to make systems so that people can safely share more information and get the associated benefits from doing so....
There are many dimensions here in this design space. We can change what is shared, how it is shared, when something is shared, and who it is shared with. One key challenge is in balancing privacy, utility, and the overhead for end users in setting up these policies. Another key challenge is understanding how to help people change these policies over time to adapt to people's needs. These are issues I'll discuss in future blog postings.
For me, a particularly intriguing way of thinking here is safe staging, an idea that Alma Whitten brought to the attention of security specialists in her seminal paper Why Johnny Can't Encrypt. The basic idea is that people progressively get more powerful tools as they become comfortable with a system, but are kept in a safe state as much as possible as they learn how to use the system. A real-world example would be training wheels on a bicycle. For systems that provide any level of awareness, the defaults might be set, for example, so that at first only close friends and family see anything, while over time people can easily share more information as they understand how the system works and how to control things.
Personally, I believe that there are a collection of planet-threatening problems, such as climate change and ozone depletion, that only scientists are in a position to solve. Hence, the sorry state of DBMS support in particular (and system software support in general) for this class of users is very troubling.
Science users, of course, want a commercial-quality DBMS, i.e., one that is reliable, scalable, and comes with good documentation and support. They also want something that is open source. There is no hope that such a software system can be built in a research lab or university. Such institutions are good at prototypes, but not production software. Hence, the obvious solution is a nonprofit foundation, along the lines of Apache or Mozilla, whose charter would be to build such a DBMS. It could not be financed by venture capital, because of market size issues. As such, support must come from governments and foundations.
It is high time that the United States got behind such an initiative.
While I agree that RDBMS is not an optimal technology for scientific applications and that an open source initiative may lead to some good innovation, I'd be cautious in separating the data model from the query and management language.
There are proprietary tools, such as kx.com, that have done so successfully.
The speed and capacity of such tools is phenomenal (as are the licensing fees one must pay).
In computer science, the way that we investigate computation is with programming. We don't want to teach computing as a pile of "accumulated knowledge." We know that that doesn't lead to learning. We need to teach computation with exploration and investigation, which implies programming.
The best research study that I know of that addresses this question is Chris Hundhausen's study where he used algorithmic visualization in CS1. He had two groups of students. One group was to create a visualization of an algorithm using art supplies. The students were learning theory and describing the process without programming. The second group had to use a visualization system, ALVIS. The students were learning theory and encoding their understanding in order to create a presentation. As Chris says in his paper, "In fact, our findings suggest that ALVIS actually had a key advantage over art supplies: namely, it focused discussions more intently on algorithm details, leading to the collaborative identification and repair of semantic errors." If you have no computer system, it's all too easy to say "And magic happens here." It's too easy to rely on intuitive understanding, on what we think ought to happen. Having to encode a solution in something that a computer can execute forces an exactness such that errors can be identified.
The idea isn't that programming creates barriers or makes it harder. Rather, using the computer makes it easier to learn it right. Without a computer, it's easier to learn it wrong, where you just learn computing as a set of accumulated knowledge (as described in the AAAS report) or with semantic errors (as with art supply algorithm visualization). If you don't use programming in CS1, you avoid tedious detail at the possible (even likely) loss of real learning.
©2009 ACM 0001-0782/09/0500 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2009 ACM, Inc.
No entries found