Personalized Search

The magnitude of the difference between the Outride system and the other engines is compelling, especially given that most search engines are less than 10% better than one another.

Interestingly, the retrieval process can be infused with different granularities of usage data—individual, group/social, and census—enabling a system to fallback to a coarser level of usage data in the face of uncertainty. The latter forms create a kind of social relevancy, where the notion of importance is defined by the usage of a community of users. Very few usage-based systems have been developed, with the most notable exception being Direct Hit’s collaborative filtering-inspired approach that monitors which search results people select.

What’s curious about these approaches is that relevance is measured as a function of the entire population of users. One can view this as an attempt to optimize the consensus relevancy for any given topic. For any query, relevancy is computed identically for all users without acknowledging that relevance is relative for each user. Further, none are able to differentiate based upon who is searching, their current context, interests, and/or prior knowledge. What’s needed is a way to take into account that different people find different things relevant and that people’s interests and knowledge change over time. What’s needed is a way to compute personal relevancy.

What’s needed is a way to take into account that different people find different things relevant and that people’s interests and knowledge change over time.

User models are computed from the content in these information spaces in the Outride sidebar. The models are based upon the ontology of the Open Directory Project (ODP) where each user has their own weighting across the top 1,000 categories of the ODP. Upon download of the sidebar component, if the user imports a set of favorite links, the system fetches the pages and classifies them into the ODP adjusting the weights accordingly. If no links are imported, the user starts out with no content weighting. As the user clicks around the Web each click is captured by the sidebar, classified, and the user model is updated accordingly. The last 1,000 unique clicks of each user are stored in their surf history.

Query augmentation is performed by integrating various clues provided by the instrumentation of the interface and user models. If a user is browsing the ODP, the category name and its contents are compared to the query to see if they are similar. Likewise, the title and contents of the currently viewed Web page is checked. Our initial investigations found that while these cues provide meaningful data, comparing the query to the users’ content profile using vector methods provide better results. Only queries exceeding certain similarity thresholds are augmented automatically by the system.

Result set processing is performed across an expanded set of results, typically 1,000, from the backend search engine. Filtering by "Have Seen, Have Not Seen," and usage-based re-ranking are straightforward to implement. To re-rank search results based upon the user profile, the titles and other metadata from the pages are compared via vector methods against the user profile. Although far from optimal, if we were to re-rank results based upon the context of each result page it would require indexing the entire Web—an option we felt better handled via a partnership.

We found the combination of query augmentation and result processing with a contextually designed interface to be quite effective in making search easier and faster.