Research and Advances
Computing Applications

Helping People Find What They Don’t Know

Recommendation systems help users find the correct words for a successful search.
  1. Article
  2. References
  3. Author

Imagine you are performing a task while interacting with a service hosted on the Internet or with an automated speech recognition mobile phone service. What if during your interaction with this service, a machine makes a recommendation suggesting how you could better perform your current task? An important problem relating to personalization concerns understanding how a machine can help an individual user via suggesting recommendations.

When people engage in information-seeking behavior, it’s usually because they are hoping to resolve some problem, or achieve some goal, for which their current state of knowledge is inadequate. This suggests they don’t really know what might be useful for them, and therefore may not be able to specify the salient characteristics of potentially useful information objects.

Unfortunately, typical information systems require users to specify what they want the system to retrieve. Furthermore, people engaging in large-scale information systems typically are unfamiliar with the underlying operations of the systems, the vocabularies the systems use to describe the information objects in their databases, and even the nature of the databases themselves. This situation suggests it might be appropriate for some part of the information system to recommend courses of action to information seekers, which could help them to better understand their problems, and to use the system’s resources more effectively. This is the general challenge our research group at Rutgers has been addressing over the last several years [2, 4].

One specific aspect of the difficulties people face in interacting in information systems is choosing the correct words to represent their information problems. In the typical information system, which assumes a model of information seeking called "specified searching," the user in the system is asked to generate a query, which is understood to be a specification of what she or he wants to have retrieved. In order for the system to search and find appropriate responses, the query must be couched in terms matching the way the information objects are represented in the system. Whether such representation is based on the actual words used in the information objects themselves (so-called "keyword representation"), or on a controlled vocabulary representing the domain or the database (so-called "conceptual representation"), the problem for the user is the same: How to guess what words to use for the query that will adequately represent the person’s problem and be the same as those used by the system in its representation. In information retrieval research and practice, it is generally understood that accomplishing these two goals is a multistage, interactive process of initial query formulation, which allows users to enter into interaction with the system, and subsequent iterations of query reformulation, based upon the results of the interaction [5, 8]. This is an extremely difficult problem as it is difficult for people to specify what they don’t know; there are many words that can be used to express the same ideas; predicting how another will talk about a topic is uncertain at best; and, predicting what another finds important, and worthy of representation, cannot be readily ascertained. For instance, consider the person who wishes to find obituary information about some group of well-known Americans. In a system relying on the words in the text for representation, using the term "obituary" in the query will not be useful, since that word is never used in the text of an obituary. However, words or phrases such as "died," "yesterday" (or any of the days of the week), "mourned by," "survived by," are commonly used in obituaries. It will be the rare user who will understand these characteristics of newspaper obituaries and be able to make use of them in an initial query, or even in query reformulation. Similar arguments hold for the representation of "well-known" and "American." How can a system help its user to overcome such problems?

In the mid-1960s, John Rocchio suggested a technique for addressing this problem called "relevance feedback" [7]. For reasons already mentioned, a user is unlikely to begin an interaction with the ideal query (that is, that query that best specifies what is to be searched for and retrieved). Furthermore, because the user is unlikely to understand the complexities of representation and matching within an information retrieval system, that person will be unlikely to engage in effective query reformulation. However, we can assume the user will be able to recognize, and indicate whether a retrieved information object is relevant or not to the problem. Rocchio suggested the system could use the characteristics (that is, word frequencies and distributions) of the information objects judged relevant or not in order to modify (reformulate) the original query, until the query eventually became ideal, separating relevant from nonrelevant objects in the best possible way. The user’s role in this interaction is merely to indicate relevance or nonrelevance of a retrieved object; the query reformulation takes place internal to the system, and the user’s only knowledge of that process is through the list of objects retrieved as a result of the reformulated query. We can characterize this type of interaction as system-controlled with respect to term recommendation. However, indicating relevance or nonrelevance gives the user some measure of influence on query reformulation through her or his interaction with the system results.

An alternative approach to system-support for query reformulation is for the system to show the user—given the terms used in the original query, and/or the documents retrieved by the original query—new terms that might be useful for query reformulation. These terms can be identified through their empirical relationships to the query terms as determined by co-occurrence, for instance, with the query terms in a document, or co-occurence in similar contexts in the database. It is the user’s task in such systems to examine the suggested terms, and to manually reformulate the query given the information provided by the system. Such techniques are typically known as "term suggestion" devices, and can be thought of as user-controlled, at least to the extent the user controls how the query is reformulated. In this case, the actual terms suggested do not depend upon the user’s response to the system’s results.

When people engage in information-seeking behavior, it’s usually because they are hoping to resolve some problem, or achieve some goal, for which their current state of knowledge is inadequate.

At Rutgers, we have been investigating support for query reformulation (that is, recommendation by the system of how a query might be better put) both with respect to relevance feedback versus term recommendation, and with respect to user knowledge and control of such support. One of our early results [6] showed that relevance feedback worked well in an interactive information retrieval environment, but it also worked better with both increased knowledge of how it worked, and with increased control by the user of its suggestions. That is, a version of relevance feedback in which the user was informed of the basic algorithms used in query reformulation, and in which the terms the system would use to reformulate the query based on the user’s relevance judgments were presented to the user for selection (a term suggestion device), performed consistently better than one where the user knew only that marking documents relevant would help the system to find similar documents. Perhaps more important, the subjects in the experiment preferred the former to the latter by a wide margin, because they felt they had control and knowledge of the query reformulation process. This led us to the conclusion that explicit term suggestion is a better way to recommend system support for query reformulation than automatic, behind-the-scenes query reformulation.

We recently compared our version of relevance feedback as a term suggestion device (in which the user controls the suggested terms through marking documents relevant) with a version of term suggestion in which the user has no control over which terms are suggested [3]. In both systems, users had some knowledge of how the suggested terms were chosen. The primary difference between the two was that users of the relevance feedback-based system had to make decisions about whether a document was relevant before they were offered any suggested terms. In the uncontrolled term suggestion system such terms were displayed at the same time as the query results. Our results indicate that users were willing to give up the control they gained over suggested terms through explicit relevance feedback, in favor of the reduced effort (that is, not having to make both relevance and term selections decisions) on their part in the uncontrolled term suggestion system.

What can we make of these results? It seems that user control over system recommendation for query reformulation is important to users with respect to their main task—a good query reformulation. But control (and, therefore, better understanding) of what terms are actually suggested—a subsidiary task—is not very important. Rather, having to engage in the subsidiary task distracts them from what they actually need to do. These conclusions must be understood with several caveats, however. First, it does seem to be necessary that users have some understanding of how the suggested terms are determined in order to be comfortable and effective in using them. Also, the terms suggested need to be perceived as related to the context of the search. Strange or unexpected terms made the subjects uncomfortable, and distracted them from query reformulation, and from the search task. These conditions mean that in order to accept and use the system recommendations effectively, the users need to have some trust in the system with respect to the suggested terms. They also need to exert control over the system with respect to the terms they thought would be useful. Trust with respect to the task not perceived as salient allowed the users to accept the recommendation without question. But with respect to the task that is clearly salient, the users were not willing to give up their autonomy to the system. These results have clear implications for how recommender systems should operate in general.

The work described here concerns offering support to users of information systems who engage in one particular kind of information-seeking activity—specified searching. Of course, people engage in many other kinds of interactions with information, for instance, browsing, evaluating, using, learning, both within a single information-seeking episode, and across episodes. At Rutgers University, and in collaboration with colleagues elsewhere, we are engaged in a long-term program researching how best to offer support to people in a variety of different information-seeking behaviors [1, 4]. Query formulation and reformulation is just one problem people face in one or more of such activities. Understanding the contents of databases, learning about effective vocabularies, being able to evaluate the relevance of an information object quickly and accurately are other kinds of important problems that people face in their information seeking for which system recommendations could offer useful support. As we have addressed several such challenges, we have seen results similar to those we found in our query reformulation studies: With sufficient reason to trust the system recommendations, users are willing to give up some measure of control, accepting suggestions while maintaining control over how they are applied. We are attempting to apply these results in the design of cooperative, collaborative, dialogue-based information systems where users and the rest of the system each have their own roles and responsibilities, offering and accepting suggestions from one another, as appropriate.

Back to Top

Back to Top

    1. Belkin, N.J. Intelligent information retrieval: Whose intelligence? Herausforderungen an die Informationswissenschaft. Proceedings des 5. Internationalen Sypmosiums für Informationswissenschaft (ISI '96). J. Krause, M. Herfurth, and J. Marx, Eds. 1996. Universitätsverlag Konstanz, 25–31.

    2. Belkin, N.J. An overview of results from Rutgers' investigations of interactive information retrieval. In Proceedings of the Clinic on Library Applications of Data Processing. P.A. Cochrane and E.H. Johnson, eds. 1998. Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, 45–62.

    3. Belkin, N.J., Cool, C., Head, J., Jeng, J., Kelly, D., Lin, S.J., Lobash, L. Park, S.Y., Savage-Knepshield, P., and Sikora, C. Relevance feedback versus Local Context Analysis as term suggestion devices: In Proceedings of the Eighth Text Retrieval Conference TREC8. (Washington, D.C., 2000). In press;

    4. Belkin, N.J., Cool, C., Stein, A., and Thiel, U. Cases, scripts and information seeking strategies: On the design of interactive information retrieval systems. Expert Syst. Apps. 9 (1995), 379–395.

    5. Efthimiadis, E. Query expansion. Annual Rev. Info. Sci. Tech. 31 (1996), 121–187.

    6. Koenemann, J. Relevance feedback: usage, usability, utility. Ph.D. Dissertation (1996). Rutgers University, Dept. of Psychology. New Brunswick, NJ.

    7. Rocchio, J. Relevance feedback in information retrieval. The SMART Retrieval System: Experiments in Automatic Document Processing. G. Salton, ed. (1971). Prentice-Hall, Englewood Cliffs, NJ, 313–323.

    8. Spink, A. and Losee, R.M. (1996) Feedback in information retrieval. Annual Rev. Info. Sci. Tech. 31 (1996), 33–78.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More