Bias on the Web

Biased search results on product information illustrate a general problem of social importance. The Web is replacing traditional repositories that individuals and organizations turn to for the information needed to solve problems and make decisions.

Indexical bias in a set or list of URLs retrieved in response to a query is a function of emphasis and prominence. It is related to other "quality of information" issues, but captures an aspect of quality that differs from relevance, accuracy, timeliness, and so on. A collection of items retrieved from a database may exhibit bias whether or not the items are relevant to a user’s query. The purport of the "whether or not" caveat is clear from some extreme cases. If the items retrieved are all deemed relevant by a user, there may be others—not retrieved—that would also be considered relevant by that user. On the other hand, a set of items irrelevant to one user might be relevant to another user for the very same query.

Given a norm prescribing expected frequency or prominence of items retrieved in response to a query, a set exhibits bias when some items occur more frequently or prominently with respect to the norm, while others occur less frequently or prominently with respect to the norm [2, 8]. The absence of certain brand names in the refrigerator example signifies bias in the results of a particular engine because other engines do retrieve those brand names. Prominence is reflected in the position a URL occupies in the list of items retrieved for a given search term. The norm used in the research reported here is based on the idea of pooling the results of a basket of search engines. This norm lends itself to a practical measurement scheme.

The only realistic way to counter the ill effects of search engine bias on the ever-expanding Web is to make sure a number of alternative engines are available.