Sign In

Communications of the ACM

Virtual extension

Ranking billions of web pages using diodes


Introduction

Because of the web's rapid growth and lack of central organization, Internet search engines play a vital role in assisting the users of the Web in retrieving relevant information out of the tens of billions of documents available. With millions of dollars of potential revenue at stake, commercial Web sites compete fiercely to be placed prominently within the first page returned by a search engine. As a result, search engine optimizers (SEOs) developed various forms of search engine spamming (or spamdexing) techniques to artificially inflate the rankings of Web pages. Link-based ranking algorithms, such as Google's PageRank, have been largely effective against most conventional spamming techniques.

However, PageRank has three fundamental flaws that, when exploited aggressively, can be proven to be its Achilles' heel: First, PageRank gives a minimum guaranteed score to every page on the Web; second, it rewards all incoming links as valid endorsements; and third, it imposes no penalty for making links to low-quality pages. SEOs can take advantage of these shortcomings to the extreme by employing an Artificial Web, a collection of an extremely large number of computer-generated Web pages containing many links to only a few target pages. Each page of the Artificial Web collects the minimum PageRank and feeds it back to the target pages. Although the individual endorsements are small, the flaws of PageRank make it possible for an Artificial Web to accumulate sizable PageRank values for the target pages. The SEOs can even download a substantial portion of the real Web and modify only the destinations of the hyperlinks, thus circumventing any detection algorithms based on the quality or the size of pages. As the size of an Artificial Web can be comparable to that of the real Web, SEOs can seriously compromise the objectivity of the results that PageRank provides. Although some statistical measures can be employed to identify specific attributes associated with an Artificial Web and filter them out of search results, it is far more desirable to develop a new ranking model that is free of such exploits to begin with.

The full text of this article is premium content


Log in to Read the Full Article

Sign In

Sign in using your ACM Web Account username and password to access premium content if you are an ACM member, Communications subscriber or Digital Library subscriber.

Need Access?

Please select one of the options below for access to premium content and features.

Create a Web Account

If you are already an ACM member, Communications subscriber, or Digital Library subscriber, please set up a web account to access premium content on this site.

Join the ACM

Become a member to take full advantage of ACM's outstanding computing information resources, networking opportunities, and other benefits.
  

Subscribe to Communications of the ACM Magazine

Get full access to 50+ years of CACM content and receive the print version of the magazine monthly.