acm-header
Sign In

Communications of the ACM

ACM TechNews

Extracting Meaning From Millions of Pages


stacks of papers

Credit: iStockPhoto.com

University of Washington researchers have developed an automated information extraction software engine that mines meaning out of more than 500 million Web pages, contributed by Google, by analyzing fundamental relationships between words. The project expands the scale of the TextRunner application in terms of the number of pages and the breadth of topics it can examine. "The significance of TextRunner is that it is scalable because it is unsupervised," says Google research director Peter Norvig. "It can discover and learn millions of relations, not just one at a time. With TextRunner, there is no human in the loop: It just finds relations on its own."

University of Washington researcher and project leader Oren Etzioni says the prototype still has a simple interface and is meant to function as a demonstration of automated information extraction rather than as a public search tool. "This work reflects a growing trend toward the design of search tools that actively combine the pieces of information they find on the Web into a larger synthesis," notes Cornell University scientist Jon Kleinberg. The University of Washington researchers are now working on the building of inferences from natural-language queries, using TextRunner as a jumping-off point.

From Technology Review
View Full Article

 

Abstracts Copyright © 2009 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account