News
Data and Information

True or False?

Posted
Searching.
A paper by Google researchers suggests the possibility of incorporating more data on the "correctness" of information into search engine results.

For the third time in less than five years, there has been a buzz over a search engine algorithm designed by Google. Researchers designated the algorithm "Knowledge-Based Trust" (KBT), and it is intended to determine the "correctness of factual information provided by the source," boosting a website to the top of search results not just because it displays impactful key words designed to draw Internet users to a specific site, but also because it is "trustworthy."

Right now, this advance is purely theoretical. A Google spokesperson said, "This was research. We don’t have any specific plans to implement it in our products. We publish hundreds of research papers every year."  Nevertheless, the study may be considered compelling since it reports gleaning "over 2.8 billion facts extracted from the Web," and then estimating "the trustworthiness of 119 million webpages."

Google is known for implementing improvements to its search engine over time, such as Knowledge Graph, which was introduced in 2012. Google created Knowledge Graph as a means of better understanding user queries by building information about connections between objects in the real world. Knowledge Graph extracts data from unstructured information on web pages in order to create a structured database of people, places, and things on the Internet, and the relationships between them all. Knowledge Graph shows up as a panel next to search results, augmenting them with broader information on the topic for easy user access. The Knowledge Graph contains information about "more than 500 million entities in the world, and more than 3.5 billion facts and relationships between those entities."

Last year, Google introduced Knowledge Vault (KV), a "web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories." The resulting knowledge base (KB), which fused together "multiple extraction sources with prior knowledge derived from an existing KB," is "about 38 times bigger than existing automatically constructed knowledge bases," according to its developers.

Still, what if the information found is interesting, but not truthful or accurate? Google researchers, realizing "the facts extracted by automatic methods such as KV may be wrong," proposed using Knowledge-Based Trust to estimate "source trustworthiness" by extracting "a plurality of facts from many pages using information extraction techniques," then jointly estimating "the correctness of these facts and the accuracy of the sources using inference in a probabilistic model." 

 "There’s no mathematical formula to determine ‘fact,’" points out Bryan S. Murley, an associate professor of new and emerging media at Eastern Illinois University. Murley wondered what a KBT search on the firing of a host on TV’s Top Gear would ostensibly label as the best reliable resource. "I [could] find 100-200 sites carrying the same information. Which is the most credible?"

"One link from IRS.gov is worth more than 20 kindergarten websites talking about ‘daddy pays taxes,’" points out Kevin Lee, CEO of search engine marketing firm Didit, which helps companies show up higher in search engine results, among other services. Lee notes that a search on the term "pay taxes" might still rank children’s uninformative letters on the topic uppermost in search results.

For almost 20 years, it has been shown how easy it is to make untrue health information seem trustworthy, by demonstrating a relationship between "vaccines" and "autism." It started with a study relating the two published in 1998 in Britain’s’ The Lancet. Discovering the study was fabricated, the publishers retracted the study that same year and apologized. Yet scientific sources periodically must still dismiss the vaccine/autism association, because it continues to be repeated in a variety of media (most recently by activist Robert F. Kennedy Jr. in People magazine).

Lee said the good news about potential implementation of the KBT algorithm into the Google search engine would be raising "the bar for content creators like us. We can’t just run material that’s not supported by facts."

Tim Farley, a software engineer who has been developing tools to combat misinformation on the Internet since 2008, said search engine optimization (SEO) developers would do "whatever it takes to rank their client more highly" in search engine results. That means there’s a "constant battle of technology," he said, on how to delete/restore top rankings for search engine results listings.

Is the Google study’s objective of trustworthiness truly innovative? In July 2014, Bing posted an FAQ emphasizing its progress in ensuring more reliable searches to users of its search engine. The LazyTruth app already aids questioning independent searchers by debunking viral rumors that show up in email inboxes.

Overall, Farley offers this reminder: "It’s often a long ways between a piece of research and an actual piece of technology."

Wendy Meyeroff is a health and technology writer based in the Baltimore area of Maryland.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More