News
Data and Information

Beyond Search

Posted
Searching for academic research papers.
The latest academic research tools do not just find papers; they analyze them.

Earlier this month, the Allen Institute for Artificial Intelligence launched a new kind of academic search tool. The service, Semantic Scholar, employs machine learning techniques to analyze millions of research papers, and its main goal is similar to that of the leading technology in the field, Google Scholar. Both are designed to make scientists more efficient and productive in the age of information overload.

"Science is expanding ever more rapidly and tools like this are really what we need to stay on top of the latest developments," says computer scientist Andrew McCallum of the University of Massachusetts Amherst. "The question is what kinds of tools are the most useful?"

An increasing number of researchers are looking to create systems that generate more than a ranked list of relevant papers. Thanks to advances in machine learning, projects like Semantic Scholar — along with efforts from Microsoft Research, computer scientist Lee Giles of Pennsylvania State University, and others — are scouring the growing volume of scientific research in a new way.

Google Scholar is the most popular tool in part because it is fast, free, and comprehensive. The service covers a wide range of academic disciplines and even searches content behind paywalls. Semantic Scholar only analyzes open-access papers in computer science, but its results are often richer and more detailed. For instance, if you were to search "deep learning" on Semantic Scholar, the service would initially generate a ranked list of more than 120,000 papers. After you constrain the date range, key phrases, and authors, that list might be narrowed down to three or four publications, each of which appears with a detailed overview.

This alone is helpful, according to experts, but clicking on one of these papers triggers another layer of analysis. On the new page, in addition to showing the abstract and author list, Semantic Scholar pulls out figures and tables, generates its own key phrases, and ranks the references within the paper according to how strongly they influenced the content. "If a paper is being cited once, then maybe it’s a courtesy," explains Allen AI CEO Oren Etzioni, "but if it’s in a caption to a figure or it’s being mentioned over and over again, then those are variables indicating a close relationship." Users can then scroll over these references to read the full citation excerpts.

Semantic Scholar developed an alternative approach to key phrases as well. Giles, a pioneer in the field, notes that searching an academic paper is easy on one level, since everything is labeled, including the tables, figures, authors, and keywords. Yet Etzioni says author-chosen keywords in the actual paper are often too high-level or specific to be useful. "They’re not really getting at this question of, ‘hey do I really want to read this paper? What is it really about?’" he notes. Semantic Scholar analyzes the frequency of phrases inside the paper, the sections in which they appear, and even the terms and phrases used to refer to the work in other publications. When a paper cites another article, Semantic Scholar will scour that reference for potential key phrases as well.

One of the common goals of these efforts is to help researchers quickly find the work they need, and prevent them from wasting time scouring the web. At Microsoft, computer scientist Alex Wade says the company is experimenting with improving academic search and general search by focusing more on user interaction. "One of the things we’re trying to do is put the librarian back into the equation," Wade says. "As a librarian, when someone asks you a question, you don’t say, ‘here’s 1.3 million books.’ You ask clarifying questions. Then you use that information to present the person with a more refined list or even a direct answer."

The long-term vision is to use Microsoft’s Cortana technology to ask questions in response to an initial query, then generate better results based on the user’s answers.

While the focus for now is on making researchers more efficient, there are grander plans in the works for several of these systems. One popular idea is to begin mining the content and data within papers to produce new methodologies and even scientific hypotheses. Etzioni insists this is a realistic goal for Semantic Scholar, and that it could represent an entirely new way of helping scientists become more productive. "Over the past 10 years, the Google Scholar team has truly revolutionized research," says Etzioni. "To me, the idea of making the world’s scientists more effective is very inspiring. So the question is: can we take it to the next level?"

Gregory Mone is a Boston, MA-based writer and the author of the novel Dangerous Waters.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More