From the very early days of the World Wide Web, researchers identified a need to be able to understand the semantics of the information on the Web in order to enable intelligent systems to do a better job of processing the booming Web of documents. Early proposals included labeling different kinds of links to differentiate, for example, pages describing people from those describing projects, events, and so on. By the late 1990s, this effort had led to a broad area of computer science research that became known as the Semantic Web.1 In the past decade and a half, the early promise of enabling software agents on the Web to talk to one another in a meaningful way inspired advances in a multitude of areas: defining languages and standardsa to describe and query the semantics of resources on the Web; developing tractable and efficient ways to reason with these representations and to query them efficiently; understanding patterns in describing knowledge; and defining ontologies that describe Web data to allow greater interoperability.
In fact, Semantic Web research and practice spanned the spectrum from focusing on expressivity and reasoning on the Web4 to providing an ecosystem of linked data that allows data resources to link to one another explicitly through shared naming and equivalence statements across repositories.2 Arguably, the far ends of this spectrum were ignoring the messiness of the real Web in the former case, and were not providing enough perceivable value because of lack of any organization or semantics in the latter. However, in between, there was a broad "sweet spot" where the work coming out of these communities has led to contributions that have gone beyond research and led to undeniable advances in the way that the Web works today:
The list goes on.
As the early research has transitioned into these larger, more applied systems, today's Semantic Web research is changing: It builds on the earlier foundations but it has generated a more diverse set of pursuits. As the knowledge graphs mentioned previously increasingly use semantic representations, they have driven the functionality of a new generation of apps (mobile healthcare, mapping and shopping assistants, and others). As these applications became increasingly crucial to advertising and e-commerce, the representations they used became less formal and precise than many early Semantic Web researchers had envisioned.
As developers strive to provide structure and organization beyond just linking of data, they are not making very much use of the formal semantics that were standardized in the Semantic Web languages. Modern semantic approaches leverage vastly distributed, heterogeneous data collection with needs-based, lightweight data integration. These approaches take advantage of the coexistence of a myriad of different, sometimes contradictory, ontologies of varying levels of detail without assuming all-encompassing or formally correct ontologies. In addition, we are beginning to see the increased use of textual data that is available on the Web, in hundreds of languages, to train artificially intelligent agents that will understand what users are trying to say in a given context and what information is most pertinent to users' goals at a given time. These projects are increasingly leveraging the semantic markup that is available on the Web; for example, the IBM Watson "Jeopardy!"-playing program made use of taxonomies and ontologies (such as DBpediad and YAGOe) to increase performance significantly.3
As the early research has transitioned into larger, more applied systems, today's Semantic Web research is changing.
In addition to the increasing amount of semantically annotated information on the Web, a lot more structured data is becoming available. This data includes information from scientists and governments publishing data on the Web and the ever increasing amount of information available about each of us, individually and as societies—in the form of our social interactions, location and health data, activities, and interests. Harnessing this data, and understanding its diverse and often contradicting nature, to provide really meaningful services and to improve the quality of our lives, is something that researchers in both industry and academia are beginning to tackle. Statistical and machine-learning methods become more powerful and computational resources continue to improve. Thus, some of the semantic knowledge that researchers had to construct manually they can now learn automatically, tremendously increasing the scale of the use of semantics in understanding and processing Web data. While manually constructed formal ontologies may often (but not always) be required to form a backbone of semantics for the Web, much of the content that puts "meat" on those bones is "scruffy" and imprecise, often statistically induced. Indeed, the ontologies themselves might be learned or enhanced automatically. As the semantics, in a sense, becomes more "shallow," it could be more widely applicable.5 Consequently, our very understanding of the nature of the semantics that intelligent systems produce and leverage is changing, and with it, our vision for the future of the Semantic Web.
As we look at the next decade of the Semantic Web, we believe these trends will continue to fuel new demands on Web researchers. Thus, these trends lead us to formulate a new set of research challenges. We believe the objective of the next decade of Semantic Web research is to make this vast heterogeneous multilingual data provide the fuel for truly intelligent applications.
Bringing a new kind of semantics to the Web is becoming an important aspect of making Web data smarter and getting it to work for us.
Achieving this objective will require research that provides more meaningful services and that relies less on logic-based approaches and more on evidence-based ones. We note the rubrics listed here are not all that different from the challenges we faced in the past, but the methods, the scale, and the form of the level of representation languages changes drastically. We present questions under each of the rubrics to guide this research.
In short, bringing a new kind of semantics to the Web is becoming an increasingly important aspect of making Web data smarter and getting it to work for us. We believe our fellow computer scientists can both benefit from the additional semantics and structure of the data available on the Web and contribute to building and using these structures, creating a virtuous circle. The techniques of the early Semantic Web research have defined many of the parameters that we need in order to understand these new approaches and have provided important data resources to the community exploring how to build new Web-based applications. Continued research into Web semantics holds incredible promise, but only if we embrace the challenges of the modern and evolving Web.
5. Meusel, R., Petrovski, P., and Bizer, C. The WebDataCommons Microdata, RDFa and Microformat Dataset Series. In P. Mika et al., Eds. The Semantic Web— ISWC 2014 SE-18 (Vol. 8796, 2014), Springer International Publishing, 277–292; DOI: 10.1007/978-3-319-11964-9_18.
6. Tudorache, T., Nyulas, C., Noy, N., and Musen, M. Using Semantic Web in ICD-11: Three Years Down the Road. In H. Alani, et al., Eds. The Semantic Web – ISWC 2013 SE -13 (Vol. 8219, 2013); Springer Berlin Heidelberg, 195–211; DOI: 10.1007/978-3-642-41338-4_13
The Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.
The following letter was published in the Letters to the Editor in the December 2016 CACM (http://cacm.acm.org/magazines/2016/12/210384).
I was eager to learn about the latest developments in the Semantic Web through the lens of a "new kind of semantics" as Abraham Bernstein et al. explored in their Viewpoint "A New Look at the Semantic Web" (Sept. 2016), but by the end I had the impression the entire vision of a Semantic Web was somehow at risk.
If I understand it correctly, semantics is a mapping function that leads from manifest expressions to elements in a given arbitrary domain. Based on set theory, logicians have developed a framework to set up such mapping for formal languages like mathematics, provided one can fix an interpretation function. On the other hand, 20th-century logicians (notably Alfred Tarski) warned of the limits of the framework when applied to human languages. Now, to the extent it embraces a set-theoretic semantics (as in the W3C's Ontology Web Language), the Semantic Web seems to be facing exactly such limitations or experiencing, dealing with, and suffering them.
Most Web content is expressed as natural language, and it is not easy for programmers to bring it into clean logical form; meanwhile, Percy Liang's article "Learning Executable Semantic Parsers for Natural Language Understanding" (also Sept. 2016) gave an idea of the early stage of "semantic parsing," or the task of obtaining a formal representation of the meaning of a given text. It seems the "new semantics" in Bernstein et al., albeit not formally characterized, was an attempt to outline a better approach to tapping the linguistic nature of the Web, which is indeed remarkable.
In taking a language-oriented view, however, Bernstein et al. seemed to neglect a key feature of formal semantics — transparency. They seem comfortable with the relaxation of logic as a conceptual framework for the Semantic Web, which is typical of modern Knowledge Graphs (such as the one Google uses). But one of the consequences of such relaxation is that part of data semantics ends up being embedded in algorithms. Not only practitioners but also common users are aware that algorithms that work on Web data are embedded in only a few monolithic, private platforms that are far from open, transparent, and auditable.
Isn't keeping meanings in a handful of proprietary algorithms exactly the opposite of what the Semantic Web was meant to be?
As we mentioned in the Viewpoint, the Semantic Web is not just about texts but also about myriad data, images, video, and other Web resources. While a formal logic that could be both transparent enough for all such resources and yet usable by Web developers is a noble ambition, current logics are simply not up to the task. The transparency of "some semantics" is the best to hope for and would allow all potential developers to build Web-scale, best-effort applications.
Abraham Bernstein, Zürich, Switzerland
James Hendler, Troy, NY
Natalya Noy, Mountain View, CA
Displaying 1 comment