To capture cultural heritage is to capture the experience of people who are directly involved in creating, witnessing, and maintaining cultural heritage objects. Ideally, the people accessing digital representations of cultural heritage objects are able to understand the significance underlying the objects. The question is how to capture (the experience of) cultural objects in digital form. Various modalities exist for representing cultural heritage: unstructured textual data, possibly including images or videos, as well as structured data.
Capisco has been shown to provide quality semantic search results for English-language texts, with promising early results for other languages.
To illustrate our approaches, we pick one cultural heritage representative: rendang, one of the five national dishes of Indonesia, believed to have existed as early as the 15th century. From textual sources, we may learn about rendang and its history. According to Nurmufida et al.,8 rendang is a traditional cuisine originating from West Sumatra, with beef and coconut milk as its main ingredients. From image sources, we may see what rendang looks like. For example, Wikimedia Commons contains images of rendang that show that, despite being similar to curry, rendang is actually drier. Next, we may wonder how to cook rendang. A simple YouTube search provides a wide selection of videos showing how to cook rendang, with the chefs ranging from local Indonesians to international chefs. In fact, the word rendang originates from how it is cooked; that is, slowly (merandang, in the Minangnese language).
Much information about heritage is already available in traditional books. Recipe variations for rendang are published in many cookbooks, and its cultural significance can be traced through the centuries in novels, newspapers, and historic books. Many of these can now be accessed through their digitized versions, for example, via the HathiTrust Digital Library (www.hathitrust.org). A quick search for rendang reveals cookbooks, introductions to the Indonesian language, and to Javanese culture. However, not all search results refer to the dish; for example, Raffles10 refers to "Mangsa rendang" as the season of rain, and Rendang is also a district in Bali. The same search will also miss references to kalio, a wetter version of rendang. Potential resources in other languages (using spelling variations) may also be lost. Thus, false positives (wrong hits) and false negatives (omitted hits) may occur in the search results.
Semantic text analysis is one approach to make such hidden information accessible and to help avoid irrelevant search results. The Capisco system6 avoids the need for complete semantic text markup by using an automatically generated Concept-in-Context (CiC) network. The network is seeded by semantic concepts and their context of use as identified from Wikipedia texts. When doing a semantic search for rendang in Capisco, the user would specify they are interested in the dish; consequently, only those digital sources that contain words in the context of cooking and eating are selected. Texts about the Rendang district in Bali, as well as the rain season, would be excluded from search results (as they do not refer to the dish). In the Capisco CiC network, both rendang and kalio are semantic concepts that are flagged as potentially synonymous in the context of Indonesian cooking. Thus, the search results would not only contain all texts referring to rendang recipes, but also those containing kalio recipes. Capisco's semantic index can be exported to be used as data enrichment in existing digital libraries.5 Capisco has been shown to provide quality semantic search results for English-language texts, with promising early results for other languages. The support for cross-language semantic search and multilingual texts (particularly relevant in bicultural New Zealand) is currently being investigated.
Through BudayaKB, applications for cultural heritage can be developed easily, as the data is captured into a structured form ready to be queried.
In addition to seeding the CiC network via Wikipedia, Capisco also allows scholars to develop and refine their own set of relevant concepts. The system is unique in that the scholar is supported from initial exploration of digitized documents through to the creation of a publicly accessible collection.4 Particularly relevant for the Asia-Pacific region are Capisco's use cases of heritage collections and digital repatriation. While rich collections of historic documents exist for many South Pacific Island nations, the identification of these widely scattered documents and their compilation into coherent collections is challenging for any individual nation. The lack of access to records and documents is severely limiting, and many scholars in Pacific nations must manually build their own collections to support their research. The preservation of heritage information in digital libraries is a recurring theme in the ICADL series of Asia-Pacific Conferences on Digital Libraries.
In addition to human consumption, it is of particular importance that cultural heritage information can also be consumed in a machine-friendly way, realized through structured data. One of the local initiatives to capture structured data about cultural heritage artifacts is by Putra et al.,9 who developed BudayaKB, a knowledge base storing Indonesian cultural heritage metadata. BudayaKB extracts entities of cultural heritage from Indonesian Wikipedia and presents the types and locations of the entities using machine-readable RDF triples.12 Through BudayaKB, applications for cultural heritage can be developed easily, as the data is captured into a structured form ready to be queried. BudayaKB contains around 3,200 cultural heritage entities; a third of those are about food. Data in BudayaKB also is linked to Wikidata, a crowd-sourced knowledge-base hub. Information about rendang can be found in BudayaKB, as a type of traditional food, coming from West Sumatra. Figure 1 shows a SPARQL query,11 asking for traditional foods of provinces in Indonesia.
The query results, filtered to focus on Sumatera Barat (West Sumatra), show not only rendang, but also other traditional foods from the same region like terung balado (eggplants with chilies) and ayam pop (steamed chicken). From the links to Wikidata, one can learn that rendang is not only a beef dish, but also has variations with chicken, fish, and lamb. These examples demonstrate how structured data may facilitate knowledge discovery in cultural heritage.
The main challenge for effective and sustainable cultural heritage preservation is the development of tools to support domain experts in curating collections of cultural heritage information, rather than hand-crafting code for creating and maintaining specific digital heritage objects. One such tool is the Greenstone digital library system that allows collection owners to index and present a searchable and browsable version of their documents online. The Niupepa collection of historic Mãori newspapers (see Figure 2) uses the Greenstone system for presenting heritage information.2
Many existing computational techniques focus on storage and processing of digital data, but heritage information poses the additional challenge of preserving or returning the cultural context to objects in a heritage collection. Because heritage documents and digital objects are scattered across a variety of resources, they are often stripped of their interpretation and the connection to the experiences of their originating people. Further development of any of these tools must be based on collaboration between indigenous domain experts and software engineers to ensure these connections are renewed and the cultural objects are treated with appropriate respect.
Recent years have seen renewed debates about the return of heritage objects to indigenous communities.3 However, for intangible cultural heritage objects, such as information about the traditions of rendang, simple return is not a viable option and digital repatriation offers a possible alternative. Capisco is a suitable tool to semantically search vast existing collections to identify relevant documents, such as missionary reports, anthropological monographs, early geographic surveys, and Victorian tourist reports. These then can be compiled into portable collections, which may be returned to the indigenous peoples whose images, cultures, and histories were captured in the identified documents. In this challenging work, the rights of indigenous people relating to the collection and curation of data about their culture and heritage must be acknowledged.7 Appropriate processes and data representations need to be developed by, with, and under guidance of the affected communities.1
4. Cunningham, S.J., Hinze, A.M., Bainbridge, D., Taube-Schock, C. and Ryan, T. Building heritage document collections for Pacific Island nations using semantic-enriched search. In Proceedings of the Samoa Conference III. National University of Sãmoa, 2015.
5. Hinze, A., Bainbridge, D., Cunningham, S.J., Taube-Schock, C., Matamua, R., Downie, J.S. and Rasmussen, E. Capisco: Low-cost concept-based access to digital libraries. Intern. J. Digital Libraries 20 (2019), 307–334; https://doi.org/10.1007/s00799-018-0232-3.
6. Hinze, A., Taube-Schock, C., Bainbridge, D., Matamua, R. and Downie, J.S. Improving access to large-scale digital libraries through semantic-enhanced search and disambiguation. In Proceedings of the 15th ACM/IEEE-CS Joint Conf. Digital Libraries. ACM, 2015.
8. Nurmufida, M., Wangrimen, G.H., Reinalta, R. and Leonardi, K. Rendang: The treasure of Minangkabau. J. Ethnic Foods 4, 4 (2017), 232¬¬–235; https://doi.org/10.1016/j.jef.2017.10.005
9. Putra, H.S., Mahendra, R. and Darari, F. BudayaKB: Extraction of cultural heritage entities from heterogeneous formats. In Proceedings of the 9th ACM Intern. Conf. Web Intelligence, Mining and Semantics, 2019, 6:1–6:9.
11. W3C. SPARQL 1.1 Query Language, 2013; https://www.w3.org/TR/sparql11-query/
12. W3C. RDF 1.1 Primer, 2014; https://www.w3.org/TR/rdf11-primer/
©2020 ACM 0001-0782/20/4
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from email@example.com or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2020 ACM, Inc.
No entries found