Research and Advances
Computing Applications

Archival Perspectives on the Emerging Digital Library

Posted
  1. Article
  2. Author
  3. Footnotes

Although archives are often housed in libraries, the archival and library communities and institutions have long been distinct entities with related, but independent missions, perspectives, and self images. Libraries mainly acquire, preserve, arrange, describe, and make available published information. Archival repositories conduct all these functions, albeit in very different ways, with unpublished and unique materials of either organizational or personal origin having enduring value. Despite many similarities, libraries and archives function as parallel institutions, occasionally drawing on each other’s theory and practice, but largely remain independent with their own traditions and literatures.

Digital library collections promise to contain both library and archival materials—published and unpublished, commercially available, and institutionally unique, bound and collected materials all in large quantities. We already see digital libraries such as ibiblio (www.ibiblio.com) at the University of North Carolina at Chapel Hill soliciting personal materials that users/participants digitize and donate. The Valley of the Shadow Project (jefferson.village.virginia.edu/vshadow/vshadow.html) at the University of Virginia collects personal papers and municipal records documenting life during the Civil War. Certainly much of the material digitized for the Library of Congress American Memory Project, considered to be U.S.’s National Digital Library, is archival in nature (www.loc.gov/ammem).

Given the hybrid archive-library mixture of content that will be the digital library, it is time to apply relevant aspects of both librarianship and archivy, along with a strong technological base to the digital library’s design and management. Archival theory and practice promise to be particularly important in three areas: What to save and what to digitize; how to save it; and how to provide access to it.

What to save; what to digitize. Appraisal theory and practice, along with life cycle of records, can facilitate the retention of materials of enduring value. While archivists are known as great savers, in reality, they are highly skilled selectors, generally retaining no more than 5% of the original bulk of any collection. Librarians, who primarily deal with published material, have a much easier selection task as their materials have already undergone the rigors of publication and review.

Archivists deal with massive institutional and personal collections in which they must determine what is of enduring value—both to the creator and posterity—and remove everything else so researchers can find what is useful over time. Such winnowing must be done while maintaining the context of the collection as a whole through the selection of items and their arrangement and description. Appraisal, the most intellectually challenging and critical aspect of archival work, for saving everything, especially in an era of documentary abundance, means finding nothing. Deaccessioning unique materials, however, means losing them forever. Appraisal strategies are also important in deciding which print materials warrant the expense of digitization.


Despite many similarities, libraries and archives function as parallel institutions, occasionally drawing on each other’s theory and practice, but largely remain independent with their own traditions and literatures.


When various professions, communities, and organizations have not taken an active role in preserving their own history, archivists have developed documentation strategies to ensure that a representative sample of materials from these domains have been preserved for the future.1 Sometimes this has involved something as simple as conducting oral histories with community members and solicitation of papers, but in other cases has required mapping an entire profession and a strategic approach to record acquisition. The digital libraries that wish to solicit quality participant contributions can well look to the work of archivists in donor relations, acquisitions, appraisal, and document authenticity.

How to save it. Archival theory will be essential in developing models of long-term intellectual preservation of authentic and reliable digital objects for the governmental, commercial, and cultural heritage sectors. Significant work within the archival community, exploring the fundamental nature of evidentiary electronic records, promises insight into how the digital library can ensure the authenticity of its data.2 The InterPARES project, an international collaborative effort, is currently the most important example of work in this area.3 Archivists are also exploring physical preservation of digital objects in light of technological obsolescence and data and media degradation. Proposed approaches include data migration and software emulation.

Archivists are also in the forefront of determining best practices for the digitization process as many of the digitization projects funded to date have involved archival materials. In lieu of established standards in this field, Kenney and Rieger’s landmark work, Moving Theory into Practice: Digital Imaging for Libraries and Archives, discusses the benchmarking process developed at Cornell University Libraries.4

How to provide access to it. Archives have developed effective and efficient techniques to deal with massive quantities of information and information containers that will surely reside in the digital library. It is quite common for record groups in governmental archives to have millions of documents in their domains; many manuscript collections will also contain photos, books, clippings, audio and video files, and more. Collective and hierarchical arrangement and description of materials, based on the provenance of data and reflected in collection-finding aids that preserve the context, structure, and diversity of archival objects, provide access to materials otherwise inaccessible due to their bulk. Where indexing is impossible and where the sheer extent of full-text electronic files might well prohibit effective retrieval, archival arrangement and description provide entry points into collections for researchers. The Encoded Archival Description (EAD) document type definition5—a standard developed within the archival community—now allows for the presentation of finding aids on the Web that preserves this hierarchy and allows the linking of digital documents to this framework.

These are just a few ways in which archival perspectives can inform the digital library.6 While much material in archives and manuscript repositories remains in paper format, archivists are grappling with the challenges of electronic records that will become tomorrow’s heritage, and historical and legal evidence in ways unlike most other information professionals. The digital library has much to gain from incorporating archival theory and practice into its vision for the preservation and provision of the world’s most important, enduring information.

Back to Top

Back to Top

    1 See J. Krizack's Documentation Planning for the U.S. Health Care System. Johns Hopkins Press, Baltimore, MD, 1994.

    2 See, for example, www.sis.pitt.edu/~nhprc and www.slais.ubc.ca/users/duranti.

    3 International Project on Permanent Authentic Records in Electronic Systems (InterPARES); www.interpares.org.

    4 Kenney, A. and Rieger, O. Moving Theory into Practice: Digital Imaging for Libraries and Archives. RLG, Mountain View, CA, 2000.

    5 EAD is a document type definition of the Standard Generalized Markup Language (SGML).

    6For an extended discussion of the value of the archival perspective for the digital library, see A. Gilliland-Swetland's "Enduring Paradigm, New Opportunities: The Value of the Archival Perspective in the Digital Environment;" www.clir.org/pubs/reports/pub89/pub89.pdf.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More