Niupepa is a collection of 42 newspaper titles published in New Zealand from 18421933, comprising a total of 21,000 pages in 1,750 issues. This collection forms a unique historical record of the language of the indigenous Mãori people, the evolution of the written form of this language, and of events and developments during the formative colonial history of our country. Using the Greenstone software from the New Zealand Digital Library (see article in this special section), this collection is now publicly available with full-text search capability.
The Niupepa material had earlier been gathered on microfiche  from original material in libraries scattered throughout the country. The delivery of a digital library version required two distinct forms of the original data. To facilitate full-text search, the newspaper content was first converted into electronic text using optical character recognition (OCR). To maintain the form and integrity of the original newspapers, a digital facsimile of the original page was preferred for viewing.
Data capture has involved scanning 21,000 images from 35mm photographic negatives. These images vary considerably in qualitysome are clean, others are badly stainedand in information density. For reliable OCR, scanning densities corresponding to approximately 300dpi on the original newspaper page were needed. OCR has been performed using FineReader, utilizing a dictionary of Mãori words to aid recognition. Nevertheless, it has been essential for fluent Mãori speakers to check the text against the original images to correct remaining recognition errors.
The Niupepa collection incorporates a page-level index, with text for each page held in a separate file. These text files, together with the digital facsimiles of the original pages, files containing commentaries and bibliographic information, and English-language abstracts of individual issues, form the digital library collection. Searching for a particular term or phrase returns a list of those pages in which it appears. From this list, hyperlinks provide direct access to the text itself, where the search term(s) appear highlighted, or to the corresponding image page.
Both Mãori and English language versions of the interface are provided, and in addition to the full-text search capability, the collection can be browsed by series, issue, or date.
Capturing this invaluable resource on microfiche secured its preservation. Creating a digital library collection opens up access for cultural, sociological, linguistic, historic, and even geographical research to laypeople, schoolchildren, and scholars at home and abroad.
©2001 ACM 0002-0782/01/0500 $5.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2001 ACM, Inc.
No entries found