Research and Advances
Computing Applications

Niupepa: a Historical Newspaper Collection

Posted
  1. Article
  2. References
  3. Authors
  4. Footnotes

Niupepa is a collection of 42 newspaper titles published in New Zealand from 1842–1933, comprising a total of 21,000 pages in 1,750 issues. This collection forms a unique historical record of the language of the indigenous Mãori people, the evolution of the written form of this language, and of events and developments during the formative colonial history of our country. Using the Greenstone software from the New Zealand Digital Library (see article in this special section), this collection is now publicly available with full-text search capability.

The Niupepa material had earlier been gathered on microfiche [1] from original material in libraries scattered throughout the country. The delivery of a digital library version required two distinct forms of the original data. To facilitate full-text search, the newspaper content was first converted into electronic text using optical character recognition (OCR). To maintain the form and integrity of the original newspapers, a digital facsimile of the original page was preferred for viewing.

Data capture has involved scanning 21,000 images from 35mm photographic negatives. These images vary considerably in quality—some are clean, others are badly stained—and in information density. For reliable OCR, scanning densities corresponding to approximately 300dpi on the original newspaper page were needed. OCR has been performed using FineReader,™ utilizing a dictionary of Mãori words to aid recognition. Nevertheless, it has been essential for fluent Mãori speakers to check the text against the original images to correct remaining recognition errors.

The Niupepa collection incorporates a page-level index, with text for each page held in a separate file. These text files, together with the digital facsimiles of the original pages, files containing commentaries and bibliographic information, and English-language abstracts of individual issues, form the digital library collection. Searching for a particular term or phrase returns a list of those pages in which it appears. From this list, hyperlinks provide direct access to the text itself, where the search term(s) appear highlighted, or to the corresponding image page.

Both Mãori and English language versions of the interface are provided, and in addition to the full-text search capability, the collection can be browsed by series, issue, or date.

Capturing this invaluable resource on microfiche secured its preservation. Creating a digital library collection opens up access for cultural, sociological, linguistic, historic, and even geographical research to laypeople, schoolchildren, and scholars at home and abroad.

Back to Top

Back to Top

Back to Top

    1. Niupepa 1842–1933. Alexander Turnbull Library. Microfiche set. Wellington, NZ (1996).

    FineReader is a registered trademark of ABBYY Software.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More