JISC PoWR

Preservation of Web Resources: a JISC-funded project [Archived Blog]

Why you can sometimes leave it to the University

Posted by Ed Pinsent on September 8th, 2009

“Does anyone have any positive experiences to share?”, asks Brian in a recent post. Well, I have – except it’s not in the UK. Harvard University Library in the USA have recently put Harvard WAX (the Web Archive Collection Service) live, after a pilot project which began in July 2006.

Harvard WAX includes themed collections on Women’s Voices and Constitutional Revision in Japan, but of particular interest to us in PoWR is their A-Sites collection: the semi-annual captures of selected Harvard websites. “The Harvard University Archives is charged with collecting and preserving the historical records of the University,” state the curators, recognising their formal archival function in this regard. “Much of the information collected for centuries in paper form now resides on University web sites.”

Helen Hockx-Yu of the British Library met with the WAX team in May 2009. “I was impressed with many of the features of the system,” she said, “not just the user and web curator interfaces but also some of the architectural decisions. WAX is a service offered by the Library to all Harvard departments and colleges. In exchange for a fee, the Departments use the system to build their collections. The academics may not be involved with the actual crawling of websites, but spend time QAing and curating the websites, and can to some extent decide how the archive targets appear in the Access Tool. The QAed sites are submitted directly into Harvard’s institutional repository.”

It is very encouraging to read of this participatory dimension to the project, indicating how success depends on the active involvement of the creators of the resources. Already 48 Harvard websites have been put into the collection, representing Departments, Committees, Schools, Libraries, Museums, and educational programmes.

The delivery of the resources has many good features also; there’s an unobtrusive header element which lets the user know they’re looking at an archived instance (instead of the live website). There’s a link explaining why the site was added to the collection, and contextual information about the wider collection. Another useful link allows researchers, scholars and other users to cite the resource; it’s good to see this automated feature integrated directly within the site. The Terms of Use page addresses a lot of current concerns about republishing web resources, and strikes just the right balance between protecting the interests of Harvard and providing a service to its users. Like a good OAIS-compliant repository, they are perfectly clear about who their designated user community are.

Best of all, they provide a working full-text search engine for the entire collection, something that many other web archive collections have been struggling to achieve.

The collection is tightly scoped, and takes account of ongoing developments for born-digital materials: “Collection managers, working in the online environment, must continue to acquire the content that they have always collected physically. With blogs supplanting diaries, e-mail supplanting traditional correspondence, and HTML materials supplanting many forms of print collateral, collection managers have grown increasingly concerned about potential gaps in the documentation of our cultural heritage.” The project has clear ownership (it is supported by the University Library’s central infrastructure), and it built its way up from a pilot project in less than three years. Their success was partially due to having a clear brief from the outset, and through collaboration with three University partners. What Harvard have done chimes in with many of the recommendations and suggestions made in the PoWR Handbook, particularly Chapters 5 (Selection), 16 (Responsibility for preservation of web resources) and 19 (How can you effect change?)

There are many aspects of this project which UK Institutions could observe, and perhaps learn something from. It shows that it is both possible and practical to embed website collection and preservation within an Institution.

One Response to “Why you can sometimes leave it to the University”

  1. gary price Says:

    You might also be interested in the new California Digital Library’s Web Archiving Service (WAS).

    http://www.resourceshelf.com/2009/08/08/web-archiving-service-preserves-data-for-the-future/