JISC PoWR

Preservation of Web Resources: a JISC-funded project [Archived Blog]

LIWA – Living Web Archives

Posted by Kevin Ashley on March 6th, 2009

The PoWR project identified a number of technical challenges which made certain types of content – particularly that with a Web 2.0 flavour – particularly difficult to manage and preserve in an effective way. My attention has recently been drawn to an EU-funded project which hopes to overcome a number of these technical problems, as well as others that are applicable to large-scale archiving such as the problem of spam content.

LIWA – Living Web Archives – began in early 2008, but as with many EU projects, its startup phase involved a lot of internal activity without much of a public face. As a result we didn’t pick up on its work in the JISC-PoWR handbook, but I’m sure we’ll rectify this omission in any future revisions.

To pick one example of LIWA’s areas of interest, it intends to develop tools which make it easier to take a temporal view of web archives and to maintain temporal consistency. Temporal consistency – or rather its absence – will be familiar to anyone who has spent time exploring sites in the Internet Archive, where different pages, or even portions of the same page (such as images) will have been archived on different days. This can lead to occasional surprises when navigating through archived content, with links taking one to pages that don’t have the expected content.

LIWA’s partner’s include Hanzo, a UK-based web archive services company that we covered briefly in the handbook; I hope we can explore their potential value to UK HE in the future.

One Response to “LIWA – Living Web Archives”

  1. Mark Middleton Says:

    Thanks for highlighting our work in LiWA.

    Your readers may be interested to know that Hanzo is also working on a couple of open source web archiving projects.

    WARC Tools are a collection of command line and web tools for the creation and manipulation of ISO 28500 web archive (WARC) files, funded by IIPC.

    Search Tools adds full-text and metadata search capabilities to WARC Tools, funded by JISC. The HE angle on this is that Search Tools is a deliverable of the World Wide Web of Humanities project, a collaboration with Oxford Internet Institute in UK and Internet Archive in US.

    More details and source code for these projects can be found here:
    * WARC Tools: http://code.google.com/p/warc-tools/
    * Search Tools: http://code.google.com/p/search-tools/