JISC PoWR

Preservation of Web Resources: a JISC-funded project [Archived Blog]

Web archiving in the wider world

Posted by Kevin Ashley on January 12th, 2010

When a topic is being discussed in the correspondence pages of national newspapers, it’s a sign that it’s no longer the concern of a few specialists. That’s certainly been true of web archiving for some time as a recent example shows. Malcolm Birdling wrote a letter published in the Guardian on January 1, 2010 bemoaning the fact the some government agencies – in particular the UK Borders agency – actively prevent sites such as the Internet Archive from capturing their contents. This has important consequences for citizens, particularly when such sites are used to publish regulations and guidance which is frequently changing. (I have anecdotal evidence that the UK Inland Revenue lost an appeal brought by a taxpayer over a very similar issue.)

WAGN website - capture from Internet Archive (detail) Mr Birdling’s letter brought a rapid response from David Thomas of the UK National Archives who was keen to reassure readers that central government websites were being archived, even without the legislation which prompted Mr Birding’s original letter. (That story refers to the changes to Legal Deposit regulations which would permit the British Library and other UK copyright libraries to capture UK content without the permission of rights owners.)

But earlier examples of non-specialist concern with preserving web content exist. One of my favourite examples comes from the Usenet group uk.railway whose contributors include a fair number of rail enthusiasts (“trainspotters” if you’re feeling unkind.) Privatisation of the UK railway network means that we have a plethora of train operating companies, or TOCs, each of whom operate their own web site, much as the great companies of old such as LNER might have done if the web had existed then. The difference is that now these companies come and go every few years when the government puts operating contracts out for re-tender. Railway ephemera such as promotional leaflets and timetables are a key part of the print collections at places such as the National Railway Museum. “What happens to TOC web sites when franchises change?” wondered one poster to uk.railway back in 2007. The Internet Archive has certainly captured some material, but it isn’t the same as a collection controlled by an institution such as the NRM. I wasn’t able to give a very positive answer to their question. I don’t believe the National Railway Museum are yet able to capture websites as part of their collection, and it’s not clear that any of the members of UKWAC see TOC sites as falling within their collecting policy.

And herein lies a lesson. Rail enthusiasts are incredibly effective at preserving railway heritage, both through their own efforts and through influencing others. They include many people with an enviable range of technical abilities. They ensured that special legislation was passed to ensure the preservation of railway heritage after privatisation. Not content with simply preserving heritage, some of them set about recreating it through building an entirely new steam locomotive. But their combined efforts have not yet (so far as I know) ensured that past railway web sites have been preserved. If they can’t manage it without institutional help, what hope is there for the rest of us ?

One Response to “Web archiving in the wider world”

  1. Avi Rappoport Says:

    Hello JISC-POWr people,

    I am a librarian and writing an article for InfoToday on the UK Web Archive that the British Library has just announced:

    You all seem to be the perfect people to give me perspective and context. Does the webarchive.ork.uk site have any particularly good or bad aspects? Is there something new and cool, or something important that is missing?

    One of the interesting aspects is that IBM will be using open-source Hadoop software for the data store, and many proprietary and open-source technologies to try and get a handle on the content. Their plan is to use a spreadsheet interface and dynamically generate various rows and columns.

    If this is a good forum to discuss it, please comment here, otherwise direct me to the right place or use the contact form on my web site.

    Yours,

    Avi