Category Archives: Future

“Why study the web?” – Monday 8th March, Royal Society

My attention has just been drawn to this event by a blog post by Aleks Krotoski. The panel session, which will be streamed live and available for later download, will discuss ways in which the web can be studied at postgraduate level. Many of the examples focus on contemporary issues – the web as it is now – but this looks to be an ideal opportunity to highlight the research potential of web archives, and the services that those archives need to provide to enable research to be carried out. (JISC are commissioning work in this area.) More details are available at ECS Southampton. Worth a visit if you are nearby; I wish we had been able to give more warning!

Web archiving in the wider world

When a topic is being discussed in the correspondence pages of national newspapers, it’s a sign that it’s no longer the concern of a few specialists. That’s certainly been true of web archiving for some time as a recent example shows. Malcolm Birdling wrote a letter published in the Guardian on January 1, 2010 bemoaning the fact the some government agencies – in particular the UK Borders agency – actively prevent sites such as the Internet Archive from capturing their contents. This has important consequences for citizens, particularly when such sites are used to publish regulations and guidance which is frequently changing. (I have anecdotal evidence that the UK Inland Revenue lost an appeal brought by a taxpayer over a very similar issue.)

WAGN website - capture from Internet Archive (detail) Mr Birdling’s letter brought a rapid response from David Thomas of the UK National Archives who was keen to reassure readers that central government websites were being archived, even without the legislation which prompted Mr Birding’s original letter. (That story refers to the changes to Legal Deposit regulations which would permit the British Library and other UK copyright libraries to capture UK content without the permission of rights owners.)

But earlier examples of non-specialist concern with preserving web content exist. One of my favourite examples comes from the Usenet group uk.railway whose contributors include a fair number of rail enthusiasts (“trainspotters” if you’re feeling unkind.) Privatisation of the UK railway network means that we have a plethora of train operating companies, or TOCs, each of whom operate their own web site, much as the great companies of old such as LNER might have done if the web had existed then. The difference is that now these companies come and go every few years when the government puts operating contracts out for re-tender. Railway ephemera such as promotional leaflets and timetables are a key part of the print collections at places such as the National Railway Museum. “What happens to TOC web sites when franchises change?” wondered one poster to uk.railway back in 2007. The Internet Archive has certainly captured some material, but it isn’t the same as a collection controlled by an institution such as the NRM. I wasn’t able to give a very positive answer to their question. I don’t believe the National Railway Museum are yet able to capture websites as part of their collection, and it’s not clear that any of the members of UKWAC see TOC sites as falling within their collecting policy.

And herein lies a lesson. Rail enthusiasts are incredibly effective at preserving railway heritage, both through their own efforts and through influencing others. They include many people with an enviable range of technical abilities. They ensured that special legislation was passed to ensure the preservation of railway heritage after privatisation. Not content with simply preserving heritage, some of them set about recreating it through building an entirely new steam locomotive. But their combined efforts have not yet (so far as I know) ensured that past railway web sites have been preserved. If they can’t manage it without institutional help, what hope is there for the rest of us ?

LIWA – Living Web Archives

The PoWR project identified a number of technical challenges which made certain types of content – particularly that with a Web 2.0 flavour – particularly difficult to manage and preserve in an effective way. My attention has recently been drawn to an EU-funded project which hopes to overcome a number of these technical problems, as well as others that are applicable to large-scale archiving such as the problem of spam content.

LIWA – Living Web Archives – began in early 2008, but as with many EU projects, its startup phase involved a lot of internal activity without much of a public face. As a result we didn’t pick up on its work in the JISC-PoWR handbook, but I’m sure we’ll rectify this omission in any future revisions.

To pick one example of LIWA’s areas of interest, it intends to develop tools which make it easier to take a temporal view of web archives and to maintain temporal consistency. Temporal consistency – or rather its absence – will be familiar to anyone who has spent time exploring sites in the Internet Archive, where different pages, or even portions of the same page (such as images) will have been archived on different days. This can lead to occasional surprises when navigating through archived content, with links taking one to pages that don’t have the expected content.

LIWA’s partner’s include Hanzo, a UK-based web archive services company that we covered briefly in the handbook; I hope we can explore their potential value to UK HE in the future.

Managing the Crowd: Rethinking records management for the Web 2.0 world

My review of the Steve Bailey text Managing the Crowd: Rethinking records management for the Web 2.0 world has now been published in the latest Ariadne magazine.

This text has been mentioned at PoWR workshops, on the PoWR blog and on the JISC Information Environment Team blog. I can honestly say that it has had quite an impact on my thinking with regard to preservation and Web 2.0 resources, other members of the PoWR team may agree.

As I say in the conclusion:

This book offers up much food for thought. Bailey wants to wake up and shake his community. He wants to make them see that all is not well in the records management world and that if they don’t start moving with the times then they will be pushed out of the way. He contends there is a very real possibility that records management as we know it will cease to exist; it will be outsourced.

Go on, have a read.

JISC PoWR presentation at ILI 2008

A presentation on JISC PoWR entitled Preservation for the Next Generation was given yesterday at the Internet Librarian International Conference 2008 held at the Novotel London West.

The slides of the talk are now available from Slideshare and embedded below.

The presentation was well received and sparked a lot of interest particularly from delegates from US libraries. Donald Grose, Dean of Libraries from the University of North Texas, informed me that they have been preserving many of the US government Web sites for a number of years. See this related press release. Looking into Web resource preservation activity in other sectors and possibly other countries is definitely an area of interest for the future. Hopefully the JISC PoWR project will be able to talk more to Donald about his work in the future.