JISC-PoWR

Preservation of Web Resources: a JISC-sponsored project

Archive for June, 2008

Workshop 1 - Resources available

Posted by Marieke Guy on 30th June 2008

The first JISC-PoWR workshop took place last Friday (27th June 2008) at Senate House Library and was attended by over 30 people from a wide range of professional groupings, including the Web management and Records Management communities. The day instigated much discussion and started people thinking about how they could make a start on Web resource preservation at their institution.

The main presentations are now available for download.

  • Presentation 1: JISC-PoWR Workshop 1, Marieke Guy, UKOLN. Presentation: [Slideshare] - [pres1.ppt PowerPoint file]
  • Presentation 2: Preservation of Web Resources Part I, Kevin Ashley, ULCC. Presentation: [Slideshare] - [pres2.ppt PowerPoint file] (audio - pres2.mp3)
  • Presentation 3: Challenges for Web Resource Preservation, Marieke Guy, UKOLN Presentation: [Slideshare] - [pres3.ppt] (audio - pres3.mp3)
  • Presentation 4: Bath University Case Study, Alison Wildish and Lizzie Richmond, University of Bath. Presentation: [Slideshare] - [pres4.ppt PowerPoint file] [audio - pres4.mp3].
  • Presentation 5: Legal issues, Jordan Hatcher, opencontentlawyer. Presentation: [Slideshare] - [pres5.ppt PowerPoint file] (audio - pres5.mp3)
  • Presentation 6: Preservation of Web Resources Part II, Ed Pinsent, ULCC. Presentation: [Slideshare] - [ pres6.ppt PowerPoint file] (audio - pres6.mp3)
  • Preservation 7: ReStore: A sustainable web resources repository, Arshad Khan, National Centre for Research Methods. Presentation: [Slideshare] - [pres7.ppt Powerpoint file]. Audio: [pres7.mp3]

The presentations are also available from Slideshare. Audio files are available from the Internet Archive.

We are also using a Wetpaint Wiki to collate the feedback from the workshop breakout sessions. If you were there, please have a look and help us ensure that your suggestions are represented.

An ‘at the event’ report written on the workshop by Stephen Emmott has been published in the Ariadne Web Magazine.

Posted in Workshops, Project news, Events | 2 Comments »

When Web Sites Outlast Their Welcome

Posted by Brian Kelly on 26th June 2008

The JISC PoWR is concerned with ensuring that Web sites and their content don’t disappear. Right? Actually this would be to misunderstand what Web site preservation is about. Sometimes there may be a need for Web sites to be deleted. Indeed there may be dangers (both in terms of brand management and legal issues) if the content of Web sites outlasts its welcome.

Take, for example, the Web site for the National Open Centre, which is illustrated.

National Open Centre Home Page
If you visit the Web site you will find a nicely designed and easy to use Web site for the National Open Centre (NOC) which is a:

“national policy institute, a think tank to understand and articulate strategies to make effective use of Open Source Software and Open Standards (OS&S) for the benefit of all. It will focus on nationally relevant issues leading to proactive strategies to ensure that the UK effectively exploits the opportunities that arise with OS&S. The NOC will be independent, strategic and proactive and seeks the participation of interested and informed people.”

A very worthy organisation, it would seem (and I should add that I was a member of the NOC’s Advisory Group and attended the first meeting). Sadly, despite having a launch event at the Houses of Parliament, the NOC was unsuccessful in its attempts to gain funding, despite having a launch event at the Houses of Parliament. To paraphrase the Monty Python sketch “the NOC is not resting. The NOC is no more! It is bereft of life. It is an ex-NOC!“.

But this isn’t what you’d think if you explored the Web site. The home page urges visitors to “Get Involved!” and describes how it has “established the first set of subject panels. The topics being researched/discussed are: Public procurement, Open Standards and Open Source/Open Standard for SMEs. The Get Involved page then encourages visitors to participate with the NOC in a number of ways, including joining the Advisory Board, Subject Panels or the NOC Community. The only subtle indications that the NOC is no longer operational are the dates on various pages (206 or 2007) and the broken link to the NOC’s wiki from the Get Involved page.

The failure to provide any indication that the NOC failed to receive funding may be embarrassing to the partners of the service, which are list on the home page. But as well as such possible embarrassment, what would happen if visitors arrive at the Events page and read details of the one-day event on Document Standards planned for 4 July, which is illustrated below.

National Open Centre Home Page

There is no indication that this refers to an event which was planned for 4 July 2007.
And there are no details about registration, although a location for the event is given (NCC offices, London). What might happen if someone travels to London to attend the workshop (which covers interesting aspects related to open document formats, with apparent participation from companies such as Microsoft). If this happened, I’m sure the potential participants would be pretty upset to discover that the NOC folded last year.

This is, I would agree, unlikely to happen. But what if the information about the event had been held on one of the NOC’s partner organisations, such as Birmingham City Council?

This example is taken from the wider public sector. But within the higher and further education sector, with short term project funding provided for much development work, institutions may find themselves in a simple situation, with the intentions of a project team failing to be realised due to a failure to win funding, and perhaps a loss of project staff.

How should this possible scenario be addressed? This is something to be addressed in future posts, but for now your comments and suggestions would be welcomed.

Posted in Web 1.0 | 3 Comments »

What can PoWR do for you?

Posted by Kevin Ashley on 25th June 2008

Web preservation is a big topic and we’re not even pretending to deal with all of it. The aspect that we care about - that JISC believes the community is looking for help with - is fairly well-defined. We want to help institutions make effective decisions about preserving web resources, and help them implement those decisions in a way that is cost-effective and non-disruptive.

Making effective decisions
At its simplest level, this means deciding what to keep and what not to keep. There may be many drivers for these decisions - institutional policy, legal requirements and research interests are just a few. The decisions need to relate not just to what is to be kept, but why and who for. That’s because those requirements may have a bearing on how you choose to go about the job, or whose responsibility it is to carry it out. Not everything needs to be kept, and even when it does, it may not be your institution’s responsibility to keep it.

Implementing those decisions
Carrying out your decisions - keeping things, throwing things away, or ensuring that other people keep things - can be the trickiest part of the process. You may know you want to preserve the prospectus for past years, but can you be sure that your CMS, or the Internet Archive, or some local use of web-harvesting tools is going to do this job effectively for you ? You may be being told that some part of your web infrastructure would be easier to preserve if you avoided the use of certain features, or used a different authoring system. Is that true, and if it is, what are the negative consequences of such decisions ?

The handbook which will be one of the project’s outputs will attempt to answer these quesions in a way that makes sense to everyone who might be involved in the process. We want to help to make it easier to take decisions about preservation and to know what tools, systems or working methods can be employed to help you implement them.

The workshops are the primary mechanism we’re using to test whether the handbook makes sense to the people it’s aimed at, and that they tackle the problems that people are actually facing.

Posted in Challenges, Workshops, Resources | No Comments »

Seeing Eye to Eye: Web Managers and Records Managers

Posted by Marieke Guy on 25th June 2008

The technological and cultural changes brought about by the advancement of the Web have, on numerous occasions, required co-ordinated interdisciplinary work. 0ne of the intended aims of the JISC-PoWR project is to help to bring together the differing perspectives of information professionals such records managers and Web managers in the context of the preservation of Web resource - and there are probably at least four sets of expertise involved: Web content creation (as perceived by Web authors), Web content management from a technical perspective (as perceived by those who choose or configure the underlying software), records and/or information management and digital preservation. So there’s the bringing together of intellectual perspectives: (What content needs to be preserved? How long for? Who is responsible?) and there’s the technical perspectives, assuming that the above questions come up with anything that needs preserving (How do we do it ? Are site-level tools more appropriate than national services? Does CMS X make preservation easier or harder than CMS Y? Is a more accessible site also a more preservable one? Are there configuration choices that affect preservation without (significantly) affecting other aspects of management?)

Within the JISC-PoWR team there have been a number of interesting discussions that have highlighted how differently the different players see Web preservation. To quote Ed Pinsent:

“The fundamental thing here is bringing together two sets of information professionals from differing backgrounds who, in many cases, don’t tend to speak to each other. Many records managers and archivists are, quite simply, afraid of IT and are content to let it remain a mystery. Conversely, it is quite possible to work in an IT career path in any organisation (not just HE/FE) and never be troubled by retention or preservation issues of any sort. “

The cliched view might regard Web managers as concerning themselves primarily with the day to day running of an organisation’s Web site, with preservation as an afterthought, and records managers focussing mainly on the preservation of resources and failing to understand some of the technical challenges presented. And although this may be a superficial description of the complexitities of they ways in which institutions go about the management of the digital resources, perhaps like many cliches, there could be an element of truth in such views.

Read the rest of this entry »

Posted in Web 1.0, Challenges, Records management, Preservation | 2 Comments »

The History Of the University of Bath Home Page

Posted by Brian Kelly on 20th June 2008

How has your institutional home page changed over time? And have you kept records of the changes and the decisions which were made?

In order to illustrate how an institution’s home page may change over a period of over 11 years the Internet Archive’s WayBack Machine was used to view the first occurrence of the University of Bath home page in every year from 1997 until 2007. (Note that in browsers which support Flash you can interact with the display and a more interactive access can be obtained if your install the PicLens plugin, although there are also links to the static images and an automated rolling display of the pages in the Internet Archive).

In addition to this display a 4 minute video with accompanying commentary has also been created, which discusses some of the changes to the home page over the 11 years:

Is this example of interest to other institutions? Would it be helpful if tools could be provided to assist the creation of a similar visualisation of the history of your institutional home page?

[Note image of video replaced by embedded YouTube video on 20 July 2009]

Posted in Web 1.0, Case studies | 1 Comment »

Don’t Web Managers Care About Preservation?

Posted by Brian Kelly on 17th June 2008

In response to a post on ULCC’s DA Blog Chris Rushbridge, director of the DCC (and contributor to the Digital Curation Blog) commented:

The enthusiastic way in which web-site owners “re-brand” or “re-launch” their web-sites suggests that they are not particularly interested, long-term, in the details of the experience; continuous improvement means continuous discarding. One hopes that they are more interested in the information content, in some more abstract sense. Maybe we could measure this by tracking older pages across re-launches?

Perhaps a measure of commitment to the “look and feel” might be the lifetime since last reorganised?

Is this right? Don’t Web site owners care about preservation, preferring instead to continually add new features to their services?

I have to say that I disagree. Rather than continual changes to Web sites due to the Web site owners’ enthusiasms, I would argue that such changes usually occur in response to user needs and expectations, the growing importance of Web services (which mean that institutions have greater expectations of the services which will be provided) and an increasing understanding of the limitations of approaches taken to Web site development in the past.

One example of this has been the obligation (for legal and moral reasons) to enhance the accessibility of Web resources. Initially HTML authoring tools and Content Management Systems (CMSs) provided little support to enhance accessibility - indeed many CMSs generated low quality HTML which could not be processed by assistive technologies. Read the rest of this entry »

Posted in Web 1.0, Web 2.0 | 4 Comments »

Introduction: Kevin Ashley

Posted by Kevin Ashley on 13th June 2008

Kevin Ashley mugshot Hello. I’m Kevin Ashley, manager of the Digital Archives Department (DAD) at ULCC since its establishment in 1997 (the department, not ULCC.) During that time, DAD has set up and run the NDAD service for The National Archives, preserved digital material for the British Library (before handing it back to them to put in their shiny new Digital Object Management system), collaborated with Cornell and the DPC to produce the Digital Preservation Training Programme in the UK, and many other activities.

I’m currently chair of JISC’s Repositories and Preservation Advisory Group, and ULCC’s representative on the DPC board. My proudest achievement is the creation (with my quondam colleague Martin Powell) of a founder member in the Useless Web Pages Hall of Fame: the ULCC web telephone dialler - often imitated but never, IMHO, bettered. Unfortunately, both the dialler itself and the Hall of Fame are no longer with us on the web, and those links both depend on the Internet Archive’s Wayback Machine. For that reason, and many others, I’m particularly interested in the success of PoWR.

Posted in Project news | 2 Comments »

Case Study for the Exploit Interactive and Cultivate Interactive E-Journals

Posted by Brian Kelly on 12th June 2008

Exploit Interactive was an e-journal which was funded by the EU’s Telematics for Libraries programme. Nine issues of the journal were published between May 1999 and October 2000.

After the project funding had ceased, additional funding from the EU was obtained to publish a new e-journal known as Cultivate Interactive which was launched in July 2000. However there was a need to define a policy and accompanying procedures for the preservation of the Exploit Interactive Web site and content.

As described in a case study document on Providing Access to an EU-funded Project Web Site after Completion of Funding the following policy decisions were taken:

  • The Web site’s domain name will be kept for at least 3 years after the end of funding.
  • We will seek to ensure the Web site continues for at least 10 years after
    the end of funding.
  • We will seek to ensure that the Web site continues to function, although we
    cannot give an absolute commitment to this.
  • We will not commit to fixing broken links to external resources.
  • We will not commit to fixing non-compliant HTML resources.

The case study went on to measure the disk storage used by the Web site and to quantify the costs. The storage requirements of less than 500 Mg of disk space were not significant so it was agreed to continue to pay for the www.exploit-lib.org domain name until at least October 2008.

Periodic automated links checks were carried out on the Web site to ensure that the internal links on the Web site continued to work.

A similar process was established for the Cultivate Interactive e-journal, when its funding ceased in February 2003. Read the rest of this entry »

Posted in Web 1.0 | No Comments »

Web Continuity Project at The National Archives

Posted by Richard M. Davis on 11th June 2008

Ed and I were pleased to come across an interesting document, recently received from The National Archives, describing their Web Continuity Project. This is the latest of the many digital preservation initiatives undertaken by TNA/PRO, that began with EROS and NDAD in the mid 1990s, leading to the UK Government Web Archive and other recent digital preservation initiatives (many in conjunction with BL and the JISC).

The Web Continuity Project arises from a request by Jack Straw, as leader of the House of Commons in 2007, that government departments ensure continued access to online documents. Further research revealed that:

  • Government departments are increasingly citing URLs in answer to Parliamentary Questions
  • 60% of links in Hansard to UK government websites for the period 1997 to 2006 are now broken
  • Departments vary considerably: for one, every link works; for another every link is broken. (TNA’s own website is not immune!)

Read the rest of this entry »

Posted in Digital preservation, Policies, Preservation | No Comments »

Digital preservation in a nutshell, part II

Posted by Ed Pinsent on 10th June 2008

As Richard noted in Part I, digital preservation is a “series of managed activities necessary to ensure continued access to digital materials for as long as necessary.” But what sort of digital materials might be in scope for the PoWR project?

We think it extremely likely that institutional web resources are going to include digital materials such as “records created during the day-to-day business of an organisation” and “born-digital materials created for a specific purpose”.

What we want is to “maintain access to these digital materials beyond the limits of media failure or technological change”. This leads us to consider the longevity of certain file formats, the changes undergone by proprietary software, technological obsolescence, and the migration or emulation strategies we’ll use to overcome these problems.

Read the rest of this entry »

Posted in Digital preservation, Records management | No Comments »