Category Archives: Resources

Why you can sometimes leave it to the University

“Does anyone have any positive experiences to share?”, asks Brian in a recent post. Well, I have – except it’s not in the UK. Harvard University Library in the USA have recently put Harvard WAX (the Web Archive Collection Service) live, after a pilot project which began in July 2006.

Harvard WAX includes themed collections on Women’s Voices and Constitutional Revision in Japan, but of particular interest to us in PoWR is their A-Sites collection: the semi-annual captures of selected Harvard websites. “The Harvard University Archives is charged with collecting and preserving the historical records of the University,” state the curators, recognising their formal archival function in this regard. “Much of the information collected for centuries in paper form now resides on University web sites.”

Helen Hockx-Yu of the British Library met with the WAX team in May 2009. “I was impressed with many of the features of the system,” she said, “not just the user and web curator interfaces but also some of the architectural decisions. WAX is a service offered by the Library to all Harvard departments and colleges. In exchange for a fee, the Departments use the system to build their collections. The academics may not be involved with the actual crawling of websites, but spend time QAing and curating the websites, and can to some extent decide how the archive targets appear in the Access Tool. The QAed sites are submitted directly into Harvard’s institutional repository.”

It is very encouraging to read of this participatory dimension to the project, indicating how success depends on the active involvement of the creators of the resources. Already 48 Harvard websites have been put into the collection, representing Departments, Committees, Schools, Libraries, Museums, and educational programmes.

The delivery of the resources has many good features also; there’s an unobtrusive header element which lets the user know they’re looking at an archived instance (instead of the live website). There’s a link explaining why the site was added to the collection, and contextual information about the wider collection. Another useful link allows researchers, scholars and other users to cite the resource; it’s good to see this automated feature integrated directly within the site. The Terms of Use page addresses a lot of current concerns about republishing web resources, and strikes just the right balance between protecting the interests of Harvard and providing a service to its users. Like a good OAIS-compliant repository, they are perfectly clear about who their designated user community are.

Best of all, they provide a working full-text search engine for the entire collection, something that many other web archive collections have been struggling to achieve.

The collection is tightly scoped, and takes account of ongoing developments for born-digital materials: “Collection managers, working in the online environment, must continue to acquire the content that they have always collected physically. With blogs supplanting diaries, e-mail supplanting traditional correspondence, and HTML materials supplanting many forms of print collateral, collection managers have grown increasingly concerned about potential gaps in the documentation of our cultural heritage.” The project has clear ownership (it is supported by the University Library’s central infrastructure), and it built its way up from a pilot project in less than three years. Their success was partially due to having a clear brief from the outset, and through collaboration with three University partners. What Harvard have done chimes in with many of the recommendations and suggestions made in the PoWR Handbook, particularly Chapters 5 (Selection), 16 (Responsibility for preservation of web resources) and 19 (How can you effect change?)

There are many aspects of this project which UK Institutions could observe, and perhaps learn something from. It shows that it is both possible and practical to embed website collection and preservation within an Institution.

Wiki Management

This contribution to a thread about management of wikis, posted by the Records management section at the University of Edinburgh, was submitted to the Archive listerv recently:

Below is an outline of the ‘wiki’ situation at the University of Edinburgh:

At Edinburgh University our main effort to date has been making sure that wikis are retention scheduled, and considering what the ideal retention period for a wiki should be. As part of setting up any new wiki space the University records details such as space owner and proposed use, but due to the wide variety of uses it is difficult to specify a generic retention period. There is the option for the space owner to delete a wiki space; however the most likely scenario is that a space atrophies over time, the owner stops engaging, and it is therefore then up to the University to be proactive in identifying and pruning out dead spaces.

At present the service policy talks about a default retention period of 1 year, which is primarily to make space owners aware that if not used their space may be deleted. If we have anything that requires long term migration we would look into outward migration; either to a new system or to an archive.

I found it very encouraging to see this pro-active and practical-minded approach to the management of wikis. In many ways Edinburgh’s RM approach vindicates a lot of the RM advice which we have recommended in the PoWR Handbook; as we say early on, we must manage resources in order to preserve them. It is also encouraging that in Edinburgh’s case at least the wiki problem is considered primarily in terms of information and staff management, and not exclusively in terms of the technological solutions that might be applied.

In particular:

1) Edinburgh: “Make sure wikis are retention scheduled”.

  • PoWR: “Deciding which aspects of your web resources to capture can be informed to a large extent by your Institutional drivers, and the agreed policies for retention and preservation.”  (p 22)

2) Edinburgh: “Consider the ideal retention period for a wiki”.

  • PoWR: “The attraction of bringing a website in line with an established retention and disposal programme is that it will work to defined business rules and retention schedules to enable the efficient destruction of materials, and also enable the protection and maintenance of records that need to be kept for business reasons.”  (p 93)

3) Edinburgh: “Make space owners aware that if not used their space may be deleted”.

  • PoWR: “Quite often in an academic context these applications rely on the individual to create and manage their own resources. A likely scenario is that the academic, staff member or student creates and manages his or her own external accounts in Flickr, Slideshare or; but they are not Institutional accounts. It is thus possible with Web 2.0 application for academics to conduct a significant amount of Institutional business outside of any known Institution network. The Institution either doesn’t know this activity is taking place, or ownership of the resources is not recognised officially. In such a scenario, it is likely the resources are at risk.”  (p 42)

4) Edinburgh: “The service policy talks about a default retention period.” This approach seems to incorporate rules as part of setting up any new wiki space, starting to manage the resource at the very beginning of the record’s lifecyle.

  • PoWR: “If  we can apply a lifecycle model to web resources, they will be created, managed, stored and disposed of in a more efficient and consistent way; it can assist with the process of identifying what should and should not be retained, and why; and that in turn will help with making preservation decisions.” (p 34)

5) Edinburgh: “If we have anything that requires long term migration we would look into outward migration; either to a new system or to an archive.”

  • PoWR: “Migration of resources is a form of preservation. Migration means moving resources from one operating system to another, or from one storage/management system to another. This may raise questions about emulation and performance. Can the resource be successfully extracted from its old system, and behave in an acceptable way in the new system?”  (p 33)
  • “The usual aim of archival appraisal has been to identify and select records for permanent preservation. Quite often appraisal has taken place at the very end of the lifecycle process (although records managers intervene where possible at the beginning of the process, enabling records of importance to be identified early).”  (p 36)

History of the First UK Institutional Web Service

It was 15 years ago, the first week back at work after the Christmas break (I think) when I was part of the team which set up the Web service at the University of Leeds. This was, I believe, the UK’s first institutional Web service, with contributions made shortly afterwards from several academic departments, including not only the usual suspects (the Computing Service, Computer Science, Chemistry and Physics) but also the School of Music.

Various people at the University of Leeds were active in Web development activities back then. My role was in promoting its use (and I’ve discovered a copy of a special issue of the University Computing Service newsletter on the theme on online information services – in particular the Web – which is available on the Internet Archive). But in addition the Chemistry Department were, in conjunction with Imperial College, developing services which provided access to molecules on the Web; a colleague in the Computing Service provided access to the University Libraruy catalogue and Nikos Drakos, a researcher in the Computer Based Learning Unit, wrote the Latex2HTML conversion software (which was first announced in May 1993).

Fifteen years later my memories of our early involvement with the Web are beginning to fade. But as I knew this would happen I write a history of the various activities of colleagues at the University, which was published on the University”s Web site. Sadly, but perhaps inevitably, over time this resource was deleted, no doubt following a reorganisation of the Web site.

But this does not necessarily mean that the information is no longer available. As well as being an early adopter of the Web, the Computing Service had also had long standing involvement in digital preservation. And so the file should still be available on the University’s archive service. But although the bits and bytes may still be available, what are the processes needed for this resource to be retrieved?  Is this a service which the University offers? And is it a service which can be provided to a former member of staff, who left the University over 13 years ago?

As JISC PoWR project team members have commented previously, digital preservation isn’t just about the technical aspects of preservating bits in a format suitable for processing in the future – it’s also about the policies and the procedures.  And I think it’s time I send an email to my former colleagues to see ifthis resource can be retrieved.  I’ll provide details of my experiences in a future post.

What can PoWR do for you?

Web preservation is a big topic and we’re not even pretending to deal with all of it. The aspect that we care about – that JISC believes the community is looking for help with – is fairly well-defined. We want to help institutions make effective decisions about preserving web resources, and help them implement those decisions in a way that is cost-effective and non-disruptive.

Making effective decisions
At its simplest level, this means deciding what to keep and what not to keep. There may be many drivers for these decisions – institutional policy, legal requirements and research interests are just a few. The decisions need to relate not just to what is to be kept, but why and who for. That’s because those requirements may have a bearing on how you choose to go about the job, or whose responsibility it is to carry it out. Not everything needs to be kept, and even when it does, it may not be your institution’s responsibility to keep it.

Implementing those decisions
Carrying out your decisions – keeping things, throwing things away, or ensuring that other people keep things – can be the trickiest part of the process. You may know you want to preserve the prospectus for past years, but can you be sure that your CMS, or the Internet Archive, or some local use of web-harvesting tools is going to do this job effectively for you ? You may be being told that some part of your web infrastructure would be easier to preserve if you avoided the use of certain features, or used a different authoring system. Is that true, and if it is, what are the negative consequences of such decisions ?

The handbook which will be one of the project’s outputs will attempt to answer these quesions in a way that makes sense to everyone who might be involved in the process. We want to help to make it easier to take decisions about preservation and to know what tools, systems or working methods can be employed to help you implement them.

The workshops are the primary mechanism we’re using to test whether the handbook makes sense to the people it’s aimed at, and that they tackle the problems that people are actually facing.

The History Of the University of Bath Home Page

How has your institutional home page changed over time? And have you kept records of the changes and the decisions which were made?

In order to illustrate how an institution’s home page may change over a period of over 11 years the Internet Archive’s WayBack Machine was used to view the first occurrence of the University of Bath home page in every year from 1997 until 2007. (Note that in browsers which support Flash you can interact with the display and a more interactive access can be obtained if your install the PicLens plugin, although there are also links to the static images and an automated rolling display of the pages in the Internet Archive).

In addition to this display a 4 minute video with accompanying commentary has also been created, which discusses some of the changes to the home page over the 11 years. A screenshot of the video is given below:


Is this example of interest to other institutions? Would it be helpful if tools could be provided to assist the creation of a similar visualisation of the history of your institutional home page?

[Note image of video replaced by embedded YouTube video on 20 July 2009.]

Web Resource Preservation Case Studies

The JISC-PoWR project would like to publish a number of case studies highlighting best practice regarding Web resource preservation.

Has your institution has recently deployed a Web resource preservation strategy or embarked on Web resource preservation work? Would you be willing to share your experiences and discuss solutions to problem areas by submitting a brief case study? If so then please contact Marieke Guy.

Further details are available on the suggested format for case studies.