Goodbye from the JISC PoWR blog

From today we don’t intend to provide any more significant posts on the JISC PoWR blog and will be closing comments. The blog will remain here as a resource for you to use but it is now officially frozen.

An Archived blog page is now available giving further information on the archiving of the blog. It includes blog statistics for future reference.

The JISC PoWR team would like to say thank you to all our readers. Most of the team members are involved in new digital/Web preservation work so this won’t be the last you hear from us!

Cessation of posts to the JISC PoWR blog

Following the successful completion of the JISC PoWR  project we continued to publish occasional posts on this blog related to the preservation of Web sites. We have also recently published a new handbook on the preservation of Web resources which we have announced on this blog.

It is now therefore timely to officially announce that we do not intend to publish any new posts on the blog after a couple of post which provide a summary of how this blog was used.  A week or so after the final posts have been published we will switch off comments on the blog – so that we will no longer have to spend time in checking for spam comments.

The blog itself, and all posts and comments, will remain available for the indefinite future – by which we mean that we will seek to provide access for a period of at least 3 years from now.

The summary posts we intend to provide will contain details about the blog such as:

  • Number of posts and comments
  • Details of contributors
  • Details of blog theme and plugins used
  • Details of type and version of software used

If you have any suggestions for any other information it would be useful to provide and record please do let us know.

We intend to use the closing of the blog as a case study which will be documented as part of the JISC Beginner’s Guide to Digital Preservation. The Beginner’s Guide will eventually be available online but the process of creating the guide is being documented in the JISC Beginner’s Guide to Digital Preservation blog.

A Guide to Web Preservation

The JISC PoWR team is pleased to announce the launch of A Guide to Web Preservation.

This Guide uses similar content to PoWR: The Preservation of Web Resources Handbook but in a way which provides a practical guide to web preservation, particularly for web and records managers. The chapters are set out in a logical sequence and answer the questions which might be raised when web preservation is being seriously considered by an institution. These are:

  • What is preservation?
  • What are web resources?
  • Why do I have to preserve them?
  • What is a web preservation programme?
  • How do I decide what to preserve?
  • How do I capture them?
  • Who should be involved?
  • What approaches should I take?
  • What policies need to be developed?

Each chapter concludes with a set of actions and one chapter lists the tasks which must be carried out, and the timings of these tasks, if an institution is to develop and maintain a web preservation programme. In addition points made in the Guide are illustrated with a number of case studies.

The guide was edited by Susan Farrell who has used her knowledge and expertise in the management of large-scale institutional Web services in writing the document.

The Guide can be downloaded (in PDF format) from the JISC PoWR Web site. The Guide is also hosted on JISCPress service which provides a commenting and annotation capability. It has been published on the Lulu.com print-on-demand service where it can be bought for £2.82 plus postage and packing.

If you want to discuss the Guide on Twitter you should use the #jiscpowr tag.

Making any Upgrades to your Blog Sir?

This blog is hosted by JISC Involve who provide blogs for the JISC community.

Till recently JISC Involve was running on an old version of WordPress (1.2.5). Earlier this month the JISC Digital Communications Team upgraded their server to the latest version of WordPress (2.9.2) and then migrated all the JISC Involve’s blogs over to the new installation.

Although all blog posts, comments, attachments, user accounts, permissions and customisations were supposed to move over easily JISC Involve users were encouraged to back-up the content of drafts etc. ‘just in case’.

Unfortunately there were some technical problems migrating the content and as a consequence the original theme was lost and URLs now redirect.

Luckily the JISC PoWR team were able to locate the original theme and reinstall it.

However the process has made them aware of the need to record details of the technical components and architecture of the blog. This information can be critical in a migration process and when ‘closing down’ a blog.

The JISC PoWR team will ensure that such information is routinely recorded.

Is there any other information that is important for preservation or migration purposes?

JISC Beginner’s Guide to Digital Preservation

Members of UKOLN who were involved in the JISC PoWR project have recently begun work on a new project creating a straightforward and pragmatic guide to digital preservation for those working on JISC projects. The project will create the  JISC Beginner’s Guide to Digital Preservation.

It will look at reasons why JISC projects might want to preserve their deliverables, will introduce mainstream terminology and processes and offer clearcut solutions. The guide will also offer lists of references and resources, a checklist of issues users will need to think about and a number of case-studies by which they will be able to benchmark themselves against.

A number of the discussions initiated on the JISC PoWR blog (such as preservation of Web 2.0 services including blogs and wikis) will be taken forward on the new project.

A project blog has recently been launched at http://blogs.ukoln.ac.uk/jisc-bgdp/

The Library of Congress Twitter Archive

Two weeks ago the Library of Congress announced that they will be archiving all public tweets since Twitter began. The tweets have been given to the library as a ‘gift’ from Twitter.

The announcement was fittingly made on Twitter.

Yesterday the Library of Congress blog published a list of FAQs abouut the approach they will be taking.

The FAQ explains:

  • Why is it important to preserve the Twitter archive?
    It sees Twitter is part of the historical record of communication, news reporting, and social trends – all of which complement the Library’s existing cultural heritage collections.
  • What is in the Archive?
    Public information. Not private account information or deleted tweets.
  • What does the Library plan to do with the archive?
    Its aims are preserving access to the archive for the long term and making data available to researchers.

Blue Ribbon Task Force Publishes Sustainable Economics for a Digital Planet

Universities grappling with complex decisions on which of their burgeoning digital resources they should preserve – and the inherent financial, technical and legal issues that surround such work – may welcome a report that offers a “supply-and-demand” perspective on how individuals and institutions might manage their digital collections.

The Blue Ribbon Task Force on Sustainable Digital Preservation and Access (BRTF-SDPA), a new international initiative funded by JISC and other organisations, has recently released its report entitled Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information. Its report examines the complicated and diverse issues from an economic standpoint. It identifies the problems intrinsic to all preserved digital materials, and proposes domain-specific actions that address the challenges to sustainability. The report focuses its inquiry on materials of long-term public interest in content domains with diverse preservation profiles, namely scholarly discourse, commercially owned cultural content and collectively produced Web content.

JISC is organising a free one-day symposium in London on 6 May 2010 where the Blue Ribbon task force will present its final report and invite responses from the BBC, the Natural History Museum, the British Library, European Bioinformatics Institute and the European Commission. Further information is available.

Storing Information in the Cloud

In a guest blog post Nicole Schulz, Teaching Fellow in the Department of Information Studies at Aberystwyth University reports on a recent survey on Storing Information in the Cloud.


Storing Information in the Cloud

The Department of Information Studies at Aberystwyth University is currently running a small research project funded by the Society of Archivists examining operational, legal and security issues relating to the storage of information in the cloud for access. This is a topic which is likely to be of interest to those involved in website preservation work, as there has been increased interest in cloud services to support institutional activities. Although the project does not address preservation issues directly, the outputs of the project should be of interest to information professionals involved in digital preservation. We already had interest from various institutions in a follow-up project on preservation issues and are hoping to continue research in that in the near future – so watch this space!

What is the Aim of the Project?

Our aim is to generate debate and to highlight some of the issues surrounding the storage of information in a virtual environment. We have already seen many organisations outsourcing email and data storage to cloud providers such as Google and Amazon for cost and efficiency reasons. Cloud computing can have financial and operational advantages such as reduced deployment cost, increased storage capabilities and scalability. However, cloud computing raises quite a few security and compliance issues that need to be addressed when outsourcing information storage to third parties.

We, therefore, aim to develop a toolkit designed to enable information professionals whose organisations are about to deploy information into the cloud to ask the right questions and identify the right strategies for ensuring that information is kept securely, accessible and in line with relevant legislation. Even though preservation will not feature prominently in the toolkit, it is understood that preservation questions are an integral part of assessing how to manage the information life-cycle in the cloud and need to be addressed right at the start of setting up information management services and procedures.

What did the Survey tell us?

As a first step, we conducted an online survey aimed rather narrowly at information management professionals as the main stakeholders in information security and governance via Listservs and professional bodies’ members lists. Given the limited chosen audience we had a good response rate and gathered interesting insights into what professionals think and do about storing information in the cloud:

  • The overwhelming majority of people who completed the survey worked in the public sector.
  • Roughly 30% of participants said that their organisations are already using cloud computing and another 40% claim that their organisations are interested in cloud computing but have no active plans as yet.
  • Most organisations use or intend to use software-as-a-service and deployment into a private cloud as cloud computing models.
  • Data storage, email and standard office applications were named as the main IT services deployed into the cloud.
  • There appeared to be no single outstanding driver for cloud computing – reducing cost, scalability and flexibility were the most popular by a small margin.
  • Similarly, concerns about storing information in the cloud appeared to be evenly spread with concerns about the retrieval/destruction of data when terminating the cloud service, loss of control over data and data protection at the top of the issues list.
  • Preservation and retention management were singled out for areas of further research. And demand in further guidance on operational, security and legal aspects in the form of best practice guidance was identified by the majority of participants

What is Next?

We will run a workshop in Manchester on 21 May 2010 in order to work through some cloud storage scenarios and investigate further issues and approaches to ensuring the secure storage of information in the cloud. Bookings will open soon and you can find more information about the workshops at http://www.dis.aber.ac.uk/en/news/cloudworkshop.asp.

Following the analysis of the results of both the survey and the workshop, a toolkit and report will be made available by the Society of Archivists in the autumn of this year.

If you have an interesting case study or would like to find out more, please contact me (email: nis@aber.ac.uk).


Nicole Schulz
Teaching Fellow
Department of Information Studies
Aberystwyth University
Llanbadarn Fawr
Aberystwyth SY23 3AS

“A Fifth Of BBC Sites Are Already Dead”

The Paid:Content:UK blog has recently published an article which informs us that “A Fifth Of BBC Sites Are Already Dead“. The article begins by annocing that “Nearly half of the websites most likely to be closed as part of its big Strategic Review have already long been shut, some for as much as eight years“.

A list of a number of the sites which have been ‘mothballed’ is given in the article. Some of the sites are for programmes that have ceased broadcasting (eg. On The Record) and others are for events which are now over (e.g. Politics ‘97).

I was particularly interested to read about the BBC policies regarding the decommissioning of such Web sites. The article provide a link to the BBC’s policy which describes that inactive pages are left online for reference as “We don’t want to delete pages which users may have bookmarked or linked to in other ways. In general, our policy is only to remove pages where the information provided has become so outdated that it may lead to actual harm or damage.”

With the promises of large cuts for public sector organisations in the offing after the general election I suspect that we will find Web sites in many higher education origanisation being decommissioned.  But will  content be simply deleted, will the content be left ‘as is’ or will a more manged approach to such decommissioning take place? 

I feel there will be a renewed interest in the decommisioning of Web sites.  I hope the JISC PoWR’s Handbook on the Preservation of Web Resources will be of interest to organisations which find themselves  in this position.

“Why study the web?” – Monday 8th March, Royal Society

My attention has just been drawn to this event by a blog post by Aleks Krotoski. The panel session, which will be streamed live and available for later download, will discuss ways in which the web can be studied at postgraduate level. Many of the examples focus on contemporary issues – the web as it is now – but this looks to be an ideal opportunity to highlight the research potential of web archives, and the services that those archives need to provide to enable research to be carried out. (JISC are commissioning work in this area.) More details are available at ECS Southampton. Worth a visit if you are nearby; I wish we had been able to give more warning!

Kevin Ashley new DCC Director

Earlier this week the Digital Curation centre announced the appointment of their new Director who will succeed Chris Rusbridge upon his retirement in April 2010. The role has been taken on by JISC PoWR’s very own Kevin Ashley.

kevin.jpgKevin has been Head of Digital Archives at the University of London Computer Centre (ULCC) since 1997, during which time his multi-disciplinary group has provided services related to the preservation and reusability of digital resources on behalf of other organisations, as well as conducting research, development and training.

The group has operated the National Digital Archive of Datasets for The National Archives of the UK for over twelve years, delivering customised digital repository services to a range of organisations.

As a member of the JISC’s Infrastructure and Resources Committee, the Advisory Council for ERPANET, plus several advisory boards for data and archives projects and services, Kevin has contributed widely to the research information community.

Kevin has been an active member of the JISC PoWR project and written many blog posts sharing his expertise.

The DCC has just begun its third phase of work makes the following comment on it’s Web site (A new phase, a new perspective, a new Director):

As a firm and trusted proponent of the DCC we look forward to his energetic leadership in this new phase of our evolution.

At JISC PoWR we offer Kevin our congratulations and wish him all the best in his new role.

Official Launch of the UK Web Archive

The British Library has officially launched the UK Web Archive, offering access in perpetuity to thousands of UK websites for generations of researchers.

The site was unveiled earlier this week by the Minister for Culture and Tourism, the Rt Hon Margaret Hodge MBE MP, and Chief Executive of the British Library, Dame Lynne Brindley, this project demonstrates the importance and value of the nation’s digital memory.

Websites included in the UK Web Archive include:

  • The Credit Crunch – initiated in July 2008, this collection contains records of high-street victims of the recession – including Woolworths and Zavvi.
  • Antony Gormley’s ‘One & Other’ Trafalgar Square Fourth Plinth Project – involving 2,400 participants and streamed live by Sky Arts over the web to an audience of millions, this site will no longer exist online from March 2010.
  • 2010 General Election – work has started to preserve the websites of MPs such as Derek Wyatt, who will be retiring at the next election, creating a permanent record of his time as a Member of Parliament.

This important research resource has been developed in partnership with the National Library of Wales, JISC and the Wellcome Library, as well as technology partners such as IBM.

British Library Chief Executive, Dame Lynne Brindley said:

Since 2004 the British Library has led the UK Web Archive in its mission to archive a record of the major cultural and social issues being discussed online. Throughout the project the Library has worked directly with copyright holders to capture and preserve over 6,000 carefully selected websites, helping to avoid the creation of a ‘digital black hole’ in the nation’s memory.

“Limited by the existing legal position, at the current rate it will be feasible to collect just 1% of all free UK websites by 2011. We hope the current DCMS consultation will enact the 2003 Legal Deposit Libraries Act and extend theprovision of legal deposit through regulationto cover freely available UK websites, providingregular snapshots ofthe free UK web domainforthebenefit of future research.

Further details are available from the British Library.

Findings available from the KRDS2 Survey

The findings from the Keeping Research Data Safe 2 (KRDS2) survey of digital preservation cost information are now available on the KRDS2 project Web page.

KRDS2

The Keeping Research Data Safe 2 project commenced on 31 March 2009 and will complete in December 2009. The project will identify and analyse sources of long-lived data and develop longitudinal data on associated preservation costs and benefits. It is believed that these outcomes will be critical to developing preservation costing tools and cost benefit analyses for justifying and sustaining major investments in repositories and data curation.

The Survey

The survey was carried out between between September and November 2009 to identify key research data collections with information on preservation costs and related issues. 13 survey responses were received: 11 of these were from UK-based collections, and 2 were from mainland Europe. The responses covered a broad area of research including the arts and humanities, social sciences, and physical and biological sciences and research data archives or cultural heritage collections.

The survey questionnaire sought to identify cost information available for the main KRDS2 activities in the Pre-Archive and Archive phases. Information for some activities is very high (archival storage cost information is available in 100% of the responses). Other more infrequent activities such as disposal (and perhaps also preservation planning) are less well represented. Knowledge of acquisition costs is also relatively low (46%).

Further information is available from the KRDS2 project Web page.


					

Web archiving in the wider world

When a topic is being discussed in the correspondence pages of national newspapers, it’s a sign that it’s no longer the concern of a few specialists. That’s certainly been true of web archiving for some time as a recent example shows. Malcolm Birdling wrote a letter published in the Guardian on January 1, 2010 bemoaning the fact the some government agencies – in particular the UK Borders agency – actively prevent sites such as the Internet Archive from capturing their contents. This has important consequences for citizens, particularly when such sites are used to publish regulations and guidance which is frequently changing. (I have anecdotal evidence that the UK Inland Revenue lost an appeal brought by a taxpayer over a very similar issue.)

WAGN website - capture from Internet Archive (detail) Mr Birdling’s letter brought a rapid response from David Thomas of the UK National Archives who was keen to reassure readers that central government websites were being archived, even without the legislation which prompted Mr Birding’s original letter. (That story refers to the changes to Legal Deposit regulations which would permit the British Library and other UK copyright libraries to capture UK content without the permission of rights owners.)

But earlier examples of non-specialist concern with preserving web content exist. One of my favourite examples comes from the Usenet group uk.railway whose contributors include a fair number of rail enthusiasts (“trainspotters” if you’re feeling unkind.) Privatisation of the UK railway network means that we have a plethora of train operating companies, or TOCs, each of whom operate their own web site, much as the great companies of old such as LNER might have done if the web had existed then. The difference is that now these companies come and go every few years when the government puts operating contracts out for re-tender. Railway ephemera such as promotional leaflets and timetables are a key part of the print collections at places such as the National Railway Museum. “What happens to TOC web sites when franchises change?” wondered one poster to uk.railway back in 2007. The Internet Archive has certainly captured some material, but it isn’t the same as a collection controlled by an institution such as the NRM. I wasn’t able to give a very positive answer to their question. I don’t believe the National Railway Museum are yet able to capture websites as part of their collection, and it’s not clear that any of the members of UKWAC see TOC sites as falling within their collecting policy.

And herein lies a lesson. Rail enthusiasts are incredibly effective at preserving railway heritage, both through their own efforts and through influencing others. They include many people with an enviable range of technical abilities. They ensured that special legislation was passed to ensure the preservation of railway heritage after privatisation. Not content with simply preserving heritage, some of them set about recreating it through building an entirely new steam locomotive. But their combined efforts have not yet (so far as I know) ensured that past railway web sites have been preserved. If they can’t manage it without institutional help, what hope is there for the rest of us ?

Bookings open for 5th International Digital Curation Conference

DCC

5th International Digital Curation Conference

“Moving to Multi-Scale Science: Managing Complexity and Diversity” | 2-4 December 2009

The IDCC is an established annual event reaching out to individuals, organisations and institutions across all disciplines and domains involved in curating data for e-science and e-research.

The DCC will be hosting a workshop programme on 2 December followed by a Pre-Conference Drinks Reception at the Natural History Museum. The main conference will open on 3 December with a keynote speech from Professor Douglas Kell, Chief Executive of the Biotechnology & Biological Sciences Research Council (BBSRC). Other key speakers will include: Professor Ed Seidal, National Science Foundation; Cliff Lynch, Coalition for Networked Information; Timo Hannay, Nature Publishing Group. The first day of the conference will incorporate an interactive afternoon for posters and demos, followed by a Symposium entitled “Citizen Science: Data Challenges” led by Richard Cable, BBC Lab UK.

The second day will be made up of peer-reviewed papers in themed sessions covering Disciplinary and Institutional Challenges, Practitioner Experience, Metadata, Software Preservation & Managing Risk.

Places are limited so please register now.

Registration to close on 20 November 2009

The Demise of Geocities – But a Renewed Interest in Web Site Archeology

An article published today on the Guardian Technology Web site entitled “Geocities: dead but not lost” describes how Geocities, which was founded in 1994 and was at one stage the third most-browsed site on the web, is now dead.

Geocities pageWe discussed Yahoo’s announcement that the Geocities service was to be shut down some time ago in a post entitled ““Seething With Anger” at the Demise of Geocities“. What I find interesting in the article is the information that “… there’s the real effort, by the Archive Team, who have been trying to archive as many Geocities pages and sites as they could“.

I’d not come across the Archive Team wiki before. They describe themselves as a “project composed of volunteers, currently coordinated by Jason Scott” which invites.

  • Writers, who can create clear essays and instructions for archivists and concerned parties.
  • People with Lots of Hosted Disk Space who have a proper hosted webserver and fat pipe, who are willing (when asked) to consider hosting mirrored dead sites or archives.
  • People who love setting up torrents who can do the same as the mirror folks, but do so hosting torrents.
  • OCD-rich individuals who want to download things who will respond to our alerts and call outs and download entire sites or diagnose ways to get at obfuscated data.

The wiki home page informs us that “This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.”

Hmm. I wonder how effective a volunteer organisation is likely to me? My initial thoughts were fairly sceptical, but other volunteer-led initiatives, such as Wikipedia, do seem to be successful. What are your thoughts?

The digital media collection +100 years

As part of the JISC ITT Workshops & Seminars: Achievements & Challenges in Digitisation & e-Content strand JISC Digital Media have hosted two free seminars focussing on key topics for individuals involved with digital media. Today I attended the second of these entitled The digital media collection +100 years.

Obsolescence, deterioration of physical storage media or withdrawal of institutional support: just what will prove to be the greatest threat to the materials we digitise today? This seminar projects one hundred years into the future and attempts to predict the future ‘preservability’ of what we digitise today. This seminar will examine changing user demands and inevitable developments in technology.

Panel Session

After a brief opening from Dave Kilbey of JISC Digital Media the scene setting introduction was given by Dr William Kilbride, Executive Director of the Digital Preservation Coalition.

The Preservation Landscape

As well as the more conventional look at the key issues (the volumes of data available, the complexities and complicated requirements of this data teamed with rising public expecations) William gave a really interesting talk on the path of literacy. He demonstrated through the Stroop interferance test how once we can read and write we tend to process this information quicker that image information. The result is a that literate cultures tend to be hegemonic through discursive power. His point was that the consequences of our work are not inevitable or neutral: digitisation is a social practice that can be used for good and for ill. After this slight aside William ran us through some of the main challenges which include obsolescence of technologies, correct configuration of hardware, software and operators, and the need for a constantly managed service. He ended with a few ‘answers’ from a survey of recent JISC digitisation projects. When asked how long their resources were to be available answers varied from “perpetuity” to “forever or three years”. He concluded that digital preservation is possible but our legacy will be what we make of it and cannot be taken for granted.

The Camera Raw format and preservation

Nigel Goldsmith, a photographer working for JISC Digital Media gave a quick run through of the possibilities of using Raw camera format. Raw offers the photographer greater control over the processing of their images, however this flexibility comes at a price. Raw is a proprietary format which requires specialist applications to view. Nigel’s suggestion was to archive raw but to keep it along side another format, possibly tiff or Jpeg2000.

Preservation Metadata Initiatives and Standards

After coffee Getaneh Alemu from the Humanities Computing Department, the University of Portsmouth gave us a whirlwind tour of state-of-the-art metadata standards and how metadata can help ensure the integrity, identity and authenticity of digital documents. His overview included a look at OAIS, NLA PANDORA, CEDARS, NEDLIB, LMER, PREMIS, and METS metadata initiatives and standards. He concluded that at the moment preservation metadata formats tend to have element naming issues that descriptive metadata initiatives don’t tend to have.

The challenges of archiving computer games and other multipart digital interactives

After lunch Tom Woolley from the National Media Museum talked about some of the digital media preservation issues they are tackling on-site at the museum. The museum is involved in a number of initiatives that aim to let visitors ‘have a go’ at old games and old internet environments. The tricky dilema is giving users a taster of old games in a cost effective way, actually using original kit (like ZX Spectrums) would have a heavy cost attatched. The key is often emulation. The museum also try to capture the context of games by capturing fan information, discussion forums, FAQs etc. Tom was followed by James Newman from Bath Spa University who works with Tom on the National Video Game Archive.

James talked about one of the biggest challenges of video game archiving: supersession. Within the gaming world there is a tendency to be always looking for the ‘next big game’ which has resulted in an environment where games creators don’t value old games. Although there is a niche market for retro games, gaming is an area where the experience is almost completely associated with the technology, making archiving very difficult.

The importance of collaboration

Simon Tanner, director of King’s Digital Consultancy Services focused on institutional preservation and the importance of collaboration in sustainability. He started off by saying that one of the biggest challenges is that we may run out of the minerals to make microchips. He later played on the climate issue again by saying that he currently saw digital preservation as sitting in the same space as climate change: people viewed it as potentially a terrible thing (the loss of digital objects) but currently it does not impact on individuals, so it remains low on the priority list. Simon pointed out that sustainability of resources was becoming a mandate but remains an unfunded mandate. The way to deal with this was through the ecology of collaboration – within your institution and out side.

A Poisoned Chalice? Accepting Responsibility for Sustainable Access

Neil Grindley

The day concluded with a talk from Neil Grindley, JISC Programme Manager for Digital Preservation. Neil pointed out ath ensuring that an organisation’s digital assets are safe, secure and accessible for the long term should (in theory) be an interesting, responsible and useful role for anyone in an organisation to accept. The critical importance of digital assets, the ubiquity of digital methods and the need for people in all walks of life to have effective means to refer to persistent sources of data reinforce this notion. How is it then that long-term asset management, information lifecycle management, data curation, digital preservation (call it what you will) is often regarded as a peripheral specialist activity that it is difficult to resource, complex to carry out, and delivers benefits that are, at best, simply an insurance policy rather than an activity that adds value to an organisation? Neil’s presentation examined the importance of defining clear roles for those involved with digital preservation and considered the importance of associating this professional activity with strategic and tactical frameworks. He advocated the need for allocation of responsibility and internal preservation policies. JISC has spent 6 million in the digital preservation arena between 2005 and 2009, yet there is still work to be done. He concluded by pointing out the need for human judgement when deciding what to keep and predicted that in the future digital preservation will be integrated with administration departments, have better tools and will take more terms from the cultural heritage area.

After Neil’s talk there was a panel session and time for questions, unfortunately I had to leave to make the difficult drive home through rush hour traffic!

The day was an interesting one, although the talks were a real mixed bag they all offered constructive steps forward to make today’s digital media collection something that we may be able to access and use 100 years on.

Why you can sometimes leave it to the University

“Does anyone have any positive experiences to share?”, asks Brian in a recent post. Well, I have – except it’s not in the UK. Harvard University Library in the USA have recently put Harvard WAX (the Web Archive Collection Service) live, after a pilot project which began in July 2006.

Harvard WAX includes themed collections on Women’s Voices and Constitutional Revision in Japan, but of particular interest to us in PoWR is their A-Sites collection: the semi-annual captures of selected Harvard websites. “The Harvard University Archives is charged with collecting and preserving the historical records of the University,” state the curators, recognising their formal archival function in this regard. “Much of the information collected for centuries in paper form now resides on University web sites.”

Helen Hockx-Yu of the British Library met with the WAX team in May 2009. “I was impressed with many of the features of the system,” she said, “not just the user and web curator interfaces but also some of the architectural decisions. WAX is a service offered by the Library to all Harvard departments and colleges. In exchange for a fee, the Departments use the system to build their collections. The academics may not be involved with the actual crawling of websites, but spend time QAing and curating the websites, and can to some extent decide how the archive targets appear in the Access Tool. The QAed sites are submitted directly into Harvard’s institutional repository.”

It is very encouraging to read of this participatory dimension to the project, indicating how success depends on the active involvement of the creators of the resources. Already 48 Harvard websites have been put into the collection, representing Departments, Committees, Schools, Libraries, Museums, and educational programmes.

The delivery of the resources has many good features also; there’s an unobtrusive header element which lets the user know they’re looking at an archived instance (instead of the live website). There’s a link explaining why the site was added to the collection, and contextual information about the wider collection. Another useful link allows researchers, scholars and other users to cite the resource; it’s good to see this automated feature integrated directly within the site. The Terms of Use page addresses a lot of current concerns about republishing web resources, and strikes just the right balance between protecting the interests of Harvard and providing a service to its users. Like a good OAIS-compliant repository, they are perfectly clear about who their designated user community are.

Best of all, they provide a working full-text search engine for the entire collection, something that many other web archive collections have been struggling to achieve.

The collection is tightly scoped, and takes account of ongoing developments for born-digital materials: “Collection managers, working in the online environment, must continue to acquire the content that they have always collected physically. With blogs supplanting diaries, e-mail supplanting traditional correspondence, and HTML materials supplanting many forms of print collateral, collection managers have grown increasingly concerned about potential gaps in the documentation of our cultural heritage.” The project has clear ownership (it is supported by the University Library’s central infrastructure), and it built its way up from a pilot project in less than three years. Their success was partially due to having a clear brief from the outset, and through collaboration with three University partners. What Harvard have done chimes in with many of the recommendations and suggestions made in the PoWR Handbook, particularly Chapters 5 (Selection), 16 (Responsibility for preservation of web resources) and 19 (How can you effect change?)

There are many aspects of this project which UK Institutions could observe, and perhaps learn something from. It shows that it is both possible and practical to embed website collection and preservation within an Institution.

Survey: How successful has Records Management been?

As part of his dissertation at Aberystwyth University Andrew Brown is undertaking a research project which aims to determine how successful Records Management has been in the UK by asking Records Managers for their perceptions of Records Management in their organisation and the profession as a whole. He is attempting to quantify this ‘success’
and would be very grateful if record managers could take the time to complete the survey, which will take approximately 10-15 minutes.

It is hoped that this study will generate some stimulating debate on this matter and lead to a greater understanding of the current and future state of the Records Management profession in the UK where digital and Web preservation may be key.

Please access the survey at the following link.

The survey closes at midnight on 5th September.

iPres 2009 Programme

The programme for the sixth International Conference on Preservation of Digital Objects (iPres 2009)  has recently been released and registration is now open.

This year’s event will be hosted by California Digital Library (CDL) at Mission Bay Conference Center in San Francisco on October 5th and 6th, 2009.

UK presentations include Maureen Pennock on ArchivePress, David Giaretta on significant properties in OAIS and Adam Farquar on (Planets) metadata.