A presentation on JISC PoWR entitled Preservation for the Next Generation was given yesterday at the Internet Librarian International Conference 2008 held at the Novotel London West.
The slides of the talk are now available from Slideshare and embedded below.
The presentation was well received and sparked a lot of interest particularly from delegates from US libraries. Donald Grose, Dean of Libraries from the University of North Texas, informed me that they have been preserving many of the US government Web sites for a number of years. See this related press release. Looking into Web resource preservation activity in other sectors and possibly other countries is definitely an area of interest for the future. Hopefully the JISC PoWR project will be able to talk more to Donald about his work in the future.
The web-archiving strand at the iPRES Conference was an opportunity for a number of National Libraries to describe their initiatives, their approaches to web capture and preservation, and the things that were (for them) the biggest hurdles. It’s interesting to me that librarians - as opposed to archivists - seem to have been first off the mark with web-archiving, although in the UK The National Archives have at least two parallel initiatives underway (UKWAC and European Archive) to capture government websites deemed to be records.
From the PoWR point of view, one thing I found useful was the summary statement from Iceland which suggests that the community is now starting to agree on three main approaches to selection - bulk/domain harvesting, thematic, and event-based. Sometimes I wonder if these approaches are too library-influenced (looking at websites primarily as publications, rather than as records, whereas in PoWR we are suggesting that HFE websites contain and function as a mixture of both), but it helped me sharpen up my thinking and I fed this idea back into the PoWR Handbook’s chapter on Selection.
Since they were all representing libraries, most of the presenters tended to focus on legal deposit (and attendant permission problems) as the biggest hurdle to gathering websites - and the one that eats up the most in terms of resources. Denmark have not resolved this completely, and although they have gathered lots of material from the web, they don’t make it widely available - only allowing controlled access for research or statistical purposes. France, on the other hand, have had legal deposit laws in place since 2006. Australia have found this such a problem (despite the strenuous efforts they have made to get the law changed) that it pretty much forced the decision to go the curatorial route. All permissions are negotiated, but the collections are shaped to a certain extent by community input.
With my archivist hat on, I sat up when we were told quite categorically that registration and cataloguing of web resources would not work, and that no-one should ever even attempt it. The BL admitted they were ‘not really collecting metadata’. If this is true, I immediately started to wonder why both PANDAS and Web Curator Tool (which I have used) have conventional Dublincore metadata elements built into their workflow? Don’t we anticipate cataloguing archived web resources in some way? I almost asked a question about this, until I reflected on the use of NutchWax and full-text indexing (which is probably acceptable until we can come up with some form of semantic tagging or automated metadata extraction for web resources).
Then I found myself reaching for the mic to ask a question about what I call ‘endangered resources’. We have already blogged about this on dablog, when my colleague Joanne Anthony raised a question about a web resource owned by a smallish institution which suddenly found itself with its funding removed. Was there anything such an Institution could do, I asked the panel, to preserve its website? And what were the pro-active steps being taken by these National Libraries to rescue or identify resources at risk? I know that UKWAC, for example, offers a public submission service on its website, although it is not very prominent or visible, nor is it quite clear what happens to requests for archiving once the form has been filled in. I received some interesting replies, including the amusing anecdote from France which suggests that their archival collections have been accessed by red-faced politicians who have accidentally deleted their own blogs. However, I still wasn’t quite sure what national initiatives exist to address what I perceive as a significant gap in the preservation of unrecognised (and therefore uncaptured) resources.
Brian and Marieke have already written about iPres2008 and PoWR, and I have written and will write more about it from a general perspective on DABlog. But we thought it would be worth saying a bit more about what this conference, which is looking at the complete picture of digital preservation, had to say which is of relevance to PoWR’s work of web preservation in UK Universities.
There was an entire session devoted to various web archiving initiatives on the second day, which at first sight one might think is of particular relevance (almost as much as Brian’s presentation, one might think.) I wasn’t at this session - it was one of those running in parallel tracks, and I was speaking in the other track - but Ed Pinsent was and will be writing at more length about it soon. But even without attending, I’m aware that many of the projects, operating as they do within their national domains in Australia or elsewhere, won’t have much role in helping save UK University web content (unless we move our domains to .edu.au - there’s a thought.) Even when the BL realises its long-term aim of harvesting across the entire UK web domain, it still will be selective in some ways about what it captures - about depth and frequency of harvests, and about the type of content. You won’t be able to depend on those institutions to capture what you want to be captured. So if these initiatives aren’t going to meet all our needs, do we need to do it ourselves ? The PoWR project thinks not, but that is one of the options institutions will need to examine. The work the IIPC is doing to develop harvesting and access tools will be of interest to those few institutions that feel able to operate these tools themselves - not something to be undertaken lightly.
Yet there was much of relevance at iPres2008. One recurring them, picked up at the outset by Lynne Brindley and in Steve Knight’s closing remarks, was that ‘digital preservation’ is not the term to be using in discussions with our institutions and the world, echoing remarks on the DCC blog which Brian later picked up on here. Steve prefers the phrase ‘permanent access’. which is indeed outcome-focussed. However, we’ve also said in PoWR that preservation isn’t always forever, so I would prefer something a little more all-embracing - ‘long-lived access’ might fit.
The sessions covering things like significant properties also touched on issues that PoWR is concerned with. When we decide to preserve something, what is it that we’re really trying to keep ? Most forms of preservation change the original object in some way, just as long-life milk isn’t the same as pasteurised, and neither are quite as tasty as fresh milk (or so I’ve been told.) This is clearly still a very difficult problem, and one that (to my mind) demonstrates that the digital preservation community hasn’t even developed a clear problem statement, much less a fully worked-out solution. So, in the meantime, we need to be pragmatic and do what seems best at the time. Always a good plan.
As my colleague Marieke Guy commented recently I presented a paper on “Preservation of Web Resources: The JISC PoWR Project” at the iPRES 2008 conference on Monday 29 September 2008 which described the work of the JISC PoWR project. The iPRES 2008 conference, incidentally, was featured in an article “In praise of … preserving digital memories” published in The Guardian Editorial page yesterday (1 October 2008). The article stated that “If all goes well, we will have the capacity to preserve as many of our memories, personal and national, as we want“. So it was very pleasing to present the work of the JISC PoWR project, which explored ways in which memories held on Web sites can be selected and preserved.
The slides of the talk (in which I focus primarily on preservation within a Web 2.0 environment) are now available and are embedded below.
There is also a video recording of the talk available (although I haven’t yet been able to upload the video to Google Video to allow it to be embedded in other Web pages, I’m afraid).
I should also add that Chris Rusbridge provided a comprehensive report on the conference. I was pleased to read Chris’s comments on my talk which he described as “a very entertaining talk, and well worth looking up“. He went on to describe me as “not a preservationist, but is a full-blown technogeek discussing the roles of the latest Web 2.0 technologies on his blog, in his role as UK Web Focus“. And this technogeek was particularly pleased to read that the JISC PoWR “project achieved a strong level of interaction through its several workshops“.
Brian Kelly will be presenting a paper on “Preservation of Web Resources: The JISC PoWR Project” authored by the JISC PoWR team at the fifth International Conference on Preservation of Digital Objects (iPres 2008) this coming Monday (29th September 2008). The conference will be held at the British Library from 29 - 30th September 2008 and brings together researchers and practitioners from around the world to explore the latest trends, innovations, thinking, and practice in digital preservation.
The slides and accompanying paper are available from the UKOLN Web site.
There are a number of upcoming events that may be of relevance to those interested in Web resource preservation:
European Archive, 16-17th of October 2008, Paris
Two-day training session on Web Archiving. The training will cover all aspects of Web Archiving for librarians, archivists as well as technicians in charge of Web archiving. Special attention will be given to providing the necessary background on Internet technologies in general and Web publishing in particular to understand the media and requirements for its preservation.
For further details have a look at the European Archive Web site
Records Management in the digital age, 24th September 2008, London
The theme of UNICOM’s networking and discussion evening is the challenge of information management and records management in the digital age and in the light of collaborative technologies, social networking and Web 2.0. It features three short presentations followed by interactive discussion and a drinks and networking session. The evening will be held at the Regency Hotel, London from 5pm - 7:30pm.
For further details have a look at the UNICORM Web site.
International Digital Curation Conference, 1st -3rd December 2008, Edinburgh The Digital Curation Centre is holding their 4th International Digital Curation Conference, which will comprise of a mix of peer-reviewed papers, invited presentations and international keynote speakers
For further details see the DCC Web site.
The third and final JISC-PoWR workshop (Embedding Web Preservation Strategies Within Your Institution) took place last Friday (12th September 2008) at the Flexible Learning Space, Centre for Excellence in Enquiry-Based Learning (CEEBL), University of Manchester. Twenty delegates were able to comment on an early draft of the JISC-PoWR handbook and provide feedback on the approaches suggested.
The main presentations are now available for download:
During the last JISCPoWR workshop yesterday in Manchester (of which more anon) I made brief mention of a tool from Adobe which allows web pages, or entire sites, to be captured to a PDF file. I mentioned this primarily to illustrate one of the three points at which web capture can take place (behind the server; from the HTTP transaction; or browser-side) but it generated considerable interest, and I promised to blog about the product since I could not remember what it was called.
It turns out that it’s not a separate product, nor a plug-in, but a built-in part of Adobe Acrobat. It was first available as a free add-on for Acrobat 4 in 1998 or 1999 , and I think it was then that I first saw this demonstrated at the PRO (as it then was) - hence my misunderstanding. Tools like this have their place, but (like all web preservation technologies) they also have their drawbacks. PDF’s print-oriented format isn’t a good match to some sites, much as some sites don’t look good when you try to print them. (In fact, I believe that Acrobat Web Capture effectively uses the browser’s print engine combined with PDF writer pseudo-printer to do its work, so there will be a close correlation.) But we’ll be covering this tool, along with others, in the handbook.
An ‘at the event’ report on the first JISC PoWR workshop held at Senate House Library, London on Friday 27th June 2008 has been published in the recent Ariadne Web Magazine (issue 56, July 2008). The piece, written by Stephen Emmott, concluded:
The challenges are significant, especially in terms of how to preserve Web resources. No doubt the institutional repository will play a role. Arguably, the absence of a solution to the preservation of Web resources leads to either retention or deletion, both of which carry risks. The workshop’s core message to practitioners was therefore to start building an internal network amongst relevant practitioners as advice and guidance emerge.
My thinking about this matter was certainly stimulated and I look forward to the next two workshops, and the handbook that will result. Web preservation is an issue which was always important but now grows increasingly urgent.
It is hoped that a trip report on the third workshop (for which bookings are currently still open) will be published in a future Ariadne.
The draft programme for the third JISC-PoWR workshop (Friday 12th September 2008, University of Manchester)is now available:
Presentation. 1. Introduction to JISC-PoWR (Kevin Ashley, ULCC) Presentation. 2. Records Management vs. Web Management (Marieke Guy, UKOLN) Breakout Session: Web Preservation in your organisation Presentation. 3. Web Preservation and Web 2.0 (Brian Kelly, UKOLN) Presentation. 4. Legal issues (Jordan Hatcher, Opencontentlawyer)
LUNCH
Presentation. 5. The JISC-PoWR Workshops - Inputs and Outcomes (Marieke Guy, UKOLN) Presentation. 6. The JISC-PoWR Handbook - Explaining Web Preservation (Kevin Ashley, ULCC) Presentation. 7. The JISC-PoWR Handbook - Identifying Web Issues (Richard Davis)
COFFEE
Breakout Session: The next steps for Web Preservation in your organisation Presentation. 8. The JISC-PoWR Handbook - Recommended Approaches (Ed Pinsent, ULCC) Presentation. 9. Future possibilities Final Thoughts
More information is available on the Workshop 3 page.
Places are still available. You can register using the Online Registration Form (note that this link takes you out of the JISC-PoWR blog to a Google Doc form).