I am a research officer in the Community and Outreach Team at UKOLN. Much of my work involves exploring Web 2.0 technologies and their relevance to the communities we work with.
DigitalPreservationEurope (DPE) fosters collaboration and synergies between many existing national digital preservation initiatives across the European Research Area. As part of their work they publish concise overviews of key digital preservation and curation issues. Earlier this month they published a briefing paper on Considerations for the Preservation of Blogs (PDF).
The preamble sets the context for the paper:
Blogs, it seems, are everywhere these days, but what about the next day (and the next and the next …). Opinions vary on whether or not blogs merit preservation beyond the actions of a blog’s respective authors. This briefing paper does not contribute to that dialogue. Rather, it provides an overview of issues to be considered by organizations planning blog preservation programs. Blogs are the product of a network of players, including blog authors, service providers, and readers. Discussed here are some key attributes of blogs, and the characteristics and behaviors of these players, which may impact preservation activities.
During the JISC PoWR project we recognised that despite blogs initially being commonly characterised as ephemeral (as commented on in the DPE paper) their increasing importance and role in both the research context and in our cultural history is becoming apparent, and like other Web resources their preservation is a matter that needs to be addressed, somehow.
The PoWR blog has a number of interesting posts on the preservation of blogs including:
We’ve written about Twitter on the JISC PoWR site before mainly when considering preservation of Web 2.0 material. Now Twitter could become a useful tool in helping you communicate about Web resource preservation.
She points out that “Some of our conversations are cross-media; they may start on Twitter, but they move to Facebook and then the blog. Capturing only one of those accounts means that only part of our conversation is captured. Okay, so you’re probably not interested in capturing ourinteractions in your archives. But you probably are interested in capturing interactions from important people (back to Stephen Fry and Obama again) and you will thus face the same issues.”
She then says “We all know the problems we’ve got in capturing and archiving emails. What of Twitter? How do you get Tweets out of the system and integrate them into a collection? What of Facebook data? And YouTube?”
It seems the Twitter challenge is becoming more real as it becomes increasingly mainstream.
We are lucky enough to start 2009 with a guest blog post from Dr James Currall, Director of Information Strategy, IT Services & HATII Senior Research Fellow, University of Glasgow.
James has been involved with the highly successful Glasgow MPhil (now MSc) course in Information Management and Preservation since it inception, in which he teaches about the transition from storage of information on physical to digital media, information security, the role of numbers as information and a variety of other topics including risk and information management as an investment. In this latter context he was the Project Director of the espida project which developed a sustainable business-focussed model for digital preservation. He gave a plenary talk on Web preservation at last year’s Institutional Web Management Workshop (IWMW 2008) entitled The Tangled Web is but a Fleeting Dream … but then again … which was very well received and is available to watch on Google Video.
And I’ll pass you over to James…
A few weeks back, I was involved in a discussion about the skills required by people involved in Digital Curation and much of that discussion was based around the DigCCurr Project which has a long list of skills, some of which are specific to Digital Curation, but many of which are rather of a more general nature. And this set me on a dangerous course – thinking ….What exactly is this ‘profession’ of digital curator that DigCCurr amongst others are trying to define?
Let us rewind to say the second half of the 16th century and let us suppose that you were charged by Mr Shakespeare’s publishers with curating ‘The Scottish Play’. What would you have done? What exactly is this ‘information object’? Is is the fonts, the layout, the pagination, the language, the story, the stage directions or what? In spite of the absence of the profession of ‘paper curator’ we have inherited a rich heritage. Along the way, many items will have been lost – it was always thus and, in spite of the optimistic techno-determinism of some, it always will be EVEN IN THE DIGITAL AGE. I would argue that this is all good and necessary and whilst I would mourn the passing of Algol, Reverse Polish Notation, amplifiers based on thermionic valves or chunky discrete solid state components, vinyl records, reel to reel tape and other really splendid ideas that were IMHO much better than the ‘mass market equivalents’ that replaced them, we have to discard much of our baggage as we move on.
So what is this preservation activity all about? Is it not about the preservation and curation of information not of digits? During a session with my MSc students, We visited the Way Back Machine and had a look at the University of Glasgow Web site (you wondered when I would get on to the web didn’t you?). The page that we selected at random was from 18th October 2000. As a web page it is rather uninteresting, when I looked at it today there was no style sheet, the graphics were all missing and it was generally rather uninspiring, but …. what is interesting is the headline news story ‘Funeral of the First Minister, Donald Dewar’. For those of you firth of Scotland, Donald was a leading light in the establishment of devolution for Scotland and the first First Minister of the devolved administration in Scotland. He was a graduate of the University of Glasgow and his premature passing at the age of 63 was tragic. The news story is about ‘administrative’ details of his funeral and the passage of his cortege past the University – details of importance in relation to the history of the University and perhaps of Scotland. It is the information contained in the web pages that is of interest and importance, whilst the layout of the pages and such ‘technical’ details of passing interest as the ‘container’ for that information.
So with 2008 now ended let us bury the idea that the digital needs its own ghetto that we need to prepend everything with ‘digital’, be it: curation, preservation, art, culture, revolution, etc. Digital artifacts are the currently ‘fashionable’ containers for information and whilst the term continues, the technologies underneath that are radically different at every turn and often require as much conversion one to another as a paper to magnetic disc conversion. It is not the containers that are important but what they contain. The Eastern concept of ‘Pointing at the Moon’ has something to say here.
If we come to regard preservation/curation as a finger pointing to the moon; we might come to mistake the finger for the moon and never see beyond it to the moon itself.
This short clip of Bruce Lee in ‘Enter the Dragon‘ (1974) captures something of this in a different context.
I am also reminded of the auditors in Terry Pratchett’s ‘Thief of Time’ who take a great painting and break it down into flakes of paint which they put in little piles of each colour and then spend time looking to see where the art has gone! These auditors are described in the Wikipedia article for DiscWorld thus:
“The Auditors, cosmic bureaucrats who prefer a universe where electrons spin, rocks float in space and imagination is dead, represent the perils of handing yourself over to a completely materialist and deterministic vision of reality, devoid of the myths and stories that make us human.“
In 2009 we need to see digital preservation and curation as ‘last year’s model’, of course we need to understand the importance of custody, metadata and identifiers, but above all we need to understand the centrality of the information in the artifacts that we are seeking to curate and preserve. This piece is recognisably ‘Currall’ not because of a digital signature, not because it is on his web site and not because the owners of the JISC PoWR say it is – it is ‘Currall’ because of its recognisably iconoclast position, poor grammar and tortured logic – that is what needs to be preserved!
Information is the thing (even if that is hard and technology is relatively easy) – lose sight of that and the game is a bogey.
PS if you are interested in a rather more rigorous treatment of this topic you might like to access “Authenticity: a red herring?“ (doi:10.1016/j.jal.2008.09.004)
The BBC News Web site has published an interesting article entitled Future in Bits asking how can the ever-changing Web be archived bearing in mind the dilema of the malleable nature of digital information.
The article draws attention to the fact that no UK-based commercial online newspapers are currently being archived.
David Stuart, a research fellow in Web 2.0 Technologies at the University of Wolverhampton is quoted as saying:
“The lack of an exhaustive archive of the UK web space not only risks the loss of information on web pages that are changed or taken down,” he said. “It also undermines the value of pages that link to them; the value of the web comes as much from the hyperlinks between pages as the contents of the web pages. This is especially true in the blogosphere, where so much of the content created by the public is built upon the foundations of traditional news stories“
Jessie Owen, digital continuity project manager at the National Archives explains that the key to archiving is preparation.
This text has been mentioned at PoWR workshops, on the PoWR blog and on the JISC Information Environment Team blog. I can honestly say that it has had quite an impact on my thinking with regard to preservation and Web 2.0 resources, other members of the PoWR team may agree.
As I say in the conclusion:
This book offers up much food for thought. Bailey wants to wake up and shake his community. He wants to make them see that all is not well in the records management world and that if they don’t start moving with the times then they will be pushed out of the way. He contends there is a very real possibility that records management as we know it will cease to exist; it will be outsourced.
JISC have announced the publication of a study on Digital Preservation Policies which can be downloaded in PDF format from the JISC Web site.
This study aims to provide an outline model for digital preservation policies and to analyse the role that digital preservation can play in supporting and delivering key strategies for Higher and Further Education Institutions. Although focussing on the UK Higher and Further Education sectors, the study draws widely on policy and implementations from other sectors and countries and will be of interest to those wishing to develop policy and justify investment in digital preservation within a wide range of institutions.
The study concludes “that for institutions digital preservation must be seen as “a means to an end” rather than an end in itself: any digital preservation policy must be framed in terms of the key business drivers and strategies of the institution.”
Two tools have been created in the study:
1) a model/framework for digital preservation policy and implementation clauses based on examination of existing digital preservation policies;
2) a series of mappings of digital preservation to other key institutional strategies in UK universities and colleges including Research, Teaching and Learning, Information, Libraries, and Records Management.
These tools are definitely worth taking a look at if you are embarking on a Web preservation strategy.
There was an interesting editorial by Siobhan Butterworth in Monday’s Guardian about ‘unpublishing’ – removal of content once placed on the Internet.
Siobhan explains:
Judging from the numbers of emails I get from people asking for material to be removed from the Guardian’s electronic archive, it seems that some people still don’t fully understand the implications of speaking to or even writing for a news organisation in the web age.
She goes on to argue that:
The web makes a lie of the old cliché that today’s newspaper pages are tomorrow’s fish and chip wrapping. Nowadays, as I’ve said before, the things you say about yourself in a newspaper are more like tattoos – they can be extremely difficult to get rid of.
It seems a good rule to set yourself when publishing content (or allowing content to be published about you) on the Web (and the same rule could apply to all emails sent) is: Are you happy for the whole world to see this?
The concepts that what you publish can be seen by all and that nothing truly disappears from the Web have slowly begun to embed themselves in our consciousness. This has been fuelled by a number of horror stories about employers accessing the Facebook (and Flickr and other socialnetworking sites…) accounts of perspective employees. A New York Magazine article from February this year quoted a teenager as saying “If I don’t delete it, I’m still gonna be there. My generation is going to have all this history; we can document anything so easily.” Many people do realise that the off-hand comments and inappropriate photos we blog or publish can come back to haunt us.
While in some ways this might seem to be the flip side of what JISC PoWR is about deletion is very much part of a preservation strategy.
So it pays to remember that:
stuff can disappear, and quite often it is the really good stuff we wish we’d held on to.
but
stuff that we wish would come out in the wash can stain for good.
So maybe we need to give some thought to how (and should things) be ‘unpublished’? What do people think?
A presentation on JISC PoWR entitled Preservation for the Next Generation was given yesterday at the Internet Librarian International Conference 2008 held at the Novotel London West.
The slides of the talk are now available from Slideshare and embedded below.
The presentation was well received and sparked a lot of interest particularly from delegates from US libraries. Donald Grose, Dean of Libraries from the University of North Texas, informed me that they have been preserving many of the US government Web sites for a number of years. See this related press release. Looking into Web resource preservation activity in other sectors and possibly other countries is definitely an area of interest for the future. Hopefully the JISC PoWR project will be able to talk more to Donald about his work in the future.
Brian Kelly will be presenting a paper on “Preservation of Web Resources: The JISC PoWR Project” authored by the JISC PoWR team at the fifth International Conference on Preservation of Digital Objects (iPres 2008) this coming Monday (29th September 2008). The conference will be held at the British Library from 29 – 30th September 2008 and brings together researchers and practitioners from around the world to explore the latest trends, innovations, thinking, and practice in digital preservation.
The slides and accompanying paper are available from the UKOLN Web site.
There are a number of upcoming events that may be of relevance to those interested in Web resource preservation:
European Archive, 16-17th of October 2008, Paris
Two-day training session on Web Archiving. The training will cover all aspects of Web Archiving for librarians, archivists as well as technicians in charge of Web archiving. Special attention will be given to providing the necessary background on Internet technologies in general and Web publishing in particular to understand the media and requirements for its preservation.
For further details have a look at the European Archive Web site
Records Management in the digital age, 24th September 2008, London
The theme of UNICOM’s networking and discussion evening is the challenge of information management and records management in the digital age and in the light of collaborative technologies, social networking and Web 2.0. It features three short presentations followed by interactive discussion and a drinks and networking session. The evening will be held at the Regency Hotel, London from 5pm – 7:30pm.
For further details have a look at the UNICORM Web site.
International Digital Curation Conference, 1st -3rd December 2008, Edinburgh The Digital Curation Centre is holding their 4th International Digital Curation Conference, which will comprise of a mix of peer-reviewed papers, invited presentations and international keynote speakers
For further details see the DCC Web site.
The third and final JISC-PoWR workshop (Embedding Web Preservation Strategies Within Your Institution) took place on Friday 12th September 2008 at the Flexible Learning Space, Centre for Excellence in Enquiry-Based Learning (CEEBL), University of Manchester. Twenty delegates were able to comment on an early draft of the JISC-PoWR handbook and provide feedback on the approaches suggested.
The main presentations are now available for download:
An ‘at the event’ report on the first JISC PoWR workshop held at Senate House Library, London on Friday 27th June 2008 has been published in the recent Ariadne Web Magazine (issue 56, July 2008). The piece, written by Stephen Emmott, concluded:
The challenges are significant, especially in terms of how to preserve Web resources. No doubt the institutional repository will play a role. Arguably, the absence of a solution to the preservation of Web resources leads to either retention or deletion, both of which carry risks. The workshop’s core message to practitioners was therefore to start building an internal network amongst relevant practitioners as advice and guidance emerge.
My thinking about this matter was certainly stimulated and I look forward to the next two workshops, and the handbook that will result. Web preservation is an issue which was always important but now grows increasingly urgent.
It is hoped that a trip report on the third workshop (for which bookings are currently still open) will be published in a future Ariadne.
The draft programme for the third JISC-PoWR workshop (Friday 12th September 2008, University of Manchester)is now available:
Presentation. 1. Introduction to JISC-PoWR (Kevin Ashley, ULCC) Presentation. 2. Records Management vs. Web Management (Marieke Guy, UKOLN) Breakout Session: Web Preservation in your organisation Presentation. 3. Web Preservation and Web 2.0 (Brian Kelly, UKOLN) Presentation. 4. Legal issues (Jordan Hatcher, Opencontentlawyer)
LUNCH
Presentation. 5. The JISC-PoWR Workshops – Inputs and Outcomes (Marieke Guy, UKOLN) Presentation. 6. The JISC-PoWR Handbook – Explaining Web Preservation (Kevin Ashley, ULCC) Presentation. 7. The JISC-PoWR Handbook – Identifying Web Issues (Richard Davis)
COFFEE
Breakout Session: The next steps for Web Preservation in your organisation Presentation. 8. The JISC-PoWR Handbook – Recommended Approaches (Ed Pinsent, ULCC) Presentation. 9. Future possibilities Final Thoughts
More information is available on the Workshop 3 page.
Places are still available. You can register using the Online Registration Form (note that this link takes you out of the JISC-PoWR blog to a Google Doc form).
For those who still need to convince their senior management here are five reasons why you should embed Web preservation strategies within your institution:
1. You need to protect your institution
University Web sites contain evidence of institutional activity which is not recorded elsewhere and may be lost if the Web site is not archived or regular snapshots are taken. If you do not record certain information you are in danger of failing to comply with legal acts such as FOI and DPA, you may be breaking contractual and auditing obligations and put your institution at risk. This risk management approach has been taken to countless other digital resources (for example email – Curation of emails), it is only a matter of time before it is a standard approach to Web sites.
2. Starting a Web preservation programme will make you look like a ‘forward thinking’ university
You could be one of the first to start an official ‘Web preservation’ programme which will be great marketing fodder. (Remember the first UK Universities to offer blogs to students (Warwick), launch a YouTube channel and offer downloadable lectures using iTunes (University College London)? How about the first to get sued by a student for changing the course specification and having no record of the previous entry? Universities have already been sued over Web site accessibility, copyright of material on their site and allowing plagarism to take place.) Embedding Web preservation strategies will also help you think about the continuity of resources, dead links etc.
3. It could save you money
Web resources cost money to create and failing to repurpose and reuse them will waste money. Although Web preservation may have an initial cost, once the process has begun the savings can be great. Having a good strategy in place (which also should include selection and deletion where appropriate) will save both money and energy in the long run. Brian Kelly’s recent UK Web Focus Blog post on the environmental issues involved in digital preservation touches on this. As Owen Steven suggests in his comment it may make sense to link digital preservation to commercialism.
4. You have a responsibility to the people who use your resources
Students and staff may make serious choices based on Web site information and you have a responsibility to make sure a record is kept of this information.
5. You have a responsibility to the people who may need to use your resources in the future
Many of resources your institution publishes are unique and deleting them may mean that invaluable scholarly, cultural and scientific resources (heritage records) will be unavailable to future generations.
These reasons should give your senior management food for thought. These drivers and others will be expanded on in the JISC-PoWR handbook.
At both of the JISC-PoWR workshops delegates have been keen for the project team to spell out the reasons why institutions might want to preserve Web resources. These ‘drivers’ then give fuel to their case for the funds needed to archive the institutional Web site.
The idea of ‘heritage records’ is one that is often mentioned. Using Web sites as a ‘cultural snap shot’ has the potential to be a highly useful activity.
In his interesting and functional text Managing the Crowd: Rethinking Records Management for the Web 2.0 World Steve Bailey puts forward the point that deciding what will be important in the future is a tricky business. As he explains in the section on appraisal, retention and destruction: “The passage of time inevitably changes the filter through which we view our world and assess its priorities.”
Steve gives the example of the current plethora of Web sites that offer what we might call ‘quack’ remedies for medical problems. These sites may not seem to be of great interest right now but they may be invaluable to future historians who wish to demonstrate the distrust of the medical profession exhibited in 21st century western culture.
James Curral in his recent plenary talk at the recent Institutional Web Management Workshop used the example of blog posts made by soldiers out in Iraq and Afghanistan to demonstrate the irony of modern technology; these highly informative records could easily be lost while the diaries of World War II soldiers remain accessible.
Preservation mistakes have been made aplenty in the past. The destruction of much of the BBC’s flagship programmes in the 1970s has been well documented and in 2001 the BBC launched a a treasure hunt campaign to locate recordings of pre-1980 television or radio programmes. Ironically the Web site is no longer being updated, though it is still hosted on the BBc server.
So who can know what the future will bring? Which Web resources will we wish we had kept? Which student blog writer will go on to be a future prime minister or an infamous criminal? What bit of the terrabytes is the most important?
As Steve Bailey points out there is no crystal ball. It has always has been, and always will be, very difficult to predict what resources may prove to be valuable to future generations.
Although this offers little recompense for those making these choices, it does at least argue the case that we do need to preserve and we need to do so soon.
Bookings are now open for the third JISC-PoWR workshop to be held at the Flexible Learning Space, University of Manchester on Friday 12th September 2008. The workshop entitled Embedding Web Preservation Strategies Within Your Institution is free to attend and open to Web, information and records managers working in HE/FE Institutions and related HE and FE agencies.
For information on how to reserve a place see the workshop 3 page.
Records Management has a concept of record declaration. This is the point when we “draw a metaphorical line in the sand and fix the content of a record” (see the JISCInfo Kit on Records Management which also uses the term ‘fixity’ in this context.)
Most electronic records management systems (ERMS) provide users with the ability to perform this declaration automatically. When they do so, the digital content they have created (e-mail, document or whatever) becomes ‘fixed’. UK Government have called this creating ‘locked down and secure’ records, a necessary step for ensuring their authenticity and reliability.
But ERM systems seem to work best with static documents; authors of reports, for example, understand that a good time to declare their report as a record is when the final approved version has been accepted. Yet one of the distinctive features of Web 2.0 content is that the information is very fluid, and often there is no obvious point at which to draw this line and fix content.
One example might be blog posts. These can receive comments from the moment they are posted and well into the future. Not only this but many bloggers go back and edit previous posts and delete comments. This matter was recently discussed on Brian Kelly’s UKWeb Focus blog. Phil Wilson asked:
“Brian, is there any reason you never modify or update your posts when you’ve made an error, and instead make users plough through the comments to see if anything you’ve said is wrong?” (UK Web Focus Blog)
Brian’s response was that he sometimes fixes typos and layout issues but is:
“reluctant to change the meaning of a published post, even (or perhaps especially) if I make mistakes. In part I don’t want to undermine the authority of any comments or the integrity of any threaded discussions.”
Brian is open about this in his blog policy stating that only in exceptional circumstances will postings and comments be deleted.
Assuming that blog posts are to be included within a records management programme or a preservation programme, the issues described above might cause problems for those attempting to preserve authentic and reliable Web resources.
One approach is to be explicit in your Web Resource Preservation strategy about when you freeze Web resources for preservation, and the implications of doing so.
Another approach might involve an agreed institutional policy such as Brian has, but with an additional form of wording that is explicit about the status of blog posts as records, including when and how they should be declared as records, and whose responsibility it is to do so. Should selected blog posts be declared as records by their owners into the ERMS? Or will they all be harvested by an automated capture programme, and if so, how frequently?
The first JISC-PoWR workshop took place last Friday (27th June 2008) at Senate House Library and was attended by over 30 people from a wide range of professional groupings, including the Web management and Records Management communities. The day instigated much discussion and started people thinking about how they could make a start on Web resource preservation at their institution.
The main presentations are now available for download.
Presentation 2: Preservation of Web Resources Part I, Kevin Ashley, ULCC. Presentation: [Slideshare] – [pres2.ppt PowerPoint file] (audio – pres2.mp3)
Presentation 3: Challenges for Web Resource Preservation, Marieke Guy, UKOLN Presentation: [Slideshare] – [pres3.ppt] (audio – pres3.mp3)
Presentation 4: Bath University Case Study, Alison Wildish and Lizzie Richmond, University of Bath. Presentation: [Slideshare] – [pres4.ppt PowerPoint file] .
Presentation 6: Preservation of Web Resources Part II, Ed Pinsent, ULCC. Presentation: [Slideshare] – [ pres6.ppt PowerPoint file] (audio – pres6.mp3)
Preservation 7: ReStore: A sustainable web resources repository, Arshad Khan, National Centre for Research Methods. Presentation: [Slideshare] – [pres7.ppt Powerpoint file]. Audio: [pres7.mp3]
The presentations are also available from Slideshare. Audio files are available from the Internet Archive.
We are also using a Wetpaint Wiki to collate the feedback from the workshop breakout sessions. If you were there, please have a look and help us ensure that your suggestions are represented.
An ‘at the event’ report written on the workshop by Stephen Emmott has been published in the Ariadne Web Magazine.
The technological and cultural changes brought about by the advancement of the Web have, on numerous occasions, required co-ordinated interdisciplinary work. 0ne of the intended aims of the JISC-PoWR project is to help to bring together the differing perspectives of information professionals such records managers and Web managers in the context of the preservation of Web resource – and there are probably at least four sets of expertise involved: Web content creation (as perceived by Web authors), Web content management from a technical perspective (as perceived by those who choose or configure the underlying software), records and/or information management and digital preservation. So there’s the bringing together of intellectual perspectives: (What content needs to be preserved? How long for? Who is responsible?) and there’s the technical perspectives, assuming that the above questions come up with anything that needs preserving (How do we do it ? Are site-level tools more appropriate than national services? Does CMS X make preservation easier or harder than CMS Y? Is a more accessible site also a more preservable one? Are there configuration choices that affect preservation without (significantly) affecting other aspects of management?)
Within the JISC-PoWR team there have been a number of interesting discussions that have highlighted how differently the different players see Web preservation. To quote Ed Pinsent:
“The fundamental thing here is bringing together two sets of information professionals from differing backgrounds who, in many cases, don’t tend to speak to each other. Many records managers and archivists are, quite simply, afraid of IT and are content to let it remain a mystery. Conversely, it is quite possible to work in an IT career path in any organisation (not just HE/FE) and never be troubled by retention or preservation issues of any sort. “
The cliched view might regard Web managers as concerning themselves primarily with the day to day running of an organisation’s Web site, with preservation as an afterthought, and records managers focussing mainly on the preservation of resources and failing to understand some of the technical challenges presented. And although this may be a superficial description of the complexitities of they ways in which institutions go about the management of the digital resources, perhaps like many cliches, there could be an element of truth in such views.
The JISC-PoWR project would like to publish a number of case studies highlighting best practice regarding Web resource preservation.
Has your institution has recently deployed a Web resource preservation strategy or embarked on Web resource preservation work? Would you be willing to share your experiences and discuss solutions to problem areas by submitting a brief case study? If so then please contact Marieke Guy.