JISC-PoWR

Preservation of Web Resources: a JISC-sponsored project

Archive for August, 2009

iPres 2009 Programme

Posted by Marieke Guy on 24th August 2009

The programme for the sixth International Conference on Preservation of Digital Objects (iPres 2009)  has recently been released and registration is now open.

This year’s event will be hosted by California Digital Library (CDL) at Mission Bay Conference Center in San Francisco on October 5th and 6th, 2009.

UK presentations include Maureen Pennock on ArchivePress, David Giaretta on significant properties in OAIS and Adam Farquar on (Planets) metadata.

Posted in Events | 1 Comment »

“Why you never should leave it to the University”

Posted by Brian Kelly on 19th August 2009

A blog post from Richard Gatarski begins with the blunt announcement:

A year ago my academic web site disappeared. And those who made it go away probably ignored that such a thing could happen.

The article goes on to describe how last year Richard “found out that the School of Business had redesigned their web site. And in the process they just ignored my research. About ten years worth of virtually daily updates were gone That included most of the manuscripts for my published work. The same thing happened to lecture notes, powerpoint slides, course documentations, useful links, etc. It had all disappeared from the Web!“.

Richard did have some good news to report: “Courtesy of the Internet Archive you can still find most of my academic stuff on the Web through their Wayback machine.” although Richard did wonder why he had to rely on the Internet Archive (”a 501(c)(3) non-profit that was founded to build an Internet library”) - after all, wouldn’t you expect your institutional library to provide this service?

Richard’s losses of his digital resources have continued - a blog he set up at Stockholm University was deleted after he left the institution - although, again a copy is archived on the Internet Archive.

Richard’s experiences have left him disillusioned with the attitudes towards the digital preservation of scholarly resources. He concludes by recommending that academics take responsibility themselves for preserving their resources:

Meanwhile, for those of you who publish stuff on the Web while working with an organisation, including universities. Try to put your content where you control it. Most likely you will move between work places, temporary assignments, and soforth. If you want your stuff to be preserved, it is your responsability to make sure it is.

But how easy will this be for the typical academic? Richard doubts whether “the issues I bring forward today are heavily discussed among university chancellors, political leaders, educational policy makers, and scientific philosophers.“  But surely we need to ensure that this debate takes place. And, in today’s economic climate, that debate needs to include discussions of the costs of digital preservation (disk storage may be cheap but management of content is not).

Richard’s tale is based on his experiences as an academic in Sweden. Is the situation different in the UK, I wonder?  Judging by Stuart Smith’s lament that “Mummy I lost my MP3!“, which I summarised in a post on “Disappearing Resources On Institutional Web Sites” in December 2008 it would seem that we have similar experiences in the UK higher education sector. Does anyone have any positive experiences to share?

Posted in Digital preservation | 2 Comments »

What’s the average lifespan of a Web page?

Posted by Marieke Guy on 12th August 2009

…or is it easier to ask how long is a piece of string?

The statistic much banded about (for Web pages not pieces of string!) is 44 days, believed to originate in an article by Brewster Kahle (of Internet Archive fame) published in 1997 and titled Preserving the Internet. Brewster’s original quote is specifically about URLs, “…estimates put the average lifetime for a URL at 44 days.

Whether this figure still stands today is a matter currently being discussed on the CURATORS@LIST.NETPRESERVE.ORG list after a query from Abigail Grotke of the Library of Congress.

Abbie offered up the 44 day statistic and pointed out that on the Digital Preservation Web site they have a graphic that discusses Web volatility stating “44% of the sites available on the internet in 1998 had vanished one year later“.

The other figure often cited is 75 days from a Michael Day’s report Collecting and preserving the world wide web.

The dynamic nature of the Web means that pages and whole sites are continually evolving, meaning that pages are frequently changed or deleted. Alexa Internet once estimated that Web pages disappear after an average time of 75 days. (Lawrence, et al.,2001, p. 30).

Another figure sometimes suggested is 100 days, this seems to come from Rick Weiss article for the The Washington Post, Washington, DC, 24 November 2003, On the Web, Research Work Proves Ephemeral -  no longer available.

So what is the average lifespan of a Web page today? Is it getting shorter or longer? The Internet Archive now gives 44 -75 days as its ball park figure. I’d have to hazard a guess that with the rise in use of Web 2.0 technologies the Web is actually getting more transient by the day.

Is this OK?

Maybe if it’s just a tweet you sent your friend, however if it’s something more substantial that’s disapearing then it’s a real worry.

Posted in Web 1.0, Digital preservation, Web 2.0 | 4 Comments »

An Archive Of IWMW 2009 Tweets

Posted by Brian Kelly on 7th August 2009

In a recent blog post entitled Tools For Preserving Twitter Posts I described some of the Twitter preservation tools we were planning to use to keep a record of the tweets related to UKOLN’s recent IWMW 2009 event.

Twitter proved very popular during this annual event for institutional Web managers, with over 1,500 Twitter posts (tweets) being published during the last week of July. Further statistical information is provided in a post on Evidence on Use of Twitter for Live Blogging.

We suggested that a two character code (P1 to P8)  could be used to identify each plenary session and that using this as a hashtag in conjunction with the event’s hashtag (#iwmw2009) would enable the tweets about a particular talk to be easily identified and, in theory, this data migrated to a managed environment.

As an example you can search for the tweets related to:

We have recently used The Archivist desktop application to create a local copy of the tweets for the plenary  talks at the conference, and these have been made available on the IWMW 2009 Web site from the individual pages for the plenary talks (e.g. see the page for Derek Law’s opening plenary talk). The pages also contain a summary of the number of Twitter posts which were found using the tool.

One reason for wishing to do this is to provide an answer to the speaker who may ask “I Wonder What They Thought About My Session?“.

Posted in Web 2.0 | 2 Comments »

Preservation and Google Wave

Posted by Brian Kelly on 3rd August 2009

A number of scientists have written enthusiastic blog posts about the potential of Google Wave including Peter Murray-Rust, Cameron Neylon and several others. A post entitled Google Wave: possibilities for librarians on the Rambling Librarian blog provides a useful summary of Google Wave and how it aims to provide a response to the question “What might email be like if it was invented today?

The Rambling Librarian post also picks up on the important “implication … that digital preservation will be even more critical. Imagine all the collaborative efforts gone when the server crashes. Or power fails.

Absolutely! And let’s ensure that the digital preservation aspects are being considered right at the start of any development activities rather than being ignored by those focussing on the new possibilities which this technology can provide.

Hmm, I wonder if there are any funding possibilities available for exploring the preservation aspects of Google Wave?

Posted in Web 2.0 | No Comments »