JISC-PoWR

Preservation of Web Resources: a JISC-sponsored project

Archive for the 'Web 1.0' Category

Official Launch of the UK Web Archive

Posted by Marieke Guy on 26th February 2010

The British Library has officially launched the UK Web Archive, offering access in perpetuity to thousands of UK websites for generations of researchers.

The site was unveiled earlier this week by the Minister for Culture and Tourism, the Rt Hon Margaret Hodge MBE MP, and Chief Executive of the British Library, Dame Lynne Brindley, this project demonstrates the importance and value of the nation’s digital memory.

Websites included in the UK Web Archive include:

  • The Credit Crunch - initiated in July 2008, this collection contains records of high-street victims of the recession - including Woolworths and Zavvi.
  • Antony Gormley’s ‘One & Other’ Trafalgar Square Fourth Plinth Project - involving 2,400 participants and streamed live by Sky Arts over the web to an audience of millions, this site will no longer exist online from March 2010.
  • 2010 General Election - work has started to preserve the websites of MPs such as Derek Wyatt, who will be retiring at the next election, creating a permanent record of his time as a Member of Parliament.

This important research resource has been developed in partnership with the National Library of Wales, JISC and the Wellcome Library, as well as technology partners such as IBM.

British Library Chief Executive, Dame Lynne Brindley said:

Since 2004 the British Library has led the UK Web Archive in its mission to archive a record of the major cultural and social issues being discussed online. Throughout the project the Library has worked directly with copyright holders to capture and preserve over 6,000 carefully selected websites, helping to avoid the creation of a ‘digital black hole’ in the nation’s memory.

“Limited by the existing legal position, at the current rate it will be feasible to collect just 1% of all free UK websites by 2011. We hope the current DCMS consultation will enact the 2003 Legal Deposit Libraries Act and extend theprovision of legal deposit through regulationto cover freely available UK websites, providingregular snapshots ofthe free UK web domainforthebenefit of future research.

Further details are available from the British Library.

Posted in Web 1.0, Digital preservation, Preservation | 1 Comment »

The Demise of Geocities - But a Renewed Interest in Web Site Archeology

Posted by Brian Kelly on 26th October 2009

An article published today on the Guardian Technology Web site entitled “Geocities: dead but not lost” describes how Geocities, which was founded in 1994 and was at one stage the third most-browsed site on the web, is now dead.

Geocities pageWe discussed Yahoo’s announcement that the Geocities service was to be shut down some time ago in a post entitled ““Seething With Anger” at the Demise of Geocities“. What I find interesting in the article is the information that “… there’s the real effort, by the Archive Team, who have been trying to archive as many Geocities pages and sites as they could“.

I’d not come across the Archive Team wiki before. They describe themselves as a “project composed of volunteers, currently coordinated by Jason Scott” which invites.

  • Writers, who can create clear essays and instructions for archivists and concerned parties.
  • People with Lots of Hosted Disk Space who have a proper hosted webserver and fat pipe, who are willing (when asked) to consider hosting mirrored dead sites or archives.
  • People who love setting up torrents who can do the same as the mirror folks, but do so hosting torrents.
  • OCD-rich individuals who want to download things who will respond to our alerts and call outs and download entire sites or diagnose ways to get at obfuscated data.

The wiki home page informs us that “This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.”

Hmm. I wonder how effective a volunteer organisation is likely to me? My initial thoughts were fairly sceptical, but other volunteer-led initiatives, such as Wikipedia, do seem to be successful. What are your thoughts?

Posted in Web 1.0 | 2 Comments »

What’s the average lifespan of a Web page?

Posted by Marieke Guy on 12th August 2009

…or is it easier to ask how long is a piece of string?

The statistic much banded about (for Web pages not pieces of string!) is 44 days, believed to originate in an article by Brewster Kahle (of Internet Archive fame) published in 1997 and titled Preserving the Internet. Brewster’s original quote is specifically about URLs, “…estimates put the average lifetime for a URL at 44 days.

Whether this figure still stands today is a matter currently being discussed on the CURATORS@LIST.NETPRESERVE.ORG list after a query from Abigail Grotke of the Library of Congress.

Abbie offered up the 44 day statistic and pointed out that on the Digital Preservation Web site they have a graphic that discusses Web volatility stating “44% of the sites available on the internet in 1998 had vanished one year later“.

The other figure often cited is 75 days from a Michael Day’s report Collecting and preserving the world wide web.

The dynamic nature of the Web means that pages and whole sites are continually evolving, meaning that pages are frequently changed or deleted. Alexa Internet once estimated that Web pages disappear after an average time of 75 days. (Lawrence, et al.,2001, p. 30).

Another figure sometimes suggested is 100 days, this seems to come from Rick Weiss article for the The Washington Post, Washington, DC, 24 November 2003, On the Web, Research Work Proves Ephemeral -  no longer available.

So what is the average lifespan of a Web page today? Is it getting shorter or longer? The Internet Archive now gives 44 -75 days as its ball park figure. I’d have to hazard a guess that with the rise in use of Web 2.0 technologies the Web is actually getting more transient by the day.

Is this OK?

Maybe if it’s just a tweet you sent your friend, however if it’s something more substantial that’s disapearing then it’s a real worry.

Posted in Web 1.0, Digital preservation, Web 2.0 | 4 Comments »

New Study - Web Archives: Now and in the Future

Posted by Brian Kelly on 28th May 2009

A news item on The National Archives Web site has recently announced a new study on “Web Archives: Now and in the Future“. This study, which is funded by the JISC and will take place in collaboration with the UK Web Archiving Consortium, will look into how archived Web sites are collected and made available to users.

The study aims to:

  • Investigate how UK Web archives are delivered to users now, and how they might be delivered in the future
  • Define the long-term historical and research value of online content in the UK
  • Look at different organisations that collect Web archives, and their interests

The study will run until late July 2009, and the results will be published on The National Archives and UK Web Archiving Consortium Web sites in August 2009.

We’ll published details on the availability of the study once it is published.

Posted in Web 1.0 | No Comments »

“Seething With Anger” at the Demise of Geocities

Posted by Brian Kelly on 5th May 2009

A blog post entitled “The Death and Life of Geocities” has been published recently on the Adactio blog by Jeremy Keith, a Web developer living and working in Brighton, England. In the post Jeremy describes how he is “seething with anger” but then goes on to add that “I hope I can tap into that anger to do something productive“. The reason for the anger is his concern that “Yahoo are planning to destroy their Geocities property. All those URLs, all that content, all those memories will be lost …like tears in the rain“.

Although in an update to his post Jeremy does admit that “no data has been destroyed yet; no links have rotted” and that his “toys-from-pram-throwage may yet prove to be completely unfounded” Jeremy is right to raise concerns regarding the recent announcement that “Yahoo [is] to shut down GeoCities“.

Some people, as illustrated by JR Raphael’s article in PC World entitled “So Long, GeoCities: We Forgot You Still Existed” are not losing any sleep over GeoCities demise whilst others, such as the Online Lunchpail blog feel that “the demise of GeoCities … proves my point that the U.S. government never should have approved the takeover of GeoCities by Yahoo!“.

From my perspective I feel that the concerns raised by Jeremy Keith (who, it should be pointed out, is a professional Web developers) will become more widely appreciated as ordinary Web users, who might have used the first generation of public-facing Web-hosting services such as GeoCities for their initial simple Web development activities, realise that their may be sentimental attachments to one’s early work - just as I regret having lost my scrap book from primary school (I remember writing “When I grow up I want to be a Beatle, sing ‘She loves you, yer, yer, yer’ and earn £100 a week“). And what of the social historians - have we lost our cultural memories of the initial take-up of the Web outside of the universities and business sector?

In a blog post by Jason Scott on the ASCII  “weblog of computer history, punditry and trivia” Jason describes the efforts being made to preserve content published on GeoCities. But Jason admits that

I can’t do this alone. I’m going to be pulling data from these twitching, blood-in-mouth websites for weeks, in the background. I could use help, even if we end up being redundant. More is better. We’re in #archiveteam on EFnet. Stop by. Bring bandwidth and disks. Help me save Geocities. Not because we love it. We hate it. But if you only save the things you love, your archive is a very poor reflection indeed.”

What is to be done? Should the digital preservation for the general public’s digital heritage (as opposed to an institutional digital heritage) be left to volunteers? Or will future generations regard us as having failed in our responsibilities as previous generations failed to preserve the built environment and left us with the soulless shopping centres and high-rise building which were developed during the 1960s?

Posted in Web 1.0 | 4 Comments »

“Your List Will Be Closed In One Week’s Time”

Posted by Brian Kelly on 7th April 2009

The dangers of reliance of externally-hosted Web 2.0 services has been mentioned previously. And there have been recent incidents in which companies have given a short period of notice of impending closure of services, with users having little time to migrate their data to alternative providers. A recent article in The Guardian (Thursday 2 April 2009)  entitled “Can I assume that my online data is safe for ever?” addressed such concerns in an article on the closure of the Filefront.com service, who gave their users just 5 days to migrate their data.

Coincidentally I recently received the following email from a service I subscribe to:

Our previous request to you to provide a new owner for the  list has not produced a response.  Therefore, we assume the list is no longer useful and aim to close it in one week’s time.
We would be happy to provide a zipped copy of the archives and any files on deletion of the list, should they be required.

In this case it appears that the service has been little used for over a year. And yet what if useful information is still available on the service? Is a week’s notice enough for users of the service to consider the implications of this decision, identify appropriate solutions and then implement them? And let’s not forget that this email was sent outside of term time when researchers could be away.

The email did not make it clear if data was to be deleted, the service was to continue to be made available in a read-only mode or the interface to the data hidden - all possible solutions if it is felt necessary for a little-used service to be withdrawn.

There’s still a need to establish the best practices when Web-based interfaces to services are to be removed, I feel. And such issues do not just affect the third party services outside of our community.

Posted in Web 1.0 | No Comments »

Who Should Preserve The Web?

Posted by Brian Kelly on 16th March 2009

Members of the JISC PoWR Team will be participating at next week’s JISC conference, which takes place in Edinburgh on 24th March 2009.

In the session, entitled “Who should preserve the web?” a panel will

“Outline the key issues with archiving and preserving the web and will describe practical ways of approaching these issues. Looking at the international picture and the role of major consortia working in this area, the session will also offer practical advice from the JISC Preservation of Web Resources (PoWR) project on the institutional benefits of preserving web resources, what tools and processes are needed, and how a records management approach may be appropriate.”

If you are attending the conference we hope you will attend the session and participate in the discussions. If you are attending one of the other parallel sessions you can meet the UKOLN members of the  JISC PoWR team at the UKOLN staff. And if you haven’t bookeda place at the conference (which is now fully subscribed) feel free to participate in the discussions on the online forum.

Posted in Web 1.0, Preservation, Web 2.0, Events | 1 Comment »

TASI Is No More! Welcome To JISC Digital Media

Posted by Brian Kelly on 12th February 2009

The JISC-funded TASI (Technical Advisory Service for Images) is no more. This service, which is based at ILRT, University of Bristol has been reborn as JISC Digital Media, with an expanded remit for supporting digital media in general and not just images, which was the focus of the TASI service. Further information is available on the JISC Web site.

This change has been accompanied by a new domain name - http://www.jiscdigitalmedia.ac.uk/ rather than http://www.tasi.ac.uk/.

Now the TASI service provided many useful resources on best practices for digitisation.  But what has happened to links to these resources? Will we get a 404 error message? Or, even worse, will we get a message saying the domain no longer exists?

The QA Focus briefing document on “Improving The Quality Of Digitised Images” contains a reference to a Digital Imaging Basics resource which was available at the URL <http://www.tasi.ac.uk/advice/using/basics.html>. Following the link takes you to the resource, which is now available at <http://www.jiscdigitalmedia.ac.uk/advice/using/basics.html>.

There seems to have been a simple mapping of resources from the TASI domain to the new JISC Digital Media domain. And as the original resource has ‘cool URIs’ (i.e. they had no dependencies on a specific technology (such as a CMS, Java server pages, etc.) it was technically not a difficult task to migrate the links to the new domain.

Well done TASI / JISC Digital Media. The challenge now is to see how long such redirects will continue to function.

Posted in Web 1.0 | No Comments »

JISC Advisory Services to be Closed - But Don’t Panic!

Posted by Brian Kelly on 15th January 2009

A message sent to the JISC infoNet  JISCMail (and other) lists back in November described significant changes to the structure of the JISC Advisory Services:

 JISC and the Advisory Services have been looking at ways to be more agile and flexible to respond to the changing needs and demands of thefurther and higher education communities. The outcome of this review is to create a new company called JISC Services.

JISC infoNet, JISC Legal, JISC TechDis, Netskills, Procureweb and TASI are coming together to create JISC Services which will formally come into existence on 1 August 2009.

The aim of the new company is to create a more flexible and comprehensive source of advice, with increased opportunities for addressing new and changing needs across the community. This change is designed to ensure that our services continue to offer the internationally acclaimed advice for which they are renowned. Putting the further and higher education communities at the centre of what we do will be strengthened by working together as one company to deliver expertise and advice.

You will still be able to access all of the services you currently value via the usual channels and over the next few months the services will increasingly join together at events, on projects and in producing resources.

Find out more about the JISC Services at: http://www.jisc-services.ac.uk

I recently wrote about the closure of organisations and best practices for preserving the resources hosted on the organisational Web sites. This case is rather different - rather than closing down organisations JISC is building on the strengths of the advisory services and seeking to provide benefits to the user community by providing a more seamless interface (and remember, if the advisory services were regarding as failing to deliver a valuable service we might have expected the organisational changes to have provied an opportunity to close any lame ducks).

The challenge, from the perspective of Web site preservation, is to try to ensure that valuable resources are not lost in the merger process.  I feel that this change could provide valuable lessons for the wider community - the JISC Advisory Services, after all, won;t be the last organisations to be reorganised! And let’s hope that the lessons are based on a successful migration of the Web resources, and not lessons on what can go wrong!

Posted in Web 1.0, Preservation | 3 Comments »

Heritage Records and the Changing Filter through which we View our World

Posted by Marieke Guy on 11th August 2008

At both of the JISC-PoWR workshops delegates have been keen for the project team to spell out the reasons why institutions might want to preserve Web resources. These ‘drivers’ then give fuel to their case for the funds needed to archive the institutional Web site.

The idea of ‘heritage records’ is one that is often mentioned. Using Web sites as a ‘cultural snap shot’ has the potential to be a highly useful activity.

In his interesting and functional text Managing the Crowd: Rethinking Records Management for the Web 2.0 World Steve Bailey puts forward the point that deciding what will be important in the future is a tricky business. As he explains in the section on appraisal, retention and destruction: “The passage of time inevitably changes the filter through which we view our world and assess its priorities.”

Steve gives the example of the current plethora of Web sites that offer what we might call ‘quack’ remedies for medical problems. These sites may not seem to be of great interest right now but they may be invaluable to future historians who wish to demonstrate the distrust of the medical profession exhibited in 21st century western culture.

James Curral in his recent plenary talk at the recent Institutional Web Management Workshop used the example of blog posts made by soldiers out in Iraq and Afghanistan to demonstrate the irony of modern technology; these highly informative records could easily be lost while the diaries of World War II soldiers remain accessible.

Preservation mistakes have been made aplenty in the past. The destruction of much of the BBC’s flagship programmes in the 1970s has been well documented and in 2001 the BBC launched a a treasure hunt campaign to locate recordings of pre-1980 television or radio programmes. Ironically the Web site is no longer being updated, though it is still hosted on the BBc server.

So who can know what the future will bring? Which Web resources will we wish we had kept? Which student blog writer will go on to be a future prime minister or an infamous criminal? What bit of the terrabytes is the most important?

As Steve Bailey points out there is no crystal ball. It has always has been, and always will be, very difficult to predict what resources may prove to be valuable to future generations.

Although this offers little recompense for those making these choices, it does at least argue the case that we do need to preserve and we need to do so soon.

Posted in Web 1.0, Challenges, Records management, Preservation | 2 Comments »