Posted by Marieke Guy on May 19th, 2008
If it was we’d all be at it!!
Any records manager or archivist will probably be able to give you half a dozen reasons for why digital preservation is very important. Some might well give you half a dozen more for why the preservation of Web resources in particular, which now play such a huge part in our daily lives, is very very important.
Unfortunately this critical activity isn’t easy. In fact the very nature of the Web means that the preservation and archiving of Web resources is actually a very complex task. A few of the major issues include:
- The transient and dynamic nature of the Web – The Web is growing at a rapid rate. The average Web resource’s lifespan is short and pages are often removed. On the Web publishing is an easy process and content may be changed often and not necessarily in an orderly way. Metadata is very much an afterthought. Web 2.0 content (comprising of data mash ups, blog entries, comments etc.) is even more dynamic.
- Selection issues – Of the billions of resources out there which and which instantiation of them should we preserve?
- The technologies involved – The Web is dependant on technology, it uses various file formats and follows many protocols, most of which evolve quickly. The look and feel of a Web page may be determined by a number of different elements such as the code, the http protocol, the user, the browser and the server. Which of these need to be preserved? Web resources are usually held on just one server, so are at greater risk of removal, yet for some resources countless copies are made. Again which do we preserve? Web sites are held together by hypertext links meaning parts of the site could be omitted (if for example they use a robots.txt file or pages are not actually linked to) if crawled by archiving software. Whole areas of the Web are held in problematical CMS or behind authentication systems and Web 2.0 applications use layered APIs, which use data in many different ways.
- Organisational issues – How is your institution using its Web site? Is it a publication or is it a record? Is the content being managed? Who is responsible and who has ownership?
- The legal issues – There are many IPR and data protection issues with Web content. Who owns the photos on Flickr, the comments on a blog or the details on a social networking site?
There is no easy answer! However despite the difficulties of Web preservation some institutions may be addressing some of these issues already. We are keen to hear examples of any approaches being taken.