Archiving a wiki

On dablog recently I have put up a post with a few observations about archiving a MediaWiki site. The example is the UKOLN Repositories Research Team wiki DigiRep, selected for the JISC to add to their UKWAC collection (or to put it more accurately, pro-actively offered for archiving by DigiRep’s manager). The post illustrates a few points which we have touched on in the PoWR Handbook, which I’d like to illuminate and amplify here.

Firstly, we don’t want to gather absolutely everything that’s presented as a web page in the wiki, since the wiki contains not only the user-input content but also a large number of automatically generated pages (versioning, indexing, admin and login forms, etc). This stems from the underlying assumption about doing digital preservation, mainly that it costs money to capture and store digital content, and it goes on costing money to keep on storing it. (Managing this could be seen as good housekeeping. The British Library Life and Life2 projects have devised ingenious and elaborate formulae for costing digital preservation, taking all the factors into account to enable you to figure out if you can really afford to do it.) In my case, there are two pressing concerns: (a) I don’t want to waste time and resource in the shared gather queue while Web Curator Tool gathers hundreds of pages from DigiRep, and (b) I don’t want to commit the JISC to paying for expensive server space, storing a bloated gather which they don’t really want.

Secondly, the above assumptions have led to me making a form of selection decision, i.e. to exclude from capture those parts of the wiki I don’t want to preserve. The parts I don’t want are the edit history and the discussion pages. The reason I don’t want them is because UKWAC users, the target audience for the archived copy – or the designated user community, as OAIS calls it – probably don’t want to see them either. All they will want is to look at the finished content, the abiding record of what it was that DigiRep actually did.

This selection aspect led to Maureen Pennock’s reply, which is a very valid point – there are some instances where people would want to look at the edit history. Who wrote what, when…and why did it change? If that change-history is retrievable from the wiki, should we not archive it? My thinking is that yes, it is valuable, but only to a certain audience. I would think the change history is massively important to the current owner-operators of DigiRep, and that as its administrators they would certainly want to access that data. But then I put on my Institutional records management hat, and start to ask them how long they really want to have access to that change history, and whether they really need to commit the Institution to its long-term (or even permanent) preservation. Indeed, could their access requirement be satisfied merely by allowing the wiki (presuming it is reasonably secure, backed-up etc.) to go on operating the way it is, as a self-documenting collaborative editing tool?

All of the above raises some interesting questions which you may want to consider if undertaking to archive a wiki in your own Institution. Who needs it, how long for, do we need to keep every bit of it, and if not then which bits can we exclude? Note that they are principally questions of policy and decision-making, and don’t involve a technology-driven solution; the technology comes in later, when you want to implement the decisions.

Set a blog to catch a blog…

Much discussion of blog preservation focuses on how to preserve the blogness of blogs: how can we make a web archive store, manage and deliver preserved blogs in a way that is faithful to the original?

Nesting...

Since it is blogging applications that provide this stucture and behaviour (usually from simple database tables of Posts, Comments, Users, etc), perhaps we should consider making blogging software behave more like an archive. How difficult would that be? Do we need to hire a developer?

One interesting thing about WordPress is the number of uses its simple blog model has been put to. Under-the-hood it is based on a remarkably simple data base schema of about 10 tables and a suite of PHP scripts, functions and libraries that provide the interface to that data. Its huge user-base has contributed a wide variety of themes and additional functions. It can be turned into a Twitter-like microblog (P2 and Prologue) or a fully-fledged social network (WordPress MU, Buddypress).

Another possibility exploited by a 3rd-party plugin is that of using WordPress as an aggregating blog, collecting posts automatically via RSS from other blogs: this seems like a promising basis for starting to develop an archive of blogs, in a blog.

The plugin in question is called FeedWordPress. It uses the Links feature of WordPress as the basis of a list of feeds which it checks regularly, importing new content when it finds it, as Posts within WordPress.

I installed FeedWordPress a while ago on ULCC’s DA Blog, and set it up to import all of the ULCC-contributed posts to JISC-PoWR, i.e. those by Ed Pinsent, Kevin Ashley and myself. I did this because I felt that these contributions warrant being part of ULCC’s insitutional record of its activities, and that DA Blog was the best to place to address this, as things stand.

JISC-PoWR also runs on WordPress, therefore I knew that, thanks to WordPress’s REST-like interface and Cool URIs, it is easy not only to select an individual author’s posts (/author/kevinashley) but also the RSS feed thereof (/author/kevinashley/feed). This, for each of the three author accounts, was all I needed to start setting up FeedWordPress in DA Blog to take an automatic copy each time any of us contributed to JISC-PoWR.The “author” on the original post has been mapped to an author in DA Blog, so posts are automatically (and correctly) attributed. The import also preserves, in custom fields, a considerable amount of contextual information about the posts in their original location.

In many cases, I’ve kept the imported post private in DA Blog. “Introductory” posts for the JISC-PoWR project blog, for example: as editor of DA Blog, I didn’t feel we needed to trouble our readers there with them; nevertheless they are stored in the blog database, as part of “the record” of our activities.

This is, admittedly, a very small-scale test of this approach, but the kind of system I’ve described is unquestionably a rudimentary blog archive, that can be set up relatively easily using WordPress and FeedWordPress – no coding necessary. Content is then searchable, sortable, exportable (SQL, RSS, etc). (Note, by the way, what happens when you use the Search box on the JISC-PoWR blog copy in UKWAC: this won’t happen with this approach!)

For organisations with many staff blogging on diverse public platforms this would be one approach to ensuring that these activities are recorded and preserved. UKOLN, for example, manages its own blog farm, while Brian and Marieke have blogs at WordPress.com (as well as contributing to this one), and Paul Walk appears to manage his own blog and web space. This kind of arrangement is not uncommon, nor the problem of how an institution get a grasp on material in all these different locations (it’s been at the heart of many JISC-PoWR workshop discussions). A single, central, self-hosted, aggregating blog, automatically harvesting the news feeds of all these blogs, might be a low-cost, quick-start approach to securing data in The Cloud, and safeguarding the corporate memory.

There are more issues to address. What of comments or embedded images? Can it handle Twitter tweets as well as blog posts? Does it scale? What of look-and-feel, individual themes, etc? Now we start needing some more robust tests and decisions, maybe even a developer or two to build a dedicated suite of “ArchivePress” plugins. But thanks to the power and Open-ness of  WordPress, and the endless creativity of its many users, we have a promising and viable short-term solution, and a compelling place to start further exploration.

Who Should Preserve The Web?

Members of the JISC PoWR Team will be participating at next week’s JISC conference, which takes place in Edinburgh on 24th March 2009.

In the session, entitled “Who should preserve the web?” a panel will

“Outline the key issues with archiving and preserving the web and will describe practical ways of approaching these issues. Looking at the international picture and the role of major consortia working in this area, the session will also offer practical advice from the JISC Preservation of Web Resources (PoWR) project on the institutional benefits of preserving web resources, what tools and processes are needed, and how a records management approach may be appropriate.”

If you are attending the conference we hope you will attend the session and participate in the discussions. If you are attending one of the other parallel sessions you can meet the UKOLN members of the  JISC PoWR team at the UKOLN staff. And if you haven’t bookeda place at the conference (which is now fully subscribed) feel free to participate in the discussions on the online forum.

Meet Members of the JISC PoWR Team at the JISC 2009 Conference

JISC PoWR poster for JISC 2009 ConferenceMembers of the JISC PoWR team from UKOLN and ULCC will be attending the JISC 2009 conference in Edinburgh on 24th March 2009. UKOLN will have a stand in the accompanying exhibition and we intend to produce a poster about the work of the JISC PoWR project which will be on display.

In order to help you spot the poster at what is likely to be a very busy event we’ve included an image of the poster in this post (which is also available on Slideshare, if you’ve like to see more details of the content of poster).

If you have an interest in the preservation of Web resources, feel free to come along to the UKOLN stand and chat to myself or Marieke Guy, UKOLN’s team members for the JISC PoWR project.

LIWA – Living Web Archives

The PoWR project identified a number of technical challenges which made certain types of content – particularly that with a Web 2.0 flavour – particularly difficult to manage and preserve in an effective way. My attention has recently been drawn to an EU-funded project which hopes to overcome a number of these technical problems, as well as others that are applicable to large-scale archiving such as the problem of spam content.

LIWA – Living Web Archives – began in early 2008, but as with many EU projects, its startup phase involved a lot of internal activity without much of a public face. As a result we didn’t pick up on its work in the JISC-PoWR handbook, but I’m sure we’ll rectify this omission in any future revisions.

To pick one example of LIWA’s areas of interest, it intends to develop tools which make it easier to take a temporal view of web archives and to maintain temporal consistency. Temporal consistency – or rather its absence – will be familiar to anyone who has spent time exploring sites in the Internet Archive, where different pages, or even portions of the same page (such as images) will have been archived on different days. This can lead to occasional surprises when navigating through archived content, with links taking one to pages that don’t have the expected content.

LIWA’s partner’s include Hanzo, a UK-based web archive services company that we covered briefly in the handbook; I hope we can explore their potential value to UK HE in the future.

Considerations for the Preservation of Blogs

DigitalPreservationEurope (DPE) fosters collaboration and synergies between many existing national digital preservation initiatives across the European Research Area. As part of their work they publish concise overviews of key digital preservation and curation issues. Earlier this month they published a briefing paper on Considerations for the Preservation of Blogs (PDF).

The preamble sets the context for the paper:

Blogs, it seems, are everywhere these days, but what about the next day (and the next and the next …). Opinions vary on whether or not blogs merit preservation beyond the actions of a blog’s respective authors. This briefing paper does not contribute to that dialogue. Rather, it provides an overview of issues to be considered by organizations planning blog preservation programs. Blogs are the product of a network of players, including blog authors, service providers, and readers. Discussed here are some key attributes of blogs, and the characteristics and behaviors of these players, which may impact preservation activities.

During the JISC PoWR project we recognised that despite blogs initially being commonly characterised as ephemeral (as commented on in the DPE paper) their increasing importance and role in both the research context and in our cultural history is becoming apparent, and like other Web resources their preservation is a matter that needs to be addressed, somehow.

The PoWR blog has a number of interesting posts on the preservation of blogs including:

There is a also a section on preservation of blogs in the JISC PoWR handbook.

Twitter Groups and Twitter Problems

We’ve written about Twitter on the JISC PoWR site before mainly when considering preservation of Web 2.0 material. Now Twitter could become a useful tool in helping you communicate about Web resource preservation.

The Archivists and Records Managers Twitter Group is up and running. You can register at http://twittgroups.com/group/archives.

Twitter
I’m sure there will be lots of interesting posts.

The preservation of Twitter posts (tweets) has again been discussed in the blogosphere. Maureen Pennock commented in her post entitled ‘Making retrospective sense of cross media communications: a new archival challenge‘ that the increasing number of communication mechanisms presents a big problem for archivists.

She points out that “Some of our conversations are cross-media; they may start on Twitter, but they move to Facebook and then the blog. Capturing only one of those accounts means that only part of our conversation is captured. Okay, so you’re probably not interested in capturing our interactions in your archives. But you probably are interested in capturing interactions from important people (back to Stephen Fry and Obama again) and you will thus face the same issues.

She then says “We all know the problems we’ve got in capturing and archiving emails. What of Twitter? How do you get Tweets out of the system and integrate them into a collection? What of Facebook data? And YouTube?

It seems the Twitter challenge is becoming more real as it becomes increasingly mainstream.


TASI Is No More! Welcome To JISC Digital Media

The JISC-funded TASI (Technical Advisory Service for Images) is no more. This service, which is based at ILRT, University of Bristol has been reborn as JISC Digital Media, with an expanded remit for supporting digital media in general and not just images, which was the focus of the TASI service. Further information is available on the JISC Web site.

This change has been accompanied by a new domain name – http://www.jiscdigitalmedia.ac.uk/ rather than http://www.tasi.ac.uk/.

Now the TASI service provided many useful resources on best practices for digitisation.  But what has happened to links to these resources? Will we get a 404 error message? Or, even worse, will we get a message saying the domain no longer exists?

The QA Focus briefing document on “Improving The Quality Of Digitised Images” contains a reference to a Digital Imaging Basics resource which was available at the URL <http://www.tasi.ac.uk/advice/using/basics.html>. Following the link takes you to the resource, which is now available at <http://www.jiscdigitalmedia.ac.uk/advice/using/basics.html>.

There seems to have been a simple mapping of resources from the TASI domain to the new JISC Digital Media domain. And as the original resource has ‘cool URIs’ (i.e. they had no dependencies on a specific technology (such as a CMS, Java server pages, etc.) it was technically not a difficult task to migrate the links to the new domain.

Well done TASI / JISC Digital Media. The challenge now is to see how long such redirects will continue to function.

Digital ‘Movage’

Kevin Kelly has coined the term ‘movage’ in a blog post published on 11 December 2008. Kevin argues that:

The only way to archive digital information is to keep it moving. I call this movage instead of storage. Proper movage means transferring the material to current platforms on a regular basis — that is, before the old platform completely dies, and it becomes hard to do.

The reasons for this are the continual changes in the formats and degradation of the storage media. I think this relates to the ideas discussed previously on this blog about an emphasis on ongoing access to Web resources rather than the preservation of such resources. In the case of Web resources the need tends to arise from changes in the technologies used to deliver the Web services rather than the formats themselves.

But whether a new term needs to be created is questionable – after all, Kevin Kelly is simply describing the well-established concept of migration of formats. As described in a glossary entry on the DCC Web site:

Migration: A means of overcoming technical obsolescence by transferring digital resources from one hardware/software generation to the next. The purpose of migration is to preserve the intellectual content of digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology.

Despite this reservation I still think it’s good to see a slightly different variant on the ideas which have been discussed on this blog reaching a new community.

Wiki Management

This contribution to a thread about management of wikis, posted by the Records management section at the University of Edinburgh, was submitted to the Archive listerv recently:

Below is an outline of the ‘wiki’ situation at the University of Edinburgh:

At Edinburgh University our main effort to date has been making sure that wikis are retention scheduled, and considering what the ideal retention period for a wiki should be. As part of setting up any new wiki space the University records details such as space owner and proposed use, but due to the wide variety of uses it is difficult to specify a generic retention period. There is the option for the space owner to delete a wiki space; however the most likely scenario is that a space atrophies over time, the owner stops engaging, and it is therefore then up to the University to be proactive in identifying and pruning out dead spaces.

At present the service policy talks about a default retention period of 1 year, which is primarily to make space owners aware that if not used their space may be deleted. If we have anything that requires long term migration we would look into outward migration; either to a new system or to an archive.

I found it very encouraging to see this pro-active and practical-minded approach to the management of wikis. In many ways Edinburgh’s RM approach vindicates a lot of the RM advice which we have recommended in the PoWR Handbook; as we say early on, we must manage resources in order to preserve them. It is also encouraging that in Edinburgh’s case at least the wiki problem is considered primarily in terms of information and staff management, and not exclusively in terms of the technological solutions that might be applied.

In particular:

1) Edinburgh: “Make sure wikis are retention scheduled”.

  • PoWR: “Deciding which aspects of your web resources to capture can be informed to a large extent by your Institutional drivers, and the agreed policies for retention and preservation.”  (p 22)

2) Edinburgh: “Consider the ideal retention period for a wiki”.

  • PoWR: “The attraction of bringing a website in line with an established retention and disposal programme is that it will work to defined business rules and retention schedules to enable the efficient destruction of materials, and also enable the protection and maintenance of records that need to be kept for business reasons.”  (p 93)

3) Edinburgh: “Make space owners aware that if not used their space may be deleted”.

  • PoWR: “Quite often in an academic context these applications rely on the individual to create and manage their own resources. A likely scenario is that the academic, staff member or student creates and manages his or her own external accounts in Flickr, Slideshare or WordPress.com; but they are not Institutional accounts. It is thus possible with Web 2.0 application for academics to conduct a significant amount of Institutional business outside of any known Institution network. The Institution either doesn’t know this activity is taking place, or ownership of the resources is not recognised officially. In such a scenario, it is likely the resources are at risk.”  (p 42)

4) Edinburgh: “The service policy talks about a default retention period.” This approach seems to incorporate rules as part of setting up any new wiki space, starting to manage the resource at the very beginning of the record’s lifecyle.

  • PoWR: “If  we can apply a lifecycle model to web resources, they will be created, managed, stored and disposed of in a more efficient and consistent way; it can assist with the process of identifying what should and should not be retained, and why; and that in turn will help with making preservation decisions.” (p 34)

5) Edinburgh: “If we have anything that requires long term migration we would look into outward migration; either to a new system or to an archive.”

  • PoWR: “Migration of resources is a form of preservation. Migration means moving resources from one operating system to another, or from one storage/management system to another. This may raise questions about emulation and performance. Can the resource be successfully extracted from its old system, and behave in an acceptable way in the new system?”  (p 33)
  • “The usual aim of archival appraisal has been to identify and select records for permanent preservation. Quite often appraisal has taken place at the very end of the lifecycle process (although records managers intervene where possible at the beginning of the process, enabling records of importance to be identified early).”  (p 36)

JISC Advisory Services to be Closed – But Don’t Panic!

A message sent to the JISC infoNet  JISCMail (and other) lists back in November described significant changes to the structure of the JISC Advisory Services:

 JISC and the Advisory Services have been looking at ways to be more agile and flexible to respond to the changing needs and demands of thefurther and higher education communities. The outcome of this review is to create a new company called JISC Services.

JISC infoNet, JISC Legal, JISC TechDis, Netskills, Procureweb and TASI are coming together to create JISC Services which will formally come into existence on 1 August 2009.

The aim of the new company is to create a more flexible and comprehensive source of advice, with increased opportunities for addressing new and changing needs across the community. This change is designed to ensure that our services continue to offer the internationally acclaimed advice for which they are renowned. Putting the further and higher education communities at the centre of what we do will be strengthened by working together as one company to deliver expertise and advice.

You will still be able to access all of the services you currently value via the usual channels and over the next few months the services will increasingly join together at events, on projects and in producing resources.

Find out more about the JISC Services at: http://www.jisc-services.ac.uk

I recently wrote about the closure of organisations and best practices for preserving the resources hosted on the organisational Web sites. This case is rather different – rather than closing down organisations JISC is building on the strengths of the advisory services and seeking to provide benefits to the user community by providing a more seamless interface (and remember, if the advisory services were regarding as failing to deliver a valuable service we might have expected the organisational changes to have provied an opportunity to close any lame ducks).

The challenge, from the perspective of Web site preservation, is to try to ensure that valuable resources are not lost in the merger process.  I feel that this change could provide valuable lessons for the wider community – the JISC Advisory Services, after all, won;t be the last organisations to be reorganised! And let’s hope that the lessons are based on a successful migration of the Web resources, and not lessons on what can go wrong!

The Fetish of the Digital

Happy New Year to all our readers.

We are lucky enough to start 2009 with a guest blog post from Dr James Currall, Director of Information Strategy, IT Services & HATII Senior Research Fellow, University of Glasgow.

James has been involved with the highly successful Glasgow MPhil (now MSc) course in Information Management and Preservation since it inception, in which he teaches about the transition from storage of information on physical to digital media, information security, the role of numbers as information and a variety of other topics including risk and information management as an investment. In this latter context he was the Project Director of the espida project which developed a sustainable business-focussed model for digital preservation. He gave a plenary talk on Web preservation at last year’s Institutional Web Management Workshop (IWMW 2008) entitled The Tangled Web is but a Fleeting Dream … but then again … which was very well received and is available to watch on Google Video.

And I’ll pass you over to James…


A few weeks back, I was involved in a discussion about the skills required by people involved in Digital Curation and much of that discussion was based around the DigCCurr Project which has a long list of skills, some of which are specific to Digital Curation, but many of which are rather of a more general nature. And this set me on a dangerous course – thinking ….What exactly is this ‘profession’ of digital curator that DigCCurr amongst others are trying to define?

Let us rewind to say the second half of the 16th century and let us suppose that you were charged by Mr Shakespeare’s publishers with curating ‘The Scottish Play’.  What would you have done?  What exactly is this ‘information object’?  Is is the fonts, the layout, the pagination, the language, the story, the stage directions or what?  In spite of the absence of the profession of ‘paper curator’ we have inherited a rich heritage.  Along the way, many items will have been lost – it was always thus and, in spite of the optimistic techno-determinism of some, it always will be EVEN IN THE DIGITAL AGE. I would argue that this is all good and necessary and whilst I would mourn the passing of Algol, Reverse Polish Notation, amplifiers based on thermionic valves or chunky discrete solid state components, vinyl records, reel to reel tape and other really splendid ideas that were IMHO much better than the ‘mass market equivalents’ that replaced them, we have to discard much of our baggage as we move on.

So what is this preservation activity all about?  Is it not about the preservation and curation of information not of digits?  During a session with my MSc students, We visited the Way Back Machine and had a look at the University of Glasgow Web site (you wondered when I would get on to the web didn’t you?).  The page that we selected at random was from 18th October 2000. As a web page it is rather uninteresting, when I looked at it today there was no style sheet, the graphics were all missing and it was generally rather uninspiring, but ….  what is interesting is the headline news story ‘Funeral of the First Minister, Donald Dewar’. For those of you firth of Scotland, Donald was a leading light in the establishment of devolution for Scotland and the first First Minister of the devolved administration in Scotland. He was a graduate of the University of Glasgow and his premature passing at the age of 63 was tragic.  The news story is about ‘administrative’ details of his funeral and the passage of his cortege past the University – details of importance in relation to the history of the University and perhaps of Scotland.  It is the information contained in the web pages that is of interest and importance, whilst the layout of the pages and such ‘technical’ details of passing interest as the ‘container’ for that information.

So with 2008 now ended let us bury the idea that the digital needs its own ghetto that we need to prepend everything with ‘digital’, be it: curation, preservation, art, culture, revolution, etc.  Digital artifacts are the currently ‘fashionable’ containers for information and whilst the term continues, the technologies underneath that are radically different at every turn and often require as much conversion one to another as a paper to magnetic disc conversion.  It is not the containers that are important but what they contain.  The Eastern concept of ‘Pointing at the Moon’ has something to say here.

If we come to regard preservation/curation as a finger pointing to the moon; we might come to mistake the finger for the moon and never see beyond it to the moon itself.

This short clip of Bruce Lee in ‘Enter the Dragon‘ (1974) captures something of this in a different context.

I am also reminded of the auditors in Terry Pratchett’s ‘Thief of Time’ who take a great painting and break it down into flakes of paint which they put in little piles of each colour and then spend time looking to see where the art has gone!  These auditors are described in the Wikipedia article for DiscWorld thus:

The Auditors, cosmic bureaucrats who prefer a universe where electrons spin, rocks float in space and imagination is dead, represent the perils of handing yourself over to a completely materialist and deterministic vision of reality, devoid of the myths and stories that make us human.

From http://en.wikipedia.org/wiki/Discworld#Elves_and_Auditors

In 2009 we need to see digital preservation and curation as ‘last year’s model’, of course we need to understand the importance of custody, metadata and identifiers, but above all we need to understand the centrality of the information in the artifacts that we are seeking to curate and preserve.  This piece is recognisably ‘Currall’ not because of a digital signature, not because it is on his web site and not because the owners of the JISC PoWR say it is – it is ‘Currall’ because of its recognisably iconoclast position, poor grammar and tortured logic – that is what needs to be preserved!

Information is the thing (even if that is hard and technology is relatively easy) – lose sight of that and the game is a bogey.

PS if you are interested in a rather more rigorous treatment of this topic you might like to access “Authenticity: a red herring? (doi:10.1016/j.jal.2008.09.004)

History of the First UK Institutional Web Service

It was 15 years ago, the first week back at work after the Christmas break (I think) when I was part of the team which set up the Web service at the University of Leeds. This was, I believe, the UK’s first institutional Web service, with contributions made shortly afterwards from several academic departments, including not only the usual suspects (the Computing Service, Computer Science, Chemistry and Physics) but also the School of Music.

Various people at the University of Leeds were active in Web development activities back then. My role was in promoting its use (and I’ve discovered a copy of a special issue of the University Computing Service newsletter on the theme on online information services – in particular the Web – which is available on the Internet Archive). But in addition the Chemistry Department were, in conjunction with Imperial College, developing services which provided access to molecules on the Web; a colleague in the Computing Service provided access to the University Libraruy catalogue and Nikos Drakos, a researcher in the Computer Based Learning Unit, wrote the Latex2HTML conversion software (which was first announced in May 1993).

Fifteen years later my memories of our early involvement with the Web are beginning to fade. But as I knew this would happen I write a history of the various activities of colleagues at the University, which was published on the University”s Web site. Sadly, but perhaps inevitably, over time this resource was deleted, no doubt following a reorganisation of the Web site.

But this does not necessarily mean that the information is no longer available. As well as being an early adopter of the Web, the Computing Service had also had long standing involvement in digital preservation. And so the file should still be available on the University’s archive service. But although the bits and bytes may still be available, what are the processes needed for this resource to be retrieved?  Is this a service which the University offers? And is it a service which can be provided to a former member of staff, who left the University over 13 years ago?

As JISC PoWR project team members have commented previously, digital preservation isn’t just about the technical aspects of preservating bits in a format suitable for processing in the future – it’s also about the policies and the procedures.  And I think it’s time I send an email to my former colleagues to see ifthis resource can be retrieved.  I’ll provide details of my experiences in a future post.

CASPAR Training Days

Too late to be of much use, I suspect, but just before Christmas I received an email containing details of two CASPAR (Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval ) Training Days. The CASPAR Training Day for the Cultural Domain will be held on 12 January 2009 and the CASPAR Training Day for the Scientific Domain on the following day (13 January 2009).

The seminars will take place in Rome, and are free to attend. If you require further information please email:
<info@casparpreserves.eu>

Looking Back … and Looking Foward

A news item entitled Preserving web resources – new advisory handbook published on the 9th December 2008 on the JISC Web site Neil Grindley, manager of JISC’s Digital Preservation programme, described how “the JISC PoWR handbook helps institutions to identify where material of interest might exist, which elements may require long-term access and how these decisions can link into wider institutional policies“.

Neil went on to add that “The PoWR handbook recognises that preservation is not an end in itself, but that it can complement an institution’s mission, whether that be improving the quality of research, conforming with national policy or avoiding the threat of legal action. It will evolve following the practical experience of its use to ensure it remains at the forefront of best practice advice for web preservation issues“.

The JISC PoWR project has been formally completed – but the interests of the project team (UKOLN and ULCC) in the area of preservation continues. We have agreed that we will continue to publish posts on this blog which are relevant to the area of the preservation of Web resources for a period of time- we will seek to publish at least 3 posts per month. Around Easter time we will review the status of this blog. As well as posts from members of the JISC PoWR project team we would also welcome guest blog posts from the community. So if you would like to write something about your interests in the area of Web site preservation please contact Marieke Guy (email M.Guy@ukoln.ac.uk).

But for now on behalf of the JISC PoWR team I’d like to wish everyone a happy and enjoyable Christmas.

Legal scholarship recognises long-term value of blogs

A recent post on the digital-preservation list indicates that at least one scholarly community has recognised the long-term scholarly value of online resources such as blogs, and the potential damage to future scholarship that might result from their loss. It draws attention to a symposium taking place at Georgetown University next year. The email says that the symposium:

…will build upon the fundamental assumption that blogs are an integral part of today’s legal scholarship.

and goes on to say:

This symposium will bring together academic bloggers, librarians, and experts in digital preservation …. Symposium participants will collectively develop innovative practices to ensure that valuable scholarship is not easily lost.

Join the conversation now by tagging items you think are relevant to this symposium with the del.icio.us tag FTLS2009.

It’s interesting to observe that this is an example of a community acting to preserve information of interest that is likely to be scattered over many institutions and none. (I suspect a fair amount of blogging in this area is done by practitioners who aren’t at an academic institution.) One of the concerns we identified in PoWR was that much material of this type was unlikely to be preserved as a result of institutional interests, unless one institution tried to bring materials like this into the remit of its special collections (and some have done this.)

The conference web site goes on to say:

This unique symposium will seek answers to the questions:

1. How can quality academic scholarship reliably be discovered?
2. How can future researchers be assured of perpetual access to the information currently available in blogs?
3. How can any researcher be confident that documents posted to blogs are genuine?

The symposium will include a working group break-out session to create a uniform standard for preservation of blogs, a document to be shared by bloggers and librarians alike.

That last goal of a uniform standard for blog preservation looks like a tall order and it will be interesting to see what emerges from this group, and what its wider relevance might be. But its a clear demonstration of the value of web material to some research communities, and their willingness to do something about it if their institutions can’t, or won’t, help them.

When Funding Bodies Shut Down

An email sent to the MLANORTHEAST-NEWS JISCMail list provides details of the implications of the closure of the MLA North East regional Agency on the Web services it has set up or commissioned.

The message states :

MLA North East Websites after 12th December, 2008

MLA North East over recent years has set up several websites which we have managed on behalf of the sector. This brief note is to inform you of the arrangements made for each of the sites.

www.mlanortheast.org.uk:  a holding page will refer visitors to MLA council site at www.mla.gov.uk All other content will be taken down at 4.00pm on Friday 12th December, 2008.

www.thenortheast.com:  currently a portal to our sector’s on-line stores selling local studies material and other ephemera. The content will be taken down at 4.00pm on Friday 12th December, 2008. The domain name is now owned  by One NorthEast.

www.archivesnortheast.com:  a portal to North East archives services, providing links to catalogues and paid-for  professional support in researching archives. This will continue under the auspices of the North East Regional Archives Council [NERAC] Contact Liz Rees liz.rees@twas.org.uk

www.wellinever.info: a portal to learning resources to teachers, pupils, parents and carers providing venue guides, information regarding learning visits and links to some of the sector’s regional on-line learning resources. This will continue under the auspices of Tyne & Wear Museums. Contact ian.thilthorpe@twmuseums.org.uk

www.primarysources.org.uk:  basic skills resources developed by primary teachers, working alongside learning professionals from six archives in the North East region and  designed for use schools.  These resources offer a fresh and engaging approach to teaching basic skills. This will continue under the auspices of Durham University. Contact andrew.preater@durham.ac.uk

www.discs-uk.info: DiSCS provides an online directory of information technology (IT) and digital services suppliers to work with the cultural and heritage sector. This site has transferred and is managed by The Collections Trust www.collectionstrust.org.uk

www.tomorrows-history.com: The regional local studies site for archives and record offices, libraries, museums, archaeology services, the region’s universities and commercial organisations.

Additionally, community groups have created one hundred local history projects. This will continue. The domain name is now owned by Newcastle City Council. The site is managed by Newcastle City Library Services. Contact Kath Cassidy kath.cassidy@newcastle.gov.uk

www.oralhistorynortheast.com: The site for oral history in the North East of England. Support for individuals and organisations undertaking oral history projects, to provide focus and support and a forum for the sharing of ideas and experience. This site is closed.

I think this demonstrates some good practices of what organisations which have set up or commissioned Web sites should do if they are forced to close, either due to changes in Government funding and policies (as is the case with the MLA Regional Agencies).

We can see that the Web site address and a brief summary of its purpose is provided, details of when the site ceases operation, contact details and, in a couple of cases, details of how the service is being continued by other organisations.

I know the implications of the demise of our organisation on the Web services we are providing isn’t something that we like to think about. But in a personal capacity once we reach a certain age and become aware of our resoponsibilities to others we due ten to make plans for what happens after we die, perhaps by making a will. So shouldn’t our organisations be making similar plans in case the oprganisation ceases to exist. And at a time of the credit crunch this is even more important than it used to be.

Library Partnership Preserves End-of-Term Government Web Sites

The news that a Library Partnership Preserves End-of-Term Government Web Sites was announced in August 2008 (and it’s about the end of the George W Bush’s term of office). However I think it’s worth drawing attention to the article for those with an interest in the preservation of Web sites. One thing that caught my eye was the comment that:

the Internet Archive will undertake a comprehensive crawl of the .gov domain.

The article concluded with a summary of the role of the Internet Archive:

The Internet Archive is a high-tech nonprofit, founded in 1996 by Brewster Kahle as an “Internet library” to provide universal and permanent access to digital information for educators, researchers, historians, and the general public. The Internet Archive captures, stores and provides access to born-digital and digitized content, and leads the development of Heritrix, the open-source archival web crawler, used to facilitate the collection of web data for this project.

What role might the Internet Archive have in the UK, I wonder?

Future in Bits

The BBC News Web site has published an interesting article entitled Future in Bits asking how can the ever-changing Web be archived bearing in mind the dilema of the malleable nature of digital information.

The article draws attention to the fact that no UK-based commercial online newspapers are currently being archived.

David Stuart, a research fellow in Web 2.0 Technologies at the University of Wolverhampton is quoted as saying:

The lack of an exhaustive archive of the UK web space not only risks the loss of information on web pages that are changed or taken down,” he said. “It also undermines the value of pages that link to them; the value of the web comes as much from the hyperlinks between pages as the contents of the web pages. This is especially true in the blogosphere, where so much of the content created by the public is built upon the foundations of traditional news stories

Jessie Owen, digital continuity project manager at the National Archives explains that the key to archiving is preparation.

This is something the JISC PoWR handbook can offer help with.

Managing the Crowd: Rethinking records management for the Web 2.0 world

My review of the Steve Bailey text Managing the Crowd: Rethinking records management for the Web 2.0 world has now been published in the latest Ariadne magazine.

This text has been mentioned at PoWR workshops, on the PoWR blog and on the JISC Information Environment Team blog. I can honestly say that it has had quite an impact on my thinking with regard to preservation and Web 2.0 resources, other members of the PoWR team may agree.

As I say in the conclusion:

This book offers up much food for thought. Bailey wants to wake up and shake his community. He wants to make them see that all is not well in the records management world and that if they don’t start moving with the times then they will be pushed out of the way. He contends there is a very real possibility that records management as we know it will cease to exist; it will be outsourced.

Go on, have a read.