JISC-PoWR

Preservation of Web Resources: a JISC-sponsored project

Archive for July, 2008

JISC PoWR Workshop 2: Preservation and Web 2.0

Posted by Brian Kelly on 31st July 2008

The second JISC PoWR workshop was held on 23rd July 2008 as part of UKOLN’s annual institutional Web management workshop, IWMW 2008.

This workshop provided an opportunity to review the outcomes of the first workshop, in which members of the JISC PoWR team and the 30+ participants identified some of the challenges to be faced in preserving content held on institutional Web services and explored some of the ways in which these challenges can be addressed. The slides for this review are available on Slideshare and are embedded below.


The main focus of the second workshop, however, was to look at the additional challenges which need to be addressed in a Web 2.0 context, when the content may be more dynamic, hosted by third party services and created by a wide range of users.

A PowerPoint presentation was used to initiate discussions based on a number of scenarios including use of blogs, wikis, Twitter, communications tools, social networks, ‘amplified events’ and use of third party repository services such as Slideshare - which is appropriate as this presentation is itself available on Slideshare and is embedded below.


This presentation doesn’t have any answers to these challenges - it was intended to initiate the debate at the workshop. Some of the approaches which may be relevant to the various scenarios have already been discussed on this blog including use of wikis, student blogs, use of Slideshare, instant messaging and Twitter and the wider set of discussions which took place at the workshop will feed into the final JISC PoWR handbook.

It is worth noting that this presentation was spotlighted on the Slideshare home page. This has helped to increase the visibility of the work of the JISC PoWR project: a week after the presentation hed been given there had been 713 views of the slides. It should also be noted that other Slideshare users had assigned various tags to the presentation (including data-portability, digital-preservation, sioc and preservation). As can be seen if you follow these links, we are beginning to see use of such social Web technologuies which can help users to discover related resources of interest to the digital preservation community.  This, to me, is a good example of the potential benefits which Web 2.0 can provide to those with n interest in the presevation of Web resources.

Posted in Workshops, Web 2.0 | 1 Comment »

The History of Your Institution’s Web Site

Posted by Brian Kelly on 21st July 2008

A recent blog post by Lorcan Dempsey on “The institutional web presence again” provided a link to a page on “the history of U.Va. on the web” which provides details of 14 year’s history for the University of Virginia’s Web site from 1994-2008.

The page provides details of the Web usage statistics in the early years, with screen images shown of major changes to the home page from 1997 (unfortunately no screen images are available for the first three years of the service).

Information is provided on the people and groups responsible for the design, the changes which were made as new technologies became available, significant additional content that was added and details of awards which the site won.

This is an approach which I feel all institutions should consider taking.  And let’s start recording the history of those early years quickly, before the first generation of institutional Web managers start to retire, leave or forget the details of the institution’s Web history.

University of Virginia Web Site History

Posted in Web 1.0 | No Comments »

Preservation And Instant Messaging

Posted by Brian Kelly on 21st July 2008

Background

The Web 2.0 environment has a strong emphasis on communications between individuals and not just one-way publishing. This pattern of usage places additional challenges for institutions wishing to ensure that records are kept of the dialogue which takes place. And these challenges may well need to be addressed within the context of policies on the preservation of Web resources as increasingly digital communications technologies will have Web interfaces.

We will be publishing a series of posts looking at different aspects of Web 2.0. In this initial post we will provide a brief case study on use of instant messaging to support communications between two institutions. The case study will attempt to draw out some of the general policy issues which should be applicable more widely.

Use of IM for the QA Focus Project

This example describes the approaches taken to use of instant messaging to support communications between the project partners for the JISC-funded QA Focus project which was launched in January 2002. The project partners were UKOLN (based at the University of Bath) and, initially, ILRT, University of Bristol. However after the end of the first year of the project ILRT withdrew form the project and were replaced by AHDS, who were based in London.

In order to minimise the amount of travel and to help to provide closely integrated working across the project partners it was agreed to make use of instant messaging technologies. As well as enabling the team members to have speedy contact with each other it was also recognised that official project meetings could be held using the technology. It was appreciated that in this context there was a need to have a slightly formal protocol for managing the meetings, to compensate for the limitations of online meetings. And in addition to the best practices for managing the online meetings it was also agreed that a record of the transcript would be kept, and that this record would be copied across to the Intranet along with other formal documents.

After AHDS replaced by ILRT as project partners we decided to change our IM client from Yahoo Messenger to MSN Messenger. It was either during this change of IM tools or whilst making use of another IM client (I can’t recollect the exact details) that we noticed that different IM applications work in slightly different ways. This includes whether a transcript of dialogue is kept automatically and whether new participants to a group chat will see only new discussions or discussions which have taken place previously (which has the potential to cause embarrassments at the least).

The experiences we gained in use of IM led the project partners to develop a policy on use of IM (which covered issues such as the possible dangers of interruptions, as well as keeping records of formal meetings held on IM). The policy also clarified use of IM in an informal context, with their being no guarantee that records would be kept.

The policy stated that:

  • IM software may be used for formal scheduled meetings. In such cases standard conventions for running meetings should be used. For example an agenda should be produced, actions clearly defined, changes of topics flagged and a record of the meeting kept.
  • IM software may be used for direct communications between individual team members. For example it may be used for working on particular tasks, to clarify issues when working on collaborative tasks and to support team working. IM may be particularly suited for short term tasks for which no archive is needed and other team members need not be involved - for example, arranging a meeting place.
  • Highly confidential information will not be sent using IM, due to the lack of strong encryption.

General Issues

The general issues arising from this case study include:

  • The need to ensure that the users of the IM technologies and those involved in developing policies related to its use have a good understanding of how the technologies work together with an understanding of the differences between different IM systems.
  • The need for simple documented policy statements

Posted in Web 2.0 | 1 Comment »

Preservation and Slideshare

Posted by Brian Kelly on 18th July 2008

Slideshare is a popular externally hosted Web 2.0 service for providing access to presentations. And as I’ve described on the UK Web Focus blog, there is evidence to demonstrate its impact in maximising awareness of presentations - and this might include both awareness of research activities, as described in my post, but also marketing activities.

But what about the risks associated in making use of a third party service in this way? What will happen if, for example, the Slideshare’s business model is flawed and the company goes bankrupt? Rather than making use of a Web 2.0 service shouldn’t we be providing Slideshare’s functionality in-house?

I feel this is the wrong response: it would be similar to saying that we should not allow third party organisations to manage our savings - but we all have bank accounts. And, although we know from recent experiences in the UK that there can be risks when using banks, we don’t shut down our accounts when we became aware if incidents such as Northern Rock financial difficulties. Rather we assess the risks and then manage the risks (in the case of savings, this might be to limit one’s saving to a maximum of £35,000 with any single bank, as this amount is guaranteed by the Government).

In the case of Slideshare an in-house solution would not only be costly to replicate its functionality, but it would also be unlikely to provide the impact and popularity which Slideshare has.

The challenge then is to assess possible risks and to explore mechanisms for managing such risks. The approach I take is to look at the popularity of the service and its user community (an approach, incidentally, which has also been recommended when selecting open source software). The Techcrunch service can be useful if providing information on the financial background to many Web 2.0 companies and its information on Slideshare seems reassuring, with a post in May 2008 described how SlideShare had Secured $3M for Embeddable Presentations.

The risk management approach I have taken is to store a managed master copy of the slides on the UKOLN Web site and ensure that links to this resource are provided on Slideshare.  As can be seen from the image,  the URL is included  on the title slide and in the accompanying metadata. In addition the URL is also included in the footer of the hard copy printouts. I also provide a Creative Commons licence for the resource, which seeks to avoid any legal barriers to future curation of the resource and allow the resource to be downloaded from the Slideshare site.

Metadata provided on the Slideshare service

This approach aims to ensure that the master resource is kept at a stable managed location, allows users to make a copy of the resource (if, for example, the Slideshare service suffers from performance or reliability problems) and allows uses to bookmark or cite the managed master version of the file.

Posted in Web 2.0 | 4 Comments »

Student Blogs

Posted by Brian Kelly on 17th July 2008

How should an institution go about providing a blogging service for its students? The traditional approach which has been taken to the provision of an IT service for members of an institution has been to evaluate the range of products and select a solution which satisfies the user requirements, taking into account the resource and support implications.

In a Web 2.0 environment, however, other options become available. Rather than installing software locally, services which are available on the network can be used - and blogging services such as Wordpress and Blogger are very popular blog hosting services.

What are the preservation aspects associated with the provision of a student blogging service? One might feel that the locally-installed application must be preferable, since management of the software and data is under the control of the institution. But what happens when students leave the institution? The normal policy in many institutions has been to delete student accounts and their data shortly after they leave. But is this desirable from the student’s perspective?  And what if they wish their data - their blog posts - to still be available after they leave the institution?

This is starting to happen, with the University of Warwick, which provided the first large-scale student blogging service a number of years ago. And as I wrote about a year ago, we are starting to see the first generation of student blog enthusiasts asking these questions. My post linked to a blog post hosted at the University of Warwick from a student (Jo Casey) who asked:

In the middle of August I will be leaving Warwick (to be the new Corporate Communications Manager at the Open University). … But, given that I will have to migrate my blog, where is the best place to go?“.

Unfortunately Jo’s blog was been deleted after she left the University - I was fortunate to have captured her question on my blog.

In light of this particular example from an institution which pioneered use of students blogs, my question would be “Wouldn’t institutions be advised to recommend the use of mature hosted blogging services for members of the institution - such as students - who will normally only be at the institution for a short period?

Would this be a desirable approach? What are the disadvantages? And could such problems be addressed?

Posted in Web 2.0 | 3 Comments »

When do we Fixity?

Posted by Marieke Guy on 14th July 2008

Records Management has a concept of record declaration. This is the point when we “draw a metaphorical line in the sand and fix the content of a record” (see the JISCInfo Kit on Records Management which also uses the term ‘fixity’ in this context.)

Most electronic records management systems (ERMS) provide users with the ability to perform this declaration automatically. When they do so, the digital content they have created (e-mail, document or whatever) becomes ‘fixed’. UK Government have called this creating ‘locked down and secure’ records, a necessary step for ensuring their authenticity and reliability.

But ERM systems seem to work best with static documents; authors of reports, for example, understand that a good time to declare their report as a record is when the final approved version has been accepted. Yet one of the distinctive features of Web 2.0 content is that the information is very fluid, and often there is no obvious point at which to draw this line and fix content.

One example might be blog posts. These can receive comments from the moment they are posted and well into the future. Not only this but many bloggers go back and edit previous posts and delete comments. This matter was recently discussed on Brian Kelly’s UKWeb Focus blog. Phil Wilson asked:

Brian, is there any reason you never modify or update your posts when you’ve made an error, and instead make users plough through the comments to see if anything you’ve said is wrong?” (UK Web Focus Blog)

Brian’s response was that he sometimes fixes typos and layout issues but is:

reluctant to change the meaning of a published post, even (or perhaps especially) if I make mistakes. In part I don’t want to undermine the authority of any comments or the integrity of any threaded discussions.”

Brian is open about this in his blog policy stating that only in exceptional circumstances will postings and comments be deleted.

Concerns about censorship and bloggers deleting posts/comments were also recently made in responses to What is fair play in the blogo/commentosphere? on Nature’s Blog.

Assuming that blog posts are to be included within a records management programme or a preservation programme, the issues described above might cause problems for those attempting to preserve authentic and reliable Web resources.

One approach is to be explicit in your Web Resource Preservation strategy about when you freeze Web resources for preservation, and the implications of doing so.

Another approach might involve an agreed institutional policy such as Brian has, but with an additional form of wording that is explicit about the status of blog posts as records, including when and how they should be declared as records, and whose responsibility it is to do so. Should selected blog posts be declared as records by their owners into the ERMS? Or will they all be harvested by an automated capture programme, and if so, how frequently?

Any thoughts?

Posted in Challenges, Records management, Web 2.0 | 1 Comment »

How sticky is your wiki?

Posted by Richard M. Davis on 13th July 2008

Wetpaint wiki is just one of the many enticing, powerful, quick-fix web apps that have sprung up around Web 2.0 and Social Networking. You’ll have your own favourites no doubt: I won’t start listing them here. Wikis have grown up a lot since the first WikiWikiWeb, and now are at the online heart of many educational projects at all levels, from classroom, to research and publishing.

We’ve been using Wetpaint’s wiki feature as a collaborative space for our workshop feedback, and this suits us fine: once we have collated all the input for our project outputs, in a few weeks it’ll probably be no loss to us to delete the wiki, or just set it adrift among all the other jettisoned flotsam in cyberspace.

But what’s often given less serious consideration, in the excitement of using a third-party provider of wikis, blogs, Ning, etc., to get your collaborative hypertext project off the ground so quickly and easily - and without having to go cap or cheque in-hand to whoever guards your web space - is this key preservation issue: what happens when you want to get your painstakingly intricate web of hyperlinked pages out?

There are many good reasons why you might want to do this: you might want to migrate to another wiki system or CMS, as the shape and nature of your content evolves; or put it on a permanent, persistent footing by moving it into your own domain; you might simply want to back it up or take a snapshot; or you might want to pull out information for publication in a different form. When you had one or two pages, it might have seemed trivial; but what if you now have hundreds?

Old Style Wiki

Unfortunately, just as exporting the information is often a secondary consideration for wiki content creators, so it also is for the wiki farm systems. The Wetpaint Wiki discussion boards indicate that an export feature was a long time in coming (and its absence quite a blocker to adoption by a number of serious would-be users). And what was eventually provided leaves a lot to be desired.

Wetpaint’s backup option “lets” you download your wiki content as a set of HTML files. Well, not really HTML files: text files with some embedded HTML-like markup. (Which version? Not declared.) Don’t expect to open these files locally in your browser and carry on surfing your wiki hypertext (even links between wiki pages need fixing). The export doesn’t include comment threads or old versions. Restoring it to your online wiki is not possible. But, for what it’s worth, you have at least salvaged some sort of raw content, that might be transformed into something like the wiki it came from, if hit with a bit of Perl script or similar.

I checked out Wikidot - another impressively-specced, free “wiki farm”. Wikidot’s backup option will deliver you a zip file containing each wiki page as a separate text file, containing your wiki markup as entered, as well as all uploaded file attachments. However, according to Wikidot support:

you can not restore from it automatically, it does not include all page revisions, only current (latest), it does not include forum discussion or page comments.

To reconstruct your wiki locally, you’ll, again, need some scripting, including using the Wikidot code libraries to reconvert its non-standard wiki-markup into standard HTML.

A third approach can be seen with a self-hosted copy of Mediawiki. Here you can select one or more pages by name, and have them exported as an XML file, which also contains revisions and assorted other metadata. Within the XML framework, the page text is stored as original wiki markup, raising the same conversion issues as with Wikidot. However, the XML file can be imported fairly easily into a different or blank instance of Mediawiki, recreating both hypertext and functionality more or less instantly.

In contrast to all these approaches, if you set a spidering engine like HTTrack or Wget to work “remotely harvesting” the site, you would get a working local copy of your wiki looking pretty much as it does on the web. This might be an attractive option if you simply want to preserve a record of what you created, a snapshot of how it looked on a certain date; or just in case a day should come when Wetpaint.com Inc., and the rest, no longer exist.

However, this will only result in something like a preservation copy - not a backup that can be easily restored to the wiki, and further edited - in the event, say, the wiki is hacked/cracked, or otherwise disfigured. For that kind of security, it may be enough to depend on regular backups of the underlying database, files and scripts: but you still ought to reassure yourself exactly what backup regime your host is operating, and whether they can restore them in a timely fashion. (Notwithstanding the versioning features of most wikis, using them to roll back a raft of abusive changes across a whole site is not usually a quick, easy or particularly enjoyable task.)

All this suggests some basic questions that one needs to ask when setting up a wiki for a project:

  • How long do we need it for?
  • Will it need preserving at intervals, or at a completion date?
  • Is it more important to preserve its text content, or its complete look?
  • Should we back it up? If so, what should we back up?
  • Does the wiki provide backup features? If so, what does it back up (e.g. attachments, discussions, revisions)?
  • Once “backed up”, how easily can it be restored?
  • Will the links still work in our preservation or backup copy?
  • If the backup includes raw wiki markup, do you have the capabilities to re-render this as HTML?

And questions like these are no less relevant when considering your uses of blogs and other social software: I hope we’ll be able to look at them more closely in another post.

Posted in Technologies, Web 2.0 | No Comments »

Preservation Of Your Tweets

Posted by Brian Kelly on 11th July 2008

How should you go about preserving your Twitter posts, which are sometimes referred to as tweets. You may feel this is a strange question, or perhaps even an incomprehensible one.  For those who may not be familiar with Twitter, this is a microblogging application which can be used to create a brief (up to 140 characters) blog post. Although initially used by individuals to summarise how they are feeling or what they are thinking the ways in which the service is being used has evolved: in some cases it is used as a general chat facility, and so has some parallels with an instant messaging environment (with the added advantage that tweets can be delivered free-of-charge to mobile phones). Of particular relevance to this blog, is the way in which institutions are beginning to explore Twitter’s potential from an institutional context.

On a recent post on the UK Web Focus blog I described how the Open University has set up an institutional Twitter account. And a number of responses to the posts described similar institutional Twitter accounts for Edge Hill University (illustrated), Birmingham City University, Coventry University and Aston University. We can also expect departments to follow the example of the School of Law at the University of Sheffield  which is using Twitter to syndicates its Law School News blog.

Edge Hill University Twitter Account

Many fans of Twitter may feel that issues of preservation shouldn’t intrude in what is normally used as a individual productivity and social tool. However if is often the case that new technologies which may have initially been provided for individual use and for social purposes, quickly seem to be used by early adopters in teaching and learning and research contexts. And soon afterwards institutions which are willing to explore the potential of such emerging technologies to support the needs of the institution will set up Twitter accounts, areas on YouTube, iTunes, etc. as, for example, the Open University has done.

Hence the need, I would argue, for institutions to ensure that they have considered the preservation and management implications of their tweets, even if the institutions feels that it would be inappropriate to have heavyweight policies on personal use of micro-blogging technologies. But perhaps before we establish the institutional policies we need to think about the different ways in which such micro-blogging applications may be used and also what the potential risks may be. 

Any thoughts? 

Posted in Web 2.0 | 10 Comments »

Are There Three Key Aspects To Web Site Preservation?

Posted by Brian Kelly on 10th July 2008

In response to my post on “Don’t Web Managers Care About Preservation?” Kevin Ashley described how he “see[s] a distinction being made between preserving an experience and preserving the information which the experience makes available. Both are valid preservation approaches and both achieve different ends.

Kevin is correct - these distinctions are very real. And different sectors will may well have differing views as to the importance of preservation the underlying data or the user experience - this has surfaced at recent repository events, with some groups arguing that PDF provides a satisfactory means of preserving the user experience whilst others feel that it is more important to preserve the data which was used to create the PDFs.

But rather than revisiting such arguments in this blog I would like to reflect on a comment made by Chris Rusbridge in response to the same post mentioned above. Chris described how:

this grump came about partly because a number of organisations which are supposed to have a commitment to long-term access to information managed to destroy access through re-launches. Richard, I do like continuity, and also long-term accessibility (gets both angles!) rather than preservation…

Persistent URIs are not about technical solutions, they are about commitment. We must make sure we never break URIs!

We should note that Chris isn’t engaging with the argument of whether it’s the experience of the information which he wants to be preserved - rather it’s the means of access he wants to remain in place.

And this, I feel, is one of the most challenging aspects of Web site preservation - preserving the access mechanisms for the end user. This, then, is very different from preserving that valuable historical parchment which might be moved from public view,  send off to a company for renovation and then send on tour as part of a travelling exhibition. In this case the resource may be being curated, but access to end user is not available - or even expected.

In the case of Web resources a failure by an organisation to manage digital assets may result in the organisation losing valuable information. But what if the Web resources are simply migrated to an alternative location? Or the resources are embedded in other aspects of the organisation’s work? In such cases the organisation will argue that it hasn’t lost anything.  Rather it is the end user who may feel aggrieved - as Chris has clearly described.

So perhaps we have three key aspects to Web site preservation - preservation of the experience, the information and the access. Or, if you feel that access for end users is part of the experience, we might argue the need to preserve the experience and/or information to support the needs of the organisation and the needs of the user community.

Posted in Policies | 1 Comment »

Getting Institutional Buy-in For Web Site Preservation

Posted by Brian Kelly on 8th July 2008

One of the risks we identified when we wrote the bid for the JISC PoWR project was that those involved in providing institutional Web service would not be interested in issues related to preservation. Surely not, you may feel if you’re a records manager. And if you are involved in providing institutional Web services you may be reluctant to confess to being less than fully committed to an area which does seem worthy.  But, to be honest, Web managers may not have a particularly strong interest in this topic. And if this is the case, it will be difficult to persuade them of the need to invest resources in this area and to gain the necessary commitment from senior managers and policy makers. Without these issues being addressed it seems to me that we’re unlikely to make any significant changes to instituional approachs to Web site preservation.

So I was very pleased to read Alison’s Wildish’s blog post enitled “Web Preservation: should we make the time?“. In this post Alison (head of Web Services at the University of Bath) described the case study which she and Lizzie Richard (Archivist, Records Manager and FOI Coordinator at the University of Bath) presesented at the first JISC PoWR workshop. Alison described how:

Neither of us felt web preservation was something we had expertise in nor the time (and for me the inclination) to fully explore this. Web preservation was something we could see as being useful (in the future) but I think we both felt it wasn’t a priority.

The good news is that the discussions Alison and Lizzie had after I introduced them to each other and invited them to participate in the JISC PoWR projects have helped them to further their understanding of Web site preservation:

Simply discussing preservation (from both sides of the fence) taught us a lot. We discovered the risks involved in simply side-lining it; the potential gap in University history and the benefits of embedding preservation into our digital strategy.

And now that Alison and Lizzie are better aware of the need to have a policy of Web site preservation they are  in a position to start working on one:

So is it something we should make time for? Yes I believe it is.

The JISC PoWR project is starting to deliver its goals of engaging the key stakeholders, making them better aware of the challenges in preserving Web sites but also willing to address those challenges :-)

And I’m please to say that Alison has made the slides used at the workshop available on Slideshare - well worth viewing, especially if you are a records manager who is “a paper person [and] have enough trouble trying to preserve hard copy records without having to worry about the web … [who] can see the value in theory, but in practice it’s too huge [and] guess it might be a good idea, but no one much cares what I think I am interested though… ” or a Web person who has the view that “In all honesty it isn’t interesting to me… We struggle to keep the site current – never mind thinking Web Specialist about preserving the old stuff I am future watching… need to know what to bring in not how to keep hold of the past Why is it something I should think about now? I’m not really that interested“. 

 

Posted in Challenges | 1 Comment »