JISC PoWR

Preservation of Web Resources: a JISC-funded project [Archived Blog]

Archive for the 'Web 2.0' Category

Student Blogs

Posted by Brian Kelly on 17th July 2008

How should an institution go about providing a blogging service for its students? The traditional approach which has been taken to the provision of an IT service for members of an institution has been to evaluate the range of products and select a solution which satisfies the user requirements, taking into account the resource and support implications.

In a Web 2.0 environment, however, other options become available. Rather than installing software locally, services which are available on the network can be used – and blogging services such as WordPress and Blogger are very popular blog hosting services.

What are the preservation aspects associated with the provision of a student blogging service? One might feel that the locally-installed application must be preferable, since management of the software and data is under the control of the institution. But what happens when students leave the institution? The normal policy in many institutions has been to delete student accounts and their data shortly after they leave. But is this desirable from the student’s perspective?  And what if they wish their data – their blog posts – to still be available after they leave the institution?

This is starting to happen, with the University of Warwick, which provided the first large-scale student blogging service a number of years ago. And as I wrote about a year ago, we are starting to see the first generation of student blog enthusiasts asking these questions. My post linked to a blog post hosted at the University of Warwick from a student (Jo Casey) who asked:

In the middle of August I will be leaving Warwick (to be the new Corporate Communications Manager at the Open University). … But, given that I will have to migrate my blog, where is the best place to go?“.

Unfortunately Jo’s blog was been deleted after she left the University – I was fortunate to have captured her question on my blog.

In light of this particular example from an institution which pioneered use of students blogs, my question would be “Wouldn’t institutions be advised to recommend the use of mature hosted blogging services for members of the institution – such as students – who will normally only be at the institution for a short period?

Would this be a desirable approach? What are the disadvantages? And could such problems be addressed?

Posted in Web 2.0 | 4 Comments »

When do we Fixity?

Posted by Marieke Guy on 14th July 2008

Records Management has a concept of record declaration. This is the point when we “draw a metaphorical line in the sand and fix the content of a record” (see the JISCInfo Kit on Records Management which also uses the term ‘fixity’ in this context.)

Most electronic records management systems (ERMS) provide users with the ability to perform this declaration automatically. When they do so, the digital content they have created (e-mail, document or whatever) becomes ‘fixed’. UK Government have called this creating ‘locked down and secure’ records, a necessary step for ensuring their authenticity and reliability.

But ERM systems seem to work best with static documents; authors of reports, for example, understand that a good time to declare their report as a record is when the final approved version has been accepted. Yet one of the distinctive features of Web 2.0 content is that the information is very fluid, and often there is no obvious point at which to draw this line and fix content.

One example might be blog posts. These can receive comments from the moment they are posted and well into the future. Not only this but many bloggers go back and edit previous posts and delete comments. This matter was recently discussed on Brian Kelly’s UKWeb Focus blog. Phil Wilson asked:

Brian, is there any reason you never modify or update your posts when you’ve made an error, and instead make users plough through the comments to see if anything you’ve said is wrong?” (UK Web Focus Blog)

Brian’s response was that he sometimes fixes typos and layout issues but is:

reluctant to change the meaning of a published post, even (or perhaps especially) if I make mistakes. In part I don’t want to undermine the authority of any comments or the integrity of any threaded discussions.”

Brian is open about this in his blog policy stating that only in exceptional circumstances will postings and comments be deleted.

Concerns about censorship and bloggers deleting posts/comments were also recently made in responses to What is fair play in the blogo/commentosphere? on Nature’s Blog.

Assuming that blog posts are to be included within a records management programme or a preservation programme, the issues described above might cause problems for those attempting to preserve authentic and reliable Web resources.

One approach is to be explicit in your Web Resource Preservation strategy about when you freeze Web resources for preservation, and the implications of doing so.

Another approach might involve an agreed institutional policy such as Brian has, but with an additional form of wording that is explicit about the status of blog posts as records, including when and how they should be declared as records, and whose responsibility it is to do so. Should selected blog posts be declared as records by their owners into the ERMS? Or will they all be harvested by an automated capture programme, and if so, how frequently?

Any thoughts?

Posted in Challenges, Records management, Web 2.0 | 1 Comment »

How sticky is your wiki?

Posted by Richard M. Davis on 13th July 2008

Wetpaint wiki is just one of the many enticing, powerful, quick-fix web apps that have sprung up around Web 2.0 and Social Networking. You’ll have your own favourites no doubt: I won’t start listing them here. Wikis have grown up a lot since the first WikiWikiWeb, and now are at the online heart of many educational projects at all levels, from classroom, to research and publishing.

We’ve been using Wetpaint’s wiki feature as a collaborative space for our workshop feedback, and this suits us fine: once we have collated all the input for our project outputs, in a few weeks it’ll probably be no loss to us to delete the wiki, or just set it adrift among all the other jettisoned flotsam in cyberspace.

But what’s often given less serious consideration, in the excitement of using a third-party provider of wikis, blogs, Ning, etc., to get your collaborative hypertext project off the ground so quickly and easily – and without having to go cap or cheque in-hand to whoever guards your web space – is this key preservation issue: what happens when you want to get your painstakingly intricate web of hyperlinked pages out?

There are many good reasons why you might want to do this: you might want to migrate to another wiki system or CMS, as the shape and nature of your content evolves; or put it on a permanent, persistent footing by moving it into your own domain; you might simply want to back it up or take a snapshot; or you might want to pull out information for publication in a different form. When you had one or two pages, it might have seemed trivial; but what if you now have hundreds?

Old Style Wiki

Unfortunately, just as exporting the information is often a secondary consideration for wiki content creators, so it also is for the wiki farm systems. The Wetpaint Wiki discussion boards indicate that an export feature was a long time in coming (and its absence quite a blocker to adoption by a number of serious would-be users). And what was eventually provided leaves a lot to be desired.

Wetpaint’s backup option “lets” you download your wiki content as a set of HTML files. Well, not really HTML files: text files with some embedded HTML-like markup. (Which version? Not declared.) Don’t expect to open these files locally in your browser and carry on surfing your wiki hypertext (even links between wiki pages need fixing). The export doesn’t include comment threads or old versions. Restoring it to your online wiki is not possible. But, for what it’s worth, you have at least salvaged some sort of raw content, that might be transformed into something like the wiki it came from, if hit with a bit of Perl script or similar.

I checked out Wikidot – another impressively-specced, free “wiki farm”. Wikidot’s backup option will deliver you a zip file containing each wiki page as a separate text file, containing your wiki markup as entered, as well as all uploaded file attachments. However, according to Wikidot support:

you can not restore from it automatically, it does not include all page revisions, only current (latest), it does not include forum discussion or page comments.

To reconstruct your wiki locally, you’ll, again, need some scripting, including using the Wikidot code libraries to reconvert its non-standard wiki-markup into standard HTML.

A third approach can be seen with a self-hosted copy of Mediawiki. Here you can select one or more pages by name, and have them exported as an XML file, which also contains revisions and assorted other metadata. Within the XML framework, the page text is stored as original wiki markup, raising the same conversion issues as with Wikidot. However, the XML file can be imported fairly easily into a different or blank instance of Mediawiki, recreating both hypertext and functionality more or less instantly.

In contrast to all these approaches, if you set a spidering engine like HTTrack or Wget to work “remotely harvesting” the site, you would get a working local copy of your wiki looking pretty much as it does on the web. This might be an attractive option if you simply want to preserve a record of what you created, a snapshot of how it looked on a certain date; or just in case a day should come when Wetpaint.com Inc., and the rest, no longer exist.

However, this will only result in something like a preservation copy – not a backup that can be easily restored to the wiki, and further edited – in the event, say, the wiki is hacked/cracked, or otherwise disfigured. For that kind of security, it may be enough to depend on regular backups of the underlying database, files and scripts: but you still ought to reassure yourself exactly what backup regime your host is operating, and whether they can restore them in a timely fashion. (Notwithstanding the versioning features of most wikis, using them to roll back a raft of abusive changes across a whole site is not usually a quick, easy or particularly enjoyable task.)

All this suggests some basic questions that one needs to ask when setting up a wiki for a project:

  • How long do we need it for?
  • Will it need preserving at intervals, or at a completion date?
  • Is it more important to preserve its text content, or its complete look?
  • Should we back it up? If so, what should we back up?
  • Does the wiki provide backup features? If so, what does it back up (e.g. attachments, discussions, revisions)?
  • Once “backed up”, how easily can it be restored?
  • Will the links still work in our preservation or backup copy?
  • If the backup includes raw wiki markup, do you have the capabilities to re-render this as HTML?

And questions like these are no less relevant when considering your uses of blogs and other social software: I hope we’ll be able to look at them more closely in another post.

Posted in Technologies, Web 2.0 | 1 Comment »

Preservation Of Your Tweets

Posted by Brian Kelly on 11th July 2008

How should you go about preserving your Twitter posts, which are sometimes referred to as tweets. You may feel this is a strange question, or perhaps even an incomprehensible one.  For those who may not be familiar with Twitter, this is a microblogging application which can be used to create a brief (up to 140 characters) blog post. Although initially used by individuals to summarise how they are feeling or what they are thinking the ways in which the service is being used has evolved: in some cases it is used as a general chat facility, and so has some parallels with an instant messaging environment (with the added advantage that tweets can be delivered free-of-charge to mobile phones). Of particular relevance to this blog, is the way in which institutions are beginning to explore Twitter’s potential from an institutional context.

On a recent post on the UK Web Focus blog I described how the Open University has set up an institutional Twitter account. And a number of responses to the posts described similar institutional Twitter accounts for Edge Hill University (illustrated), Birmingham City University, Coventry University and Aston University. We can also expect departments to follow the example of the School of Law at the University of Sheffield  which is using Twitter to syndicates its Law School News blog.

Edge Hill University Twitter Account

Many fans of Twitter may feel that issues of preservation shouldn’t intrude in what is normally used as a individual productivity and social tool. However if is often the case that new technologies which may have initially been provided for individual use and for social purposes, quickly seem to be used by early adopters in teaching and learning and research contexts. And soon afterwards institutions which are willing to explore the potential of such emerging technologies to support the needs of the institution will set up Twitter accounts, areas on YouTube, iTunes, etc. as, for example, the Open University has done.

Hence the need, I would argue, for institutions to ensure that they have considered the preservation and management implications of their tweets, even if the institutions feels that it would be inappropriate to have heavyweight policies on personal use of micro-blogging technologies. But perhaps before we establish the institutional policies we need to think about the different ways in which such micro-blogging applications may be used and also what the potential risks may be. 

Any thoughts? 

Posted in Web 2.0 | 10 Comments »

Collective Memory For Our Web Sites

Posted by Brian Kelly on 3rd July 2008

I recently posted an article about the history of the University of Bath home page which included a link to a display of versions of the home page, based on data taken from the Internet Archive from 1997-2007.

Andy Powell, a former colleague of mine who used to work at the University of Bath, posted a Twitter message in response to my post in which he said:

@briankelly all pages prior to http://tinyurl.com/47pydq were mine – that’s what web design was like back then! – but all records now lost

03:56 PM June 18, 2008 from twhirl in reply to briankelly

But although formal records of the decisions made related to the home page (its design, the content, the links and the technologies used) may have been lost (or perhaps not even kept) I do wonder whether it may be possible to document such history based on anecdotal evidence from those who were either direectly involved with the decision-making process or perhaps who observed the results of the decisions.

From the museum’s sector and the experiences of The National Archive (with the public Wiki service) we know that the general public does seem willing to provide anecdotal information on resources such as old photographs.

This approach seems to reflect some of the discussions held at the first JISC PoWR workshop. As described in Ed Pinsent’s summary of the eventthere was a lot of ‘folk memory’ and anecdotal evidence, also sometimes called ‘tacit knowledge’“.

Would it be possible, I wonder, to provide access to images of an institution’s old Web pages and, though use of social networking technologies, encourage members of the institution (and perhaps the wider community) to document their recollections of the Web site?

Posted in Web 1.0, Web 2.0 | 1 Comment »

Preservation and Innovation

Posted by Brian Kelly on 1st July 2008

In a recent comment on this blog Kevin Ashley makes the point that having an interest in the preservation of Web resources doesn’t mean that one is anti-innovation. As Kevin points out “I see a distinction being made between preserving an experience and preserving the information which the experience makes available. Both are valid preservation approaches and both achieve different ends.

There’s a real difficulty, though, in applying either of these preservation approaches in a environment of rapid technological development. And within higher education we are likely to see examples of such innovation, whether this is scientific researchers involved in new ways of visualisating scientific data or teaching staff who wish to ensure students gain experiences in use of Social Web technologies.

How are such tensions to be addressed? Should, for example, use of immersive environments such as Second Life be banned until preservation techniques have been developed which will ensure that such complex environments can be preserved? Such a draconian approach is alien to the educational sector’s IT development culture (although such approaches are taken in other areas such as biological and medical research). And as I’ve described in a post on “Is Second Life Accessible?” innovative technologies such as Second Life can bring substantuial benefits to the user community – in this case a user with cerebral palsy who feels that Second Life provides a really useful tool for people who are unable to get around, who have problems of mobility in real life “because you can have friends without having to go out and physically find them“.

The tensions between preservation and innovation perhaps reflect similar tensions between accessibility and innovation, with differing opinions being held by the various interested parties. In the case of Second Life (where we are seeing virtual worlds being continually assembled, developed and then redeveloped) there does seem to be an awareness of the need to preserve such virtual worlds, with the Maryland Institute for Technology in the Humanities having received funding from the Library of Congress’s National Digital Information Infrastructure and Preservation Program (NDIIPP) for a two-year project on Preserving Virtual Worlds. And yet the $590,000 funding for this project, which will not, of course, guarantee that a solution to the problem with be available at the end of the funding, indicates that the preservation of immersive worlds will not be an easy undertaking.

Returning to Kevin’s comment that there is a “distinction [to be] made between preserving an experience and preserving the information which the experience makes available. Both are valid preservation approaches and both achieve different ends. perhaps it is important to focus on these distinctions when we are seeking to preserve our innovative services. Might the video clip of the Second Life experience be the appropriate solution for the pioneers of this technology until the research programmes have devised ways of preserving the much richer and resuable environment? And might not this be an approach which can also be taken for our innovative Web services?

Posted in Policies, Web 2.0 | 1 Comment »

Don’t Web Managers Care About Preservation?

Posted by Brian Kelly on 17th June 2008

In response to a post on ULCC’s DA Blog Chris Rushbridge, director of the DCC (and contributor to the Digital Curation Blog) commented:

The enthusiastic way in which web-site owners “re-brand” or “re-launch” their web-sites suggests that they are not particularly interested, long-term, in the details of the experience; continuous improvement means continuous discarding. One hopes that they are more interested in the information content, in some more abstract sense. Maybe we could measure this by tracking older pages across re-launches?

Perhaps a measure of commitment to the “look and feel” might be the lifetime since last reorganised?

Is this right? Don’t Web site owners care about preservation, preferring instead to continually add new features to their services?

I have to say that I disagree. Rather than continual changes to Web sites due to the Web site owners’ enthusiasms, I would argue that such changes usually occur in response to user needs and expectations, the growing importance of Web services (which mean that institutions have greater expectations of the services which will be provided) and an increasing understanding of the limitations of approaches taken to Web site development in the past.

One example of this has been the obligation (for legal and moral reasons) to enhance the accessibility of Web resources. Initially HTML authoring tools and Content Management Systems (CMSs) provided little support to enhance accessibility – indeed many CMSs generated low quality HTML which could not be processed by assistive technologies. Read the rest of this entry »

Posted in Web 1.0, Web 2.0 | 4 Comments »

Web Resource Preservation: No One Ever Said It Would Be Easy….

Posted by Marieke Guy on 19th May 2008

If it was we’d all be at it!!

Any records manager or archivist will probably be able to give you half a dozen reasons for why digital preservation is very important. Some might well give you half a dozen more for why the preservation of Web resources in particular, which now play such a huge part in our daily lives, is very very important.

Unfortunately this critical activity isn’t easy. In fact the very nature of the Web means that the preservation and archiving of Web resources is actually a very complex task. A few of the major issues include:

  • The transient and dynamic nature of the Web – The Web is growing at a rapid rate. The average Web resource’s lifespan is short and pages are often removed. On the Web publishing is an easy process and content may be changed often and not necessarily in an orderly way. Metadata is very much an afterthought. Web 2.0 content (comprising of data mash ups, blog entries, comments etc.) is even more dynamic.
  • Selection issues – Of the billions of resources out there which and which instantiation of them should we preserve?
  • The technologies involved – The Web is dependant on technology, it uses various file formats and follows many protocols, most of which evolve quickly. The look and feel of a Web page may be determined by a number of different elements such as the code, the http protocol, the user, the browser and the server. Which of these need to be preserved? Web resources are usually held on just one server, so are at greater risk of removal, yet for some resources countless copies are made. Again which do we preserve? Web sites are held together by hypertext links meaning parts of the site could be omitted (if for example they use a robots.txt file or pages are not actually linked to) if crawled by archiving software. Whole areas of the Web are held in problematical CMS or behind authentication systems and Web 2.0 applications use layered APIs, which use data in many different ways.
  • Organisational issues – How is your institution using its Web site? Is it a publication or is it a record? Is the content being managed? Who is responsible and who has ownership?
  • The legal issues – There are many IPR and data protection issues with Web content. Who owns the photos on Flickr, the comments on a blog or the details on a social networking site?

There is no easy answer! However despite the difficulties of Web preservation some institutions may be addressing some of these issues already. We are keen to hear examples of any approaches being taken.

Posted in Project news, Selection, Web 2.0 | 5 Comments »