JISC PoWR

Preservation of Web Resources: a JISC-funded project [Archived Blog]

How sticky is your wiki?

Posted by Richard M. Davis on July 13th, 2008

Wetpaint wiki is just one of the many enticing, powerful, quick-fix web apps that have sprung up around Web 2.0 and Social Networking. You’ll have your own favourites no doubt: I won’t start listing them here. Wikis have grown up a lot since the first WikiWikiWeb, and now are at the online heart of many educational projects at all levels, from classroom, to research and publishing.

We’ve been using Wetpaint’s wiki feature as a collaborative space for our workshop feedback, and this suits us fine: once we have collated all the input for our project outputs, in a few weeks it’ll probably be no loss to us to delete the wiki, or just set it adrift among all the other jettisoned flotsam in cyberspace.

But what’s often given less serious consideration, in the excitement of using a third-party provider of wikis, blogs, Ning, etc., to get your collaborative hypertext project off the ground so quickly and easily – and without having to go cap or cheque in-hand to whoever guards your web space – is this key preservation issue: what happens when you want to get your painstakingly intricate web of hyperlinked pages out?

There are many good reasons why you might want to do this: you might want to migrate to another wiki system or CMS, as the shape and nature of your content evolves; or put it on a permanent, persistent footing by moving it into your own domain; you might simply want to back it up or take a snapshot; or you might want to pull out information for publication in a different form. When you had one or two pages, it might have seemed trivial; but what if you now have hundreds?

Old Style Wiki

Unfortunately, just as exporting the information is often a secondary consideration for wiki content creators, so it also is for the wiki farm systems. The Wetpaint Wiki discussion boards indicate that an export feature was a long time in coming (and its absence quite a blocker to adoption by a number of serious would-be users). And what was eventually provided leaves a lot to be desired.

Wetpaint’s backup option “lets” you download your wiki content as a set of HTML files. Well, not really HTML files: text files with some embedded HTML-like markup. (Which version? Not declared.) Don’t expect to open these files locally in your browser and carry on surfing your wiki hypertext (even links between wiki pages need fixing). The export doesn’t include comment threads or old versions. Restoring it to your online wiki is not possible. But, for what it’s worth, you have at least salvaged some sort of raw content, that might be transformed into something like the wiki it came from, if hit with a bit of Perl script or similar.

I checked out Wikidot – another impressively-specced, free “wiki farm”. Wikidot’s backup option will deliver you a zip file containing each wiki page as a separate text file, containing your wiki markup as entered, as well as all uploaded file attachments. However, according to Wikidot support:

you can not restore from it automatically, it does not include all page revisions, only current (latest), it does not include forum discussion or page comments.

To reconstruct your wiki locally, you’ll, again, need some scripting, including using the Wikidot code libraries to reconvert its non-standard wiki-markup into standard HTML.

A third approach can be seen with a self-hosted copy of Mediawiki. Here you can select one or more pages by name, and have them exported as an XML file, which also contains revisions and assorted other metadata. Within the XML framework, the page text is stored as original wiki markup, raising the same conversion issues as with Wikidot. However, the XML file can be imported fairly easily into a different or blank instance of Mediawiki, recreating both hypertext and functionality more or less instantly.

In contrast to all these approaches, if you set a spidering engine like HTTrack or Wget to work “remotely harvesting” the site, you would get a working local copy of your wiki looking pretty much as it does on the web. This might be an attractive option if you simply want to preserve a record of what you created, a snapshot of how it looked on a certain date; or just in case a day should come when Wetpaint.com Inc., and the rest, no longer exist.

However, this will only result in something like a preservation copy – not a backup that can be easily restored to the wiki, and further edited – in the event, say, the wiki is hacked/cracked, or otherwise disfigured. For that kind of security, it may be enough to depend on regular backups of the underlying database, files and scripts: but you still ought to reassure yourself exactly what backup regime your host is operating, and whether they can restore them in a timely fashion. (Notwithstanding the versioning features of most wikis, using them to roll back a raft of abusive changes across a whole site is not usually a quick, easy or particularly enjoyable task.)

All this suggests some basic questions that one needs to ask when setting up a wiki for a project:

  • How long do we need it for?
  • Will it need preserving at intervals, or at a completion date?
  • Is it more important to preserve its text content, or its complete look?
  • Should we back it up? If so, what should we back up?
  • Does the wiki provide backup features? If so, what does it back up (e.g. attachments, discussions, revisions)?
  • Once “backed up”, how easily can it be restored?
  • Will the links still work in our preservation or backup copy?
  • If the backup includes raw wiki markup, do you have the capabilities to re-render this as HTML?

And questions like these are no less relevant when considering your uses of blogs and other social software: I hope we’ll be able to look at them more closely in another post.

One Response to “How sticky is your wiki?”

  1. JISC-PoWR » Blog Archive » JISC PoWR Workshop 2: Preservation and Web 2.0 Says:

    [...] may be relevant to the various scenarios have already been discussed on this blog including use of wikis, student blogs, use of Slideshare, instant messaging and Twitter and the wider set of discussions [...]