Much discussion of blog preservation focuses on how to preserve the blogness of blogs: how can we make a web archive store, manage and deliver preserved blogs in a way that is faithful to the original?
Since it is blogging applications that provide this stucture and behaviour (usually from simple database tables of Posts, Comments, Users, etc), perhaps we should consider making blogging software behave more like an archive. How difficult would that be? Do we need to hire a developer?
One interesting thing about WordPress is the number of uses its simple blog model has been put to. Under-the-hood it is based on a remarkably simple data base schema of about 10 tables and a suite of PHP scripts, functions and libraries that provide the interface to that data. Its huge user-base has contributed a wide variety of themes and additional functions. It can be turned into a Twitter-like microblog (P2 and Prologue) or a fully-fledged social network (WordPress MU, Buddypress).
Another possibility exploited by a 3rd-party plugin is that of using WordPress as an aggregating blog, collecting posts automatically via RSS from other blogs: this seems like a promising basis for starting to develop an archive of blogs, in a blog.
The plugin in question is called FeedWordPress. It uses the Links feature of WordPress as the basis of a list of feeds which it checks regularly, importing new content when it finds it, as Posts within WordPress.
I installed FeedWordPress a while ago on ULCC’s DA Blog, and set it up to import all of the ULCC-contributed posts to JISC-PoWR, i.e. those by Ed Pinsent, Kevin Ashley and myself. I did this because I felt that these contributions warrant being part of ULCC’s insitutional record of its activities, and that DA Blog was the best to place to address this, as things stand.
JISC-PoWR also runs on WordPress, therefore I knew that, thanks to WordPress’s REST-like interface and Cool URIs, it is easy not only to select an individual author’s posts (
/author/kevinashley) but also the RSS feed thereof (
/author/kevinashley/feed). This, for each of the three author accounts, was all I needed to start setting up FeedWordPress in DA Blog to take an automatic copy each time any of us contributed to JISC-PoWR.The “author” on the original post has been mapped to an author in DA Blog, so posts are automatically (and correctly) attributed. The import also preserves, in custom fields, a considerable amount of contextual information about the posts in their original location.
In many cases, I’ve kept the imported post private in DA Blog. “Introductory” posts for the JISC-PoWR project blog, for example: as editor of DA Blog, I didn’t feel we needed to trouble our readers there with them; nevertheless they are stored in the blog database, as part of “the record” of our activities.
This is, admittedly, a very small-scale test of this approach, but the kind of system I’ve described is unquestionably a rudimentary blog archive, that can be set up relatively easily using WordPress and FeedWordPress – no coding necessary. Content is then searchable, sortable, exportable (SQL, RSS, etc). (Note, by the way, what happens when you use the Search box on the JISC-PoWR blog copy in UKWAC: this won’t happen with this approach!)
For organisations with many staff blogging on diverse public platforms this would be one approach to ensuring that these activities are recorded and preserved. UKOLN, for example, manages its own blog farm, while Brian and Marieke have blogs at WordPress.com (as well as contributing to this one), and Paul Walk appears to manage his own blog and web space. This kind of arrangement is not uncommon, nor the problem of how an institution get a grasp on material in all these different locations (it’s been at the heart of many JISC-PoWR workshop discussions). A single, central, self-hosted, aggregating blog, automatically harvesting the news feeds of all these blogs, might be a low-cost, quick-start approach to securing data in The Cloud, and safeguarding the corporate memory.
There are more issues to address. What of comments or embedded images? Can it handle Twitter tweets as well as blog posts? Does it scale? What of look-and-feel, individual themes, etc? Now we start needing some more robust tests and decisions, maybe even a developer or two to build a dedicated suite of “ArchivePress” plugins. But thanks to the power and Open-ness of WordPress, and the endless creativity of its many users, we have a promising and viable short-term solution, and a compelling place to start further exploration.