…or is it easier to ask how long is a piece of string?
The statistic much banded about (for Web pages not pieces of string!) is 44 days, believed to originate in an article by Brewster Kahle (of Internet Archive fame) published in 1997 and titled Preserving the Internet. Brewster’s original quote is specifically about URLs, “…estimates put the average lifetime for a URL at 44 days.”
Whether this figure still stands today is a matter currently being discussed on the CURATORS@LIST.NETPRESERVE.ORG list after a query from Abigail Grotke of the Library of Congress.
Abbie offered up the 44 day statistic and pointed out that on the Digital Preservation Web site they have a graphic that discusses Web volatility stating “44% of the sites available on the internet in 1998 had vanished one year later“.
The other figure often cited is 75 days from a Michael Day’s report Collecting and preserving the world wide web.
“The dynamic nature of the Web means that pages and whole sites are continually evolving, meaning that pages are frequently changed or deleted. Alexa Internet once estimated that Web pages disappear after an average time of 75 days. (Lawrence, et al.,2001, p. 30).“
Another figure sometimes suggested is 100 days, this seems to come from Rick Weiss article for the The Washington Post, Washington, DC, 24 November 2003, On the Web, Research Work Proves Ephemeral – no longer available.
So what is the average lifespan of a Web page today? Is it getting shorter or longer? The Internet Archive now gives 44 -75 days as its ball park figure. I’d have to hazard a guess that with the rise in use of Web 2.0 technologies the Web is actually getting more transient by the day.
Is this OK?
Maybe if it’s just a tweet you sent your friend, however if it’s something more substantial that’s disapearing then it’s a real worry.
As the quotation from my Collecting and preserving the World Wide Web report implies, Alexa’s 75 day estimate was also referred to in a 2001 scientific paper written by Lawrence et al. [1]. Ironically, given that paper’s subject matter, the only reference provided there was to Alexa’s main Web page. From that, I couldn’t trace the original percentage (or the methodology used to generate it), so I just used the figure as a general indication of volatility.
That said, as you imply, these types of figures can hide a great deal of complexity, in that certain types of Web content are likely to be more stable than others. Moreover, in practice, it probably only makes sense to think about link decay within particular contexts. For example, the article by Lawrence et al. cited above was specifically concerned with Web references in scientific publications, with a case study based on computer science journal and conference papers and technical reports. Their study found that the percentage of invalid links varied over time, e.g. from 23% in 1999 to 53% in 1994. Even then, much of the content (around 80%) was actually still available, and could be found using search engines or other methods.
Over the past few years, similar studies have been made of link decay in medical and general science journals [2], followed by similar analysis of various other domains. The evidence here also tends to suggest that link volatility is higher than that of the content itself [3]. Looking that these kinds of detailed studies might actually be more helpful than providing general link decay figures that appear to mean very little. I’m NOT saying that Web link volatility isn’t a problem, just that it is difficult to understand what link decay percentages mean without an understanding of exactly what they were trying to measure.
References:
[1] Steve Lawrence, David M. Pennock, Gary William Flake, Robert Krovetz, Frans M. Coetzee, Eric Glover, Finn Arup Nielsen, Andries Kruger, and C. Lee Giles, “Persistence of Web References in Scientific Research,” Computer 34(2), February 2001, DOI: 26-3110.1109/2.901164
[2] Robert P. Dellavalle, Eric J. Hester, Lauren F. Heilig, Amanda L. Drake, Jeff W. Kuntzman, Marla Graber, and Lisa M. Schilling, “Going, Going, Gone: Lost Internet References,” Science, 302(5646), 787-788, 31 October 2003, DOI: 10.1126/science.1088234
[3] Jonathan D. Wren, “URL Decay in MEDLINE – a 4-year Follow-up Study,” Bioinformatics, 24(11), 1381-85, 1 June 2008, DOI: 10.1093/bioinformatics/btn127
Pingback: ResourceShelf » Blog Archive » What’s the average lifespan of a Web page?
Pingback: Web Page Life Span « The ADL Librarian
I found the source:
http://www.sciam.com/0397issue/0379kahle.html
Kahle, Brewster, Preserving the Internet, Scientific American, March 1997
Pingback: The Average Lifespan of a Webpage « ARCHIVE CULTURES NEWS COLLECTION by amateur_archivist
Pingback: What is the Average Lifespan of a Web Site