JISC PoWR

Preservation of Web Resources: a JISC-funded project [Archived Blog]

What’s the average lifespan of a Web page?

Posted by Marieke Guy on August 12th, 2009

…or is it easier to ask how long is a piece of string?

The statistic much banded about (for Web pages not pieces of string!) is 44 days, believed to originate in an article by Brewster Kahle (of Internet Archive fame) published in 1997 and titled Preserving the Internet. Brewster’s original quote is specifically about URLs, “…estimates put the average lifetime for a URL at 44 days.

Whether this figure still stands today is a matter currently being discussed on the CURATORS@LIST.NETPRESERVE.ORG list after a query from Abigail Grotke of the Library of Congress.

Abbie offered up the 44 day statistic and pointed out that on the Digital Preservation Web site they have a graphic that discusses Web volatility stating “44% of the sites available on the internet in 1998 had vanished one year later“.

The other figure often cited is 75 days from a Michael Day’s report Collecting and preserving the world wide web.

The dynamic nature of the Web means that pages and whole sites are continually evolving, meaning that pages are frequently changed or deleted. Alexa Internet once estimated that Web pages disappear after an average time of 75 days. (Lawrence, et al.,2001, p. 30).

Another figure sometimes suggested is 100 days, this seems to come from Rick Weiss article for the The Washington Post, Washington, DC, 24 November 2003, On the Web, Research Work Proves Ephemeral -  no longer available.

So what is the average lifespan of a Web page today? Is it getting shorter or longer? The Internet Archive now gives 44 -75 days as its ball park figure. I’d have to hazard a guess that with the rise in use of Web 2.0 technologies the Web is actually getting more transient by the day.

Is this OK?

Maybe if it’s just a tweet you sent your friend, however if it’s something more substantial that’s disapearing then it’s a real worry.

6 Responses to “What’s the average lifespan of a Web page?”

  1. Michael Day Says:

    As the quotation from my Collecting and preserving the World Wide Web report implies, Alexa’s 75 day estimate was also referred to in a 2001 scientific paper written by Lawrence et al. [1]. Ironically, given that paper’s subject matter, the only reference provided there was to Alexa’s main Web page. From that, I couldn’t trace the original percentage (or the methodology used to generate it), so I just used the figure as a general indication of volatility.

    That said, as you imply, these types of figures can hide a great deal of complexity, in that certain types of Web content are likely to be more stable than others. Moreover, in practice, it probably only makes sense to think about link decay within particular contexts. For example, the article by Lawrence et al. cited above was specifically concerned with Web references in scientific publications, with a case study based on computer science journal and conference papers and technical reports. Their study found that the percentage of invalid links varied over time, e.g. from 23% in 1999 to 53% in 1994. Even then, much of the content (around 80%) was actually still available, and could be found using search engines or other methods.

    Over the past few years, similar studies have been made of link decay in medical and general science journals [2], followed by similar analysis of various other domains. The evidence here also tends to suggest that link volatility is higher than that of the content itself [3]. Looking that these kinds of detailed studies might actually be more helpful than providing general link decay figures that appear to mean very little. I’m NOT saying that Web link volatility isn’t a problem, just that it is difficult to understand what link decay percentages mean without an understanding of exactly what they were trying to measure.

    References:

    [1] Steve Lawrence, David M. Pennock, Gary William Flake, Robert Krovetz, Frans M. Coetzee, Eric Glover, Finn Arup Nielsen, Andries Kruger, and C. Lee Giles, “Persistence of Web References in Scientific Research,” Computer 34(2), February 2001, DOI: 26-3110.1109/2.901164

    [2] Robert P. Dellavalle, Eric J. Hester, Lauren F. Heilig, Amanda L. Drake, Jeff W. Kuntzman, Marla Graber, and Lisa M. Schilling, “Going, Going, Gone: Lost Internet References,” Science, 302(5646), 787-788, 31 October 2003, DOI: 10.1126/science.1088234

    [3] Jonathan D. Wren, “URL Decay in MEDLINE – a 4-year Follow-up Study,” Bioinformatics, 24(11), 1381-85, 1 June 2008, DOI: 10.1093/bioinformatics/btn127

  2. ResourceShelf » Blog Archive » What’s the average lifespan of a Web page? Says:

    [...] Marieke Guy does an impressive job pulling together several estimates and the underlying papers where they come from. It’s one challenging question and getting even more so each day in this time of Twitter and similar services. [...]

  3. Web Page Life Span « The ADL Librarian Says:

    [...] this context, I came across, a posting, “What’s the average lifespan of a Web page?“. It was nice to read a sort of scientific discussion of this nuisance. The posting pulls [...]

  4. Avi Rappoport Says:

    I found the source:

    http://www.sciam.com/0397issue/0379kahle.html

    Kahle, Brewster, Preserving the Internet, Scientific American, March 1997

  5. The Average Lifespan of a Webpage « ARCHIVE CULTURES NEWS COLLECTION by amateur_archivist Says:

    [...] non-resolving link doesn’t necessarily imply that the content once hosted there no longer exists (1); it may have been archived or simply exist at a new location (albeit, one mediated by a paywall) [...]

  6. What is the Average Lifespan of a Web Site Says:

    [...] The first one: What’s the average lifespan of a Web page? on August 12th, 2009 [...]