Slashdot Mirror


Using Google to Calculate Web Decay

scottennis writes: "Google has yet another application: measuring the rate of decay of information on the web. By plotting the number of results at 3,6, and 12 months for a series of phrases, this study claims to have uncovered a corresponding 60-70-80 percent decay rate. Essentially, 60% of the web changes every 3 months." You may be amused by some of the phrases he notes as exceptional, too.

2 of 208 comments (clear)

  1. Obligatory Full Text by rosewood · · Score: 5, Informative

    I only do this since I know an angelfire page will get /. and reach bandwidth limits fast! However, there is a pretty excel chart on there so bookmark and come back much later.

    Web Decay
    by Scott Ennis
    4/26/2002
    Knowing how anxious most companies are to keep their web content "fresh," I was curious how "fresh" the web itself was.

    In order to come up with a freshness rating for the web you need to sample a very large number of pages. Not wanting to do this, I opted to use the Google search engine as a method for reviewing the web as a whole.

    My hypothesis is this: By searching Google using some common english phrases and returning results at various time points, a baseline can be reached for the common rate of freshness of overall web content.

    I took the total number of pages found for each given phrase at 3, 6, and 12 months. I calculated a percentage for each of these points based on the total number of results found with no date specified.

    For example: Phrase 3 mos. 6 mos. 12 mos. Total

    buy low sell high 4700 5470 6200 7830
    60% 70% 79% 100%

    Note:
    This method excludes any pages which are not text and more specifically, not English text.
    This method relies on a random sampling of phrases.
    Using this methodology I determined that the average rate of decay of the web follows a 60-70-80 percent decline at 3, 6, and 12 months.

    Therefore, If a company wants to maintain a freshness rate on par with the web as a whole, their site content should be updated at the inverse rate. In other words:
    60% of the site should change every 3 months
    70% of the site should change every 6 months
    80% of the site should change every 12 months
    The only way to do this effectively is to either have a very small site, or have a site with dynamically generated information.

    The following graph shows the decay rate for a few phrases. I selected these phrase to display because of their unique characteristics.
    bill gates sucks--This phrase had the lowest decay rate of any phrases I searched.
    life's short play hard--This phrase had the greatest decay rate of any I searched (note: this search was also very small).
    blessed are the cheesemakers--This phrase was relatively small, but demonstrates that quantity of pages may not be important in determining decay rate.
    late at night--This phrase returned the highest number of results of any I searched and yet it also adheres closely to the 60-70-80 rule.

    Conclusion:

    Web content decays at a uniform, determinable rate. Sites wanting to optimize their content freshness need to maintain a rate of freshness that corresponds to the rate of web decay.

  2. Google Study in Another Place by scottennis · · Score: 5, Informative

    The study I posted on Angelfire appears to have reached a bandwidth threshhold. I've made the same study available here:

    http://helen.lifeseller.com/webdecay.html

    I've also included a link to the raw data I used.