Using Google to Calculate Web Decay
scottennis writes: "Google has yet another application: measuring the rate of decay of information on the web.
By plotting the number of results at 3,6, and 12 months for a series of phrases, this study claims to have uncovered a corresponding 60-70-80 percent decay rate.
Essentially, 60% of the web changes every 3 months." You may be amused by some of the phrases he notes as exceptional, too.
I only do this since I know an angelfire page will get /. and reach bandwidth limits fast! However, there is a pretty excel chart on there so bookmark and come back much later.
Web Decay
by Scott Ennis
4/26/2002
Knowing how anxious most companies are to keep their web content "fresh," I was curious how "fresh" the web itself was.
In order to come up with a freshness rating for the web you need to sample a very large number of pages. Not wanting to do this, I opted to use the Google search engine as a method for reviewing the web as a whole.
My hypothesis is this: By searching Google using some common english phrases and returning results at various time points, a baseline can be reached for the common rate of freshness of overall web content.
I took the total number of pages found for each given phrase at 3, 6, and 12 months. I calculated a percentage for each of these points based on the total number of results found with no date specified.
For example: Phrase 3 mos. 6 mos. 12 mos. Total
buy low sell high 4700 5470 6200 7830
60% 70% 79% 100%
Note:
This method excludes any pages which are not text and more specifically, not English text.
This method relies on a random sampling of phrases.
Using this methodology I determined that the average rate of decay of the web follows a 60-70-80 percent decline at 3, 6, and 12 months.
Therefore, If a company wants to maintain a freshness rate on par with the web as a whole, their site content should be updated at the inverse rate. In other words:
60% of the site should change every 3 months
70% of the site should change every 6 months
80% of the site should change every 12 months
The only way to do this effectively is to either have a very small site, or have a site with dynamically generated information.
The following graph shows the decay rate for a few phrases. I selected these phrase to display because of their unique characteristics.
bill gates sucks--This phrase had the lowest decay rate of any phrases I searched.
life's short play hard--This phrase had the greatest decay rate of any I searched (note: this search was also very small).
blessed are the cheesemakers--This phrase was relatively small, but demonstrates that quantity of pages may not be important in determining decay rate.
late at night--This phrase returned the highest number of results of any I searched and yet it also adheres closely to the 60-70-80 rule.
Conclusion:
Web content decays at a uniform, determinable rate. Sites wanting to optimize their content freshness need to maintain a rate of freshness that corresponds to the rate of web decay.
The ultimate network admin tool needs HELP!
The study I posted on Angelfire appears to have reached a bandwidth threshhold. I've made the same study available here:
http://helen.lifeseller.com/webdecay.html
I've also included a link to the raw data I used.
Read any good sonnets lately?