Slashdot Mirror


Using Google to Calculate Web Decay

scottennis writes: "Google has yet another application: measuring the rate of decay of information on the web. By plotting the number of results at 3,6, and 12 months for a series of phrases, this study claims to have uncovered a corresponding 60-70-80 percent decay rate. Essentially, 60% of the web changes every 3 months." You may be amused by some of the phrases he notes as exceptional, too.

208 comments

  1. At last! by ringbarer · · Score: 1, Interesting

    This kind of thing can be a good application of Google's SOAP interface!

    --
    "Why did they cancel my favorite Sci-Fi show? I downloaded ALL the episodes!"
  2. Google's collection of the data by Fucky+the+troll · · Score: 2, Interesting

    Are google claiming that they can check through the entire internet inside a timescale of 3 months, ready to check through again at the start of the next quarter?

    Surely this can't be true. Check Google's cached pages - see the dates on there?

    Google is turning into another history book.

    --






    Roadkill is yummy.
  3. Not exactly decay... by QuantumFTL · · Score: 4, Interesting

    It seems to me that in a way, the web is like an organism, whose smaller constituents are constantly (or not so constantly, depending on the webmaster) renewing themselves. It's a truely adaptive medium, and thus drastic change in short times like this as interest shifts should be quite expected.

    That said, this is one of the many ways in which Google is an invaluable tool for research. Not just finding information, but generating it. Thanks Google!

    1. Re:Not exactly decay... by Anonymous Coward · · Score: 1, Insightful

      I think that the larger organisms are renewing themselves on a regular basis as well. I fyou look at large sites - any of the Microsoft bundle, BBC News, Financial Times - they are all changinge from hour to hour or maybe day to day for the non news pages.

      It's the medium size businesses that don't seem to be grasping the web and the fact that you need to have a site that is dynamic in so far as it keeps people interested and possibly entertained.

      I'm lucky in that the company I work for is a small firm and a publisher so we have daily news content and well as on-line versions of our weekly and monthly publications (HTML and PDF downloads!) being uploaded all the time - so our web traffic is growing constantly - slowly but it hadn't seen a decline in the past two years.

      M@t :o)

    2. Re:Not exactly decay... by global_diffusion · · Score: 1

      Yeah. I also think it's funny that they think that this is a sign of decay:

      Essentially, 60% of the web changes every 3 months.

      Why should a site have to be updated all the time for it to be considered good? I have found tons of things like math tutorials and programming howtos that haven't been updated in years but are still valuable resources. I would only considere these 'decayed' if they randomly started losing images and text. I guess that this study is proof that you can prove any hypothesis given enough time.

  4. bill gates sucks... by jnana · · Score: 2, Funny

    For once, that is on topic. I'm glad to see that the phrase 'bill gates sucks' had the lowest decay rate of the phrases that the guy tested for.

    1. Re:bill gates sucks... by ksheff · · Score: 1

      I say it should be admitted as evidence in the recent court hearings as proof of what the public thinks of Mr. Bill (and his company which is an extension of his being). =)

      --
      the good ground has been paved over by suicidal maniacs
    2. Re:bill gates sucks... by prizzznecious · · Score: 4, Insightful

      All this means, actually, is that the sites that would include the information "Bill Gates Sucks" are not being updated very often, or have little else to say.

      It's an indicator of the dubious kind of context in which one finds such rash statements.

      --

      visit the hwky website for a lyrical genius infusion.
    3. Re:bill gates sucks... by Kierthos · · Score: 5, Interesting

      Actually (and unfortunately for any haters of the Evil that lies in the lands of Redmond) Headline News had this lovely little chart on recently, which showed public approval of several companies. Enron and Arthur Anderson had 9 and 11% approval ratings, respectively, while the big "winner" was Microsoft, with something like a 79% approval rating.

      Let's face facts here. We might hate Microsoft, but the vast majority of people do not. Good? Bad? Indifferent?

      Kierthos

      --
      Mr. Hu is not a ninja.
    4. Re:bill gates sucks... by Kierthos · · Score: 1

      How much more do you need to say?

      Kierthos

      --
      Mr. Hu is not a ninja.
    5. Re:bill gates sucks... by Anonymous Coward · · Score: 0

      Perhaps it's more of an indication that some truths are eternal. Or has Bill Gates started sucking less recently?

    6. Re:bill gates sucks... by Anonymous Coward · · Score: 0
      That number is rather high for any company--suggesting strongly that whatever data the chart was based on was biased. Perhaps it was even floated by Microsoft's own PR firm, based on a random survey at their Redmond campus--a lot of PR makes it into the news as reporting. That kind of data also depends on the exact question that was asked, and without knowing what the question was, it is also pretty useless.

      I can't even find a reference to it on-line. Do you have a pointer?

    7. Re:bill gates sucks... by Kierthos · · Score: 1

      Unfortunately, no. It's something I saw on TV, not on the Headline News site. And, true, it could easily be biased, as I don't recall them offering by what means the statistics were derived (i.e. how many people they asked, or exactly what the questions were).

      Kierthos

      --
      Mr. Hu is not a ninja.
    8. Re:bill gates sucks... by Bob(TM) · · Score: 1

      It's an indicator of the dubious kind of context in which one finds such rash statements

      A page containing a table trigonometric identities won't change much either. Therefore, one might also conclude that it could be an indicator of the absolute, inalterable truthfulness of the statement ...

      --

      The little guy just ain't getting it, is he?
    9. Re:bill gates sucks... by Anonymous Coward · · Score: 0

      This may also be skewed by the fact that most of people who can build a server and/or set up a web site probably use apache on linux/unix (according to netcraft). This is possibly a case where the technically savvy have a disproportionate impact on the message because of the complexity of the medium.

    10. Re:bill gates sucks... by Silentbob54 · · Score: 1

      No!!! Bill Gates will never stop sucking. He will never be less suckey. The man sucks more then all the members of N'SINC. He gives all of us hard working geeks a bad name

      --
      Nootch, SilentBob
    11. Re:bill gates sucks... by wings · · Score: 1

      You could also interpret it to mean that the maintainer of the page hasn't seen a need to revise it because Bill hasn't changed...

    12. Re:bill gates sucks... by Anonymous Coward · · Score: 0

      mabey its cause the mounds of name calling and MS PR, "we care, and provide quality software, and do are best"

      Not to mention they own a major network, advertise themselves as independant, etc...

      Similar tactics used by Hitler's ministry of the interior

  5. Wow by Anonymous Coward · · Score: 0

    Boy, that looks like some detailed analysis he's done there.

    1. Re:Wow by Anonymous Coward · · Score: 0

      Yeah no shit

  6. An ever-eveolving creature? by kerneljacabo · · Score: 1

    I actually always wondered about this. Really interesting, although I guessed that there would be a rapid rate of decay due to the nature of "information." Things get old and pass with time. An interesting application of this would be to keep records over a number of decades and figure out the average life/revival span of certain trends.

  7. blessed by thanjee · · Score: 4, Funny

    How long until all the cheesemakers have fully decayed and are no longer blessed?

    I don't look forward to that day.

    Long live cheese and cheese makers!

    --
    Saying your OS is the best because more people use it is like saying MacDonalds make the best food
  8. Web Death by svwolfpack · · Score: 4, Interesting

    It would also be interesting to see how much of the web no longer exists... like at what rate the web is dying. God knows there's enough dead links out there...

    1. Re:Web Death by nicklott · · Score: 1
      I seem to recall reading a New Scientist article (in print) that said someone had worked out that the half-life of the web was 18 months, so a given link has a 50% chance being dead after 18 months.

      Can't find any links unfortunately (the results of search for anything involving the words "half-life" tend to be somewhat skewed...)

    2. Re:Web Death by DarklordJonnyDigital · · Score: 2, Funny

      Oh, most of the web is still around... it just looks like pages are decaying because every link you click has already been Slashdotted ;)

    3. Re:Web Death by Anonymous Coward · · Score: 0

      interesting, i would have modded you up if you had given a link

    4. Re:Web Death by Enocasiones · · Score: 4, Interesting
      Educators and "link rot".

      In a paper to be published in the June issue of the Journal of Science Education and Technology, Brooks and Markwell likened the rate of link rot to the type of "extinction equation" commonly used to describe natural processes such as radioactive decay. They wrote that the hyperlinks in their study had an expected "half-life" of 55 months."

      Also this, which is just a link from the previous article.

      Easy! :)

      (web's half-life -game -unreal -counter -gamers)

      --
      Enoc
    5. Re:Web Death by digitalsushi · · Score: 2

      we're an isp.. i remember the first time someone contacted me about this horrible thing.. they wanted us to redirect all our 404 traffic to a page that would spawn popup spam. seems like thats what half of my web browsing is these days. find a page with links. click a link. a window pops up, and one under. close both, and the main page says the page doesnt exist. *sigh* the next one will work, though.. although it too will spawn a few windows. it's disenchanting to work on these systems when most people are spoiling the experience with their spammy goo. (and no we never sold our 404 traffic). its kinda sad.. when i get to a plain old apache default error message these days, i get all teary eyed and remember the good old days.

      now its all about finding open relays to megaphone your get rich quick idea that you copied from some other guy to 30 million people, praying that you get at least 40 back. course if you decide to bite just to mess with them, you find that they dont even check the box. whats the point? arggvhhh its just frustrating. its completely trashed the fun of having email. and the web.

      and the ghost of the old web, the one with low noise, is not viewed as dead, merely its soul is an HTTP redirect to someone's digital billboard, completely unrelated and unwanted.

      --
      slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
    6. Re:Web Death by 4of12 · · Score: 2

      I suppose the rate at which new links are created is roughly a positive coefficient that outweighs the negative coefficient associated with death of a link.

      Reminds me of calculations for population growth with k_growth and k_death.

      So, two questions:

      1. What about deliberately short-lived links like the kind of md5-flavored arguments I get from my favorite news sites hoping to track my usage? How do those affect link life statistics?
      2. What's the oldest link on the web?
      --
      "Provided by the management for your protection."
    7. Re:Web Death by Fjord · · Score: 1

      I feel your pain. My suggestion is to switch to mozilla and turn off "Open unrequested windows" in the scripting settings. I haven't had an unrequested popup/popunder since.

      --
      -no broken link
  9. baseline by swankypimp · · Score: 1

    For a few moments, I thought that the phrase "base" (for baseline) on his graph was a reference to "all your base are belong to us." It would have been neat to see how quickly that phrase appeared, then decayed!

    --

    --All your stolen base are belong to Rickey Henderson
    1. Re:baseline by foniksonik · · Score: 1

      "It would have been neat to see how quickly that phrase appeared, then decayed!"

      Certainly not as quickly as it would have without your reference, God willing.

      --
      A fool throws a stone into a well and a thousand sages can not remove it.
    2. Re:baseline by foniksonik · · Score: 0, Offtopic

      Personally I'd be interested in seeing what the numbers are on interpretations of that phrase (never to be uttered again) might be. Was it in reference to a database hack (frontbase) or something more carnal?

      --
      A fool throws a stone into a well and a thousand sages can not remove it.
    3. Re:baseline by colmore · · Score: 1, Offtopic

      it was in reference to a very badly translated video game.

      --
      In Capitalist America, bank robs you!
  10. A few things... by QuantumFTL · · Score: 1

    After reading the artical, I found a few things to be disturbing...

    First of all, he showed very little of his actual data. This makes it difficult to tell if his interpretation is correct.

    Thirdly, what the heck was this guy smoking when he came up with search phrases. Most of these phrases seem to be tangental to the main purpose of most web sitees on the internet.

    Finally, Timothy, why didn't you put the foot icon by the story? :)

    1. Re:A few things... by QuantumFTL · · Score: 1

      Okay that post didn't make much sense. I think it's because it's almost 5 oclock in the morning here and I haven't slept yet. Pardon my errors.

      Justin

  11. One of the flaws by CmdrTaco+(editor) · · Score: 2, Troll
    I think one of the flaws in any analysis of the decay on the web is the fact that most news sites keep an infinite archive of almost everything they have ever published online. The specific phrases probably don't represent a large enough sample size to properly reflect all sites. Sure, he says he used many phrases, but all he gives us is "bill gates sucks", "life's short play hard", "blessed are the cheesemakers", and "late at night". To properly do the study, he should've used a random word letter generator or word generator and test the decay of that.

    But, it is interesting to see his results. I can only imagine that if Archive.org did a study like this, they would be able to make a more legitimate conclusion. Perhaps some collaboration is in order?

    1. Re:One of the flaws by Andorion · · Score: 1

      This looks like something tossed together in a few minutes, just to get posted on slashdot =) Very thin on details and data.... Google itself should do this type of analysis - publish something like the zeitgeist

      -Berj

    2. Re:One of the flaws by Anonymous Coward · · Score: 0

      Well, obviously he's had the idea for at least 12 months, since the data is from that long ago.

  12. Obligatory Full Text by rosewood · · Score: 5, Informative

    I only do this since I know an angelfire page will get /. and reach bandwidth limits fast! However, there is a pretty excel chart on there so bookmark and come back much later.

    Web Decay
    by Scott Ennis
    4/26/2002
    Knowing how anxious most companies are to keep their web content "fresh," I was curious how "fresh" the web itself was.

    In order to come up with a freshness rating for the web you need to sample a very large number of pages. Not wanting to do this, I opted to use the Google search engine as a method for reviewing the web as a whole.

    My hypothesis is this: By searching Google using some common english phrases and returning results at various time points, a baseline can be reached for the common rate of freshness of overall web content.

    I took the total number of pages found for each given phrase at 3, 6, and 12 months. I calculated a percentage for each of these points based on the total number of results found with no date specified.

    For example: Phrase 3 mos. 6 mos. 12 mos. Total

    buy low sell high 4700 5470 6200 7830
    60% 70% 79% 100%

    Note:
    This method excludes any pages which are not text and more specifically, not English text.
    This method relies on a random sampling of phrases.
    Using this methodology I determined that the average rate of decay of the web follows a 60-70-80 percent decline at 3, 6, and 12 months.

    Therefore, If a company wants to maintain a freshness rate on par with the web as a whole, their site content should be updated at the inverse rate. In other words:
    60% of the site should change every 3 months
    70% of the site should change every 6 months
    80% of the site should change every 12 months
    The only way to do this effectively is to either have a very small site, or have a site with dynamically generated information.

    The following graph shows the decay rate for a few phrases. I selected these phrase to display because of their unique characteristics.
    bill gates sucks--This phrase had the lowest decay rate of any phrases I searched.
    life's short play hard--This phrase had the greatest decay rate of any I searched (note: this search was also very small).
    blessed are the cheesemakers--This phrase was relatively small, but demonstrates that quantity of pages may not be important in determining decay rate.
    late at night--This phrase returned the highest number of results of any I searched and yet it also adheres closely to the 60-70-80 rule.

    Conclusion:

    Web content decays at a uniform, determinable rate. Sites wanting to optimize their content freshness need to maintain a rate of freshness that corresponds to the rate of web decay.

    1. Re:Obligatory Full Text by Anonymous Coward · · Score: 0

      Does anyone else think it's funny that a page that talks a lot about the decay of information online is perpetuating more decay by being cut off by Angelfire's bandwidth limitations?

    2. Re:Obligatory Full Text by Anonymous Coward · · Score: 0

      No, the entry could summed up with "CmdrTaco does not want to violate copyright law and piss off the wrong person".

    3. Re:Obligatory Full Text by Anonymous Coward · · Score: 0

      Pretty offtopic but,
      Cacheing is an exception in most copyright law.
      When it's done for performance reasons etc its perfectly allowed.

  13. Study: World Wide Web sites and page persistence by Seth+Finkelstein · · Score: 5, Interesting
    For a more extensive (although older) study, take a look at

    Digital libraries and World Wide Web sites and page persistence

    That said, the Web and its component parts are dynamic. Web documents undergo two kinds of change. The first type, the type addressed in this paper, is "persistence" or the existence or disappearance of Web pages and sites, or in a word the lifecycle of Web documents. "Intermittence" is a variant of persistence, and is defined as the disappearance but reappearance of Web documents. At any given time, about five percent of Web pages are intermittent, which is to say they are gone but will return. Over time a Web collection erodes. Based on a 120-week longitudinal study of a sample of Web documents, it appears that the half-life of a Web page is somewhat less than two years and the half-life of a Web site is somewhat more than two years. That is to say, an unweeded Web document collection created two years ago would contain the same number of URLs, but only half of those URLs point to content. The second type of change Web documents experience is change in Web page or Web site content. Again based on the Web document samples, very nearly all Web pages and sites undergo some form of content within the period of a year. Some change content very rapidly while others do so infrequently (Koehler, 1999a). This paper examines how Web documents can be efficiently and effectively incorporated into library collections. This paper focuses on Web document lifecycles: persistence, attrition, and intermittence.

    Sig: What Happened To The Censorware Project (censorware.org)

  14. Applying statistics meaningfully by Anonymous Coward · · Score: 0, Troll

    It makes a ton of sense to conclude that information on the web 'decays' at a specified average rate based on the observations of 5 phrases.

    Good job, goober. Here's your PhD

    1. Re:Applying statistics meaningfully by python_rocks · · Score: 1

      The web is a closed system? No new sites being created to effect the results? If all the web pages that were ever going to be created have already been created and only existing pages were changes . . . then there might be a slight hint of validity to these results. Otherwise . . . this shows why Statistics is a science. (would that be "are a science"?)

  15. Well, by popeyethesailor · · Score: 0, Offtopic

    To arrest that decay rate, here's my contribution.

    Bill Gates SUCKS
    Bill Gates SUCKS
    Bill Gates SUCKS !!
    BASE BASE BASE
    BASE BASE BASE
    BASE BASE BASE
    Late at Night
    Late at Night
    Late at Night
    life's short play hard
    life's short play hard
    life's short play hard
    blessed are the cheese makers
    blessed are the cheese makers
    blessed are the cheese makers

    I request all members of the forum to link this post in all the websites you could access, and post this message too :)

  16. Well, they didn't do what momma said by Anonymous Coward · · Score: 0

    Anything you put in the freezer won't decay fast. Tehy didn't listen, and look how fast the web is decaying.

  17. Credibility? by Gossy · · Score: 2, Interesting
    Is it me, or does this 'research' simply look like something a bored guy has just thrown together from a few minutes work, then submitted to Slashdot to see if it gets posted?

    From the evidence, he searched for very few phrases. The sample size is way too low to be representive of the web - which some estimates put at several billion more pages than there are people on the planet! There are no signs of more than about 5 different phrases being searched for here..

    Can a few simple searches on Google really generate a large enough sample to draw such large conclusions?

    The report is one page long, hosted on Angelfire. There is no substantial data to back up his claims. Is this report reliable in any way?

    I'm amazed this got posted on the front page of Slashdot..

    1. Re:Credibility? by foniksonik · · Score: 1

      What about the U.S. census? Many surveys of 'scientific' reputation use small sample sets to pose hypotheses. What matters is that a 3rd party either confirms or disaffirms the data. Any takers?

      --
      A fool throws a stone into a well and a thousand sages can not remove it.
    2. Re:Credibility? by Kierthos · · Score: 1

      Yeah, but statistical samples are usually based on more then four sets or cases. If this study had checked, say a few dozen search phrases, and was coming back with similar results, I would be a touch more impressed with it. And if he had actually spent more then the apparent 3 minutes every 3 months on this and actually used a couple hundred search phrases AND was still getting the same decay rates, then it might just be indicative of something.

      Kierthos

      --
      Mr. Hu is not a ninja.
    3. Re:Credibility? by Lizard_King · · Score: 2

      Agreed. I'm a bit in the dark on *how* this guy came up with his numbers.

      I calculated a percentage for each of these points based on the total number of results found with no date specified.

      IMHO, This is a bit vague to be called anything but conjecture.

      --
      "My mother never saw the irony in calling me a son-of-a-bitch." - Jack Nicholson
    4. Re:Credibility? by DutchSter · · Score: 1

      Well, statistically, if you can develop a good enough test criteria, you could determine the rate with a very, very small sample. This is how some of the more reputable firms can survey 250 voting American adults and usually be within 3% of what the American public will do during the upcoming election. But hey, probability also says that he has a 1 in 100 shot of getting the number right just by guessing ;)

      The whole "report" was down due to excessive bandwidth (whooo hooo, Slashdot effect!), but the fact that it was posted on a free provider (Anglefire) does make me wonder. If this guy was smart enough to accurately do a study like this, you'd think he'd KNOW what happens to bandwidth when you post a story on Slashdot. Instead he keeps it on his freebee page. Was anybody able to snag a copy and post it in? Until given good evidence to the contrary, I think this study is very suspect.

      What's really needed to conduct a study of this is a temporal database. Problem is, while Google may be HUGE, temporal DBs are many times larger, and not very efficient for routine searches. In essence, they store the world "as it was", "as it is", and "as it will be" and can use deduction and inference to pull out statistics exactly like this one. There are currently no commercial TDBMSs, only a few layered applications that can be placed on top of an existing DB to provide temporal-like features.

    5. Re:Credibility? by schlaff · · Score: 1

      Actually, it doesn't even make sense. If 60% decays in 3 months, shouldn't 60% of the remaining 40% decay in 6 months (for a total of 84%). And then in another 6 months 84% of that would change which would give us 98% or "1 - (2/5)^4" of the web changed if 60% changed in 3 months. Unless there's something very funky going on, the rate of decay should stay constant!

    6. Re:Credibility? by Stephen · · Score: 2
      Well, statistically, if you can develop a good enough test criteria, you could determine the rate with a very, very small sample. This is how some of the more reputable firms can survey 250 voting American adults and usually be within 3% of what the American public will do during the upcoming election.
      No, this is because of the (surprising) fact that the accuracy of the survey is dependent only on the sample size, not on the population size. 250 is not a very small sample. The fact that it's 250 out of 100 million or whatever is irrelevant.
      --
      11.00100100001111110110101010001000100001011010001 1000010001101001100010011
    7. Re:Credibility? by Dephex+Twin · · Score: 1

      Not to mention the fact that although it talks about the Web as a whole decaying, he ignored the non-English-speaking Web.

      mark

      --

      If you want to make an apple pie from scratch, you must first create the universe. -- Carl Sagan
    8. Re:Credibility? by scottennis · · Score: 1

      I agree that the sample is too small. I have posted the entire smaple I used for those who may be interested.
      The result of my effort here is a hypothesis which I hope to test by applying it to individual sites.

  18. archive.org by mmThe1 · · Score: 3, Interesting

    This makes the job of Archive.org - like sites damn tough.

    P.S. Are we losing information at a comparable rate to generation....?

  19. What ? - ?? by bushboy · · Score: 0, Troll

    This is news ?

    Or a joke ?

    Must be a joke - anyone basing 'research' or a 'survey' on 'bill gates sucks' and 'blessed are the cheesemakers' is either really bored, or trying to see how deep in the barrel /. will dig for 'stories'

    --
    A slashdotting - you get the stick first and then the carrot !
  20. interesting but... by lowLark · · Score: 3, Interesting

    He creates a problem for himself by not providing us with his raw data, making any subsequent verification of the trend difficult. In fact, the one data set he gives us:
    Phrase 3 mos 6 mos 12 mos. Total
    buy low sell high 4700 5470 6200 7830
    60% 70% 79% 100%
    seems to demonstrate the opposite of the trend that he describes. Indeed, a current search on google shows about 1,270,000 results (makes you wonder when he did his searches that the current number of results is so many orders of magnitude in difference). The methodology also fails to take in to account any growth in the size of the web, which could mask the effects of decay.

    1. Re:interesting but... by ankit · · Score: 1

      It is a phrase. You are supposed to enclose it in double quotes. This is what you will get. A count of 7,580.

      --
      Don't Panic
    2. Re:interesting but... by lowLark · · Score: 1

      Good point, but the origional arguement still stands: that for the one set of data available, the trend seems to be towards growth rather than decay.

    3. Re:interesting but... by Anonymous Coward · · Score: 0

      Actually, the correct search returns 7500 results.

    4. Re:interesting but... by foniksonik · · Score: 1

      7830 | 7580

      The original seems higher than the most recent number... 'trend seems to be towards growth..'?

      hmmm....

      --
      A fool throws a stone into a well and a thousand sages can not remove it.
    5. Re:interesting but... by Anonymous Coward · · Score: 0

      30 Apr 2002 7:50AM Central

      Searched the web for "buy low sell high". Results 1 - 10 of about 7,370. Search took 0.15 seconds.

  21. Re:Google cached my ballz by Anonymous Coward · · Score: 0

    Why was this modded as Troll?

  22. Why so critical? by Anonymous Coward · · Score: 0

    Check out his credentials.

    1. Re:Why so critical? by Anonymous Coward · · Score: 0

      It sounds like he's a step away from supersizing my meal at Wendy's.

  23. Eternity by tanveer1979 · · Score: 0, Offtopic

    bill gates sucks

    He used to suck, sucks and will suck forever. this phrase is eternity and wil be there till forever. Googles will come and go, the net will decay and do radioactiavity but the eternal truth will remain forever --
    --
    My Aurora : http://www.youtube.com/watch?v=o91ZsGwJYyg
    FB : https://www.facebook.com/TanveersPhotography
  24. The Web is decaying by Anonymous Coward · · Score: 5, Funny
    It is now official - Netcraft has confirmed: The web is decaying

    Yet another crippling bombshell hit the beleaguered web community when recently IDC confirmed that the web accounts for less than a fraction of 1 percent of all server usage. Coming on the heels of the latest Netcraft survey which plainly states that the web has lost more market share, this news serves to reinforce what we've known all along. The web is collapsing in complete disarray, as further exemplified by failing dead last in the recent Sys Admin comprehensive networking usage test.

    You don't need to be a Kreskin to predict the web's future. The hand writing is on the wall: the web faces a bleak future. In fact there won't be any future at all for the web because the web is decaying. Things are looking very bad for the web. As many of us are already aware, the web continues to lose market share. Red ink flows like a river of blood. Dot-coms are the most endangered of them all, having lost 93% of their core developers.

    Let's keep to the facts and look at the numbers.

    The web leader Theo states that there are 7000 users of the web. How many users of other protocols are there? Let's see. The number of the web versus other protocols posts on Usenet is roughly in ratio of 5 to 1. Therefore there are about 7000/5 = 1400 other protocols users. Web posts on Usenet are about half of the volume of other protocols posts. Therefore there are about 700 users of the web. A recent article put the web at about 80 percent of the HTTP market. Therefore there are (7000+1400+700)*4 = 36400 web users. This is consistent with the number of Usenet posts about the web.

    Due to the troubles of Walnut Creek, abysmal sales and so on, the web went out of business and was taken over by Slashdot who sell another troubled web service. Now Slashdot is also dead, its corpse turned over to yet another charnel house.

    All major surveys show that the web has steadily declined in market share. The web is very sick and its long term survival prospects are very dim. If the web is to survive at all it will be among hobbyist dabblers. The web continues to decay. Nothing short of a miracle could save it at this point in time. For all practical purposes, the web is dead.

    Fact: the web is dead.

    1. Re:The Web is decaying by Ralp · · Score: 1

      +5, Troll :)

  25. The guy who posted this may have made a mistake. by eugene+ts+wong · · Score: 2, Informative
    Essentially, 60% of the web changes every 3 months.
    I think that is incorrect, according the "researcher". He should have said, "Essentially, 60% of the web is getting older every 3 months.".
  26. Better article needed by Raedwald · · Score: 5, Interesting

    I'm not impressed. The article does not define what he means by decay, or how he measured it, except in the vaguest of terms. The analysis of the data is poor; anyone interested in decay would suspect some kind of exponential decay. They would therefore plot the data logarithmically, and perhaps calcualte a half life. Piss poor.

    --
    Ne mæg werig mod wyrde wiðstondan, ne se hreo hyge helpe gefremman.
    1. Re:Better article needed by RoC+MasterMind · · Score: 1

      Reports posted on angelfire don't get much respect. The web is very large and this guy only used 5 phrases, and the study size is not large enough in relation to the web's size to accurately determine anything. Also, google has probably changed their algorithm in the past year, so the results would change even for the same query. And what does he define as decay? How long does it take for information to get "old"? I think its relavent to the content of the webpage.

    2. Re:Better article needed by scottennis · · Score: 1

      I just wanted to see if I could spot a broad trend. I believe that the 60-70-80 rule is accurate. I intend to validate it (or invalidate it) by spidering several sites and seeing whether they follow the trend.

    3. Re:Better article needed by BoBaBrain · · Score: 1

      I believe that the 60-70-80 rule is accurate.

      Why?

      --
      I am a Karma Library.
    4. Re:Better article needed by scottennis · · Score: 1

      The 60-70-80 rule appears to be accurate based on the sampling I did, and nobody has shown me anything to make me believe it is not accurate.

  27. BUT.... by droyad · · Score: 1

    The "Study" does not take into account new web pages that have replaced the old.

    But then again it is an interesting piece of trivia

  28. we've lost the ability to rely on hyperlinks by thegoldenear · · Score: 5, Insightful

    Tim Berners-Lee wrote :"There are no reasons at all in theory for people to change URIs (or stop maintaining documents), but millions of reasons in practice.": http://www.w3.org/Provider/Style/URI and advocated creating a web where documents could last, say, 20 years and more

    1. Re:we've lost the ability to rely on hyperlinks by Anonymous Coward · · Score: 1, Funny

      Nobody's ever going to keep content on the web that's 20 years old.

    2. Re:we've lost the ability to rely on hyperlinks by Kallahar · · Score: 2

      URL's: people change hosts, usually due to money concerns

      filenames: people change languages (php, perl, asp, etc), site layouts, functionality.

      filenames: intentionally changed to prevent deep linking (heh heh)

      When I change a URI it is usually because I'm changing the logical structure of that program. However I also usually check the referrer logs, and if there has been an outside referral then I will put in a redirect for the old file, and contact the site that had the link to ask them to change it.

      There is no excuse for having broken links on your own site though, though it does happen to the best of us :)

      Travis

    3. Re:we've lost the ability to rely on hyperlinks by Reziac · · Score: 2

      My main site is going on 4 years old and still has the same core pages and link structure as when it was first uploaded. The main page, and certain subpages, are linked from around 100 other sites (the list has grown every time I check it); I rely on these referrals for traffic, but bedamned if I'm gonna chase 'em all down and get everyone to fix their links. Easier to retain the existing structure, or have a duplicate page (old name, same content) if needed to maintain link integrity.

      And this costs me nothing but "oh yeah, must remember to upload index.html as well as index.htm".

      I expect that my sites will be just as valid 20 years from now, assuming Earthlink is still in business and still hosting it. (Yeah, I should get my own domain names, but..)

      If you have dynamic content, your needs might differ. But for informational sites, change for the sake of change is usually a Bad Thing.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
  29. Re: Google is good but have you tried... by Anonymous Coward · · Score: 0

    The Open Directory Project [www.dmoz.org]

  30. Information vs WWW by castlan · · Score: 1, Interesting

    The nature of information is decidedly ephemeral compared to the static nature of much of the web. Perhaps the surge in Weblogging has altered this dynamic even more than the hypercommercialization, but I'll dispute the 60% figure if it is based only on those four phrases. Much of the early Web was fairly static research and information hosted on .edu domains from what I gather. Since the tide shifted away to .commercialization and tripe, the nature of "information" has little to do with the state of the web, and more to do with tidiness. How much of the Web is long abandoned fan sites and dusty old means abandoned from the "information superhighway"?

    In fact, Information Superhighway would be a great data point for this subject. Another consideration, which would be difficult to accomodate, is the reality of mirrors and shuffling pages to different URLs.

    Most importantly, I strongly hope that your "interesting application" never gets implemented, because I can see no application of the resulting data that doesn't make my blood run cold. Psychological Warfare and hostile advertising are the bane of the Post-WWII US, and (likely) the world. Propeganda is a pernicious technology, and I fear further development in this area.

    Okay, I'll admit that was a touch trollish. Because the Psych. Warfare genie was already released from it's NAZI bottle and invited into the US (along with other valuable sciences), it's a little late to advocate repression of this technology. Yet I still reel from my country's increasingly malevolent commercialism aspects, which have spun off from Capitalism without any of Capitalism's redeeming social aspects. I almost want to become a socialist, until I consider that this state of affairs sprung from the National Socialist state.

    In any case, while the WWW may be evolving, is certainly isn't in the Darwinian sense that was likely intended. Vestigal Geocities homepages long abandoned are plentiful, and are less temporary, giving search engines a better shot at crawling than dynamic, or "living" news portals. This sickly "creature" is more of a construction than the product of evolution (unless you consider pre-Charles Darwin senses of the word). If you want to research the nature of information and survivability/mutability, the Freenet Project would provide a much more fruitful environment, if it ever reached widespread useage. I would have less strenuous objections to classifying the Freenet an "ever-evolving creature".

  31. Stop Web Decay Today by weave · · Score: 4, Funny
    Do your part to stop web decay. Include this in a cron job. For best results, be sure to brush, I mean touch, three times a day...

    find /var/www/html -name '*' -exec /bin/touch {} \;
    1. Re:Stop Web Decay Today by davesag · · Score: 1
      this highlights a valid point. page changes to not equate to information loss. page changes on a blog, or web-board, news site or olm are almost all additions to an overall mass of indexed content that does not change much, apart from perhaps the ads, and index or contents pages that change regularly. pages such as these /. comments are always changing, but seldom lose information. the information is continually being sorted for relevance by ant-like readers.

      this 'study' suggests to me that there is room for real scientific investigation into the nature of massively webbed information. and google very likely provides a useful tool in the information-scientist's investigative arsenal.

      --
      I used to have a better sig than this, but I got tired of it
    2. Re:Stop Web Decay Today by Anonymous Coward · · Score: 0

      You know, you don't need "-name '*'". If you leave it off, it does the same thing.

  32. Another flaw by stevie-boy · · Score: 1

    Doesn't Google keep improving its search algorithm so that only relevant sites are provided in the hits? Did this "researcher" hit the link that includes the filtered out near duplicates?

  33. Self-fulfilling prophecy? by flipflapflopflup · · Score: 1, Funny
    The link now says:

    Temporarily Unavailable

    The Angelfire site you are trying to reach has been temporarily suspended due to excessive bandwidth consumption.

    The site will be available again in approximately 2 hours!

  34. Study? by Anonymous Coward · · Score: 4, Insightful

    Wow! What a wonderful, in-depth, study! Is there any link to a scientific paper on that page that I am missing or is that everything? I mean, how can someone claim something just showing us a few numbers and an excel graph.

    I appreciate the topic very much, but some more material on it is needed. This study wouldn't be complete enough even for high-school homework...

    And look at his homepage (just remove the last part of the url). The most pages are more than two years old... that's decay! :)

    Seriously speaking, just look for a few more sources before you accept a story.

  35. Study claims ?? by Anonymous Coward · · Score: 1, Insightful

    this study claims to have uncovered a corresponding 60-70-80 percent decay rate. Essentially, 60% of the web changes every 3 months."

    The guy that submited this story is the guy that did the study.

    1. Re:Study claims ?? by echucker · · Score: 0, Offtopic

      Mod parent up. It's probably one of the most telling replies so far. That being said, I'd bet that pure hits for a phrase mean absolutely nothing. "Life is short, play hard" is a corporate motto. This should be static just by its very nature. Unless you check the FULL CONTENT of each and every page, you won't be able to tell how much a site has really changed.

  36. Google/CowboyNeal Study by BoBaBrain · · Score: 5, Funny

    On a similar note, I was curious to see what the CowboyNeal content of the web is. As luck would have it, a precise answer can be found easily.

    Google gives us the following interesting results:

    3,840,000 sites contain the word Cheese.

    1,640 sites contain the words CowboyNeal and Cheese.

    Therefore, 4.27083333333333333333333333333e-2% of cheese related sites contain a reference to CowboyNeal.

    As cheese is a randomly chosen word with no special connection to CowboyNeal it is reasonable to assume that 4.27083333333333333333333333333e-2% of all sites contain a reference to The Cowboy (Assuming the number of sites dedicated to CowboyNeal equals the number dedicated to ignoring him).

    So there we have it. The web is 99.957291666666666666666666666667% CowboyNeal free. :)


    I said the results were "precise", not "accurate". :P

    --
    I am a Karma Library.
    1. Re:Google/CowboyNeal Study by luccid · · Score: 1

      Now just add a graph and submit !

    2. Re:Google/CowboyNeal Study by maccallr · · Score: 0, Offtopic

      Here's the graph, kind-of... CowboyNeal's balanced diet. But I guess it's really just another blatant plug... :-)

    3. Re:Google/CowboyNeal Study by TheTomcat · · Score: 4, Funny

      Do you sell insurance?

      S

    4. Re:Google/CowboyNeal Study by Anonymous Coward · · Score: 0

      Heres a better graph

    5. Re:Google/CowboyNeal Study by Anonymous Coward · · Score: 0

      What are you talking about? What does insurance have to do with this?

    6. Re:Google/CowboyNeal Study by jo42 · · Score: 1
      CowboyNeal and Sodomy comes up with 6 hits.
      CmdrTaco and Sodomy comes up with 2 hits.
      Hemos and Sodomy comes up with 55 hits.

      Hmmm...

    7. Re:Google/CowboyNeal Study by Anonymous Coward · · Score: 0
      In a recent peer-reviewed publication, BoBaBrain iinvestigated the relationship of Cowboy Neal (CbN) to cheese on the web. The work seemed to cry out for a follow-up, so I tried extending his technique for CbN and a few random words. Here are the most recent results:

      Searched the web for "cowboy neal" cheese. Results 1 - 10 of about 405. Search took 0.22 seconds.

      Searched the web for "cowboy neal" steak. Results 1 - 10 of about 71. Search took 0.18 seconds.

      Your search - "cowboy neal" filbert - did not match any documents.
      The spelling correction - "cowboyneal" filbert - also did not match any documents.

      So much for food

      Searched the web for "cowboyneal" computer. Results 1 - 10 of about 16,100. Search took 0.15 seconds.

      Searched the web for "cowboyneal" wristwatch. Results 1 - 2 of 2. Search took 0.12 seconds.

      Searched the web for "cowboy neal" wristwatch. Results 1 - 4 of about 5. Search took 0.24 seconds.

      Searched the web for "cowboy neal" slime. Results 1 - 3 of 3. Search took 0.21 seconds.

      Searched the web for "cowboyneal" tiffany's. Results 1 - 5 of about 7. Search took 0.45 seconds.

      Comparing these to BoBaBrain's original work:

      Google gives us the following interesting results:

      3,840,000 [google.com] sites contain the word Cheese.

      1,640 [google.ch] sites contain the words CowboyNeal and Cheese.

      Therefore, 4.27083333333333333333333333333e-2% of cheese related sites contain a reference to CowboyNeal.

      leads us to the following conclusions:

      1) His methodology is at least as sound as the original article's, which seems to be the seminal article in this rapidly emerging field of crap.

      2) Google returns very different results from one search to another. Consider that I found about 405 hits for CbN and cheese, while BoBaBrain found about 1600. There seems to be justification for Google's using about to qualify their results.

      3) As a consequence of item 2), we should view results based upon Google searches with some scepticism.

      4) We all have far too much time on our hands.

    8. Re:Google/CowboyNeal Study by Rupert · · Score: 2

      What you got here is one of them reductio ad absurdum thingamajigs.

      Your assumptions seem reasonable. I'm a reasonable man, and I don't know nothing about no CowboyNeal and his cheese fetish. But when you get to an absurd result like .04% of the web are belong to CowboyNeal, then clearly your assumptions are flawed.

      Ergo, CowboyNeal has a "special connection" to cheese. Quod Erat Doodah.

      --

      --
      E_NOSIG
    9. Re:Google/CowboyNeal Study by Anonymous Coward · · Score: 0

      You know, CowboyNeal vs http yields 5.66e-2%, which is comparable.
      25% error or so.

    10. Re:Google/CowboyNeal Study by BoBaBrain · · Score: 1

      "Ergo, CowboyNeal has a "special connection" to cheese."

      Unfortunately similar results are found for many other words too.
      I believe the false assumption in this case is that 1 site == 1 topic. A look at the results show that the pearlmonks' board covers numerous diverse, bizarre topics. :)

      --
      I am a Karma Library.
  37. major source of web decay: by deltavivis · · Score: 1

    bored geeks mercilessly devouring the download limit of free sites...I can't help but find it amusing that this guys decay information has just decayed.

  38. Re:A new Slashcrap highpoint by Anonymous Coward · · Score: 0

    Yeah no doubt.

  39. Not that accurate by Jormundgard · · Score: 2

    I can't even find my page on google anymore. I don't know if it's just because my site's unpopular, or because it has the same name as an online retailer. In any case, it's not searchable anymore, and my guess is that it was removed as "dead".

  40. Jakob Nielsen: Web Pages Must Live Forever by jukal · · Score: 3, Interesting

    Once you have put a page on the Web, you need to keep it there indefinitely. Read more. Slow news day, eh?

    1. Re:Jakob Nielsen: Web Pages Must Live Forever by elb · · Score: 1

      Wow. I would have expected a usability guy to be less obtuse about human motivations than a pure engineer (Tim Berners-Lee -- "There are no reasons at all in theory for people to change URIs (or stop maintaining documents), but millions of reasons in practice." -- http://www.w3.org/Provider/Style/URI ).

      How about these reasons: the content sucks! It was stupid to begin with. It doesn't deserve immortality -- it's too embarassing to live. Most of the stuff I've taken off the web was because because it became dated, useless, outmoded, pointless, or just plain embarassing. Not all information is worth preserving. And reasons to stop maintaining documents: boredom; information isn't popular enough to justify the expense of maintaining it; relevance.

      Add in the practical considerations -- people move servers, graduate from school, change DSL providers, software schemas (e.g. flat html to ASP to PHP or whatever). Obviously URIs were NOT well-designed to handle such contingencies.

      These things should be apparant a mile away. Why is link rot so horrible anyway, aside from the annoyance of clicking on something that turns out not to be there?

  41. even Bill thinks he sucks more than ever by maccallr · · Score: 3, Interesting

    I don't claim this is the authoritative answer, or an in-depth study, but the raw data comes from Bill's very own MSN search: bill gates sucks, check it out...

    Google SOAP thing for compare-stuff is in the pipeline...

  42. Re:The guy who posted this may have made a mistake by keller · · Score: 1

    Essentially everything gets older by the minute!!!

    Actually this post is getting old!!

    --

    Enig? Det alt for hot det smor!

  43. One More Reason Why The Web Sucks by rudy_wayne · · Score: 1

    "Temporarily Unavailable

    The Angelfire site you are trying to reach has been temporarily suspended due to excessive bandwidth consumption."

    Imagine that you were renting a building and running a business - a retail store. One day, the owner of the bulding comes in and padlocks the doors and says "Sorry, you can't re-open till the first of the month - too many people have come into your store".

    What stupidity.

    1. Re:One More Reason Why The Web Sucks by aderuwe · · Score: 1

      If you're running a business, you don't go to Angelfire. In fact, I doubt if they allow anything commercial at all.
      If you're using someone else's building (for no cost, mind), this person certainly has the right to kick you out if he feels "too many people have come by".

  44. History book? Not as far as I can tell . . . by phobonetik · · Score: 3, Interesting

    Our weblogs show that google visits our site (www.up.org.nz) atleast monthly, and it is by no means a huge traffic drawing site in the global senee. Its' last visit was on 13th April, drawing 1888 hits...

    1. Re:History book? Not as far as I can tell . . . by wolf- · · Score: 1

      Our logs show google spiders visiting at least 2x a week. They crawl for about 10-15 minutes each visit. Takes about 2-3 weeks for the updates to hit the indexes though, which still isn't bad.

      --
      ----- LoboSoft specializes in Digital Language Lab
  45. shameless self promotion by Anonymous Coward · · Score: 0

    In his story submission, scottennis spoke very impersonally of the study he authored himself:

    "By plotting the number of results at 3, 6, and 12 months for a series of phrases, this study claims to have uncovered a corersponding 60-70-80 percent decay rate."

    Was that just scientific detachment or was it someone pretending that he and a few clueless Slashdot editors aren't the only ones who would take any serious interest in this numerology?

    "bit-rot-quantified" department eh...how bout the "but-not-qualified" department.

    1. Re:shameless self promotion by scottennis · · Score: 1

      Actually, it was an attempt at journalistic detachment.

  46. Needs a broader range by TJ6581 · · Score: 1

    late at night--This phrase returned the highest number of results of any I searched and yet it also adheres closely to the 60-70-80 rule.
    If he really wanted a large search he should have tried "porn".....

    --
    "Freedom of speech has always been the abstract red-headed stepchild of the Constitution"
    -Suck
  47. Heh.. Talk about web decay. by Bowie+J.+Poag · · Score: 5, Funny



    Looks like 100% of the link mentioned in this article decayed in a little under 5 minutes! ;)
    Cheers,

    --
    Bowie J. Poag

    1. Re:Heh.. Talk about web decay. by rmohr02 · · Score: 1

      It'd be nice if Google ran through news articles at /. and cached the external links. But anyway, the /. editors should know better than to link to Angelfire or Geocities.

    2. Re:Heh.. Talk about web decay. by Anonymous Coward · · Score: 0

      Uh, obviously, those numb nuts don't. Stupid mooks. Angelfire? Fuck

  48. heh by inKubus · · Score: 2

    Yeah, in bytes. I wonder how many digits that would be?

    --
    Cool! Amazing Toys.
  49. Googlebot Visits Monthly by BigBlockMopar · · Score: 2

    Are google claiming that they can check through the entire internet inside a timescale of 3 months, ready to check through again at the start of the next quarter?

    I don't know if that's all that far-fetched. I know Googlebot last hit my site on April 7th, crawled every page in my domain over the course of 12 hours, and current searches of their cache show content I'd updated at that time. They seem to visit every month or so.

    Perhaps it's based on the traffic they detect to a given site through their CGI redirects... but I'm not a large site, my primary webserver is a Pentium 90. :)

    crawl4.googlebot.com - - [07/Apr/2002:13:36:32 -0400] "GET /broken_microsoft_products/ HTTP/1.0" 200 128854 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

    --
    Fire and Meat. Yummy.
    1. Re:Googlebot Visits Monthly by jpm165 · · Score: 1
      "Perhaps it's based on the traffic they detect to a given site through their CGI redirects... but I'm not a large site, my primary webserver is a Pentium 90. :)


      crawl4.googlebot.com - - [07/Apr/2002:13:36:32 -0400] "GET /broken_microsoft_products/ HTTP/1.0" 200 128854 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"


      wow, not only are you running your domain off a pentium 90, but you also have reverse DNS lookup turned on in the logs... that's gotta be giving you a decent preformance hit, no?

    2. Re:Googlebot Visits Monthly by BigBlockMopar · · Score: 2

      wow, not only are you running your domain off a pentium 90, but you also have reverse DNS lookup turned on in the logs... that's gotta be giving you a decent preformance hit, no?

      Well, it doesn't actually handle DNS; that's felix, an old 486DX-33 running FreeBSD, port-forwarded behind my gateway (I've only got the one IP address). But yeah, I'm sure each logger thread gets held up waiting for resolution.

      More impressively, dynamic content. (Most of the pages are generated dynamically as shtml through the x-bit hack; nothing sophisticated, mostly just inserting templates and stuff for color scheme because I'm too lazy to type long BODY tags) And anywhere from 2,000 to 5,000 hits per day. And only 48 megs of RAM. And it's a popular Linux distro's default kernel, not recompiled for that machine. Even so, it hardly ever breaks a sweat.

      As you can tell, it's like, zero performance tuning. But it still cranks out a SETI@Home unit every day or two.

      As for reverse DNS itself, yeah, I like it. :) It's a nice luxury.

      --
      Fire and Meat. Yummy.
    3. Re:Googlebot Visits Monthly by KlomDark · · Score: 1

      Heh, that ain't shit... I've got Apache with reverse lookups on running on a 386/25 at http://klomdark.servebeer.com:8081/ :)

    4. Re:Googlebot Visits Monthly by Jonny+290 · · Score: 1

      Not any more. :)

      tsk, tsk. Silly people posting things like that on SLASHDOT, of all places. :D

      --
      Hey Taco! Looks like you're using the "infinite monkeys and typewriters" scheme to generate Ask Slashdots again...
  50. Google "pages found" data by Per+Abrahamsen · · Score: 3, Interesting
    I have maintained a number of google celebrity lists, where celebrities in various categories are ranked based on the number of page hits by google.

    While the numbers clearly aren't totally random, they are very fragile indeed. Some people have had a change of two orders of magnitude, within a week. And in these cases, there have usually been no real world events that could explain such a change. I guess the google page hits numbers depend as much on the internal google structure, as on the number of actual pages on the web.

    So I doubt google page hits statistics is a useful research tool. Nonetheless, it can be fun. Here are some google hall of fame lists:

    PS: Mail me to suggest new entries to the lists.
    1. Re:Google "pages found" data by Anonymous Coward · · Score: 0

      Perhaps more interesting if the links had come up right - I see five slashdot ones. I guess you meant this instead.

  51. average and real life persistance of documents by kipple · · Score: 2

    .. I noticed that a paper I wrote a LOT of years ago can still be found online somewhere.. so I suppose that although -in the average- web pages do disappear, if those pages contain documents, they will survive the death of their original webpage.

    not that it was an interesting document - just a little paper about nothing important. But still, it's out there.

    My thoughts? I think that as long as a website can be "saved" in some form, its content will be available in other forms for a long amount of time.

    this should make people think, especially those who put copyrights on their webpages, or don't want some information to spread around.

    could we say that information want to be free as long as it's downloadable?

    hmm..

    --
    -- There are two kind of sysadmins: Paranoids and Losers. (adapted from D. Bach)
  52. I conclude there is no decay ! by bushboy · · Score: 1

    By thoroughly researching the following phrases on www.yahoo.com :-

    Sex
    Warez
    mp3

    I have discovered that amazingly, my results differ substantially !

    In conclusion, then, it seems that content is ultimately always fresh and there is no indication of decay !

    --
    A slashdotting - you get the stick first and then the carrot !
  53. Blessed are the cheese makers by kzinti · · Score: 2

    What's so special about the cheese makers?

    It's not meant to be taken literally. It refers to any manufacturers of dairy products.

    -- Monty Python, Life of Brian

    1. Re:Blessed are the cheese makers by kubrick · · Score: 1

      Oh, shut up, Big Nose!

      --
      deus does not exist but if he does
  54. Wide jump from findings to conclusion by gpmart · · Score: 5, Interesting
    In fact, I would argue that good content need not change. Aside from the obvious issues with the small sampling of phrases, the web is, thankfully, not just a series of catch-phrases. In fact, it was designed to carry complex information such that it could not be reduced.

    What scares me here is the conclusion that web sites need to change their content 60% every 3 months. This is not freshness, this is reorganizing to re-organize. If you are considering doing this, you had better seriously re-consider your future. Its an interesting study but a good meme doesn't die simply because the catch-phrases are tired.

    At faculty meetings at our school I sit with a bingo card. On it are a series of catch-phrases. We listen for the catch-phrases and shout out when we have finished our cards. B***SH*T is the game and to reduce your content to a series of reorganized catch-phrases is like having a marketing guy develop foreign policy.

    Anyone willing to write the perl module that searches for the latest catch-phrases and inserts them randomly into your web content. Yeesh!

  55. Google measurements--varied and fun! by lute3 · · Score: 2, Funny
  56. Earth to Slashdot by nahdude812 · · Score: 0, Redundant
    When linking to an Angelfire site, you might as well be linking here.


    Anyone have any mirrors? (by the time I'm done posting this, there'll probably have been a dozen "First Mirror Posts" but oh well.)

  57. Intermittence by markmoss · · Score: 4, Funny
    At any given time, about five percent of Web pages are intermittent, which is to say they are gone but will return.

    For example, most web pages linked to in slashdot articles.
  58. Decay by This+is+RobV · · Score: 1

    Ironicaly, this site on decay, adds to the decay.

  59. The Slashdotted Decay Rate by aries78 · · Score: 1

    They failed to include one statistic: The decay rate when the Slashdot Effect is applied to a website: 99.998%

    :)

  60. Haiku by Darth+Paul · · Score: 0, Offtopic

    Indeed life is short
    Gone are the cheesemakers, but
    Bill gates always sucks

    1. Re:Haiku by ehackathorn · · Score: 1

      Not that it matters, but I think you're funny... Darn moderators without a sense of humor!

  61. Google Study in Another Place by scottennis · · Score: 5, Informative

    The study I posted on Angelfire appears to have reached a bandwidth threshhold. I've made the same study available here:

    http://helen.lifeseller.com/webdecay.html

    I've also included a link to the raw data I used.

    1. Re:Google Study in Another Place by camelrider · · Score: 0, Offtopic

      Thank you!

    2. Re:Google Study in Another Place by Anonymous Coward · · Score: 0

      This study doesn't make any sense to me - Are the measurements relative to your arbitrary starting time ? (the graph suggests not) or relative to today ?.

      What do the numbers measure - number of pages containing the phrase or number of pages which are now missing the phrase ?

      To me it looks more like 20% of 12 month old pages have changed vs. 40% at 3 months...weird!

  62. Local storage of information that I want to keep by coldmist · · Score: 1

    I have never liked the smell of bit-rot, so I like to keep them close by my desk where I can keep them well-watered and pruned. ;)

    For years, whenever I've found an article that I've liked, or data that I thought would be useful later on, I've always either saved the .html file or text off to my hard drive, or (lately) used Adobe Acrobat to get the whole page (preserving graphics and layout in one binary file, rather than 100 extra .gif/.jpg images in a directory somewhere).

    Ryan

    --
    Don't steal. The government hates competition.
  63. web decay by Anonymous Coward · · Score: 0

    This gal may see an inordinate amount of traffic soon (http://www.web-decay.net).

  64. website offline by badbytes · · Score: 1

    The website pointed to by the article seems to have been taken down due to excess bandwidth usage ... text-book example of being slashdotted :-).

  65. Thought and mod_rewrite are the key by Fweeky · · Score: 5, Insightful

    The key to making links that don't rot is to design a URI schema that's both independent of any redesigns of your site and independent of any particular way of doing things.

    Let's look at a few examples.

    The URI to this page is http://slashdot.org/comments.pl?sid=31884&op=Reply &threshold=3&commentsort=3&tid=95&mode=nested&pid= 3434535 - what is it telling you that it doesn't need to?

    Well, for a start, that .pl is a bad idea. What happens in 4 years time when SlashDot is running on PHP, or Java, or Perl 7, or a Perl Server Page, or ASP? Then there's the difficult-to-decode query string that tells you nothing about the link other than "this is the information the server needs to locate your page at the moment", and doesn't give you much faith in it living forever.

    Now let's look at an equivilent Kuro5hin URI.

    http://www.kuro5hin.org/comments/2002/4/29/22137/6 511/51/post#here is a URI to reply to a random comment on k5.

    For a start, you can't tell what application or script is serving you the page, and you can't see what type of file it's linking to; both these things can and will change over time.

    Second, there's a date embedded in there; you can see the developers, if they ever decide to change the meaning of '/comments', using that date as a reference; if the URI is before the change, they can map it onto the new schema or pass it onto legacy code.

    Having the date in the URI is good because it allows you to determine when the link was issued, and map it onto any changes or pass it off to a legacy system as required.

    Now let's take an apparantly good link on my now horribly out of date site, aagh.net.

    http://www.aagh.net/php/style/ links to an article on PHP coding style.

    Certainly, hiding the fact that I'm using PHP to serve this document is good, and shortening the URI to remove the useless querystring is good (you can't see one? Good, that's the point), however, this URI may well stop working in a few weeks; I'm planning a redesign and the old schema may well not fit in well with it.

    A short yyyymm in there could have made all the difference; a simple if check on the URI's issue date would keep it working.

    The moral of the story: Think about your URI's when you're designing a site. Try to remove as much data as you can without painting yourself into a corner.

    1. Re:Thought and mod_rewrite are the key by Jahf · · Score: 2

      Someone with mod points and a clue about how to organize a web site, please mod the parent up as insightful (or informative).

      If I had mod points today, I would do it ... the web needs more thought put into it's architecture and less put into it's look and feel.

      --
      It is more productive to voice thoughtful opinions (reply) than to judge (moderate) others.
    2. Re:Thought and mod_rewrite are the key by Skiboo · · Score: 1

      Despite sometimes containing redundant information, I like the way slashdot does things in this regard. It tells you a lot more than your k5 example. Also, you can easily modify them. The URI for your post is http://slashdot.org/comments.pl?sid=31884&cid=3435 254.

      By simply adding &threshold=-1 to the end of that, I can see all the replies at -1 easily and painlessly.

      Do you know how to make k5's comments nested instead of threaded purely using the URI? I'm not sure (I'm really not, this may or may not be easy at k5), but adding &mode=nested to slashdot's seems pretty easy (these can be in any order).

      The point is, wether or not it takes the optimum number of bytes isn't always the priority, as with anything, there are always other factors, in the case of /., its designed to be easy to use for the (savvy) user, not easy on the server

    3. Re:Thought and mod_rewrite are the key by Fweeky · · Score: 3, Informative

      > By simply adding &threshold=-1 to the end of that, I can see all the replies [slashdot.org] at -1 easily and painlessly.

      The argument wasn't "query strings are bad, m'kay"; look at the URI and see what information's in there. Does .pl serve any purpose? Does sid=3188 gain anything, aside from make the page very difficult to serve statically? Does tid=95 and pid=3436807 tell you anything?

      The URI's would work just as well using something like http://slashdot.org/stories/31884/comments/3436807 / 5/?mode=nested&threshold=-1; even if /stories/31884 were a static file, the URI would still work and point roughly to the right place. And it's not exposing the internals of how the comments system works, and it's keeping the more readily tunable query strings clear, without making the exact resource you're pointing to difficult to find.

      > Do you know how to make k5's comments nested instead of threaded purely using the URI?

      No. Actually, I wasn't really pointing out k5 as being the perfect example; Scoop actually tends to really suck in this respect (like setting the URI to '/' when you change comment modes). However, I might be tempted to ask you which URI is likely to live the longest, certainly back when SlashDot used to archive articles after a couple of weeks.

      > The point is, wether or not it takes the optimum number of bytes isn't always the priority

      I never once said the size of the URI was important. I said it contained a lot of extranious information that changed a lot while meaning little (i.e. the URI's changed from the dynamic query string to an .shtml file when a story was archived).

      > in the case of /., its designed to be easy to use for the (savvy) user, not easy on the server

      What's easier for the "savvy" user? A URI that will work for the rest of SlashDot's life, or one that'll last until the story is archived, or the underlying architecture changes, and which contains a lot of randomly ordered and mostly meaningless information?

      A well designed URI scheme will actually give the savvy user a lot more control; say, you include the date of an article, ala http://slashdot.org/stories/2002/05/30/; you can imagine going to such a URI and getting all of the stories on that day, month or year, and instantly being able to identify how old a linked to article is. You can also imagine an archived URI and a live, dynamic URI both using the same schema.

      You can also imagine giving a URI of an interesting article to a friend without first having to decode the query string; just strip off anything after /comments and they get the story.

      Note: This applies to any site, those particular SlashDot and k5 URI's were just examples.

  66. guidelines for content change? by call+-151 · · Score: 2
    From the article:


    Therefore, If a company wants to maintain a freshness rate on par with the web as a whole, their site content should be updated at the inverse rate. In other words:
    60% of the site should change every 3 months
    70% of the site should change every 6 months
    80% of the site should change every 12 months
    The only way to do this effectively is to either have a very small site, or have a site with dynamically generated information.


    This seems so totally- "if everyone else is
    jumping off the Brooklyn Bridge, then we
    should to" by itself that it discredits what
    sliver of credibility the article had. Using
    a web-wide average as a guideline for what
    a particular web site "should do" is
    meaningless. Web sites should present timely,
    appropriate information that is useful to
    those who visit. Some sites deal with
    material that changes frequently (stock quotes
    and sports sites should be presumably updated
    regularly) and some sites deal with material
    that does not change frequently (no need to
    redo your tech support documents for long-
    out of production products every week.)
    This notion of `freshness' is ill-defined,
    poorly measured and of dubious value.

    --
    It's psychosomatic. You need a lobotomy. I'll get a saw.
  67. Phrases that won't suffer any decay, ever: by hdparm · · Score: 1

    1. Here is tonight's top 10 list

    2. Critical Updates Package (138 MB)

    3. Hey Ho Let's Go

    4. Nobody's perfect

    and, of course

    5. News for nerds, stuff that matters

    1. Re:Phrases that won't suffer any decay, ever: by cant_get_a_good_nick · · Score: 1
      3. Hey Ho Let's Go


      In an odd karma thing, maybe 3 seconds after I started reading this comment, the program I'm pseudo watching played Blitzkreig Bop. WEIRD.

    2. Re:Phrases that won't suffer any decay, ever: by hdparm · · Score: 1
      See? Some things live forever.

      GABBA - GABBA - HEY!!!

  68. Correct links by Per+Abrahamsen · · Score: 3, Interesting
    All the links were wrong. Hopefully, these are better:
    1. Re:Correct links by Anonymous Coward · · Score: 0

      Errmmm, you forgot Taylor Dane :-)

    2. Re:Correct links by Anonymous Coward · · Score: 0

      Only in a survey of people who like to root through google for fun would Jakob Nielsen outrank Hans Christian Andersen and Niels Bohr on a list of famous Danes.

  69. How do you account for domain changes? by yerricde · · Score: 2

    The key to making links that don't rot is to design a URI schema that's both independent of any redesigns of your site and independent of any particular way of doing things.

    You can't mod_rewrite a domain name that you have lost control over. If you have a popular site hosted on a university's server, and then you graduate, what do you do? If you put up a site, some Yakkestonian trademark holder takes it from you in WIPO court, and you're forced to go to Gandi.net to get a new domain, what do you do?

    --
    Will I retire or break 10K?
    1. Re:How do you account for domain changes? by Anonymous Coward · · Score: 0

      Well then, why bother with anything?

    2. Re:How do you account for domain changes? by Fweeky · · Score: 2

      > You can't mod_rewrite a domain name that you have lost control over.

      Nope, that's too bad. You can mod_rewrite a domain name you do have control over, though. You can also see if you can get the new owners to redirect to your new domain.

      If, say, all your URI's start with a date, you ask the new owners to redirect any URI containing '/yyyy/mm/dd' less than the date you lost the domain to your new site. You may not get it all or even most of the time, but the option is still there.

      Alas, this is one of those cases URN's would come in handy.

  70. Is this study a joke? by Anonymous Coward · · Score: 0

    I hope that this study is a joke, if not the horse shit that people churn out is getting worse and worse. I cant belive this made the front of slashdot.

  71. You seem to have some good ideas by drew_kime · · Score: 2

    The analysis of the data is poor; anyone interested in decay would suspect some kind of exponential decay. They would therefore plot the data logarithmically, and perhaps calcualte a half life. Piss poor.

    So when can we expect to see your rigorous analysis? Or were you just bitching?

    --
    Nope, no sig
  72. Re: Google is good but have you tried... by Silentbob54 · · Score: 1

    The Open Directory is the bigest load of crap I have ever seen in my life. I could not find a fucking thing I was looking for on that load of shit.

    --
    Nootch, SilentBob
  73. For you, yes, it is a joke. by scottennis · · Score: 1

    Did you find yourself laughing hysterically?

  74. See this letter I wrote to TimBL by yerricde · · Score: 2

    Once you have put a page on the Web, you need to keep it there indefinitely.

    How is this possible if you happen to lose control of the domain? I wrote a letter to Tim Berners-Lee about this issue.

    In "Cool URIs don't change" you wrote that URIs SHOULD never change. However, you left some questions unanswered:

    In theory, the domain name space owner owns the domain name space and therefore all URIs in it.

    However, what happens when ownership of the domain name is suddenly removed from under a user's feet?

    Except insolvency, nothing prevents the domain name owner from keeping the name.

    Wrong. A trademark owner in Yakkestonia can drag a domain name owner into WIPO court and have the domain forcibly transferred. Under ICANN's dispute resolution policy, the plaintiff gets to pick the court, and WIPO has shown itself to find for the plaintiff in an overwhelming majority of cases. This is sometimes called "reverse domain name hijacking."

    And in theory the URI space under your domain name is totally under your control, so you can make it as stable as you like.

    Not if a hosting provider provides only subdomains (or worse yet, subdirectories) and does not offer an affordable hosting package that lets a client use his or her own domain name. Would it be reasonable to construe the "Cool URIs don't change" article as a warning against using such providers?

    John doesn't maintain that file any more, Jane does. Whatever was that URI doing with John's name in it? It was in his directory? I see.

    What is the alternative to this situation for documents that began their lives hosted on an ISP's or university's server space?

    Pretty much the only good reason for a document to disappear from the Web is that the company which owned the domain name went out of business or can no longer afford to keep the server running.

    Or that the hosting provider pulled the document under a strained interpretation of its Terms of Service because the company didn't like the document's content.

    Is there an official W3C answer to these questions?

    --
    Will I retire or break 10K?
  75. timothy, king of bullshit, strikes again by Anonymous Coward · · Score: 0

    yep.

  76. Exaclty who should be proving your thesis by bihoy · · Score: 2

    It may be a valid thesis that you are putting forth. It does occur to me, however, that you seem more interested in having someone else prove it for you. Your rather cursory investigation and lack of basis neither lends credence to your theory nor compels one to take it seriously. If you desire the respect of the scientific community then I suggest you put a little more work and effort into it.

    1. Re:Exaclty who should be proving your thesis by scottennis · · Score: 1

      I didn't realize slashdot was considered a scientific community.

    2. Re:Exaclty who should be proving your thesis by Anonymous Coward · · Score: 0
      Are you saying that you don't want the respect of the scientific community, but do want the respect of Slashdot?

      If you dont want the science respect then why post this study on angelfire? I think that by writing this thesis and approaching an effort to prove it you are demonstrating a desire to be respected by the scientific community. Otherwise, why not do as the politician and just claim fact? The criticisms given are valid and you would do well to follow them. But you decide to react defensively . . . good advice is hard to come by. Take it when you get it.

  77. Free hosting is a bad bargain by Frank+T.+Lofaro+Jr. · · Score: 4, Insightful

    Why do so many people use crap like Angelfire, Tripod, Homestead with all their bandwidth limits, restrictions, ads and blocking of remote image loads?

    Not to mention that well over 50% of the time any search engine result that points to Angelfire in particular points to a 404 Not Found. This is much more than what I experience with other sites. Do their users get kicked off often, or just go away, or what? I don't even bother clicking on those results unless it looks like the content is truly compelling. And thank God for Google's cache.

    I can understand if some truly can't afford hosting, but even for these people, even Geocities is much better!

    Somehow I doubt the majority of those people using Angelfire, Tripod, etc can't afford hosting.

    Well, after the dot-com world gets a little more squeezed, those sites may no longer exist. Too bad that many people won't bother rehosting their content and will just drop off the web.

    olm.net offers Linux based hosting for under $9/month. No I don't work for them, but I am a (satisfied) customer.

    $9 a month - and you won't piss off your users.

    (Yes I know their other packages are more - but the $9 a month package is better than any of the free services)

    Don't EVEN get me started on organizations and commercial BUSINESSES (ack!) that use free hosting - that is so unprofessional. I don't think I'd want to do business with a company (even a local store) that wouldn't/couldn't pay $9 a month to have a less annoying and more reliable website.

    Of course, some of the content out on the Web isn't even worth $9/month, heck some of it has NEGATIVE worth. ;) Of course, then it isn't worth looking at, so who cares if it is even hosted.

    --
    Just because it CAN be done, doesn't mean it should!
    1. Re:Free hosting is a bad bargain by g0rath · · Score: 1

      Well tripod now has it so that you can pay $9 and ads go away. And they can release the bandwitdh limitation. So you have both the free loaders with all the hassels, or you can have more bandwidth for slightly less then an ISP for a network larger then the typical ISP.

    2. Re:Free hosting is a bad bargain by skivvie · · Score: 1

      logjamming.com

      $5/month
      100mb
      Red Hat

      word.

    3. Re:Free hosting is a bad bargain by 56ker · · Score: 2

      "Somehow I doubt the majority of those people using Angelfire, Tripod, etc can't afford hosting." - but for most of the sites - like blogs, pictures of my family and pets - people don't think its worth paying! Also once you change address you lose your search engine rankings.

  78. +5, Troll indeed... by robson · · Score: 1

    It's a ringer for a typical adequacy.org story :)

    (Link omitted deliberately.)

  79. haha! by sulli · · Score: 1

    i love it. who would have thought that the "dying" troll would live so long?

    --

    sulli
    RTFJ.
  80. Re:Obligatory Karma Whore by c_jonescc · · Score: 1

    See above.

    --
    Getting diabetes AND salmonella would be a bad weekend.
  81. Decay - is this a joke :-) by MuskOX · · Score: 1

    Okay - this article is very entertaining but "Deacy" is the wrong word for this - the author must have been studying capacitors and the decay rate of charge on a capacitor. I would suggest substituting "decay" for "charge"

  82. Completely Incorrect! by Anonymous Coward · · Score: 0

    I can't believe this made Slashdot!

    This seems to be a complete missinterpretation of the data. Is he saying that 1 year ago there were more hits than 6 months ago, and 6 months ago more hits than 3 months ago?

    The Google search is returning all valid pages within the past 3, 6, and 12 months. So, all of the current pages are listed in the 12 month search also.

    From the data he has provided, it is possible to interpret either that the number of pages is in constant decrease, or that the number is in constant increase (with old pages being removed or relocated).

    Using his data for "home run king", 7070, 7520, 7920, 8900:
    You could say that more than a year ago, there was only 980 hits (anytime - 12 months)
    Then, 6 months to a year ago, there were 400 added (12 months - 6 months)
    Then 3 months to 6 months ago, 450 were added (6 months - 3 months)
    And, in the last 3 months, 7070 pages were added (3 month value)

    This shows a constant increase! Sure, this is highly unlikely, but it is a possible way these hits were gathered. His data collection gives no way to tell how many pages have been removed between periods, or how many were relocated.

    Why did this get posted?

    vk

    1. Re:Completely Incorrect! by microchp · · Score: 1

      I am still curious how this takes into account the data which is automatically regenerated on purpose, and also the spider intervals of the search engines. Most of them are overloaded with not to mention many are tainted due to "pay for higher rankings" content. --mcp am

      --
      --mcp
  83. Decay and archives by ferreth · · Score: 1

    "Decay" would be more along the lines of X% of links become dead after 3 months. You'd have to collect a bunch of live links from various search terms and check ALL of them 3,6,12 months down the road and see if they're still there. 60% is more a measure of changed/new content in the last three/whatever months. At least the web isn't stagnating.

    What about archives? They should not care about being 'fresh' beyond adding stuff to the archive. I want to be able to bookmark something in an archive for future reference and be able to come back to it in three years and still find it there, just like a library.

    The argument that web sites should change 60% of their content in order to keep up with the average is like saying we should all be wearing puke-green colored clothes because that's the average color of the universe - the reason has nothing to do with reality. Web content should be as 'fresh' as the information being provided demands of it. Weather forcasts should change daily, stockmarkets - hourly, slow pitch standings - monthly, and so on.

    --

    W9x:Thanks for the make-work project Bill.

  84. Someone mod it that way by Anonymous Coward · · Score: 0

    Mod it -1 Troll, then post in the discussion, for the coveted and richly deserved "5, Troll" score.

  85. Irony! by gnovos · · Score: 2

    I love how this very page seems to have died... The web is a massive irony generator.

    --
    "Your superior intellect is no match for our puny weapons!"
  86. Just wanted to remind you that... by Anonymous Coward · · Score: 0

    ...Bill Gates is the Devil! the Devil! Thank you for your attention.

  87. Using Google to Calculate Public Opinion by lostchicken · · Score: 1

    Has anyone written a script to try to figure out if a text message is a positive one or a flame? Shouldn't be too hard (you'd toss out A LOT of 'unknowns').

    If someone has, you could graph the ratio of positive/negative posts to USENET for a set of keywords over time.

    One could also graph the total volume. That would be much easier.

    --
    -twb
  88. The first last post of the new revolution by The+Last+Post · · Score: 1

    Down with Michelle's fascist regime of censorship! Long live the Widener!

  89. What�s the oldest link on the web? by Enocasiones · · Score: 1
    Not the oldest, but the first one for something OT. Then theres the oldest link, couldn't be any of these, but could be in here.

    This is a difficult question to answer, but the answer is full of totally unrelated semi-googlewhacks and curious links.

    --
    Enoc
  90. Using Google to Solve Age Old Disputes by geoff123 · · Score: 1

    Try this out! It's a PHP script using the Google API. Now you can discover if the world likes dogs better than cats, and sex better than love (duh).