Slashdot Mirror


Using Google to Calculate Web Decay

scottennis writes: "Google has yet another application: measuring the rate of decay of information on the web. By plotting the number of results at 3,6, and 12 months for a series of phrases, this study claims to have uncovered a corresponding 60-70-80 percent decay rate. Essentially, 60% of the web changes every 3 months." You may be amused by some of the phrases he notes as exceptional, too.

7 of 208 comments (clear)

  1. Re:Not exactly decay... by Anonymous Coward · · Score: 1, Insightful

    I think that the larger organisms are renewing themselves on a regular basis as well. I fyou look at large sites - any of the Microsoft bundle, BBC News, Financial Times - they are all changinge from hour to hour or maybe day to day for the non news pages.

    It's the medium size businesses that don't seem to be grasping the web and the fact that you need to have a site that is dynamic in so far as it keeps people interested and possibly entertained.

    I'm lucky in that the company I work for is a small firm and a publisher so we have daily news content and well as on-line versions of our weekly and monthly publications (HTML and PDF downloads!) being uploaded all the time - so our web traffic is growing constantly - slowly but it hadn't seen a decline in the past two years.

    M@t :o)

  2. we've lost the ability to rely on hyperlinks by thegoldenear · · Score: 5, Insightful

    Tim Berners-Lee wrote :"There are no reasons at all in theory for people to change URIs (or stop maintaining documents), but millions of reasons in practice.": http://www.w3.org/Provider/Style/URI and advocated creating a web where documents could last, say, 20 years and more

  3. Re:bill gates sucks... by prizzznecious · · Score: 4, Insightful

    All this means, actually, is that the sites that would include the information "Bill Gates Sucks" are not being updated very often, or have little else to say.

    It's an indicator of the dubious kind of context in which one finds such rash statements.

    --

    visit the hwky website for a lyrical genius infusion.
  4. Study? by Anonymous Coward · · Score: 4, Insightful

    Wow! What a wonderful, in-depth, study! Is there any link to a scientific paper on that page that I am missing or is that everything? I mean, how can someone claim something just showing us a few numbers and an excel graph.

    I appreciate the topic very much, but some more material on it is needed. This study wouldn't be complete enough even for high-school homework...

    And look at his homepage (just remove the last part of the url). The most pages are more than two years old... that's decay! :)

    Seriously speaking, just look for a few more sources before you accept a story.

  5. Study claims ?? by Anonymous Coward · · Score: 1, Insightful

    this study claims to have uncovered a corresponding 60-70-80 percent decay rate. Essentially, 60% of the web changes every 3 months."

    The guy that submited this story is the guy that did the study.

  6. Thought and mod_rewrite are the key by Fweeky · · Score: 5, Insightful

    The key to making links that don't rot is to design a URI schema that's both independent of any redesigns of your site and independent of any particular way of doing things.

    Let's look at a few examples.

    The URI to this page is http://slashdot.org/comments.pl?sid=31884&op=Reply &threshold=3&commentsort=3&tid=95&mode=nested&pid= 3434535 - what is it telling you that it doesn't need to?

    Well, for a start, that .pl is a bad idea. What happens in 4 years time when SlashDot is running on PHP, or Java, or Perl 7, or a Perl Server Page, or ASP? Then there's the difficult-to-decode query string that tells you nothing about the link other than "this is the information the server needs to locate your page at the moment", and doesn't give you much faith in it living forever.

    Now let's look at an equivilent Kuro5hin URI.

    http://www.kuro5hin.org/comments/2002/4/29/22137/6 511/51/post#here is a URI to reply to a random comment on k5.

    For a start, you can't tell what application or script is serving you the page, and you can't see what type of file it's linking to; both these things can and will change over time.

    Second, there's a date embedded in there; you can see the developers, if they ever decide to change the meaning of '/comments', using that date as a reference; if the URI is before the change, they can map it onto the new schema or pass it onto legacy code.

    Having the date in the URI is good because it allows you to determine when the link was issued, and map it onto any changes or pass it off to a legacy system as required.

    Now let's take an apparantly good link on my now horribly out of date site, aagh.net.

    http://www.aagh.net/php/style/ links to an article on PHP coding style.

    Certainly, hiding the fact that I'm using PHP to serve this document is good, and shortening the URI to remove the useless querystring is good (you can't see one? Good, that's the point), however, this URI may well stop working in a few weeks; I'm planning a redesign and the old schema may well not fit in well with it.

    A short yyyymm in there could have made all the difference; a simple if check on the URI's issue date would keep it working.

    The moral of the story: Think about your URI's when you're designing a site. Try to remove as much data as you can without painting yourself into a corner.

  7. Free hosting is a bad bargain by Frank+T.+Lofaro+Jr. · · Score: 4, Insightful

    Why do so many people use crap like Angelfire, Tripod, Homestead with all their bandwidth limits, restrictions, ads and blocking of remote image loads?

    Not to mention that well over 50% of the time any search engine result that points to Angelfire in particular points to a 404 Not Found. This is much more than what I experience with other sites. Do their users get kicked off often, or just go away, or what? I don't even bother clicking on those results unless it looks like the content is truly compelling. And thank God for Google's cache.

    I can understand if some truly can't afford hosting, but even for these people, even Geocities is much better!

    Somehow I doubt the majority of those people using Angelfire, Tripod, etc can't afford hosting.

    Well, after the dot-com world gets a little more squeezed, those sites may no longer exist. Too bad that many people won't bother rehosting their content and will just drop off the web.

    olm.net offers Linux based hosting for under $9/month. No I don't work for them, but I am a (satisfied) customer.

    $9 a month - and you won't piss off your users.

    (Yes I know their other packages are more - but the $9 a month package is better than any of the free services)

    Don't EVEN get me started on organizations and commercial BUSINESSES (ack!) that use free hosting - that is so unprofessional. I don't think I'd want to do business with a company (even a local store) that wouldn't/couldn't pay $9 a month to have a less annoying and more reliable website.

    Of course, some of the content out on the Web isn't even worth $9/month, heck some of it has NEGATIVE worth. ;) Of course, then it isn't worth looking at, so who cares if it is even hosted.

    --
    Just because it CAN be done, doesn't mean it should!