Slashdot Mirror


Web Pages Are Weak Links in the Chain of Knowledge

PizzaFace writes "Contributions to science, law, and other scholarly fields rely for their authority on citations to earlier publications. The ease of publishing on the web has made it an explosively popular medium, and web pages are increasingly cited as authorities in other publications. But easy come, easy go: web pages often get moved or removed, and publications that cite them lose their authorities. The Washington Post reports on the loss of knowledge in ephemeral web pages, which a medical researcher compares to the burning of ancient Alexandria's library. As the board chairman of the Internet Archive says, "The average lifespan of a Web page today is 100 days. This is no way to run a culture.""

361 comments

  1. Worst Record Keeping by nberardi · · Score: 1

    I really think we are living in a world right now of some of the worst record keeping of knowledge.

    1. Re:Worst Record Keeping by klokwise · · Score: 2, Funny

      i really hope you have some evidence to back that up.

    2. Re:Worst Record Keeping by Urkki · · Score: 2, Interesting

      Nah. There was a time when only very very few could even read, let alone write, let alone keep any kind of records...

      But get your point. Too bad there are some restrictions on copying the web pages you are referencing...

      There should be some service, a bit like google's cache, you could use to store the referenced pages. I submit the page to the service, then provide two links in my own document, one to the original page (which will likely expire eventually) and one to the cached version. I wonder if they could get around copyright issues the same way google cache gets around them, even though this is a bit more permanent storage than google cache... Most web page authors certainly would not have any problem with having their pages archived there, quite the opposite, most would be happy to have their work referenced by others...

    3. Re:Worst Record Keeping by robslimo · · Score: 5, Interesting

      Ummm, maybe only as applies to this topic, which is to say that web pages are a poor place to keep records.

      I'd contend that researchers & scientists in general would be quite silly to site an electronic-only resource in their publications, because the persistence of that resource relies on too many factors (the whim of the webmaster, backups or lack thereof, fiber seeking and grid seeking backhoes, etc).

      I think that will all sort itself out and real scientists will continue or return to citing more traditional resources.

      What I think is much more disturbing and disruptive is the pseudo-science and mis-information that is overly abundant on the web. Too many web sites, personal and commercial, spout 'facts' in such great detail that they have the appearance of authority. Too often, novice/amatuer scientists can be seriously mis-lead by some of the crap that can be found on the web masquerading as 'science'.

    4. Re:Worst Record Keeping by richy+freeway · · Score: 5, Funny

      I had some evidence to back it up but all the links are long dead ;P

    5. Re:Worst Record Keeping by Anonymous Coward · · Score: 0

      Its a self-reinforcing Conspiracy!

      cool.

    6. Re:Worst Record Keeping by gilrain · · Score: 1

      Unfortunately, this method would throw out the good with the bad. If the website you submitted to the archive did not expire immediately, it would probably change for the better; and your referenced copy would not reflect the changes. Essentially, you would be referencing two different versions of the same work.

    7. Re:Worst Record Keeping by NickFitz · · Score: 1

      But unless you were revising your own work to reflect those changes, surely you should continue to reference the old version?

      This would be akin to (off the top of my head) citing a reference in the first edition of A Vision by the poet W B Yeats. As the second edition was a complete rewrite bearing virtually no similarity in either argument or conclusions to the first, updating one's references to the second edition would not only be undesirable, it would probably be impossible.

      --
      Using HTML in email is like putting sound effects on your phone calls. Just say <strong>no</strong>.
    8. Re:Worst Record Keeping by gilrain · · Score: 1

      My point is that the beauty of referencing internet material is its flexibility. A reference which improves itself and corrects its own mistakes is a wonderful thing. There must be a way to solve the problem which doesn't throw out the benefits of electronic documents altogether. If there's no benefit over paper media, why come up with a solution at all?

    9. Re:Worst Record Keeping by boneglorious · · Score: 2, Insightful

      You make a good point about the abundance of mis-information on the web, and that's another problem that needs to be looked at, but I disagree with "this will all sort itself out and real scientists will continue or return to citing more traditional resources." We have an incredible resource here (the internet) for diseminating information, and to ignore it would be something that's really not going to happen. We need to solve problems like this so we can take advantage of the benefits offered by the internet.

      --
      Can I mod something +1 Scary if it's true but I wish it weren't?
    10. Re:Worst Record Keeping by BrokenHalo · · Score: 1
      As far as the (mostly conservative) scientific community is concerned, the traditional resources will continue to thrive. These resources are increasingly becoming available on an online basis (almost 100% in my own area), partly because of the storage space required for print media. Since most of the more important or frequently-cited journals are available on a subscription-only basis, the proprietors have an interest in making sure that information is available at any time to those who are willing to pay for it.

      While the idea is attractive, we can't expect much in the way of GPLing of research.

      Essentially, there is little difference now from the days when if you wanted a journal article, the quickest way to get it was to write (snail-mail) to the author asking for a copy. It's just got a bit quicker, that's all.

    11. Re:Worst Record Keeping by LiquidCoooled · · Score: 2, Interesting

      The solution suggested seems perfectly reasonable to me.

      Having an archived copy allows the references to be valid and in context, whilst giving the original link allows for the updated and refreshed page to be expanded upon.

      All it takes is a header on the archive stating that this snapshot was taken at a certain time, and from a certain URL.

      I'm not sure if archive.org already does similar, but the action of merely *searching* the archive for a page should send the scan bots out onto that page. This way it becomes a simple operation.

      I would push for the archive to be compulsory and above copyright - ALL the content continues to be the property of the original owner. Nobody should be able to remove data from the archive for any reason - if you posted it publicly, then you expect it to be cached.

      --
      liqbase :: faster than paper
    12. Re:Worst Record Keeping by drooling-dog · · Score: 2, Interesting
      I'd contend that researchers & scientists in general would be quite silly to site an electronic-only resource in their publications

      I don't necessarily see a problem here, as long as serious academic research is maintained online by trusted, stable parties. That's not demanding any more than we have up to now now with a print-based distribution system, since that depends on the continuity of a large network of brick-and-morter libraries (and associated infrastructure) to function effectively. Imagine how difficult things would look if we were going in the opposite direction technologically!

      As for the volume of dreck available on the web... Well, that's been equally true of print media, something I'm reminded of whenever I stand in a grocery checkout line. Credibility will always be judged by the trustworthiness of the source.

    13. Re:Worst Record Keeping by NickFitz · · Score: 1

      Agreed that improvement is a good thing, but not if it pulls the rug out from under you. I incline more to the idea that revised versions of information should be available alongside older versions. This has several benefits:

      • Others can see the steps on the journey to the "definitive" statement, which allows them to avoid any blind alleys in their own approach;
      • Conversely, somebody looking at what was passed over in the early stages of development of an idea might find fruitful alternative lines of enquiry;
      • It can be fun seeing the complete rubbish a supposed genius came up with in their younger days.

      Somebody else on this thread has posted a link to Tim Berners-Lee's article Cool URIs Don't Change , which I think makes a good case for using date stamping as the best way of maintaining content even when new and better versions are available.

      --
      Using HTML in email is like putting sound effects on your phone calls. Just say <strong>no</strong>.
    14. Re:Worst Record Keeping by lastninja · · Score: 1

      That`s just what Pierre de Fermat said in 1637. Apparently the bandwith at the time was terrible, and links were often lost to something called "le effect de slashdot". Not that that was news or that it mattered.

      --
      John Carmack fan, browsing at +5 since 1999.
    15. Re:Worst Record Keeping by robslimo · · Score: 1

      Assuming we're talking about professional science publications with cites to prior works, I think the problem was solved (or a solution provided) a long time ago. Publish in a publicly recognised peer journal or forum, citing your references from same or similar publications. It is the service (and responsibility?) of the journal or forum to maintain such articles for future reference.

      Scientists citing sources from the web but outside the accepted peer-review quarters are being irresponsible unless it is made clear that the resource is anecdotal only.

      Let's say you were researching the possible health effects of electromagnetic radiation from CRT's and CPU's. Plenty research has been done on this or similar in the past. You google and find studies referenced at both www.nature.com and at tomshardware.com. Which do you use and how? I'd use Nature first and tomshardware only if it contained strong or interesting anecdotal evidence and only if I presented it as such. I have little sympathy for the folks in the article whose links to 'medical research' on the web grew stale. They weren't doing very professional research, no did the people they were citing (assuming that data couldn't be found elsewhere).

    16. Re:Worst Record Keeping by caluml · · Score: 1

      When I write something I often simply refer the reader to Google. e.g:

      Jabber is good as an IM technology as it supports SSL, and GPG, it is decentralised, and open.

      This was the reader can peruse all the **current** available info about Jabber, as well as seeing any criticisms that people have written too.

    17. Re:Worst Record Keeping by sandstress · · Score: 2, Interesting

      slightly off topic but related is the efforts of researchers to create Public Knowledge Projects (PKP), such as John Willinsky , were the effort is to make research, that effects the public, accessible and understandable to the public. Stablity of links to documents and opening up citations is key to trying to develop these sites. So this is a challenge. You would almost need a completely self contained site - meaning you somehow provide duplicates of necessary links

    18. Re:Worst Record Keeping by cpghost · · Score: 1

      I'd contend that researchers & scientists in general would be quite silly to site an electronic-only resource in their publications,

      It actually happens all the time. Most recent papers cite in their biography other papers with URL. As long as they cite correctly, and there is an alternative way to obtain those papers (such as printed journals), that's fine. Unfortunately, sometimes only URLs are published, some of them not active anymore.

      The worst happens when you try to track down the author of a cited paper, and are not able to locate him/her. Universities used to maintain useable registries and good archives, but if you don't know where to start, your SOL.

      --
      cpghost at Cordula's Web.
    19. Re:Worst Record Keeping by maiden_taiwan · · Score: 1
      "I'd contend that researchers & scientists in general would be quite silly to site an electronic-only resource in their publications..."

      Given that the peer-review process for a journal article can take several years, there might not be a non-electronic reference for what you want to cite.

    20. Re:Worst Record Keeping by Anonymous Coward · · Score: 0

      And with good reason: We are living in a world right now which produces more knowledge every day than ever before. Human beings can't handle that much knowledge and deletion/loss is a necessary process which separates the useful information from information which may be useful some time in the future but is easier to regenerate than drag along until that time. Sure, with better technology we can handle bigger archives, but I've always felt that the web isn't an archive but an asynchronous communications medium.

    21. Re:Worst Record Keeping by Epsilons · · Score: 1

      I would think researchers & scientists site more traditional resoursecs because of their authority, like some kind of prevailing judgement in that field. Still, e-print service like ArXiv has become some kind of standard for researchers, which is maintained by serious people and releases the latest development,keeps track of intersting competitions, and occasinally some funny articles . It is quite helpful if it is run seriously.

    22. Re:Worst Record Keeping by Vitus+Wagner · · Score: 1

      wouldn't http://wew.webarchive.com do?

    23. Re:Worst Record Keeping by Urkki · · Score: 1

      What kind of guarantee there is that it'll be around any longer than the page you'd want to archive there? I'd check what they say in their page, but the site does not respond to me.

      And I did s/wew/www/ in the url, and even tried accessing it from 2 completely different domains, university and work... So everything else aside, it does not appear to be a very reliable site in the first place.

  2. Thats why.. by panxerox · · Score: 1

    I've started to keep archivied copies of webpages instead of links, the next time you want it it's gone. Unfortunatly you can't share them like links.

    --
    "It's so convenient to have a system where everyone is a criminal" - A. Hitler
    1. Re:Thats why.. by Anonymous Coward · · Score: 0

      Yeah, this is another problem that doesn't really exist. If you are doing something that you seriously consider research it is not a problem to copy an entire web page or even an entire site to a local folder.
      What was the point again?

    2. Re:Thats why.. by Urkki · · Score: 1

      I think you're in violation of copyright law! Please stand still and wait for a strike team from local lawyer station to arrive and arrest you, while their research team finds out who's copyright you're infringing upon, ie who should get 10% of the profit of suing you.

  3. Well, by jeffkjo1 · · Score: 5, Interesting

    Really, is there a reason to archive everything in the world? Sure, your 4 year old has some pretty drawings, but should they be put in a library someplace?

    100 years from now, should anyone be forced to accidentally stumble over goatse? (which is very disturbingly archived on archive.org)

    1. Re:Well, by fredrikj · · Score: 4, Insightful

      100 years from now, should anyone be forced to accidentally stumble over goatse? (which is very disturbingly archived on archive.org)

      Do you really think goatse will be "disturbing" 100 years from now? Only 40 years ago, people thought the Beatles were disturbing :P

    2. Re:Well, by Xzzy · · Score: 1

      > Sure, your 4 year old has some pretty drawings, but should they be put in a library someplace?

      SSSSHHH.

      Don't you see where this is going? The next obvious step is government installed "web vaults" where people can submit their oh-so-valuable chicken scratchings and they will be stored, under the same URL, for eternity.

      No more geoshitties man, we're talking lifetime free webspace for every citizen in the US!

    3. Re:Well, by operagost · · Score: 5, Insightful

      Do you really think goatse will be "disturbing" 100 years from now?

      The day goatse.cx is no longer disturbing, is sure to be the first day of Armageddon ...
      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    4. Re:Well, by GeorgeH · · Score: 5, Insightful
      100 years from now, should anyone be forced to accidentally stumble over goatse?
      The fact that you and I can refer to goatse and people know what we're talking about means that it's an important part of our shared culture. I think that anything that archives the good and bad of a culture is worth keeping around.
      --
      Why can't I moderate something "Wrong" or at least "Grossly Misinformed"?
    5. Re:Well, by mlush · · Score: 5, Interesting
      Sure, your 4 year old has some pretty drawings, but should they be put in a library someplace?

      I would be fascinated to see my Great Grandad's first drawings, his school web page, his postings to USENET. I only knew him as on old man ....

      To a historian often the most interesting stuff is the ephemera, the diary of an ordanary person gives a view of every day life you will never get looking at 'formal' archives (ie newspaper, film librarys etc etc) which only covers 'important' stuff

    6. Re:Well, by plague3106 · · Score: 1

      Maybe. I know archologists like to find all assets of a society, including childrens toys. So yes, maybe the 4 year olds drawing would be worth someting to someone wanting to know more about our early childhood.

    7. Re:Well, by 4of12 · · Score: 4, Interesting

      Really, is there a reason to archive everything in the world?

      No, only the good stuff needs to be saved. So what's good and who should save it?

      IMHO, anything that gets officially referenced by another work should be saved.

      That burden should not fall upon the original creator of the referenced work; it should fall upon the creator of the refering work.

      Despite all the hue and cry about lost revenue opportunities from controlled distribution of copyrighted information, knowledge preservation and the overall benefit to society would improve if works were able to save a local cache of referenced works.

      This would also help with the problem of morphing or revisionist works. Some works can be improved by editing (something around here comes to mind), but it would be inappropriate to change old web pages that show an earlier mistake in thinking, to show that somehow someone was particularly prescient, or to erase knowledge for a political agenda (a la Stalin).

      Just a couple of days ago I was able to retrieve an old recipe from the Google cache that had been summarily removed from a web site due to some time retention policy. An attempt to encourage repeat visits to the website because stuff disappears was circumvented. I would have been particularly annoyed with that website were it not for the delayed action of the Google cache. Google may have enable circumvention of their policy, but they would have garned a lot more ill will from me if their policy were effective.

      Guess what? References in scientific papers I write are not just available in libraries capable of paying $1K/year subscription rates, but as photocopies in my file cabinet. That is, I have a local cache of referenced works already.

      If a colleague's library did not have the specified volume and journal article, I would let him have a copy for the asking. It's a copyright violation, I know, but I'm not convinced that strict adherence to copyright laws in this case provides the best overall benefit to society.

      --
      "Provided by the management for your protection."
    8. Re:Well, by sql*kitten · · Score: 4, Insightful

      To a historian often the most interesting stuff is the ephemera, the diary of an ordanary person gives a view of every day life you will never get looking at 'formal' archives (ie newspaper, film librarys etc etc) which only covers 'important' stuff

      If you like that, you might like the books by the historian Fernand Braudel. Rather than the "kings and battles" of most histories, he focusses on how very simple things like the foods people ate, the weather, etc, and the relationships between long-term trends and the emergent properties of those interactions (i.e. over decades or centuries) are responsible for shaping the course of history.

    9. Re:Well, by Anonymous Coward · · Score: 0
      I would be fascinated to see my Great Grandad's first drawings, his school web page, his postings to USENET. I only knew him as on old man ....

      ...and don't forget, the day Great Grandad took a photo of himself and a goat, registered the GOATSE.CX domain, uploaded the picture, and then proceeded to tell everyone he knew about it. Yeah, those were the days...

    10. Re:Well, by mlush · · Score: 1
      ...and don't forget, the day Great Grandad took a photo of himself and a goat, registered the GOATSE.CX domain, uploaded the picture, and then proceeded to tell everyone he knew about it. Yeah, those were the days...

      This is a great one to hold in reserve for when the old bugger starts one of his endless 'the youth of today have no morals' rants

      archive.org keeps people honest. Anyone with a desire for politics should have a care what they post online it may come back to haunt them

    11. Re:Well, by drooling-dog · · Score: 2, Insightful
      Do you really think goatse will be "disturbing" 100 years from now? Only 40 years ago, people thought the Beatles were disturbing

      I wouldn't rule it out. There are people who are working very hard now to drag us all back into a new era of ignorance and superstition. Can they succeed? Maybe not, but things were pretty wide-open in the 20s, and then look what happened!

    12. Re:Well, by dubiousmike · · Score: 1

      I remember the first time I actually followed a goatse link from Slashdot, having no idea what it would be.

      Of course, I was at work. bleh

    13. Re:Well, by gavri · · Score: 1

      The day goatse.cx is no longer disturbing, is sure to be the first day of Armageddon ...

      Nobody will ever need more than 640k RAM!" -- Bill Gates, 1981

    14. Re:Well, by IM6100 · · Score: 1

      It's 'ignorance and superstition' to find it repellant that someone has abused their rectum to the point where it's a big pouty thing that bulges out like a set of lips?

      Most of the people who I see 'promoting ignorance and superstition' do it under the cover of being 'open minded.' They insist that we view them as being enlighened for embracing arcane flavors of ignorance and superstion. Example: neopaganism.

      --
      A Good Intro to NetBS
    15. Re:Well, by Walrus99 · · Score: 2, Funny

      Isn't goatse where the Beatles go the inspiration for their song "The End"?

    16. Re:Well, by NanoGator · · Score: 1

      "Nobody will ever need more than 640k RAM!" -- Bill Gates, 1981"

      Bill Gates never said that, yadda yadda yadda.

      --
      "Derp de derp."
    17. Re:Well, by IM6100 · · Score: 1

      Well, the concept of 'horse shit' can be defined, it can be memorialized, without the need for there to be a tin of 'horse shit' preserved under ideal conditions so it stays ever moist and fragrant. The Smithsonian doesn't need to commission a hermetrically sealed display case to show future people what 'horse shit' is (I hope).

      --
      A Good Intro to NetBS
    18. Re:Well, by NanoGator · · Score: 1

      "Really, is there a reason to archive everything in the world? Sure, your 4 year old has some pretty drawings, but should they be put in a library someplace?"

      Questions like these are only interesting if space is limited. Between JPEG and the ridiculous capacity of hard drives these days, it's really not a BFD if your 4 year old's pretty drawings were permenantly archived. If the space is available, why not?

      --
      "Derp de derp."
    19. Re:Well, by NanoGator · · Score: 4, Funny

      "The fact that you and I can refer to goatse and people know what we're talking about means that it's an important part of our shared culture."

      Amazing that the most remembered asshole of the dawn of the 21st century isn't Michael Eisner or Jack Valenti.

      --
      "Derp de derp."
    20. Re:Well, by gavri · · Score: 1

      Bill Gates never said that, yadda yadda yadda.
      How would you know? This was the summer of '81 when he came to visit me. He also said a lot of other things ("I've made a deal with the beast" or something of that kind). I wasn't really paying attention.

    21. Re:Well, by Blue+Stone · · Score: 3, Funny

      Bill Gates DID say that. Here's a link that proves it.

      --
      Corporation, n. An ingenious device for obtaining individual profit without individual responsibility. - Ambrose Bierce
    22. Re:Well, by Anonymous Coward · · Score: 0

      the day that we take baths more than once or twice a year is sure to be the first day of Armageddon.

    23. Re:Well, by Anonymous Coward · · Score: 0

      "The day goatse.cx is no longer disturbing, is sure to be the first day of Armageddon ..."

      If you are just continuing a run on joke then ok. But if you are even a little serious; know that I did not find goatse.cx disturbing - I was surprised THAT was what so much fuss was about.

      As a child I read Trablinka (sp?) (about a Nazi extermination camp in WWII). As an adult read about Hitler, Stalin, Mao, Genghis, and so on. I just ran across Billie Holiday's "Strange Fruit", read up on its history, influence and so forth after downloading the mp3 song itself.

      And after all what man has done to man I'm to be disturbed at goatse? Please...

    24. Re:Well, by Anonymous Coward · · Score: 0
      The day goatse.cx is no longer disturbing, is sure to be the first day of Armageddon ...

      That makes three +5 comments specifically about goatse... it already is Armageddon.

    25. Re:Well, by fifedrum · · Score: 1

      my particular hobby is one that values quite a bit exactly what you describe. For example, The fifer or drummer from the French and Indian War scribbling notes to a friend describes his day starting out playing three camps and moving on to fatigue call, and (most importantly) writing it out. Or the drummer complaining that practices in garrison at fort such-and-such during the civil war sucked, all we did was play x y and z over and over.

      Those details, and comments in newspapers and advertisements of the day, are almost all we have. Everything else may as well be word of mouth.

      I imagine what the civil war drummer would have typed in his blog, had he one, and hope that he mentions the stupid things like the tunes he played that day, and exactly what he did during the battle.

      It is my own practice to mention the tunes played whenever I post on the subject on the various mailing lists, as well as impressions of the event just incase sometime in the future someone digs up an archive wondering what we played "back in '03".

    26. Re:Well, by drooling-dog · · Score: 1
      It's 'ignorance and superstition' to find it repellant that someone has abused their rectum to the point where it's a big pouty thing that bulges out like a set of lips?

      Well, I wouldn't have chosen exactly that definition... I was really thinking of the sexual repression and denial of the 1950s, which really was all about "virtue" through ignorance and shame. We can all thank the 1960s for overthrowing that regime. But I'm not a fan of the "big pouty thing" either...

    27. Re:Well, by venicebeach · · Score: 2, Insightful

      The fact that you and I can refer to goatse and people know what we're talking about means that it's an important part of our shared culture. I think that anything that archives the good and bad of a culture is worth keeping around.

      I have to disagree. An object which produces such trauma should not be preserved simply because the traumatic experience is shared. I think I have some form of post-traumatic stress disorder lingering from the day I saw the goatse thing - complete with horrifying flashbacks. That thing needs to go.

      Why should any aspect of "culture" be preserved simpy because it constitutes "culture"? If we preserve everything that we have in common, we will be compulsive hoarders and the people of the earth will soon be living under a heap of obsolete car tires, betamax tapes and floppy disks. When we are done with something, we should let it go.

    28. Re:Well, by Anonymous Coward · · Score: 0

      Our library no longer buys many of the leading science journals we use. Instead they pay an annual subsription. If we stop paying, we lose all access. I don't see this as a good advance.

    29. Re:Well, by NanoGator · · Score: 1

      "How would you know? This was the summer of '81 when he came to visit me. He also said a lot of other things ("I've made a deal with the beast" or something of that kind). I wasn't really paying attention."

      Are you sure he wasn't referring to you being a bit disproportionate?

      --
      "Derp de derp."
    30. Re:Well, by gavri · · Score: 0

      Are you sure he wasn't referring to you being a bit disproportionate?
      "I'm not Pro-Microsoft, I'm Anti-Bullshit."

      Probably not. You're the one who seems to like bending over for him. With a signature like that, i've always wondered if you weren't Bill Gates yourself. Or are you his Little Bitch (TM)?

    31. Re:Well, by NanoGator · · Score: 0

      "Probably not. You're the one who seems to like bending over for him."

      You're the one who claimed to spend the summer with him. "Uh, no wait that was you!" doesn't do anything to defend yourself here. :)

      " With a signature like that, i've always wondered if you weren't Bill Gates yourself. Or are you his Little Bitch (TM)?"

      What you should be wondering is if "Do I understand his sig?"

      --
      "Derp de derp."
    32. Re:Well, by rsidd · · Score: 1
      Do you really think goatse will be "disturbing" 100 years from now? Only 40 years ago, people thought the Beatles were disturbing :P

      Maybe this is the album cover they were thinking of?

    33. Re:Well, by GigsVT · · Score: 1

      It wouldn't be too hard or outlandish. My personal effects easily fit on one 600MB CD, and that's after over 10 years of actively being a computer nerd.

      Unless one includes audio and video, the entire US could have a GB of permanat webspace on 300TB or so. With hard disks approaching 1TB a disk, this isn't too far off.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    34. Re:Well, by drinkypoo · · Score: 1

      When I write some web content that depends on some other content, I generally mirror the relevant portions of the site and I don't link them. If the site goes down, then I have the information, and I can either use it as a reference work in the process of creating a new work to supersede it, or I can damn the torpedoes and just stick it on my webpage. The latter is not really recommended but is often harmless :)

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    35. Re:Well, by cynicalmoose · · Score: 1

      And that link is dead. Hooray for good record-keeping.

      --
      Exercise your right not to vote. thinkoutside.org
  4. "This is no way to run a culture." by Cokelee · · Score: 1, Flamebait
    This is no way to run a culture.

    Tell the RIAA that.

    Music is a part of our culture.

    1. Re:"This is no way to run a culture." by mirko · · Score: 1

      Music is a part of our culture.

      Yep, but not only Britney's.
      That's the reason why I created GNUArt.net, in order to give most artistss i know the opportunity to share music or Art they once created instead of dumping their old tapes or photos...

      Of course, this may die if noone helps but I'll have at leat made these last a little more, and eventuallly given these the opportunity to be reworked by others...

      --
      Trolling using another account since 2005.
  5. Books have an ISBN... by Advocadus+Diaboli · · Score: 5, Interesting

    ...which means that with that ISBN I can refer to the book and find it at libraries or bookstores. Why don't we setup a sort of unique web page number if articles of interest or knowledge are published there. Then it would be easy to track an article if its moved to another site or whatever just by looking up a sort of catalog for these numbers.

    1. Re:Books have an ISBN... by Madmanz123 · · Score: 1

      I'm pretty sure there has been some discussion of that. Dave Winer (scripting.com) has talked about a universal ID for blog posts, but things are very preliminary.

    2. Re:Books have an ISBN... by IamGarageGuy+2 · · Score: 1

      The dewey decimal system of the internet. We could use large numbers, maybe 4 sets of 3 digits that are unique or something like that or ....Hold on...

      --
      Stay tuned for new sig...
    3. Re:Books have an ISBN... by Anonymous Coward · · Score: 1, Informative

      > Why don't we setup a sort of unique web page number ...

      Read the article. They mention a system called DOI.

    4. Re:Books have an ISBN... by kalidasa · · Score: 5, Informative

      There already is such an identifier. It's called a Universal Resource Identifier, or URI. See Berners-Lee essay Cool URIs Don't Change.

    5. Re:Books have an ISBN... by daddywonka · · Score: 4, Interesting

      Why don't we setup a sort of unique web page number if articles of interest or knowledge are published there.

      The article mentions this: "One such system, known as DOI (for digital object identifier), assigns a virtual but permanent bar code of sorts to participating Web pages. Even if the page moves to a new URL address, it can always be found via its unique DOI."

      But it seems that these current systems must use "registration agencies" to act as the gatekeeper of the unique ID.

    6. Re:Books have an ISBN... by mshiltonj · · Score: 0, Redundant

      which means that with that ISBN I can refer to the book and find it at libraries or bookstores. Why don't we setup a sort of unique web page number if articles of interest or knowledge are published there.

      You are absolutely right!

      We need some sort of Uniform Resource Identifier for the Internet. Maybe we should create an organization, a Consortium if you will, of companies on the World Wide Web to agree on a standard.

      Good idea! I wonder why know one has thought of it before?

    7. Re:Books have an ISBN... by Waffle+Iron · · Score: 1
      But it seems that these current systems must use "registration agencies" to act as the gatekeeper of the unique ID.

      Why not just embed an off-the-shelf GUID in the header of the document? That doesn't require any central authority.

      The <A> tag could be enhanced with a "guid" attribute. If a browser gets a "page not found" error on a link, it could automatically submit the GUID in the link to Google or some other search service to look for the current location.

    8. Re:Books have an ISBN... by NickFitz · · Score: 2, Funny

      We could call it an Intellectual Property Address, or IP Address for short.

      --
      Using HTML in email is like putting sound effects on your phone calls. Just say <strong>no</strong>.
    9. Re:Books have an ISBN... by Anonymous Coward · · Score: 0

      Very funny. From the article you link to:

      "File name extension. This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension.(how?)"

      Guess how is your link to /this/ resource ?

    10. Re:Books have an ISBN... by gnu-generation-one · · Score: 1

      "Why don't we setup a sort of unique web page number if articles of interest or knowledge are published there."

      Perhaps the content-hash-number of the Freenet system would work? It's just a hash of the text.

      Of course, that doesn't work for modified versions, even so small as formatting changes or different titles.

      Perhaps it would be possible to have a hash function which generates the same output regardless of such minor changes, and can cope with text regardless of the titles, navbars, etc. on the web-page.

    11. Re:Books have an ISBN... by kirkjobsluder · · Score: 1

      There already is such an identifier. It's called a Universal Resource Identifier, or URI. See Berners-Lee essay Cool URIs Don't Change.

      However, URIs are a completely different beast from ISBNs. To start with, a URI identifies a specific instance of a resource on a specific server. As such it is more akin to a library of congress designation coupled with the address of the library.

      In contrast, an ISBN uniquely identifies a book no matter where it is located. I can ask any librarian or bookseller in the U.S. or Europe "I need ISBN#" and get an answer regardless of whether they use LOC or Dewey Decimal. A similar method works for Journal citations. At most libraries knowing the title/volume/number is sufficient to find a periodical resource no matter what filing system is used.

      One of the things missed in this discussion is that libraries are massive peer-to-peer networks that use high redundancy and dense networking to deliver resources. If a library does not have a work available, they can get it through inter-library loan.

    12. Re:Books have an ISBN... by RedHat+Rocky · · Score: 1

      Let's phrase this correctly:

      One such system, DOI, allows one to RENT a barcode of sorts for participating Web pages.

      I've run into several diffent digital library schemes/solutions (dspace to name one) that depend on the Handle system. What always alarms me is that these Handles seem to operate just like domain names; they're good as long as you pay.

      The Handles just add one more layer of complexity without really addressing the real problem: archiving digital documents. The system appears more to be a way to get into the "rent nothing for something" racket ala Verisign and domain names.

      Note to all you domain name owners out there: You OWN nothing, you are leasing. Big difference!

      --
      Anything is possible given time and money.
    13. Re:Books have an ISBN... by Anonymous Coward · · Score: 0

      The W3C plan is that the "urn" URI scheme will be used for more stable Uniform Resource Names (URNs) that can be used to find resources through some referral service.

    14. Re:Books have an ISBN... by DerekLyons · · Score: 1
      But it seems that these current systems must use "registration agencies" to act as the gatekeeper of the unique ID.
      Several folks have pointed out that internet 'data registration' systems all require a gatekeeper, as if that was a deficiency as compared to the ISBN system. However, the ISBN system *itself* has a gatekeeper.
  6. then don't look for culture in web pages... by TechnoVooDooDaddy · · Score: 4, Interesting

    honestly, the transient nature of webpages makes it an unsuitable medium for the long term establishment of "culture" our categorization happy, buzz-word ridden nature so commonly prevalent will have to find a new term for what is the web. boo-freaking-hoo.. meanwhile i'll keep doing my thing, posting pics for my family to see, putting calendar events up on the web so my homebrew-club will know when we're meeting and not worry about any "culture" i might be potentially creating then destroying when i take stuff back down.

    man i need coffee, insomnia is a bitch...

    1. Re:then don't look for culture in web pages... by Urkki · · Score: 1

      The problem of some random personal web page perhaps 2 people ever looked at disappearing. The problem is that web pages actually referenced by others are disappearing, thus breaking the big web of knowledge that has been forming for as long as we've had printed press.

      There really should be a permanent way of storing web pages, and storing them at the state they were at one given moment of time. So the archiving would naturally be the responsibility of the referer.

      We just need a web service for that. It could even be profitable business, charge for every URI permanently stored there, perhaps by byte, which would also largely solve the issue of abuse. Only hinderance is copyright law I think., so it should get the status of public library...

    2. Re:then don't look for culture in web pages... by YU+Nicks+NE+Way · · Score: 2, Informative

      Even if your statement accurately reflected the concerns in the article, it would still be misguided.

      Historians are concerned about all the ephemera of a civilization, not just the "official" ones. The random archives of everyday junk can, and often do, tell a very different story about the civilization than the story that the society would like to hear about itself, so historians treasure those postings of pics for your family to see.

      For example, if you read the official press, you'd see a lot of articles about how bad the economy is for IT folk. That's entirely true, as far as it goes, but it only goes so far. The official press talks about the disappearance of jobs, and about the outsourcing of jobs, and about the unemployment rate, but doesn't talk about the fates of individual people displaced by the upheaval. Are the people who've been thrown out of work starving, or are they managing to live and to feed and clothe their families? The official story doesn't cover that -- but those silly little picture pages do, just by showing the children of these unemployed workers well-fed and dressed in new-ish clothes. Web pages are very cheap, so that indicates that the unemployed techies aren't starving.

      It's kind of like the character in the play who found out one morning that he'd spent his whole life speaking in prose. You've spent your whole life participating in the culture, and a record of that life is important to a historian interested in your culture.

    3. Re:then don't look for culture in web pages... by Araneas · · Score: 2, Interesting
      "There really should be a permanent way of storing web pages, and storing them at the state they were at one given moment of time."

      Teach browsers to speak CVS.

    4. Re:then don't look for culture in web pages... by Urkki · · Score: 1

      Teach most web developers to use CVS first... Or for starters teach 'em to know what CVS is ;)

  7. Don't do that. by Valar · · Score: 4, Insightful

    You probably shouldn't be quoting any kind of "Bob's World of Great Scientific Insight" type pages anyway. I mean, the majority of sites that go under in less than 100 days are the one person operations that one should identify as bad sources anyway. So it might seem obvious that quoting someone's blog in a research paper is just a plain stupid idea, but it happens way more often than you might think.

    1. Re:Don't do that. by Anonymous Coward · · Score: 0

      Why stupid? Considering there are people that study philosophy, media sciences etc, bloggings CAN be an important source.

    2. Re:Don't do that. by anthony_dipierro · · Score: 1

      I mean, the majority of sites that go under in less than 100 days are the one person operations that one should identify as bad sources anyway.

      That go down, perhaps. But maybe they didn't go down. Maybe they just moved. A professor switches colleges. Bob's World of Great Scientific Insight is finally recognized as the masterpiece it really is, is given a 100 million dollar grant, and moves off of geocities...

  8. Throwing out the baby with the bathwater by Liselle · · Score: 4, Insightful

    People are worried about losing the information on the web: but all that is really happening is that the URLs are no good after a while, you lose the snapshot. The information is not necessarily going anywhere. If there is a need or a want, someone will throw it up, or another will host it. That's the beauty of the web, you get the good with the bad, but time has a way of getting rid of the chaff.

    What would be interesting would be a website that archives those snapshots for posterity. Well, what do you know, there are several such sites already! Looks like we're in good shape. The sky is not falling. ;)

    --
    Auto-reply to ACs: "Truly, you have a dizzying intellect."
    1. Re:Throwing out the baby with the bathwater by DrEasy · · Score: 1

      The thing with the web though is that not everybody has the means to mirror a document that they find useful. It's due to the client-server nature of the beast. On a peer-to-peer network on the other hand, in principle anybody can share any document. As you just said, the popular ones will always be around, simply because they are of use to somebody.

      --
      "In our tactical decisions, we are operating contrary to our strategic interest."
  9. Reliability by lukewarmfusion · · Score: 5, Interesting

    It's not just the short lifespan of a webpage... it's also the fact that the source isn't always reliable. Web publications are rarely given the same strict editorial process as most journal articles. The content might be just as good - or better - but they're also not given the same credibility.

    I'm a recent grad of a University... my freshman year, profs wanted us to start using the Internet more so we were asked to submit at least x number of references from Internet sources. By my senior year, they were trying to get us to stop using the Internet. Using a URL as a reference was sometimes forbidden by the professor.

    1. Re:Reliability by bubblewrapgrl · · Score: 2, Interesting

      For one science course I took in college, we were told that we could find a source online, but then find it in hardcopy (ie, look up an article on the web, but then also make sure to look it up in a journal). Apparently, there were some issues with students who found information on the web that looked reliable (it was cited from a journal), but the information had been changed by whomever posted the article on a personal site. The professor wasn't interested in trusting scientific articles that students found online after that happened unless you could prove that you verified the same article in text.

    2. Re:Reliability by RedHat+Rocky · · Score: 1
      Web publications are rarely given the same strict editorial process as most journal articles.

      Which is both bad and good. Bad, in the sense that the source may not be up to snuff, as you point out. However, how many instances are there in history of important discoveries being delayed or buried due to "respectable" journals refusing to publish articles for one reason or another?

      One always should keep ones critical reading filters on, whether reading a research journal or a handbill obtained on the local street corner.

      --
      Anything is possible given time and money.
    3. Re:Reliability by DerekLyons · · Score: 1
      It's not just the short lifespan of a webpage... it's also the fact that the source isn't always reliable. Web publications are rarely given the same strict editorial process as most journal articles. The content might be just as good - or better - but they're also not given the same credibility.
      Indeed. I recently engaged in a lengthy discussion with the author of a web page that dealt with (among other things) my area of specialization, (US SLBM/SSBN technology and history). His pages contained many hideous errors, and it turned out that what he had done was summarize multiple 'coffee-table' books to create his text. Worse yet, because his page *looks* professional, he's been asked to write pages on the same topics for more 'authoritative' websites, thus giving his errors legitimacy in the eyes of many. Because of this, on the mailing list where the discussion occured, I was viewed as the 'bad guy' because A____ was a 'real published writer'.
  10. The final irony? by the+real+darkskye · · Score: 2, Interesting

    That matters in part because some documents exist only as Web pages -- for example, the British government's dossier on Iraqi weapons.
    "It only appeared on the Web," Worlock said. "There is no definitive reference where future historians might find it."
    Much like the WMDs themselves then ...

    --
    Music is everybody's possession.
    It's only publishers who think that people own it.
    Fuck Beta
    ~John Lenno
    1. Re:The final irony? by the+real+darkskye · · Score: 1

      Who keeps swapping the "submit" and "preview" buttons?

      --
      Music is everybody's possession.
      It's only publishers who think that people own it.
      Fuck Beta
      ~John Lenno
    2. Re:The final irony? by Anonymous Coward · · Score: 0

      "It only appeared on the Web," Worlock said. "There is no definitive reference where future historians might find it.

      Total bullshit anyway. The Iraq Dossier has a hard copy in the Commons library at least. It sure as hell will be archived from there. Where the fuck did he get this "factoid"? Some shitty web-page?

    3. Re:The final irony? by Fjord · · Score: 1

      Actually, the media reported that SCUDs were launched, but later backed off from the story because they were incorrect. Associated Press reported on March 22 that "Maj. Gen. Stanley McChrystal, the vice director of operations for the Joint Chiefs of Staff, told a Pentagon news conference that the Iraqis have not fired any Scuds and that U.S. forces searching airfields in the far western desert of Iraq have uncovered no missiles or launchers." The exact quote from Stanley McCrystal on March 22 (two days after the SCUD story broke) was "We're doing a different job of it this time, but so far, there have been no Scuds launched, which is very positive to date."

      --
      -no broken link
    4. Re:The final irony? by Anonymous Coward · · Score: 0

      Ignoring the fact that the launches by Iraq during the U.S. invasion, according to the Army's Patriot Unit report, were Al Samouds and Ababil-100s, not SCUDS, the issue isn't whether Iraq had *any* Scuds, or *any* residue of WMD development, the issue was whether Iraq had enough, or was on its way to acquiring, Scuds with enough WMD to pose an imminent threat to neighboring countries, or enough to overwhelm the restrictions placed on Saddam by no-fly zones, weapons inspections, etc.

      As in "weapons ready to launch in 45 minutes" which would have been compelling evidence if it had been actually true, instead of something more like a worst-case intelligence analysis, dressed up like Dick Cheney's wet dream.

  11. I got your solution right here, people... by ubiquitin · · Score: 0

    It's called a header redirect, folks. In one line of php, do:
    header ("Location: http://www.newsite.com/over_here.html");

    --
    http://tinyurl.com/4ny52
    1. Re:I got your solution right here, people... by mausmalone · · Score: 1

      helps, but not for those pages which are wholly removed. For example, a few faculty members here have some research posted on their personal sites, but they died. Now their sites will be taken down, and anyone referencing that research is gonna have a hard time getting a copy of it.

      --
      -=-=-=-=-=
      I'd rather be flamed than ignored.
    2. Re:I got your solution right here, people... by Anonymous Coward · · Score: 0

      You are too hung up on karma. Your .sig, as far as I can determine, is false, as is your insipid journal.

    3. Re:I got your solution right here, people... by azzy · · Score: 1

      The personal sites died, or the faculty members died?

  12. Rigidity stifles creativity by apsmith · · Score: 4, Insightful

    Any extra effort required to make web pages and their URL's preserved for eternity makes it more difficult for people to create them in the first place, which will mean less knowledge available, not more. Something unobtrusive that goes around preserving pages for posterity, like the Internet Archive, is the best soplution.

    --

    Energy: time to change the picture.

  13. Hardcopy by Overzeetop · · Score: 4, Insightful

    This is why every time I use a web reference I make a hardcopy of it and include it in my research folder. It did not take long for me to figure out that web pages are no more useful than manufacturer catalogs - once the year is up, you might never get that tidbit of information back. If it's too large to want to print, I'll hardcopy the couple of pages I need, and PDF the whole thing for digital storage.

    Having a hardcopy (1) documents the information and it's (purported) source, and (2) allows offline access for comparison and validation.

    --
    Is it just my observation, or are there way too many stupid people in the world?
    1. Re:Hardcopy by lukewarmfusion · · Score: 2, Insightful

      One problem with using a hard copy is that you're the only one holding that copy. If the site disappears from the Internet, then your readers must rely on your printout (or cache) as a reliable source. You may not have a way to prove that your printout wasn't modified between download and printout. With more traditional methods, there are so many printed copies that such a claim could be disputed easily. I think your solution is the best one under the current situation, though.

    2. Re:Hardcopy by ffub · · Score: 0

      This does however break UK, and I presume most oher western copyright law. Most research and papers are generally frowned apon if they break the law. You can make a hard copy dfor your own use, but this isn't entirely your own use if that hard copy is referenced, and stored on your server, or in any other way disseminated to other. This is effectively publishing it, and if you do that with somebody else's work without their permission, you are breaking the law.

    3. Re:Hardcopy by Carmelia · · Score: 1

      every time I use a web reference I make a hardcopy of it and include it in my research folder

      When you give references, it's a way for your readers to verify that what you are saying is true (a least demonstraded by experts in you field). If the webpage disapeared and you only provide a piece of paper you printed yourself, it could as well have been written (completely invented) by yourself just to make your findings fit.

    4. Re:Hardcopy by Overzeetop · · Score: 1

      Actually, since I work in a "production" environment (strucutral engineering firm), I make copies for legal purposes. I need to have a record for liability reasons, and I doubt the court would frown on keeping records of (potentially) transient works if the need to discover them for a court trial arose.

      --
      Is it just my observation, or are there way too many stupid people in the world?
    5. Re:Hardcopy by digitalsushi · · Score: 1

      I've always wanted a feature that would automatically save every single thing I pull off the way on disk for me with the file hierchy in place. I would just mount my web directory and save it there. I make tons of archives of things that were difficult for me to discover and store them all in a public directory with indexing enabled. I see google hits every few minutes with keywords locked exactly on what it was I searched for long ago. Everytime someone pulls some funky HOWTO or FAQ out of there, I grin, knowing I just saved some random person some random amount of time.

      --
      slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
    6. Re:Hardcopy by Anonymous Coward · · Score: 0

      That's why (in the US at least) copyright law includes several exceptions under the moniker of "Fair Use". While verbatim reproduction of an article may not constitute fair use, many other things will-- especially various forms of excerpting. Further I should think that keeping a personal copy during research is important. However, if the resource is so flaky that readers won't be able to find your cites in a year, probably you need more dependable sources. However, when the copyright expires (as they are supposed to do), your personal copy of that cited document could be very useful.

    7. Re:Hardcopy by Reziac · · Score: 1

      I've PDF'd my main website, which has some hard-to-find historical data in my field, and made the PDF available for all the world to download. Theory being this makes it more readily archivable for those who care.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    8. Re:Hardcopy by WNight · · Score: 1

      Only by some interpretations.

      Copyright law does allow quite a bit of leeway in moving data from one media to another. If the owner of the document makes and sends a copy to you (their webserver) and you choose to print it instead of displaying it on the monitor, it seems like it's your choice.

    9. Re:Hardcopy by Vengeful+weenie · · Score: 1
      This seems to me to be a non-problem. The obvious solution is that the existing journals will begin providing a service that when you publish through them (and are checked editorially as well), a permanent URL entry in their database will be maintained.

      The idea that publishing standards would be maintained well in a situation where every boob that can start a PC (and some that can't) can publish is obviously not tenable. Once everyone can publish, it doesn't mean anything. The status comes from a certain quality being required.

  14. Obvious solution... by Zocalo · · Score: 0
    Provide an alternative link to the source material on the Wayback Machine or archive.org.

    What was the problem again?

    --
    UNIX? They're not even circumcised! Savages!
  15. Don't forget the damage done by censorship! by Jerry · · Score: 1
    I was recently looking for pages about the peer review work of the global warming paper underlying the KYOTO Doctrine. Pages less than a month old were removed. Articles on ABC, Time, CNN and newspaper sites by the hundreds have 'old' pages missing.


    There is no substitute for the printed page... yet.

    --

    Running with Linux for over 20 years!

    1. Re:Don't forget the damage done by censorship! by LostCluster · · Score: 1

      That's not censorship. That's the news sites protecting the business of Lexis-Nexis which they all contribute to. If you want to search for old news, you have to pay.

  16. Let me get this strait... by ericspinder · · Score: 2, Informative
    You mean to tell me that those researchers found a dead link on the Internet, the horror. Were can I get one of those jobs!
    Another study, published in January, found that 40 percent to 50 percent of the URLs referenced in articles in two computing journals were inaccessible within four years
    That's because they were ads for companies that went out of business.

    besides if you want to see old pages just go the the the wayback machine. Between that and backup tapes, everything you ever wrote still lives (in many cases I wish it didn't !).

    --
    The grass is only greener, if you don't take care of your own lawn.
  17. Yes, big issue! by Erwos · · Score: 4, Interesting

    I've personally been working (internally so far) on a website of modern-day Orthodox-Jewish responsa to various issues of Jewish law, so this is an issue I've given some thought to.

    To say this is some kind of problem specific to the web is misleading. There are old, well-quoted sources of Jewish thought whose texts are simply lost to us in this current day and age. Example: a famous and extremely popular commentary on the Talmud and Torah, Rashi, is missing for at least a few chapters of Talmud. That would be the equivalent of IEEE misplacing some standards papers and then NO ONE having copies, just lost to the sands of time. Yet it did happen, proving this at least _was_ a serious issue.

    However, these days, with such things as the Way-Back Machine and Google caching, actually LOSING entire web pages doesn't happen very often, and, I'd bet, it happens far less frequently than the loss of books.

    -Erwos

    --
    Plausible conjecture should not be misrepresented as proof positive.
    1. Re:Yes, big issue! by Anonymous Coward · · Score: 0

      IEEE has lost a lot of stuff... They have paper-only copies of many standards that were submitted and handled electronically. The original tapes were never copied, then they were thrown out. This has also happened to many research places (universities, especially). Kills prior art searches. Sucks in general. Now that storage is "free", expect far more to be lost. The storage has finite lifetime and archiving it costs, but no one counts those into the "prices"...

    2. Re:Yes, big issue! by Reziac · · Score: 1

      A related problem I've seen more than once, is that some particular site becomes an Authority on a given topic, so everyone refers to it, rather than archiving their own copies of the data. In one infamous case, the archive owner had collected all sorts of 3rd party documents, which due to the site's encyclopaedic approach, everyone just sorta expected to stay there forever, so hardly anyone made local copies. Well, one day the site owner went off the deep end, and took the entire site with him -- and worse, began claiming *he* owned all the 3rd party articles TOO, and threatened legal action against anyone who made them available (and most were long gone from their original sources). The upshot is that if you don't already have your own archive (which at this point, one dares not publicly share), this information has been pretty much lost.

      Point being, the web's very nature makes it too easy for petty personal blowups to result in a huge loss of information. It'd be like if there were only ONE copy of the Torah, and the person in charge of its welfare one day decided to burn it.

      Perhaps a more general point should be examined: that any data archive that keeps all its eggs in one basket is just begging to be disappeared, whether thru misadventure or malice.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
  18. web pages as knowledge by Horny+Smurf · · Score: 0, Interesting

    While I use the web as a source of information (information which is unavailable in any other format), I would not cite any information unless I can personally verify it. Would you trust "Anonymous Coward" when he tell you to "click this link"? So why would you trust some random website?

    1. Re:web pages as knowledge by theMerovingian · · Score: 4, Funny

      I definitely wouldn't trust someone named "Horny Smurf" enough to click the link.

      --
      "If you think you have things under control, you're not going fast enough." --Mario Andretti
  19. Interesting... by Rinikusu · · Score: 4, Funny

    I found that out years ago.. :P

    From a researcher's perspective, I used the web primarily as a quick "google" to get some ideas on where I might do further research. For instance, while a particular paper may have been taking offline regarding my search, many times the search will proffer an author's name. Take that name to the library's database (or googling it, too), and you might can get a list of more publications that the author has penned. Even better: sometimes, you can get a valid email address from other links and you can write and ask the original researcher himself about various publications, many times they have copies on hand and can send them to you. My research involves the web, but does not end with the web, which is where many people find themselves hung.

    Hey, guys. See that big building with those obsolete books? Lots of chicks hang out there. :)

    --
    If you were me, you'd be good lookin'. - six string samurai
  20. And? by woodhouse · · Score: 1

    I don't see how this is news. Most people who write science papers are well aware of the problems with citing web pages, and we'll try to cite books and published papers wherever possible. Generally, people with something important to say will publish it properly, so this is not usually a problem.

    The only people who exclusively cite web pages are likely to be the same people who write bad papers anyway, so I can't see the issue here.

  21. A problem recognized already some time ago.... by tsvk · · Score: 4, Interesting

    Usability expert Jakob Nielsen addressed the issue of linkrot in a column already in 1998: Fighting Linkrot.

  22. Re:Books have an ISBN..(but web pages are googled) by WillAdams · · Score: 5, Insightful

    That was why Tim Berners-Lee wanted URL to stand for ``Universal'' (not Uniform) Resource Locator.

    The problem is, few people have formal training as librarians, or understand how to file away a document under such schemes (whether or no pages like this are worth preserving is another issue entirely).

    Then there's the technical issue---where's the central repository? Who ensures things are correctly filed? Who pays for it all?

    With all that said, I'll admit that I use Google's cache for this sort of thing---it lacks the formal hierarchy, but the search capabilities ameliorate this lack somewhat. It does fail when one wants a binary though (say the copy of Fractal Design Painter 5.5 posted by an Italian PC magazine a couple of years ago).

    Moreover, this is the overt, long-term intent behind Google, to be the basis for a Star Trek style universal knowledge database---AI is going to have to get a lot better before the typical person's expectations are met, but in the short term, I'll take what I can get. ;)

    William

    --
    Sphinx of black quartz, judge my vow.
  23. culture by theMerovingian · · Score: 0, Redundant

    This is no way to run a culture.

    Do we run the culture, or does the culture run us?

    --
    "If you think you have things under control, you're not going fast enough." --Mario Andretti
    1. Re:culture by Anonymous Coward · · Score: 0

      Do we run the culture, or does the culture run us?

      This isn't Soviet Russia.

    2. Re:culture by Anonymous Coward · · Score: 0

      Such an original nickname you've got there. Did your mommy suggest it?

  24. What's the problem here ? by JackJudge · · Score: 5, Insightful

    Why would we want to archive 99.9% of today's web content ?
    Does anyone archive CB radio traffic ??

    It's not a permanent storage medium, never could be, too many points of failure between your screen
    and the server holding the data.

    1. Re:What's the problem here ? by southpolesammy · · Score: 4, Interesting

      Yes, good point. The Internet is much more akin to CB radio since it is uncontrolled, unverified, entirely volunteer-based, entirely virtual, and highly volatile. By contrast, books, TV, and other media are highly controlled, subject to external verification, have a high cost of entry, are either themselves physical media, or require a physical presense in order to communicate, and are largely static in content.

      The problem with the Washington Post's article is that their premise is flawed. They assume that the Internet is a mostly static source of information, when it is definitely a mostly dynamic information source. Webpages are meant to be updated, and with updates come change. It's inevitable. To assume that we keep every update to the webpages in separate locations is a false assumption. It's cool to see sites like the Wayback machine do this, but it's not required.

      --
      Rule #1 -- Politics always trumps technology.
    2. Re:What's the problem here ? by sporty · · Score: 1

      There's a difference. CB traffic is usually casual conversation. People create websites to give out "important" information to a large spectrum.

      One has an active listener with active feedback. One is completely passive.

      Only in the case of an interview would CB traffic be completely informational. I don't remember the last time I put up a web page to say something to someone. People do put up "web applications" such as forums and chatrooms of sorts.

      While they are of the same fruit, they are still apples and oranges.

      --

      -
      ping -f 255.255.255.255 # if only

    3. Re:What's the problem here ? by kurosawdust · · Score: 1

      10-4, good buddy.

  25. Backup Your Important Data by Slider451 · · Score: 4, Insightful

    Anything worth publishing digitally should be recorded in a more permanent medium.

    I constantly backup all my digital photos because they are important to me. I also print the best ones for placing in photo albums, distributing to friends, etc.

    The website they are published to is just a delivery medium, and not even the primary one. It can disappear and I wouldn't care. People who know me can always get access to them. Scientists should view their work the same way.

    --
    Nostalgia isn't what it used to be.
    1. Re:Backup Your Important Data by Minna+Kirai · · Score: 1

      People who know me can always get access to them.

      Even after you're struck by a meteor?

      The question is about how strangers who don't know you can get an assurance that info they found on the web will continue to be accessible. You exhortation to "keep local backups" does nothing to help this problem.

    2. Re:Backup Your Important Data by Slider451 · · Score: 1

      Good point. But my analogy still works. If the data is valuable to people they can find it. In my case the people who value my photos can contact my family.

      Just because the web medium is unreliable doesn't mean the data should be. A scientist worthy of his reputation should provide other means of contact/access along with his web-published data. Otherwise his conclusions are suspect to begin with.

      Anonymity has a price and we're seeing it as sites with valid contributions, but lacking tangible links to real people, disappear forever.

      --
      Nostalgia isn't what it used to be.
  26. revisionism by bobrankle · · Score: 1, Insightful

    Would be much more worried about if the site said the same thing. What about revisionism, I would wonder if the reference cited even said the same thing as what it was cited for, it's easy enough to change the pages so that they can be twisted to make the referencer look stupid (don't like their use of the reference) or to just out and out lie after they get referenced. Unless they are locked down, and we all know that is not really possible, someone somewhere will find their way in.

  27. long-term storage needs... by mwilliamson · · Score: 2, Insightful
    This is not just a problem with Web pages, it is a problem with all popular media formats today. How can we make sure future generations will be able to make use of any of our media? (makes me think of a buddy's magneto-optical drive...who the hell else has one) One solution is to actively copy from format to format as technologies change, but this requires constand upkeep throughout the ages. Relying on future generations to maintain our most precious information is not a responsible behavior for a culture.

    Printed media, while having a low data/pound ratio, has managed to survive and span generations for centuries. I think the need for paper libraries cannot be forgotten. The challenge is distilling out what is worth keeping, and this challenge is better met now rather than later because we have more or less a good idea of what is significant information, and what is crap.

  28. archive.org and copyright? by McDutchie · · Score: 5, Interesting
    I've started to keep archivied copies of webpages instead of links, the next time you want it it's gone. Unfortunatly you can't share them like links.
    If you can't share them, then how come archive.org can? How come archive.org seems to be above copyright law?
    1. Re:archive.org and copyright? by Hooded+One · · Score: 1

      Archive.org's terms

      They get around copyright in two ways. First of all, the copyright owner can request that their material be removed from the archive. Beyond that, they basically describe an honor system; if you're not supposed to view something, don't.

    2. Re:archive.org and copyright? by Jerf · · Score: 5, Interesting

      How come archive.org seems to be above copyright law?

      Archive.org invokes the DMCA safe harbor provisions (see bottom of that page for the DMCA boilerplate), which is described in Title II of the DMCA.

      However, you'll find a careful reading of the DMCA reveals that none of the exclusions really quite applies to them; a good lawyer might be able to get them protected but I would bet against them.

      Mostly they get by because they will remove content if requested, and nobody who cares cares quite enough to sue them on behalf of "the world" when they are satisfied to have their own content removed. In other words, they are basically OK because nobody cares to sue them. Strictly speaking, archive.org probably is the world's largest copyright violation.

      This goes to show that sometimes if you break the law in a big enough way, you can get away with it. ;-)

      (Not responsible for the results of any actions based on taking that sentence to heart. For entertainment purposes only. etc.)

    3. Re:archive.org and copyright? by poofyhairguy82 · · Score: 1
      THE BIGGEST COPYRIGHT INfRINGMENT?!!! no way!


      Typing Britney Spears into Kazaa search.


      1,345,678 sources found


      Archiving is cool but nothing beats preteen girls and their puters.

  29. Free Haven by Anonymous Coward · · Score: 0
  30. Permalinking and archiving by seldolivaw · · Score: 5, Insightful

    The ephemeral nature of the web is a very real problem, but it's important not to overstate it. The reason so much more information is lost these days is partly a reflection of the fact that we produce so much more of it. The Library of Alexandria was the distilled knowledge of an entire civilisation; it was unique, irreplaceable and massively important information. The web is full of information that is of low quality, often massively redundant (thousands of pages explain the same thing in different ways) and certainly replaceable (the web is not the final repository of the information: it's a temporary place where that information is published). In the same way, for centuries, newspapers have produced thousands of redundant issues with a lifetime of just a few days. The reason no one decries the loss of our newspapers is because the publishers themselves still archive the information, even if this is somewhat hard to get to. The same is true of web pages, only the number of publishers is vastly larger.

    Individual newspapers had their own ways of making their archives public (in many cases for a fee) because storing that information is a cumulative, ever-increasing cost. On the web that cost is much lower, but still present. In addition, there's the question of relevancy: www.mysite.com/index.html may contact valuable information, relevant enough to be on the front page today, but in a week's time you don't want it to still be there. So what we need is archiving, for the web.

    But manual archiving is inefficient and a pain to maintain, since it involves constantly moving around old files, updating index pages, etc.. Plus linkers don't bother to work out where the archive copy is eventually going to be: they link to the current position of the item, as they should.

    So what the web needs is automatic archiving. One way to do this (a solution to which was the partial subject of my final year project at uni) is to include additional a piece of additional metadata (by whatever mechanism you prefer) when publishing pages; data that describes the location of the *information* you're looking for, not the page itself. So mysite.com/index.html would contain meta-information describing itself as "mysite news 2003.11.23 subject='something happened today'". User-agents (browsers) when bookmarking this information could make a note of that meta-data, and provide the option to bookmark the information, rather than the location (sometimes you want to bookmark the front page, not just the current story). Those user agents, on returning to a location to discover the content has changed, could then send the server a request for the information, to which the server would reply with the current location, even if that's on another server.

    Of course, this requires changes at the client side and the server side, which makes it impractical. A simpler but less effective solution is for the "archive" metadata to simply contain another URL, to where the information will be archived or a pointer to that information will be stored. This has the advantage of requiring only changes to the client-side.

    Suggestions of better solutions are always welcome :-)

    1. Re:Permalinking and archiving by tmark · · Score: 1

      So what the web needs is automatic archiving.

      So, if "the web" has automatic archiving, on whose shoulders falls the responsibility of providing said archiving ? The government ? The content-provider ? The ISP ? What if I change a couple of html tags. Does that get automatically archived too ? Who's going to provide the archiving mechanism the parent describes ? For archiving to be useful, and for the archive to be useful in the way a library is useful, EVERYTHING needs to be archived.

      Relying on meta-information like the parent suggests almost certainly wouldn't work, because you're relying on the content-provider to provide valid meta-information. That is a mistake if you're trying to provide the kind of verification and authenticity that people would rest (for instance) citations on which their careers depend.

      What if I provide an online encyclopedia, which some people are using as a reference. Now, maybe I realize there is something egregiously wrong with an entry, so I change that entry, providing the same meta-information as before (intentionally or unintentionally). What happens to people who cite my original paper ? People would pull up the page pointed to by the meta information and find that what I cited is not what is in the citation. Who would feel safe citing then ?

      The only way for an archive to be useful is, as I said, if everything is archived. But this means everything as of every moment in time, which it is plain to see would a Herculean task, probably impossible. And that archive would have to be maintained and signed by an organization that everyone trusted. Who's going to do that ?

    2. Re:Permalinking and archiving by seldolivaw · · Score: 1

      Absolutely right. Full archiving is impractical. So we go for either of my solutions, which are better than nothing and practical to implement. That's why they're "solutions" rather than "theory". But I agree they leave something to be desired; so suggest something better, don't just crap on them -- I know they suck! :-)

    3. Re:Permalinking and archiving by DerekLyons · · Score: 1
      The reason no one decries the loss of our newspapers is because the publishers themselves still archive the information, even if this is somewhat hard to get to.
      Not true at all. Newspapers do go out of business, and after that acess depends on libraries keeping copies.

      Even this is not foolproof. A friend of mine was frustrated when he found that all of the libraries that had once held archives of a long dead newspaper he needed for research had all replaced their hardbound copies with microfiche. The problem was that they were all the same copy and thus were all missing the issue he needed. Instead of being able to retrieve the data from a different library that was more complete, or at least had a different set of 'bad sectors', the data now appears lost forever.
  31. Make sure you have a paper reference. by Kjella · · Score: 1

    Personally, I find web links *can* be much more efficient than having to dig out an issue of some science journal (which the local library will *not* have, and your request will be forwarded by carrier snails), if they're there.

    But, always the paper reference. If it doesn't have one, it'd sure better be a reference to a known professor somewhere, so whoever is interested can dig up a homepage somewhere. If it doesn't even have that, don't use it.

    Personally, I haven't found it that difficult to cite articles and such. Sources of information is a much bigger problem. Like e.g. statistics, or overviews or similar reference material. They are often moved/updated/reorganized/removed and you have no idea about it.

    Kjella

    --
    Live today, because you never know what tomorrow brings
  32. Reason for this? by bobthemuse · · Score: 4, Insightful

    The article states that the average life for a website is 100 days, but wouldn't journals and formal publications (the most often cited documents in research) last longer than the average? Also, is the average skewed because websites are more likely to contain 'current information'? "Average lifetime" is misleading, does this mean the average time the page stays the same, or the average time before the information in the page is unavailable?

    1. Re:Reason for this? by DenOfEarth · · Score: 1

      Yeh, that's what I was thinking about. I study electrical engineering, and when I'm using a university computer, we get electronic access to PDF's of papers from IEEE Journals. Now, I know that these papers are out there in a paper backup, but I'd be willing to bet that, due to their importance, these papers will still be available on the web in 100 years. It's kind of strange that people think having an URL should be good enough to get a paper for all time, when really, if you know what kind of information you are looking for, you will be able to find it. And, as an added bonus, as the internet gets larger, there's more redundancy etc...

      I'm not really worried about the internet causing us to lose culture, in fact, I think it greatly increases the amount of culture that we can check out at our fingertips, without even leaving our office / couch / kitchen table...etc.

  33. But compared to what? by Anonymous Coward · · Score: 0

    Given that 99% of the web pages out there would never have been written in the first place, 100 days seems better 0 days doesn't it?

    The advantage of a easy-to-use, disposible medium is in the low cost of publishing. But that low cost opens the doors for a things less worthy of writing down in the first place.

  34. If you want to do serious research.... by RobertAG · · Score: 3, Insightful

    Then DOWNLOAD the pages from your web citations.

    For example, a short time ago, I did a white paper on power scavenging sources. About 1/2 the articles I read were HTML or PDF sources. Rather than just citing the URL, I downloaded/saved every online article I referenced. If someone wants the source and cannot find it, I'll just provide it to them. If your paper is going to be read by a number of people, it makes good sense to have those sources on-hand; it never hurts to cover your arse.

    Hard drive/Network/Optical space is virtually unlimited, so storage isn't a problem. Paper journals are archived by most libraries, anyway, so until they start archiving technical sources, I'm going to have to do my OWN archiving.

    1. Re:If you want to do serious research.... by Minna+Kirai · · Score: 1

      If your paper is going to be read by a number of people, it makes good sense to have those sources on-hand; it never hurts to cover your arse.

      And if one of those people telephones you asking "Hey, the website in your bibliography is down; do you have a copy?" what do you say?

      "Yes, I DO have a copy. I can't give it to you of course; that would violate federal intellectual property law. But trust me, I do have it. The source really existed"

      (Someone might respond that passing out such copies is protected fair use. It might be sometimes, but not always. In particular, if the article was first published on a student's website, then removed after acceptance by a journal, you really mustn't copy it.)

  35. Look at Microsoft by willpost · · Score: 1, Offtopic

    One month a thoughtful Microsoft programmer will post the bug on a page with a workaround, source code, and a patch using Visual Studio.

    The next month the bug officially doesn't exist, the workaround page is gone, the source code is who knows where, and it's .Net

    If you go to Linux.org though, the FAQ and bug postings are preserved for all to see.

    You're right though, in that Microsoft should be identified as one of those bad sources anyway.

    1. Re:Look at Microsoft by NickFitz · · Score: 1

      Very true. Another irritation is the way stuff vanishes from the MSDN Library distribution.

      At one place I worked, they kept all the old versions of MSDN Library on the development server; disc space is cheap, but finding that article you read last year, which can save your bacon today, isn't possible if Microsoft have junked it from the new release.

      --
      Using HTML in email is like putting sound effects on your phone calls. Just say <strong>no</strong>.
  36. The web can hold insight, in the right field by mactari · · Score: 3, Interesting

    That's a fairly reductionist view if taken too far. Not all researchers are tech whizzes (no pun intended), and I've seen a number of, in my case, professors of English Literature who run the same sort of, "Throw up ten pages with Under Construction signs, test publish a few papers, and let the site sit for years, one day to mysteriously disappear," web site lifespan that "Bob's World" might as well.

    Perhaps even more interestingly, it doesn't always really matter if you've done great, repeatable research in the "soft science" fields or outright humanities. You don't have to be a literature expect to have a good insight on "Bartleby the Scrivener". A grad student's blog, as an example, might contain excellent contributions to the conversation.

    Now that said, in the context of the article -- dealing with "a dermatologist with the Veterans Affairs Medical Center in Denver" -- I would tend to agree with you heartily. Hard science needs to pull, in my layman's view, from research that the article's author researched well enough to see that it wasn't a few 0's and 1's that might be pulled later, in general.

    And heck, what's the harm in saving the pages on your drive and contacting the original author if they disppear? Hard drive space is cheap. If you take yourself seriously, you might want to grab a snap, even if it is technically illegal (not that I know that it is; Google seems to do it right often).

    --

    It's all 0s and 1s. Or it's not.
    1. Re:The web can hold insight, in the right field by Dun+Malg · · Score: 1
      Perhaps even more interestingly, it doesn't always really matter if you've done great, repeatable research in the "soft science" fields or outright humanities. You don't have to be a literature expect to have a good insight on "Bartleby the Scrivener". A grad student's blog, as an example, might contain excellent contributions to the conversation.

      True, but if faced with the task of drawing rigorously attributed quotations from a blog, well, I'd prefer not to (sorry; couldn't resist). I suspect that most instances of "spontaneous deep insight" from amateurs with ephemeral web pages are uncomplicated enough to be restated without direct reference. If an amateur has constructed a large, complicated analysis that's worthy of reference, however, I'd say mirroring it is practically doing him a favor.

      --
      If a job's not worth doing, it's not worth doing right.
    2. Re:The web can hold insight, in the right field by Anonymous Coward · · Score: 0

      if faced with the task of drawing rigorously attributed quotations from a blog, well, I'd prefer not to

      Ha. Very good. ;^)

      Fair enough on the blog comment, but you might be surprised with the quality of what some people are, ah, heck, I'll say it, "e-publishing" these days.

    3. Re:The web can hold insight, in the right field by RedHat+Rocky · · Score: 1

      If you take yourself seriously, you might want to grab a snap, even if it is technically illegal.....

      No, it's not technically illegal, it's called fair use, keeping a copy for your personal use is certainly fine and dandy (at least in the US under current laws :) ). Granted, it doesn't really do you any good if you want to publish the page, but at least you'd have the original article to refer to when trying to find a replacement source. How many times have you looked at a URL and thought, "Gee, I wonder what the heck that is about?".

      --
      Anything is possible given time and money.
  37. Cool URIs don't change by KjetilK · · Score: 4, Interesting

    May I remind everyone to read and understand TimBL's Cool URI's don't change. It's not that hard to design systems where you do not have to change the URI every 100 days, folks.

    --
    Employee of Inrupt, Project Release Manager and Community Manager for Solid
    1. Re:Cool URIs don't change by Reziac · · Score: 2

      He touches on one of my pet peeves: just because you rearrange the site doesn't mean the OLD content simply MUST go away. Web pages don't eat much, and it's not the end of the world if someone finds "outdated" information, so long as the site structure makes it fairly evident where to find current information (such as consistent links to a sitemap or default root page -- after all, how often do you change the name of http://www.mysite.com/ ??) So unless there's some pressing reason not to (like ordering pages that now point at a dead shopping cart service), it won't hurt to leave old pages in place when you build new ones.

      In light of the usual habit of massive site structure changes, the stupidest possible use for a weblink is in local application HELP. WinXP does this, and being XP came along just before M$ decided yet again to rearrange their site (and kill all the "unsupported product" files), a lot of its Help links were already dead as of a year ago.

      --
      ~REZ~ #43301. Who'd fake being me anyway?
    2. Re:Cool URIs don't change by kirkjobsluder · · Score: 1

      Then the system uses a rewrite rule to HTTP Redirect each page in the old URL-scheme to a page in the new URL-scheme. What's so hard about that? Cool URIs don't change.

      In theory, no problem. In practice, quite a bit of problems. You are assuming that the developers had the foresight to start with "Cool URIs" to begin with. In practice I've not seen many cases where this has actually happened.

  38. URL + date by More+Trouble · · Score: 2, Insightful

    Proper URL citations include the date. I'm not worried so much about the page being taken down (since it is presumably archived), as much as changing. If you don't record which version your were referring to, the content can change dramatically.

    :w

    1. Re:URL + date by StormyMonday · · Score: 2, Interesting

      Bingo!

      I watch a number of political sites; it's amazing how, when Congressman Sludgepump says something stupid, it tends to disappear from his Website with no indication that it has ever changed. Occasionally, it even changes to show that he said the opposite of what was originally there.

      Checksums/digital signatures are potentially a solution, but the problem of doing it right can be quite difficult when you include real-world constraints. PDFs are a pain in the arse, but at least you can do a decent checksum on them.

      --
      Welcome to the Turing Tarpit, where everything is possible but nothing interesting is easy.
  39. DSPACE by Anonymous Coward · · Score: 1, Informative
    Look at DSpace, the mission of which is "To create and establish an electronic system that captures, preserves and communicates the intellectual output of MIT's faculty and researchers."

    Each data set (collection) has a handle, suppoosedly longer lasting than URNs. We're talking about long term data storage here.

    There's an implementation of it at Cambridge University, and my organisation will be evauluation it as soon as the SuSE Linux Enterprise Server software lands on my desk and I've installed my server.

    Tom.

    1. Re:DSPACE by tomknight · · Score: 3, Interesting
      Bugger, forgot to log in.

      Look at DSpace, the mission of which is "To create and establish an electronic system that captures, preserves and communicates the intellectual output of MIT's faculty and researchers."

      Each data set (collection) has a handle, suppoosedly longer lasting than URNs. We're talking about long term data storage here.

      There's an implementation of it at Cambridge University, and my organisation will be evauluation it as soon as the SuSE Linux Enterprise Server software lands on my desk and I've installed my server.

      Tom.

      --
      Oh arse
  40. cant erase my usenet postings by peter303 · · Score: 5, Interesting

    I started posting usenet in the late 1980s. These g*dd*mn things are still are still on the net. I was less guarded at that time. Everyone *knew* them becase disk space ws so scare that usenet postings would disappear in 7-14 days.

    1. Re:cant erase my usenet postings by bubblewrapgrl · · Score: 1

      Ahhh...the dual nature of the internet: can't find the things you want, yet can't get rid of the things you don't want. I love it.

    2. Re:cant erase my usenet postings by digitalsushi · · Score: 1

      Has anyone ever been fired or denied employment due to the discovery of an ancient usenet post?

      --
      slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
  41. Pretty revealing quote, isn't it? by Anonymous Coward · · Score: 0
    "This is no way to run a culture."

    I was unaware that a culture needs to be run.

  42. Signal:Noise by goldspider · · Score: 1
    With such a low signal:noise ratio on the Web, would you really want to capture everything?

    Good record-keeping doesn't necessarily mean keeping everything, just stuff worth keeping.

    --
    "Ask not what your country can do for you." --John F. Kennedy
  43. But... by Anonymous Coward · · Score: 0

    ...Google will stay.

    If the authors are too stupid to include phrases for good search results instead of dead links, I don't need their book.

    Information comes and goes. Important things stay.

  44. Clicked on link by Anonymous Coward · · Score: 1, Funny

    Hey, I clicked that link and all I got was some discusting picture. I am outraged, now I trust nothing on the web. How dare you take advantage of me like that, I have never heard of such a thing. I had thought that all things on the internet are not only important but true, and now I am not too sure. I hope your happy, Jerk!

  45. Blogging Fragments, Like the Ancients by handy_vandal · · Score: 1

    I collect miscellaneous links on my web site. Over time, I've started adding excerpts along with links. The excerpts help remind me what the link was about, but they also serve another purpose: when the link goes bad, I can use keywords in the excerpt to search for related pages on the web.

    Our knowledge of ancient history has proceeded in a similar manner. Much of what we know about, say, pre-socratic philosophers, we know because of references in Aristotle and other later scholars. The original sources may be totally lost, but at least we have some names and quotations.

    -kgj

    --
    -kgj
  46. the problem is bigger by professorhojo · · Score: 5, Insightful

    it's not simply webpages that are the problem. it's digital storage in toto.

    because we as a generation are quickly moving away from our previous long-lived forms of storage, and toward digital management of archives, it's trivial for someone to decide to unilaterally delete (not backup?) a whole decade of data in some area of our history.

    i remember the photographer who found the photograph of bill clinton meeting monica lewinsky 10 years ago. he was in a gaggle of press photographers, but nobody else had this picture because they were all using digital cameras and he was still on film. most of their pictures from that day had been deleted years ago since they weren't worth the cost of storing. but this guy had it on film.

    yes. websites are disappearing. but there's a greater problem lurking in the background. the cost of preserving this stuff digitally, indefinately. who's going to pony up the cash for that? unfortunately, no one. and we'll all ultimately pay dearly for that... (hell -- we already have trouble learning from the past.)

    1. Re:the problem is bigger by tjansen · · Score: 1

      What happens when the library containing all the valuable information on paper burns down (which happened more than once in history)?

      Digital data is somewhat easier to destory (not much, a match can be enough to destroy a paper library), but it is much easier to copy and backup. It's trivial to make a backup of your library each day and ship it to some other place. Try that with a house full of paper.

      The main problem is that too many people are still trying to delete stuff on their disks, even though that's not neccessary in most cases. Few people create more data in their lifetime than fits on today's hard disks (exception: videos and people who take *many* photos). Part of the problem are operating systems that do not keep revisions of old files, and do not actively support backups (ever seen an OS doing automatic backups to other systems in your home or telling you that it's time for a backup CD?).

  47. Give and take - it's cultural change, dummy. by 3Suns · · Score: 4, Insightful

    Easy come, easy go... here's another cliche: Give and Take. What's great about the web is that it has effectively demolished the barriers to entry in publishing. Everybody and their grandmother has a blog now - you can't compare webpages to magazine articles or newspapers. There's just so much more information being published now that its average lifespan is bound to go down. So what?

    Publications that cite [web pages] lose their authorities? Who the hell told you to cite a webpage? Might as well cite a poster you saw downtown. If the webpage is a reputable source in the first place, it'll keep it around permanently. Still better than scientific journals that are squirrelled away in the basements of university libraries - anyone can get to a webpage.

    This is no way to run a culture. Last time I checked, nobody ran our culture... It kinda runs itself. The proliferation of accessable, ephemeral webpages over permanent, priveliged paper publications (wah, too many p's!) is a sign that our information culture has moved on into a new era. Liked the old one? Tough! Now information has to maintain its own relevance in order to be permanent... and I for one welcome that change.

    --

    -3Suns

    ~~~~
    The Revolution will be Slashdotted
    1. Re:Give and take - it's cultural change, dummy. by ChristTrekker · · Score: 1
      Last time I checked, nobody ran our culture... It kinda runs itself.

      Tell that to the guys in Washington (on both sides of the aisle) who constantly try to define societal norms with legislation. Time to start voting for smaller, less-intrusive government.

    2. Re:Give and take - it's cultural change, dummy. by kirkjobsluder · · Score: 3, Insightful

      Who the hell told you to cite a webpage? Might as well cite a poster you saw downtown. If the webpage is a reputable source in the first place, it'll keep it around permanently.

      Not always true. The U.S. Government was a good source for research information until the political purge of research articles that disagreed with the administration on key policy issues. The basic response? The NIH, Department of Education, FDA and EPA's responsibility is to promote policy, not provide information to the public. (Although this problem is not limited to the Internet, libraries that were public archives for government documents were ordered to pull "sensitive" material after 9-11.) In addition there is the problem of upgrading infrastructure. The URL may work today, but what happens when the site moves to a more scalable system?

      Still better than scientific journals that are squirrelled away in the basements of university libraries - anyone can get to a webpage.

      I don't know about the journals you read, but 90% of the ones I read are already on the web or archived through a distribution service. (Although another loss to reseach for politics may be ERIC which in education has been a source for many interesting "minor" papers and conference proceedings.)

      The real value of journals has never been print publication, but in the peer-review process. The reason why citations in professional journals carry more weight is because the reader knows that the article had to have run the gauntlet of critical reviews from expert peers.

      Now, granted, web page citations should probably be treated on the level of personal correspondence rather than as authoritative source. But to say that web-based resources move or vanish because they loose their relevance is missing a major flaw in how the web works. One professional organization I'm a member of tottered on the edge of bankrupcy for about a year. If it had gone under, web access to some of the key works in the field would have vanished overnight, and the works themselves dumped into a copyright limbo.

    3. Re:Give and take - it's cultural change, dummy. by argent · · Score: 1

      "Who the hell told you to cite a webpage? Might as well cite a poster you saw downtown. If the webpage is a reputable source in the first place, it'll keep it around permanently."

      OK, find me the web page referenced in this article I wrote a few years back:

      http://scarydevil.com/~peter/io/stupidsoftware.h tm l

      Either you're using a funky definition of "reputable source" that means "any web page kept around permanently" (which makes me one of the more reputable sources on the net, scary as that is), or you're going to have to find a copy of the IA Interface Hall of Shame somewhere.

    4. Re:Give and take - it's cultural change, dummy. by Speare · · Score: 1

      Now information has to maintain its own relevance in order to be permanent... and I for one welcome our new transiently relevant overlords.

      --
      [ .sig file not found ]
    5. Re:Give and take - it's cultural change, dummy. by P-Nuts · · Score: 1
      Still better than scientific journals that are squirrelled away in the basements of university libraries - anyone can get to a webpage.
      I don't know about the journals you read, but 90% of the ones I read are already on the web or archived through a distribution service.

      Hmm. Most of the journals I read are on the web, but I can only access them because my university has a subscription. They're generally not that cheap to access if your institution isn't paying.

  48. Uhhhhh... by Anonymous Coward · · Score: 0

    ...ever heard about google? Damn I even don't bother to put the link to google here because it's so obvious. ..again.. slashdot "news"

  49. And the Fizz in My Mntn Dew Is Gone Even Sooner by RobotRunAmok · · Score: 1

    But wait, I think I still have a Mosaic presskit from the '91 Comdex. Does that count?

    It's not Web Pages, its the Web itself that will be the cultural artifact. With the bar for publishing on the Internet placed so low, it falls to Father Time to become the Web's ultimate Editor-in-Chief.

    On a related note, I'm moving, and came across reams of stuff I wrote while a college student, and boy does it suck! Tonight I light a candle to Neil Gaiman's I-Net God in thanks that my potentially career-wrecking pukage is preserved only in patchouli-smeared folders in my basement and not on a global network of servers. I feel like I've dodged more bullets than Neo; in 20 years when you guys do a vanity-google, I hope y'all feel the same way, but I'm guessing you will have wished the Web was even more forgetful than it is.

    1. Re:And the Fizz in My Mntn Dew Is Gone Even Sooner by IM6100 · · Score: 1

      It's important to sweep out all the awful stuff we said when we were younger. *.advocacy group stuff from USENET in particular needs to be carefully edited. Google.groups makes that possibly, as long as you have a tracable 'identity' to your original posts. Lord help the zealot who ranted about Linux under a College email account he no longer has.

      --
      A Good Intro to NetBS
  50. Question by Texodore · · Score: 1

    Is this an authority actually questioning the validity of the Internet and it's use in research? Or is the authority simply using this as a ruse to say, "read our publication, as the Internet is making it outdated?" I tend to vote for the latter. I've read too much good research on the Internet - valid research - overlooked by the mainstream medical and science community just because it didn't mean more money for someone or didn't fit the status quo.

  51. maybe a new TLD for this? by line.at.infinity · · Score: 1

    www.PublishersName.arc/path/articleID/VerNum/

    The idea being that files uploaded here are expected to be permanently. Then professors can say urls with *.arc are o.k. for references, and *.RespectedName.arc/* are even better. In the math community, articles from arxiv.org and a few others are generally respected sources. This might not be the case for cultural studies, for example, where there are no central repositories. If there were more and better permanent archiving services, this would be less of a problem. Maybe the government could run such a service?

  52. PHP? Try mod_alias (built-in) by Anonymous Coward · · Score: 0

    /oldschema/.htaccess:
    Redirect permanent /oldschema/foobar.html /newschema/quux.html

  53. Being wrong is surely worse; commentary techniques by arsinmsn · · Score: 1

    Once recognized a site may deserve preservation.

    Web pages that are flat-out wrong and un-moderated are all the better for being ephemeral. I've often wished for a meta-critical facility like slashdot's ranking system for general web pages. This too is problematic, though; sadly, those with the most committment to cruising around the web & instering commentary are rarely the most qualified.

    Wikipedia is an interesting example; try looking up a topic you know something about there. Even if you were to spend the considerable time necessary to iron out all of the misconceptions in many of the articles, there is no guarantee that someone won't come along the next day with an ax to grind and undo all your work.

    Sorry if this is OT, but highlighting reliable info seems a more pressing issue.

  54. No way to run a culture? by theolein · · Score: 3, Insightful

    As the board chairman of the Internet Archive says, "The average lifespan of a Web page today is 100 days. This is no way to run a culture."

    To the contrary, I think this is highly typical of the culture we have today, where everything is a transient fad in the media, technology and politics.

    And it is also self feeding, I think, since market forces need to clear out the old to make room for the new in order to meet sales forecasts and shareholder expectations. And this is very true for pop, news and technology, which explains the lack of staying power of pop icons these days and becomes interesting when you want to ask yourself if you really need that new 3GHz machine just to surf the web.

    And it is highly convenient in politics where a politician doesn't have to be accountable for what he said 100 days ago.

    And so, the lack of long time life on the web is simply symbolic of all the rest here really, even if it is highly questionable.

  55. Oh come on. It's not as if... by csoto · · Score: 1

    secretaries didn't print them out for their PHBs to read and stuff.

    --
    There exists no way of exchanging information without making judgments. --Bene Gesserit Axiom
  56. quality references by Dr.+GeneMachine · · Score: 1
    The Washington Post reports on the loss of knowledge in ephemeral web pages, which a medical researcher compares to the burning of ancient Alexandria's library.

    Err, which serious medical researcher would cite a web page? Everything remotely reliable, that is, in science at least, peer reviewed, is published in journals. While these may have a web appearance, they are also published in print - and that's what you cite.

    --
    This comment does not exist.
    1. Re:quality references by salesgeek · · Score: 1

      Everything remotely reliable, that is, in science at least, peer reviewed, is published in journals.

      Ahhhh... the ivory tower comes down to earth only to leave for the clouds again. The problem is that the vast minority of writing is done in a schollarly setting and the vast majority is done somewhere less lofty than the ivory tower.

      And what of many journals that have or will become electronic only?

      --
      -- $G
    2. Re:quality references by Dr.+GeneMachine · · Score: 1
      Well, I concede that there is a loss of knowledge due to the vanishing of web pages. But the comparison to the burning of the Alexandrian library is ridiculous. This library was a comprehensive collection of contemporary science - and this is not remotely comparable to the mentioned ephemeral web pages. The loss of the library of Alexandria set back sciences hundreds of years. This is not even remotely the case when some obscure web links now point to 404.

      There is a point regarding journals switching to electronic only publication. But these articles are referenced by additional unique identifiers ( e.g. the doi) and are archived in several places. The main problem there will not be the loss of links, but the incompatibility of future data storage formats. But this is an entirely different cup of tea.

      --
      This comment does not exist.
  57. What about... by Anonymous Coward · · Score: 0

    ..Google and archive.org? Is the knowledge about these two sites gone, too?

    Poor humans. Watch MTV and stay informed about everything you're supposed to know.

  58. genguid and google by hey · · Score: 2, Insightful

    Use genguid (or other tool) to make a globally unique number
    and place that number at the bottom of your
    page a link with google's "I'm feeling lucky"
    searching for the GUID.

  59. Ephermeral Data is no data by grolaw · · Score: 1

    Loss of reference links is worse than having no data.

    In law a citation may be relied upon for a judicial ruling. If the citation is valid at the time of the original ruling, but no longer in existance when the case is reviewed on appeal (typically 2-5 years later) then the question of the validity of the precedent cited becomes the issue rather than the authority of the citation. The whole legal construct is built upon stare decisis and if what goes before vanishes into cyber-haze then the usefulness of web citations is nil.

    Of course, Westlaw (tm) and Lexis/Nexis (tm) will have redirectors for their pages - but the cost of those services is very, very high. Infrastructure is costly even when the content is copyright free.

  60. Scary thought. by Channard · · Score: 3, Funny
    Do you really think goatse will be "disturbing" 100 years from now? Only 40 years ago, people thought the Beatles were disturbing :P

    Well, I guess we know what Paul McCartney will be doing on the cover of his next album..

  61. How long does the average conversation take? by freality · · Score: 4, Interesting

    Webpages aren't replacements for books. Or rather, you shouldn't use them that way.

    If they're lasting on average 100 days, that puts them somewhere between transient culture, like spoken conversation, and printed culture, like newspapers. Big deal.

    We want to preserve culture for future generations, no doubt. But we don't want to preserve all culture for future generations. Anything that is lasting for 100 days and isn't being persisted... well, relatively that's not worth much to future culture.

    I don't remember the exact saying, but there is a Native American saying to the effect of "We don't write things down. If we don't remember it, it's not worth remembering." Now, they're not the last word (no pun intended) in wisdom traditions, but there is a certain amount of enforced vitality necessitated by forgetting the details.

    We'd better get used to the idea. We're only going to be forgetting more and more of the details as we generate more and more useless information.

  62. Re:server gone indefinitely? by ubiquitin · · Score: 0

    Then you do a domain name redirect and reconfigure httpd.conf on the new server location. It seems to me that for lack of a little DNS (named.conf) knowledge, the world suffers a great deal. Perhaps that was the point of the original piece.

    --
    http://tinyurl.com/4ny52
  63. yeah... when's everything2 coming back? by Anonymous Coward · · Score: 0

    guys.. http://www.everything2.org/
    how long does it take to relocate?

  64. An example of broken down copyright laws by Jerf · · Score: 1

    This is Yet Another Example of how copyright laws are breaking down. If you're going to cite something academically, should you perhaps have the right of mirroring the content you are citing for the sole purpose of providing a backup if the original goes down, or even just changes?

    Copyright law says no, that's copyright infringement.

    But copyright law is based on the assumption that a published thing, like a book, is concrete and can't be changed, and can be referred to, forever and ever amen, by the same name, page number, etc. This is obviously no longer true. Should copyright law be changed as a result, now that the old idea of "expression" is breaking?

    For an extended discussion of this, please see my communication ethics essay, particularly the section on the death of 'expression' (why copyright is totally broken).

    1. Re:An example of broken down copyright laws by WNight · · Score: 2, Interesting

      I agree. The point of copyright is mainly to encourage the production of commercial works, to enrich the public domain. It was never intended to force a work to remain out of print.

      We need to change copyright law so that it doesn't prevent saving of lost works, and so that it can't be used to force a work to moulder away because it's in someone's best interest that it not be for sale. (For instance, old movies that studios don't want cutting into new movie revenue.)

      I'd like to see a short total-rights-reserved copyright, ten years or so maybe, and a longer commercial-rights copyright. I really see little reason why Warner Brothers, for instance, should be able to use Mickey Mouse in their cartoons, but fanfic, kids pictures, and other such uses should be allowed. It's part of our culture and to deny us the right to participate is rude, and short-sighted.

      Few of today's creators grew up isolated and started creating original works immediately. Instead, they built on the culture they saw around them as they grew up. Children today won't have this ability. We're raising the bar, requiring them to create something that's safe from even an over-zealous lawyer and look-and-feel cases, as their first works.

      Tolkein would never have gotten started in our current legal climate. He intentionally built on previous stories and myths, something that wouldn't be legal to do now. Hell, for a while, TSR was trying to sue people who used their monster names in fantasy works, even where their names were derived from Tolkein.

  65. Berners-Lee considered harmful by 0x0d0a · · Score: 4, Insightful

    URIs don't provide content-based addressing (like a hash of the document). They rely upon trustworthy name registrars, which is an assumption that might have been valid when Berners-Lee was doing his early work, but is not now. They rely on someone willing to continue hosting the original document -- not necessarily the case.

    You can link to a article which is then changed by the original publisher (or someone else). With scientific papers, you can't do that -- and such behavior is probably not desireable.

    On the up side, if you're currently using cited references, you should be able to build such a system without too much problem -- follow links to PDFs or automatically crawl HTML documents (and check images) and serve all papers that you refer to with your paper. It'd be big, but it provides better reliability than do current paper schemes.

    Another feature that might be useful is signing of the content (assuming RSA doesn't get broken in the future).

    Basically, if you put up a SHA-1 (Gnutella), MD4 (eDonkey), or similar reference, you can host the original referred-to documents as well as the original host.

    If Freenet didn't have as a specific drawback the inability of someone to guarantee that a document remains hosted as long as they are willing to host it, Freenet would be a good choice for this.

    One possibility is that, with a bit of manual work, one can frequently find an academic work by Googling for its title. At least for now, as long as you host the original papers as well, Google should pick up on this fact. Of course, it does nothing to prevent modification of that paper by another party...

    A good system for handling this would be to have a known system that is willing to archive, in perpetuity (probably hosted by the US government or other reasonably stable, trustworthy source [yes, yes, cracks at the US government aside]). This system would act like a Tier 1 NTP server -- it would only grant access to a number of other trusted servers (universities, etc) that mirror it -- perhaps university systems -- which would keep load sane. These servers (or perhaps Tier 3 servers) then provide public access. Questions of whether there would be a hard policy of never removing content or what would be allowed (especially WRT politically controversial content) would have to be answered.

    There could be multiple Tier 1 servers that would sync up with each other, and could act as checks in case one server is broken into. I'm partial to the idea of including a signature on each file, but I suppose it isn't really necessary.

    Specific formats could be required to ensure that these papers are readable for all time. Project Gutenberg went with straight ASCII. This would probably have to be slightly more elaborate. Microsoft Word and PDF might not be good choices, and international support would be necessary.

    1. Re:Berners-Lee considered harmful by 4of12 · · Score: 1

      if you're currently using cited references, you should be able to build such a system

      Seems like an open source local cache, a squid-like system might be useful for preserving the integrity of referenced works.

      The big objection has always been that the referred-to work doesn't necessarily get all the hits it deserves during its lifeftime.

      So, if I publish

      http://me.net/mine.html
      and it contains references to
      http://you.org/yours.html
      then my web references ought to be layered through an inpreter so they can be intelligently referred (just like how freshmeat rewrites Download URL's to go through freshmeat.net first.

      This kind of mechanism just has to exist already..

      http://me.net/refer_or_grab_from_cache.cgi?source= mine.html&reference=http://you.org/yours.html
      so that you.org gets the hits it's advertisers want, but my work gets the possibility to provide a local cache if your site goes down for whatever reason.
      --
      "Provided by the management for your protection."
    2. Re:Berners-Lee considered harmful by Anonymous Coward · · Score: 0

      just like how freshmeat rewrites Download URL's to go through freshmeat.net first

      Ugh, is that why I always have to go through all that nonsense in order to use wget?

    3. Re:Berners-Lee considered harmful by DrEasy · · Score: 1
      If Freenet didn't have as a specific drawback the inability of someone to guarantee that a document remains hosted as long as they are willing to host it, Freenet would be a good choice for this.
      There is quite a bit of academic work going on that tries indeed to preserve the persistence of documents regardless of their location. More or less like Freenet, most of the systems use the idea of distributed hashtables and some distributed routing algorithm to locate the doument based on its unique (hashed) id. If you're interested, check out CAN, CHORD, PASTRY, OceanStore, Publius... and the Proceedings of the IPTPS workshops.
      --
      "In our tactical decisions, we are operating contrary to our strategic interest."
  66. Blame bad writing by TyrranzzX · · Score: 1

    All the mass media is owned by 6 major corperations as we already know. In our stimulation-happy culture where sitting ontop of a mountain taking in the view isn't appreaciated, so too is long complicated writing. Thanks to this media, people are raised to be consumers, and we're stimulated to the point that things like books are so boring that we fall asleep reading them. Why read a book when you can watch a movie? A person who plays FPS games for months on end is so thoroughly stimulated that sitting at a spot and just relaxing may be a bit much for them to do.

    And now we've got a bunch of professors complaining about how the culture that has come into being is completly fucking their profession. This is understandable, and if studies were more available in web format they'd be more popular.

    The very idea of mass media is very very very wrong. It's a very bad thing for our society. How it used to be is that you'd know what's going on around you by seeing it. Now you've got to trust some news anchor on the television or internet and when they're reporting on some bitch or hoe on tv instead of doing their job of educating people on stuff that's important; do I care some star ran amok and molested his children? No. I do care however when congress decides it wants to take one more step towards turning me and every human on the planet into slaves and if it weren't for that, I wouldn't give a damn. I could just focus on my life and my ideas and be involved with society at my own level. My life would be so much more peaceful if I wasn't afraid of the US turning into a big pile of shit.

    It's no wonder in this stimulation happy society of 1-sentance soundbytes and news stories that don't even have paragraphs that people won't pay attention or give credibility to research organizations. If a news agency that was once taught as a credible source of information such as the New York Times is now shit, should I consider other publications such as reuters just as credible? I give more credibility to my friend who heard from their friend who heard from a friend of a friend that xyz was happening than I do to NYT.

    A major problem that even hits the intellegent people is that they want to know what's going on too. But to know what's going on it's a 2 hour ordeal every day. You've got to get up, get online, go through your daily checks folder and get your news. wade through the utter crap to get to the good stuff that you want. I'v got to look through 10 headlines of bullshit to find the 1 headline that's good, and read 10 stories on a subject to get an idea of what's going on.

    So yes, webpages are going to degrade the content available. A million bumbling idiots will drum out the one that makes sense. At least they can establish some level of research distrobution via webpages and moreso, credibility if they want to stop this from happening but more to the point, so people like me don't have to wade through utter shit to find their stuff.

  67. (whoops, not anonymous-me) by Ayanami+Rei · · Score: 1

    So anyway, yeah, not everyone uses PHP... in fact it's a whole bunch easier to cover up URL mapping issues when you are using CGI then when you have a bunch of static documents.

    And if you remove a document and want to keep it that way, there's always:
    Redirect gone /blah/blah/expired.html

    I think part of the problem is that lots of people use FTP to maintain sites still, with a unified view of how people will navigate their content and have little appreciation for .htaccess, etc. unless they are trying to implement a password check.

    And some CMS systems don't handle that kind of thing well either. Anyone have any experience with Zope?

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  68. Reviewed Content by neglige · · Score: 2, Interesting

    Contributions to science, law, and other scholarly fields rely for their authority on citations to earlier publications. The ease of publishing on the web has made it an explosively popular medium, and web pages are increasingly cited as authorities in other publications.

    For true scientific work, this should never happen. Because you should only cite reviewed sources. Such as books, articles or conference papers. This is no guarantee for quality, but at least the review process sorts out the most obvious nonsense. And, if the reviewer is good, it may even increase the quality of the work. Plus, those sources are permanent.

    As always, there are sources that are more respected (IEEE, ACM etc.) than others. And using respectable sources is a good thing, because normally you want to prove a point and you base your argument on those publication. So if your basis for your argument is faulty... well ;)

    Furthermore, there is hardly any information that can be found on the web but not in a reviewed form. Note that there are (accepted) scientific reviewed journals using the web for publishing. Without a printed edition. And you can quote them. And, as many before me have said, the articles and links do not vanish (the URL is usually not quoted anyway - these articles are listed just like printed articles).

    This is just my personal opinion on scientific work. Let's see if my head is still on my shoulders tomorrow :)

    --
    My cats ate my karma. They also wrote this comment.
  69. Others will cite this & the post as proof... by adzoox · · Score: 1
    ... that nothing can be trusted if it is reported on the web.

    That's sort of ridiculous, seeing since the source is sometimes bias itself (Washington Post)

    To me, knowledge is only truth if both sides to an extreme are presented. Meaning; one cannot understand Abortion rights unless one here's both sides; ProLife and ProAbortion.

    The sentiment that web sources, just because they aren't written without journalistic/legal lingo and because that news isn't from "the usual outlets" (CNN,ABC,NBC,CBS, Time, etc) - are not credible or don't have a point even if errant in fact or fiction, doesn't make them ANY less credible.

    Research integrity is what proves or disproves a point.

    I have a website up about a con artist. He uses this same argument around the web in his "defense" saying, "Anonymous sources are the only ones bashing me.." AND "It seems like anything posted here in these forums is taken as gospel"

    --
    Yell & scream & rant & rave... it's no use... you need a shaaaave ~ Bugs Bunny
  70. Longevity by unfortunateson · · Score: 3, Interesting

    Maintaining a links page for my wife's business' site has always been a low priority, and finally, I put up a MySQL/PHP page to do the majority of the work.

    So I've been going through all the old links, and every link request we've gotten in the business' 7-year history. Of the 120 messages in the timeframe of 1997-1999, only about 15 sites still existed. Of those, two-thirds had forwarded URLs -- often from AOL or Homestead to their own brand. A couple still existed, but had totally different content.

    Many just plain didn't exist at all. A fair chunk found the server, but no such page. A few had blank pages or nearly no content. The true annoyance though, is the number of domains that are owned by spamdexers/linkfarms that have no content of their own and beg you to set your homepage to them.

    I've still got to cover the rest of 2000-2003 link requests, but I expect that anything pre-2001 will be very sparse.

    --
    Design for Use, not Construction!
  71. Archiving web sites by UncleRoger · · Score: 1

    This problem is the impetus behind ComputerHistory.net, a sort of internet archive for computer history web sites.

    --
    Stupid people will be persecuted to the fullest extent allowed by law.
  72. Can't they print them? by khasim · · Score: 2, Insightful

    I'm amazed that anyone doing a professional article would even think of citing a web page as a web page.

    Why not just print it out?

    Not only are web pages transient, but the facts they have are subject to change. This gets back to your "pseudo-science and mis-information" comment.

    If you're going to use it in your work, print a copy or save an image of it or something.

    Which brings up to "fair use" and copyrights and all kinds of other crap.

    1. Re:Can't they print them? by Xolotl · · Score: 2, Informative
      Printing it out merely saves the information in another location, it doesn't change the citation. Citations are not for yourself, but for other people to see where you got the information from - a journal, private communication, or, in this case, a webpage. Once the webpage is gone they can, of course, come to you for the information, but can't check it for themselves - which is the point of citations in scientific articles.

      As for point of citing webpages, often they contain information such as HOWTOs or work-in-progress which may be very useful but is not yet published - and, perhaps, never will be.

  73. Site Linking Schemes by Oculus+Habent · · Score: 2, Interesting

    An easy system would be for a server to provide each document it houses with a unique meta-data identifier. Then, when a document, story or paper moves from the "main page" into an archive section, you can still refer to the FileID. This ID should be searchable, so that an article could be linked via something like:

    http://www.cnn.com/?2001EXCJA2

    The IDs could be system generated and handled by a file system that supports meta-data or they could be designed to mean something and handled by a content management system.

    Implementation is the difficult part. Getting everyone - or at least news sites, magazines, and colleges/universities - to set up FileID searching and then document the linking process on their site is no small task.

    --
    That what was all this school was for... to teach us how to solve our own problems. -- janeowit
    1. Re:Site Linking Schemes by agentk · · Score: 1

      Good idea. I see two pieces:

      (1) A tool that generates a content-based unique identifier for a document. an sha or md5 sum would be nice. Or a magic string stuck somewhere in the first KB or so of the file. (in a comment for instance). The "lookup" URL with this identifier should be stored in an index, and be visible somewhere.

      (2) A server (e.g. Apache) module which searches the filesystem for a document given a unique ID. This should be a real search of everything (or a subsection of the filesystem), possible using efficient OS-supplied search mechanisms, or just by walking the whole damn thing. some caching would be helpful. (If I know that I'm requesting an ancient document that no-one else seems to care about , I don't mind waiting for a little while).

      --

      VOS/Interreality project: www.interreality.org

  74. Not everything, but... by FunkyRat · · Score: 4, Interesting

    This is a real problem. When Vannevar Bush conceived the Memex system, his goal was to facilitate the exchange of scientific research. Later, Doug Englebart built on Bush's ideas as did Ted Nelson (the guy who coined the term "hypertext") and Tim Berners-Lee. While the web today has become a vast sinkhole of pop-up ads, crappy web stores and inane blogs it is important to not forget that its inception was in aiding scientific research.

    Yet, that is not possible without some kind of permanence. Probably what is needed is some way to integrate the web into university library collections. If there was some way of indexing web pages the way libraries currently use the Library of Congress scheme to index their physical collections, then web pages could be uniquely numbered with this number incorporated into the URL. If then universities and the Library of Congress itself were to mirror (permanently) these pages, if the original URL were to become unavailable, one could try just about any manjor university or the LOC and retrieve the page. Of course, with the current political climate here in the US I don't forsee this ever happening.

    1. Re:Not everything, but... by FunkyRat · · Score: 2, Funny

      I, of course, don't mean to suggest that every web page be indexed and stored in this manner, but rather just pages deemed important by their author.

    2. Re:Not everything, but... by Ungrounded+Lightning · · Score: 2, Insightful

      This is a real problem. When Vannevar Bush conceived the Memex system, his goal was to facilitate the exchange of scientific research. Later, Doug Englebart built on Bush's ideas as did Ted Nelson (the guy who coined the term "hypertext") and Tim Berners-Lee.

      And one of the design goals of the Xanadu server project was to provide exactly this sort of permanent storage and location-redundant backup. (We even refered to it as the "Library of Alexandrea Problem" and named one of the machines after the Alexandrean librarian. B-) )

      Unfortunately the project didn't succeed and the web filled the niche.

      So now we have a distributed Library of Alexandrea, holding the single copy of every "book", constant brushfires taking out important works, and a few "scribes" frantically trying to make copies of the whole thing (which copies, IF they exist, have to be accessed a different way than the original).

      (Also coarse-grained one-way (text snippet->page or image) rather than fine-grained (text snippet, image region, or database entry->text snippet, image region, or database entry), one-way links rather than backfollowable links, and I could go on...)

      --
      Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
  75. Horse And Buggy Thinking by paranerd · · Score: 1

    At the risk of sounding melodramatic, the internet is something so much greater than the article's author comprehends. The internet is not a horse and buggy that goes real-real fast, and flies over water, and has a really big wagon on the back. The internet is an explosion of the human consciousness. When it becomes static (and the RIAA and our neo-totalitarian governments are trying very hard to make it so) when it becomes static, it will be dead. Chaos is good.

    1. Re:Horse And Buggy Thinking by argent · · Score: 1

      Maintaining some kind of ability to create permanent references to historical documents isn't "becoming static".

    2. Re:Horse And Buggy Thinking by kirkjobsluder · · Score: 1

      I don't think so. The author has a legitimate complaint. In writing a scientific article you select the best resources that you feel your audience needs to know about. What is the point in having those resources available if the audience can't get to them?

      There has to be a balance between so chaotic that noone can find anything, and so static that nothing ever changes.

      In fact, although a lot of people here are dissing the traditional library system, libraries have been doing p2p information distribution for years. It is a system with a unique identifier for every resource, high redundancy, and a fairly efficient method for leveraging that redundancy to insure access to resources. If the Northwestern University library burns down, most resources would still be available through interlibrary loan within a few days.

      There have been efforts to provide for the same unique access and redundancy with URLs by adding more metadata. However, in the absence of widespread use, we are stuck with fulltext searches.

    3. Re:Horse And Buggy Thinking by paranerd · · Score: 1
      There has to be a balance between so chaotic that noone can find anything, and so static that nothing ever changes.
      I agree.
      The author has a legitimate complaint. In writing a scientific article you select the best resources that you feel your audience needs to know about
      That's where I disagree....kind of. In the olden days an author had to quote the "best resource" that the audience needed to know about, yes. With out the specific source being quoted the author and the audience were adrift in a sea of ideas. Today that source is the internet. If the author tells me that a particular team, at a particular lab has published a paper on a topic, then I don't need him quoting a web page. I will use Google and any other web based resource to find the web page myself. And if that resource is missing I will tell the internet community that the resource is missing and the author, or the community, will set me straight. This is dynamic information interchange. Self monitoring, self documenting, self governing, and self correcting. It doesn't need to be static. Not here. Not for this purpose.
    4. Re:Horse And Buggy Thinking by kirkjobsluder · · Score: 1

      That's where I disagree....kind of. In the olden days an author had to quote the "best resource" that the audience needed to know about, yes. With out the specific source being quoted the author and the audience were adrift in a sea of ideas. Today that source is the internet. If the author tells me that a particular team, at a particular lab has published a paper on a topic, then I don't need him quoting a web page.

      Of course not, if a particular team has published an article, then you should be quoting the source for that article.

      I will use Google and any other web based resource to find the web page myself. And if that resource is missing I will tell the internet community that the resource is missing and the author, or the community, will set me straight. This is dynamic information interchange. Self monitoring, self documenting, self governing, and self correcting. It doesn't need to be static. Not here. Not for this purpose.

      This is, assuming, that the resource still exists. Chances are quite good with this method that the resource will not exist in 5 years or 10 years.

      But what you are missing here is that journals and libraries, far from being horse and buggy thinking, is perhaps the best P2P network ever created. I can walk into any research library and ask for Volume 24, Issue 4 of Educational Psychologist and can find it in under 10 minutes (usually I don't need to even go into the library). That reference is never going to change, will never need correcting or monitoring. It will be there 5 years from now when the team that published it has migrated to the four winds, taking their web sites with them, editing out the old stuff. It will be there in 10 years when the primary authors are dead.

      There are other advantages to the static reference. If I see Krackhardt (1987) or Nonaka (1994), I know what the author is talking about. I can pull up the reference from my files (indexed by author and name.) Again, this reference will not change even if the journal goes off the web, or the authors change jobs.

      And perhaps we write for different reasons. I write to communicate, not to send my readers on a wild goose chase through google.

    5. Re:Horse And Buggy Thinking by paranerd · · Score: 1
      And perhaps we write for different reasons. I write to communicate, not to send my readers on a wild goose chase through google.
      I can not write. Sadly, it is not one of the skills I possess.

      But I still believe that a constatntly changing, volatile, and adapting internet is an order of magnitude improvement over a static never self regulating research library.

      You write better than I do. You may even be right about this issue. But if you ever chased a wild goose through google, I think your skills are at fault. I get a bit passionate on this but I think Google, and more importantly groups.google, is a greater cultural acheivment than the genome project.

    6. Re:Horse And Buggy Thinking by kirkjobsluder · · Score: 1

      But I still believe that a constatntly changing, volatile, and adapting internet is an order of magnitude improvement over a static never self regulating research library.

      Well, that is the mistake right there in assuming that the modern research library is both static and not self-regulating. To start with, the publication of a paper is the endpoint of a complex regulatory practice in which the article in question runs a gauntlent of expert reviews. And of course it is not as if the state of the art stays static. Studies are reviewed, re-reviewed, presented at conferences, critiqued, modified, and argued with before, during and after publication.

      You write better than I do. You may even be right about this issue. But if you ever chased a wild goose through google, I think your skills are at fault. I get a bit passionate on this but I think Google, and more importantly groups.google, is a greater cultural acheivment than the genome project.

      Ok, lets assume that after 10 years of experience using internet search engines and 20 years of experience navigating through information of various types, that my skills are at fault. That raises an interesting question what good is a internet utility where an early adopter who has been using google and similar seach utilities for 10 years becomes frustrated? (And this is ignoring some of the central problems of the web which is that content can dissapear.)

      I'm not dismissing google as an important utility. What I am pointing out is that google is very well designed for seaching some kinds of information, and very poorly desinged for seaching different kinds of information. There is utility embedded within the peer-review publishing process (not the least of which is filtering out crap.) There is utility in producing a static report that can be available immediately or within 48 hours at any research library. Google may add onto this, but it is not going to replace this in the near future.

  76. Good Knowledge by AntiPasto · · Score: 1

    I don't want to sound too much like a sooth-sayer here, but I find that its becoming increasingly hard to find *good* knowledge... and what qualifies it as *good*? Is CNN good? Is Al-Jazeer? Is my neighbor's gossip? How about what my ex-girlfriends? What about my wife? What about... heheh you get what I mean.

  77. History Palimpsest by Aneurysm9 · · Score: 2, Insightful

    If we are to say that not everything is worthy of archiving, who, then, is to decide what is? The 'net shouldn't be just another memory hole when there is the potential to create a respository of information that far exceeds the scope of anything possible before. That said, people who wish to cite to information published in an electronic form should be careful to cite only to sources that are reputable not only for veracity but also for longevity.

    --
    There was Cowboy Neal at the wheel of a bus to never-ever land.
  78. Average lifespan of a web page is 100 days? by lamplighter · · Score: 1

    What's the URL for this statistic? And when that URL goes down, will that invalidate this Slashdot article?

  79. What are you talking about? by boneglorious · · Score: 2, Informative

    Do you think because you print it out it suddenly becomes a more stable reference? Sometimes people doing professional articles have to cite web pages because that's where the information they are talking about is.

    --
    Can I mod something +1 Scary if it's true but I wish it weren't?
  80. Slashdot Archiving System ? by Dave21212 · · Score: 1


    Doesn't Slashdot have an archiving system that helps pages stay alive forever ?

    They just post a dupe of the story every couple weeks, thus maintaining the data forever...

    j/k - it's Monday here and I haven't had my caffeine yet...

    --
    "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech."--Benjamin Franklin
    1. Re:Slashdot Archiving System ? by makapuf · · Score: 1

      that's an interesting point however.

      Is slashdot content open/free ?
      Can we download Slashdot and archive it ourselves ? Hell, some +5 posts might be interesting to keep.

      Final question : how big is /. ?

  81. Agreed by RMH101 · · Score: 1

    What do you think the chances of your family photos being found in the attic by your descendants in 30 years and them being able to read them, now we're all shooting digital?

  82. Citing URLs is not quite appropriate (yet) by c13v3rm0nk3y · · Score: 4, Informative

    Hmmm. I'm not sure most scholary works are allowed to just cite arbitrary URLs for inline references or footnotes.

    The idea is that you generally have to cite peer-reviewed, published and presented articles; criteria which the majority of web published material simply does not satisfy. Web reading would fall under the "course reading", and would have to be backed up by a "real" reference.

    According to my GF (currently working on a Masters in Anthropology) there is a lot of confusion on how to use the web for scholary references. Many people cite URLs in citations that are really just online archives of previously-published work. In this case, noting the URL is like saying which library you checked the article out, and what shelf it was on. If you are an undergrad and cite a URL, it is almost a sure thing that the prof or the TA's will take marks off for improper citations.

    There are a few peer-reviewed journals that are (partly or completely) published online, in which case the URL might be a valid citation. This is likely to changed, and it seems the original article was suggesting that we need to handle this case now, before we lose more good work.

    In a much smaller way, this is the kind of thing that those involved in the whole blog phenomenon are trying to resolve; making sure that their blog-rolls, trackbacks and search-engine cached pages stay historically maintainable.

    --
    -- clvrmnky
    1. Re:Citing URLs is not quite appropriate (yet) by argent · · Score: 1

      Whether it is or not, the problem of lost references for scholarly works is a subset of the general problem of the ongoing loss of information that's published only in a temporarily accessible pattern of bits. The problem of drifting blogs, that's another subset of the real problem.

      We really need an online equivalent of the library of congress. The internet archive Wayback Machine might be a place to start.

    2. Re:Citing URLs is not quite appropriate (yet) by dagnabit · · Score: 1

      The APA way to format papers includes several ways to cite electronic references. There's a free guide for URLs here.

      Now the validity of the work being cited is another story altogether, as you say. But the same could be true for 'traditional' media as well - would you take someone seriously who quoted the National Enquirer's latest fad diet tellall in a "professional" paper about nutrition?

    3. Re:Citing URLs is not quite appropriate (yet) by c13v3rm0nk3y · · Score: 1
      The APA way to format papers includes several ways to cite electronic references.

      Cool. I didn't know someone was developing guidelines. I'll forward the link.

      As for "valid" papers, my GF showed me a paper that was both peer-reviewed, presented and published that was an experiment in writing "scholarly" papers that were poorly thought-out, full of factual errors, with no real citations. I wish I had the link (pun sort of intended...).

      The journal that published it is no longer publishing...

      --
      -- clvrmnky
    4. Re:Citing URLs is not quite appropriate (yet) by c13v3rm0nk3y · · Score: 1

      Just for completeness, though I know I've gone off on a tangent here, here is my GF's thoughts on the subject about citing online references:

      For the most part, the web is an inappropriate place for getting information for scholarly papers, particularly for undergrads. There are a few exceptions:
      1. There are online archives of published, peer reviewed journals. These typically require a subscription and are cited in the same way as if you read them on paper.
      2. If you want to make a claim about information that is available online or in popular news sources it is appropriate to use examples, and, of course, to cite them. In this case, the APA and MLA both have guidelines for online citations. For instance, in a paper I am currently working on, I cite a Google web search to illustrate both the number of charitable organizations available to which one can donate money, and the ease of obtaining information about them. However, in this case I am only illustrating a point, my argument certainly does not rest on online sources.
      3. There are a few online, peer reviewed journals. The ones I've seen are generally under the auspices of a university or foundation that exists in the physical world. This may indicate that if they run out of money or stop publishing they will be able to take care of archiving, but I don't really know. These are perfectly reasonable articles to use to support an argument, but I don't know how archiving is dealt with.

      Another reply to this thread specifically talks about how some fields are trying to handle the archival persistence of online references, especially as these become more common in that field. For now, it seems, a lot of research is only informally done online. The majority of results are still published in dead-tree editions.

      --
      -- clvrmnky
    5. Re:Citing URLs is not quite appropriate (yet) by deepestblue · · Score: 1

      In Computer Science, the situation is quite different. I am working on a PhD, and many publications freely cite URIs (involving both root-pages and deep-links). Most of them are either links into home pages of co-researchers (which change rarely) or links to online journals at the websites of publishing houses, which change even less frequently. Of course, as you'd imagine, most journals in Computer Science are available online (though not freely).

    6. Re:Citing URLs is not quite appropriate (yet) by vacuum_tuber · · Score: 1

      argent wrote:

      We really need an online equivalent of the library of congress.

      We have it. It's called NSA. The only problem is that it's a write-only archive except for a small circle of select people who can access it.

      --
      Look at the bright side: there's always seppuku.
  83. archive.org and copyright?-Library by Anonymous Coward · · Score: 0

    "Mostly they get by because they will remove content if requested, and nobody who cares cares quite enough to sue them on behalf of "the world" when they are satisfied to have their own content removed. In other words, they are basically OK because nobody cares to sue them. Strictly speaking, archive.org probably is the world's largest copyright violation."

    Maybe. Or maybe the copyright holders either don't know about the archive, or they see the value of the archive as a electronic library.

  84. The main difference being... by artemis67 · · Score: 2, Funny

    he Washington Post reports on the loss of knowledge in ephemeral web pages, which a medical researcher compares to the burning of ancient Alexandria's library.

    The main difference being that most of what was in ancient Alexandria's library was considered to be of importance to at least a sizeable group of people, if not the majority, whereas most of the web pages that disappear every day are simply dross.

    1. Re:The main difference being... by geeklawyer · · Score: 2, Insightful

      Thats not an entirely unreasonable view, however archeologists frequently gain important insights into an ancient culture by looking at dross. Near the burial sites of pharaohs were found carved complaints by workmen about poor conditions. in Greece (I think) notes were found n a ceremonial spot with curses aimed at neighbours and slutty wives. Gossip title-tatle for sure but quite informative and used to get a feel for the society.

      In 5000 years archeologists will learn so much about us from blogs & archives of /. They will learn Natalie Portman was a fertility goddess worshipped by the mysterious use of a dish called 'hot grits'

      --
      -he who laughs last, is a bit slow.
      journal
  85. Usenet more lasting record by Ridgelift · · Score: 1

    "It's a huge problem," said Brewster Kahle, digital librarian at the Internet Archive in San Francisco. "The average lifespan of a Web page today is 100 days. This is no way to run a culture."

    If you want something to last, post it on Usenet. If there a need to cite it in a document, post it with a unique ID#, so that a simple Google URL search along with the author's email address will find it among the billions of other postings.

  86. Alexandria had this problem licked... by Anonymous Coward · · Score: 1, Insightful

    There's already a method of long-term storage for established knowledge, and the library at Alexandria was pretty good at it: PRINTED BOOKS. Web pages were never intended to be static monoliths of information but were from the beginning meant to represent a "living document" where the exchange of information was the important thing.

  87. I do think that. by khasim · · Score: 2, Interesting

    "Do you think because you print it out it suddenly becomes a more stable reference?"

    Yes. Because now you have a copy of the source that you're citing.

    "Sometimes people doing professional articles have to cite web pages because that's where the information they are talking about is."

    And the article was about how the web pages don't stay live so you can't reference them later so the information is not available later.

    So, if you're going to use web pages as a citation, you need to have a means of referencing them after they go off-line.

    What better way is there than to have a copy of them yourself?

    1. Re:I do think that. by slimak · · Score: 3, Insightful

      Yes. Because now you have a copy of the source that you're citing.

      The real item of importance is that others have access to what you are citing. They may need/desire this for several reasons such verifying your claims and gaining more background information. By citing an online resource that is not backed by hard-publication (i.e. IEEE offers full-text online articles in addition to print, slashdot has no periodical that i know of) you may cite something that is gone tomorrow, possibily making you work look suspect. Furthuremore, anyone can post pretty much anything they want to the web -- think the onion.

  88. 100 days by feed_those_kitties · · Score: 3, Funny

    Unless it gets /.ed - then its lifespan might be measured in minutes!

  89. problem can easily be improve with some thought by agentk · · Score: 4, Insightful

    This has been a real problem for a long time. But the web is distributed. The only real solution is for people to realize that moving stuff around all the time breaks links, and avoid it. One thing that would help is a translation layer in the web server, that separates the URL from the server's filesystem. This is basic software engineering common sense.

    Non-transparent CGI, PHP and ASP scripts are even worse, they tend to change all the time. Instead they should be using the "path info", or be in the server (mod_perl, etc.)

    Example: "http://science.slashdot.org/article/03/11/24/1272 50" is a much better permanent URL for this story, than exposing the details of some perl script called "article.pl" that takes a parameter named "sid", and it will be easier to adapt to all future versions of Slash or other software, or to simple archive as a static file someday. Using the PATH_INFO CGI variable you can make a CGI like "article.pl" use URLS like that above.

    The idea that the basic job of a webserver is to pull files off your disk is incomplete: it's job ought to be to take your URL through *any* kind of query lookup, which might map to the filesystem and might not. The HTTP RFC's imply this as well.

    reed

    --

    VOS/Interreality project: www.interreality.org

    1. Re:problem can easily be improve with some thought by Anonymous Coward · · Score: 0

      Wow, you can get +4 Informative by simply paraphrasing Tim Berners-Lee without even providing a link.

    2. Re:problem can easily be improve with some thought by agentk · · Score: 1

      I guess so. Thing is, I wrote that completely independently of Tim B-L: I only read his article *after* making that post.

      Tim's article goes into depth and gives some good ideas on managing your documents and determining what links should look like, etc. It's short, I recommend reading it.

      reed

      --

      VOS/Interreality project: www.interreality.org

  90. In the USA, it could be fair use by yerricde · · Score: 1

    This does however break UK, and I presume most oher western copyright law.

    In the United States, "the fair use of a copyrighted work[...], for purposes such as [...] scholarship, or research, is not an infringement of copyright." (17 USC 107; emphasis added by yerricde).

    --
    Will I retire or break 10K?
    1. Re:In the USA, it could be fair use by IM6100 · · Score: 1

      It's refreshing to see someone asserting 'fair use' rights without it just being so they can have pop music running in the background without paying for it.

      --
      A Good Intro to NetBS
  91. Todays Society by nurb432 · · Score: 1

    100day information life span.. thats typial of todays 'throwaway culture'.

    --
    ---- Booth was a patriot ----
  92. The web can hold insight,in the right field-Backup by Anonymous Coward · · Score: 1, Insightful

    "And heck, what's the harm in saving the pages on your drive and contacting the original author if they disppear? Hard drive space is cheap. If you take yourself seriously, you might want to grab a snap, even if it is technically illegal (not that I know that it is; Google seems to do it right often)."

    You might want to make certain you have it RAID'ed. I had TWO IBM Deskstars die in the same time period. What a pain to recover what I could. And I believe that Google could fall under the same provisions as a Library.

  93. Re:Others will cite this & the post as proof.. by Minna+Kirai · · Score: 1

    one cannot understand Abortion rights unless one here's both sides; ProLife and ProAbortion.

    Funny, you swapped the propaganda name "pro-choice" for the technically accurate ProAbortion, but you left ProLife in its typical advertising form instead of the more correct "anti-abortion". That could be bias right there.

    The sentiment that web sources, just because they aren't written without journalistic/legal lingo and because that news isn't from "the usual outlets" (CNN,ABC,NBC,CBS, Time, etc)

    Umm, those media you listed are web sources. If someone is going to cite CNN, ABC, NBC, or CBS, then going to the web page will be much more reliable and practical than referencing:

    [3] Saw it on Nightline Aug 12, 2003, right after the 2nd commercial.

    Researchers can't really cite TV. (They can cite transcripts, which are documents existing independently from the TV show, and which are often on the web)

  94. Re:Others will cite this & the post as proof.. by Anonymous Coward · · Score: 0

    ... that nothing can be trusted if it is reported on the web.

    You're missing the point. Anyone can publish on the web. The "usual outlets" (as you put it) aren't credible because they use "journalistic/legal lingo," but rather because they have an editing process and make some attempt at adhering to journalistic standards and ethics (e.g. requiring two sources to cite an opinion).

    Allow me to demonstrate:

    That's sort of ridiculous, seeing since the source is sometimes bias itself (Washington Post)
    That's sort of ridiculous, seeing that the source is, in my opinion, sometimes biased itself (e.g. The Washington Post).

    To me, knowledge is only truth if both sides to an extreme are presented. Meaning; one cannot understand Abortion rights unless one here's both sides; ProLife and ProAbortion.
    I believe published "knowledge" is only truthful if the extreme sides of an issue are presented. In other words: one cannot truly understand abortion rights unless one hears both sides: pro-life and pro-choice.

  95. freenet? by Anonymous Coward · · Score: 0

    A freenet-like model might be worthwhile for this kind of stuff. Data is referenced by keys that can't change, hence your link will point to the right piece of data for ever. Of course, unpopular content disappears over time, but basing this on popularity rather than the author's efforts seems like a vast improvement.

  96. The problem with hosting... by EvilTwinSkippy · · Score: 1
    The answer is simple; host your own website. I've been hosting etoyoc.com from my living room for (lets see 2003-1998=5) 5 years now. It's been through 3 service providers, has been hosted off of DSL, a Wifi link, and a T1 at work. The server itself is a patchwork of leftovers from the "real" machine.

    I even had the presence of mind to reprogram the personal web-space from my school account to redirect to my present one before I left. That page has been up for so long asking for "Sean Woods" gives you me as the #1 link on google. Impressive considering "Sean Woods" was a character in a Tom Clancy novel, and seems to dominate all of the rest of the links.

    Back on point, running your own server doesn't cost much. Spare parts, a broadband connection with a static IP, and a little knowhow. Around Philly a place called MartNet used to have a ghetto colo where you provided the box, and they provided power and T1 access for $100/month. They don't anymore (ratzen fratzen) but if I had an old warehouse that's the business I would be in.

    Hmmm. Data warehousing. Data center in a warehouse...

    --
    "Learning is not compulsory... neither is survival."
    --Dr.W.Edwards Deming
  97. Misleading statistics by Alomex · · Score: 4, Interesting

    The article claims that "the average life span of a web page is 100 days". This is a very misleading statistic. What it really means is that the average web page is updated every 100 days, not that the page dies and goes away after 100 days.

    Moreover, as you can imagine, authorative sources (the type that people are likely to quote) are updated much less frequently.

    1. Re:Misleading statistics by Artifakt · · Score: 1

      Authoritative sources aren't the only ones that tend to stay unchanged. Many of the real fanatic sites have copious dis-information, but the people who run them are eager to preserve their "TRUTH" even at enormous costs. Holocaust deniers, Kennedy assassination conspiracy buffs, and the whole tin-foil hat crowd run some of the most long lived web sites around. The ephemeral nature of most sites gives these a spurious authority - the original site may well outlast its critics and rebuttals.

      --
      Who is John Cabal?
  98. Solution: print webpages you cite by BenLev · · Score: 1
    While some people don't really care if a webpage they reference later disappears, anyone running a scholarly publication needs to protect the integrity of his sources.

    At the journal I work for, we print out copies of many online sources cited by our authors.

    This works well for things like reports available online (e.g., something in PDF form) and much less well for a reference to a webpage in general (e.g., if someone writes "The Columbia Law School advertises its many course offerings in international law"--adding a link to the law school webpage as a reference--in an article about the proliferation of international law courses at major law schools). Unless one wants to print out the whole site, one is pretty much out of luck.

    That aside, it's a good policy to archive print copies of web documents cited in a serious piece of research. That way, future researchers investigating the article can check the sources.

    - Ben

  99. RSS Feed by Dave21212 · · Score: 1


    Good point... maybe you could pull and archive the RSS feed ?

    --
    "Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech."--Benjamin Franklin
    1. Re:RSS Feed by makapuf · · Score: 1

      does that include every discussions & threads ?

  100. websites as reliable sources in science? by XenonChloride · · Score: 1
    Dellavalle, a dermatologist [...] had co-written a research report featuring [...]footnotes -- many of which referred [...] to Web sites that he and his colleagues had used to substantiate their findings.
    Pardon me, but I call this unprofessional! In order to support your own findings/experiments you may cite peer-reviewed articles form printed or online journals having an ISSN or refer to personal communications. Usually, the latter only makes sense if the source has some sort of reliability and reputation in the scientific community. Didn't Dellavalle and collegues contact the authors of the respective web sites in time - just to make sure that the data published were correct and obtained under defined and reproducible conditions? Apparently not! If they would have done, they probably would have had enough background information to recontact the original of the data.
    1. Joe Sixpack, www.blaaa.net/index.html.
    2. A drunken bloke in the subway, personal communication, 2003.
    are not relevant references.
  101. 1984 by Anonymous Coward · · Score: 1, Insightful

    Should the web become the *sole* source of information, the Ministry of Truth will come into being. No piece of information will be trustworthy, because all information will be mutable.

    This is already happening. Read a cnn news story (something controversial or important) and save the text. Come back a couple of hours later-- you will often find changes in the text.

    What is truth when there is no proof?

    It's whatever they want to tell you.

  102. Ah, the missed possibilities... by djh101010 · · Score: 1

    Now, it'd be funny if that page had gone 404 on us...

    1. Re:Ah, the missed possibilities... by ruzel · · Score: 1

      That page didn't go 404 on us, but it's still ironic that the "hotel homepages" link (cited as being a good measure against "linkrot") DOES lead to a 404.
      _______________________________________

  103. Legal citations and authority of internet sources by mtpruitt · · Score: 5, Informative

    Law journals have tried to tried to cope with the proper weight of authority to grant web pages by trying to follow the Blue Book, a citation manual.

    The general rule has been that whenever you can find something in print, cite to that, but add an internet cite when either it is available and would make it easier to find, or if it is only available online.

    Things that are only available online are surprisingly common in citation. The leading court reporter services (WestLaw and Lexis Nexis) both have cases that aren't "officially" printed, but are available online.

    Also, many journal articles will cite to web pages such as a company's official description or press releases.

    In general, these citations are treated for their functional purpose and not their form of media -- online cases are grouped (last) with other cases, and information from most web site is considered a pamphlet or other unofficial publication.

    This system seems to deal with the fact that they are ephemera pretty well. The citations really are only used to make a point that is merely illustrative or is easily accessible to legal practitioners.

  104. 2003 SOSP paper on this... by Slowping · · Score: 1

    here is a recent SOSP paper that discusses using a P2P system to preserve the integrity of publications.

    --
    (\(\
    (^.^)
    (")")
    *beware the cute-bunny virus
  105. Digital Dark ages by mattpalmer1086 · · Score: 2, Insightful
    This will actually be a major problem for society and deserves serious consideration. Research in particular builds on prior research, not just recent research, but research which may have taken place 100 years ago or more.

    It's implications go way beyond web pages, which are just one of the first manifestations of our electronic culture creating records that never touch paper, or other more established and permanent mediums.

    Businesses typically only have to archive material for around 7 years legally, although some industries like pharaceuticals have to preserve data considerably longer. This is fine when records are primarly paper based, with some nice computers to speed our current business along. When records are totally electronic from start to end, ("born digital"), we start to have problems, legally and culturally. Some researches are talking about a digital dark ages, where many of our records today will simply vanish from history, totally inaccessible and unpreserved.

    This is about storage, migration and emulation. It's about persistent identifiers. It's about technology obsolesence leading to cultural obsolesence.

    Matt Palmer Digital Preservation Department UK National Archives.

  106. Throw stuff away when you're done with it. by lawpoop · · Score: 0, Flamebait
    I think 100 days is just about right for the life cycle of information. Why have stuff around if nobody wants it?

    Face it, the internet is about computers. Computers change so fast there isn't much worth hanging on to.

    --
    Computers are useless. They can only give you answers.
    -- Pablo Picasso
  107. I agree. by khasim · · Score: 1

    Citing web pages is a really dumb thing to do.

    But if that's the only reference to the data, and you must have that reference, then you'll need to save a copy of it.

    That way, if the reference does vanish, you'll still have a copy that you can "fair use" as a citation.

    But "fair use" depends upon our copyright laws which are subject to change.

    Even making the original copy of the web page might violate the future copyright laws.

    Which brings me back to ....
    "Citing web pages is a really dumb thing to do."

    1. Re:I agree. by whorfin · · Score: 1

      If the web is the only reference to the data I need for a paper of any kind that I expect others to depend on in the future, and you cannot contact the owner of the page for their sources, then why not just ask a random person on the street for whatever you want/need to support the position of your paper? The two have equivalent reliability.

      Just say this phrase whenever you cite information that is only netborne, and ask yourself if you still want to have your name attached to it:
      "It must be true, I read it on the Internet!"

      --
      Laugh while you can, monkey-boy!
  108. Even "hard copy" today isn't the same by jtheory · · Score: 4, Insightful

    I read an interesting article a few years ago about how even our hard copy (books, magazines, musical scores, etc.) won't be nearly as useful to future historians.

    Why?

    Current historians learn a lot about each writers creative process, and how writers evolved their ideas, from drafts and corrections. Music scholars pore over every scratched-out note, every furious scribbled comment, in Beethoven's draft scores. Writing music was laborious and hugely frustrating for Beethoven, unlike Mozart, who hardly stopped to think and made few if any corrections.

    Future scholars won't know any of this stuff, looking back at our work. We use software to edit our work... so when we fix our errors they are gone forever. We change our minds and the original idea disappears in a puff of electrons. An electronic score of a Beethoven symphony only differs from a Mozart concerto in the musical style -- all of the other data is gone.

    It's a sobering thought. Where else are we going to get this data? Not letters, because we write emails now, and regularly delete them (intentionally or not). Diaries? Some people still keep them on paper... but many store them on computer, or publish them in blogs (which as discussed will mostly be gone).

    Sobering thought isn't it? It's not neccessarily hubris to say we ought to be saving more of this stuff; people a few hundred years from now should be able to learn from our failures, as well as our successes.

    --
    There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
    1. Re:Even "hard copy" today isn't the same by a+whoabot · · Score: 1

      The fact that everything is increasingly fluid is merely part of the culture, it doesn't have to be otherwise. If everything had to be traditional to maintain culture and art, I would go crazy, because nothing is. The key is to look for new ways to create culture and art in a rapidly shifting time. And the composers of this time know this: they don't write the way Beethoven or Mozart did. As long as there's some amount of good in the world there will always be artists of pure genius who can view deeply into the workings of the world which we reside in and give it back to us in the most interesting ways.

    2. Re:Even "hard copy" today isn't the same by The+Limp+Devil · · Score: 2, Insightful

      On the other hand, we're drowning in sources from the eighteenth century and forward. In many countries, medievalists can reasonably expect to read all, or at least most, of the preserved texts from the period they study. Anyone working on later periods will just have to make some sort of selection and hope it's representative.

      The degree to which drafts of manuscripts and musical scores from earlier periods have survived is already arbitrary, and it will be so in the future: Backups will contain the drafts in the future. Some them will surely survive.

    3. Re:Even "hard copy" today isn't the same by tuxedo-steve · · Score: 1

      True, but Beethoven lacked access to decent version management software.

      Problem solved.

      --
      - SMJ - (It's not just a name: it's a bad aftertaste.)
    4. Re:Even "hard copy" today isn't the same by isopossu · · Score: 1
      But of the millions of hard disks some might survive. Actually it's not so easy to erase the disks completely and reliably, and quite a few ever even try it.

      It is probably very easy to future computers to go through today's data storages. An autopsy of today's artist's or writer's used computer might give more information than we can even dream of. When are the texts written, how fast, how edited etc.

      I remember nowadays' text processors' or drawing programs wrap in their files (.doc etc) a lot more information than people know.

  109. Sounds like purl.org by smcv · · Score: 1

    purl.org provides "persistent URLs", basically a permanent HTTP redirect (for instance, the Dublin Core people use http://purl.org/dc/ which redirects to http://dublincore.org/ - more importantly, they use purl.org for namespace URLs which need to stay valid indefinitely, like http://purl.org/dc/terms/1.0 which redirects to the Dublin Core Terms specification in RDF).

    These don't address the fact that documents get altered or deleted, for which you're still dependent on the original author maintaining the old content at the same URL.

    Sometimes documents not being modified isn't necessarily what you want, anyway - for instance, sites with navigation/stylistic stuff "wrapping" the content (like my site) might want to redo that without altering the content.

    Also, sometimes a URL isn't meant to always refer to one version, but is meant to refer to the latest version (like w3.org specifications, which have a long date-stamped URL for "this version", and a short URL for "latest version of this specification").

    Basically, it's up to the content author to follow good practices.

    On pseudorandom.co.uk I use a year in the URL for all permanent content, except a couple of "navigation" pages (the games section, the "contact me" page) which pre-date my current URL scheme - there are so many references to those elsewhere in my site, I don't want to have move the pages, fix the links and install redirectors.

    For content that existed before I switched to year-based URLs, I moved the content page to a new URL based on the year of first publication, and installed redirectors from the old name (mod_rewrite is very useful for this sort of thing).

  110. Yeah, dammit! by lumpenprole · · Score: 1

    I want future generations to have access to kirk/spock gay fan stories! How could we just let that dissapear?

    Really. Methinks the author of this didn't really take a look at the majority of webpages on the net. I mean we're not talking about the library of Alexandria here, we're talking about pron sites and endlessly cross-linked blogs. Fun? Yeah. Important to the preservation of culture? God I hope not.

    --
    Disclaimer: MINAA (Mummy! I'm Not An Animal!)
  111. I feel better now by bensagenius · · Score: 1

    When I first read this, I'll admit, I was a little alarmed. Then it occurred to me -- no one is ever going to check my citations, anyway!

    --
    I am not left-handed, either!
  112. It is a throwaway culture by EmbeddedJanitor · · Score: 1

    People still use FAT file system to store their data, though there are robust alternatives. This makes me believe that everyting is throw-away - why not culture too?

    --
    Engineering is the art of compromise.
  113. Re:usenet posting by iggymanz · · Score: 1

    actually, how would "they" ever know it really was you who was posting, not someone with same name? Only a problem if you posted from your company's computer system, and you still work at the same place after 10+ years (how many slashdotters would fit category of having same job for more than decade?)

  114. Evolution Applied to Knowledge by cattail.nu · · Score: 1
    There is so much information on the web right now that it is impossible to read it all. Some knowledge that is on the Internet is effectively not there because the sites are lost behind other references in the search engines. The same would happen to "Internet archives of everything".

    It's the Theory of Evolution. The strongest web pages will survive! The archeologists of the future can still dig through mountains (of data) in the future to learn about us.

  115. RTFA... it's about references in scientific papers by dpbsmith · · Score: 5, Insightful

    The article is not about archiving "everything in the world." It's specifically about references in scholarly papers, which, for the past three or four centuries, have been part of the essential fabric of scientific research. In a research paper, everything you say is either supposed to be the result of your own direct observation, or backed by a traceable, verifiable, and critiquable authority.

    You don't just say "Frotz and Rumble observed that the freeble-tropic factor was 6.32," you say "Frotz and Rumble (1991) observed that the freeble-tropic factor was 6.32." Then, at the end, traditionally, you would put "Frotz, Q. X and Rumble, M (1991): Dilatory freeble-tropism in the edible polka-dotted starfish, Asterias gigantiferus (L) (Echinodermata, Asteroidea), when treated with radioactive magnesium pemoline. J. f. Krankschaft und Gierschift, 221(6):340-347."

    Then if someone else wondered about that statement, they'd go to the library and pull down volume 221 of the journal, and see that Frotz and Rumble had only measured that factor on six specimens, using the questionable Rumkohrf assay. If they had more questions, they'd write to Frotz at the address given in the article, asking them whether they remembered to control for the presence of foithbernder residue.

    This sort of thing is absolutely essential to the scientific process and makes science self-correcting.

    The article says that these days, the papers are published online, the references are URLs, and that an awful lot of them are stale. If so, this cuts to the very heart of the process of scientific scholarship.

  116. But even "reputable" web pages get (re)moved... by aquarian · · Score: 2, Insightful

    It's not just the short lifespan of a webpage... it's also the fact that the source isn't always reliable. Web publications are rarely given the same strict editorial process as most journal articles. The content might be just as good - or better - but they're also not given the same credibility.

    The problem *is* the short lifespan of web pages. Even "reputable" publications move their pages around, or remove them entirely, breaking all links. I'm talking about major newspapers, scientific journals, etc. It's these people, the supposedly reputable ones, who need to do a better job. The way they're doing things now is indeed, "no way to run a culture."

  117. Hyperbole by Flunitrazepam · · Score: 1

    Please... the burning of the Great Library set back our entire race centuries.

    The fact that I can no longer read the entry in my ex's blog from the day she realized I was trying to get it on with her little sister is hardly comparable.

    --
    1) Your analysis is based on bad assumptions so your result is way off. 2) You're a sick bastard for fucking a horse.
  118. there has been research into this problem... by e40 · · Score: 1

    This article discusses the research into the subject problem. Ironically, the paper to which they link, on the researchers own site is no longer at that address!! Seems like the authors were not eating their own dog food, as they say. Furthermore, news.com did a good job preserving the URL for the last 3 years. Kudos to them.

  119. Simplest Solution. by caesar79 · · Score: 1

    If you make any references to a website, be sure you maintain a local copy on your website along with the link to the original and ALWAYS reference your website.

    For e.g.

    reference 10. XX Available at http://www.mypage.com/reference_list/xx

    page xx constains

    Article by so and so. Link here. Local Cache over here.

  120. "intellectual property" by Grimwiz · · Score: 1

    The obvious reason for this is that it costs money to keep a presence and data available on the internet, whereas when information is archived by the old library system it incurred no overheads on the author.

    The real shame is that now information is an asset (to be bartered or controlled) a well-meaning foundation cannot host the data without incurring legal penalties.

    --
    -- Don't believe everything you read, hear or think
  121. Re:RTFA... it's about references in scientific pap by squidfood · · Score: 2, Insightful
    This sort of thing is absolutely essential to the scientific process and makes science self-correcting.

    Recently a colleague of mine published a paper in an online peer-reviewed journal which contained a trivial error (transposition typo) that however would change, in fact reverse, the interpretation results. They were permitted to fix this, months after the article had first been posted. Does this aid Progress, or is it Revisionist?

  122. Copyright is the Problem by LuYu · · Score: 1

    The average lifespan of a Web page today is 100 days.
    Copyright was claimed to be created for the creation and dissemination of knowledge. At that time, creation of books was easier than their wholesale destruction. A printed book would last for years unless actively destroyed.

    Those days are over.

    Today, a book that is thrown away -- unlinked -- is effectively destroyed. When a server is turned off, that information is rendered inaccessible to the world at large. A DRM'd work is similarly useless. If only one person has the right to store and disseminate a given piece or set of information, that information is vulnerable to complete destruction, and everybody stands to lose. Still copyright protects people's "right" to hide and destroy information.

    Why should information (which we cannot judge to be valuable or not until we have encountered it) be allowed to be destroyed? Can We, as a society, afford this?

    We are misusing the Internet, which was designed to replicate information in a fault tolerant way. Whether or not any given information is valuable to business should not be our question. Our question should be: How can We allow laws originally conceived to increase the volume and newness of information to prevent Our access to information? Instead of making Us more educated, copyright is making Us more ignorant, and putting some people in court and stealing their hard earned cash (this is actual theft, be it legal or not, because the person is actually deprived of real money, as opposed to the "theft" of P2P which deprives copyright holders of fictitious money -- they call it "potential profit").

    Ignorance has profited many regimes from the book burnings of China's First Emperor to the prohibitions on education in the Indonesia of the Dutch. However, today, it is supposed to be We The People who rule. If that is so, why do we allow a few scattered monopolists to steal from Us the information that empowers Us to be the rulers of Our elected representatives? If government censorship is wrong, why is corporate censorship right (especially when corporations have so much influence over governments)?

    Access and copy restrictions should be illegal for all intellectual material, since, if it is "intellectual property" at all, it is the property of Us The People for Whom the Constitution of United States and all similar documents were written. It is every individual's responsibility to replicate as much information as possible to ensure the information is available to everyone even after the publisher's fickle breezes have changed course.

    I thank the Internet Archive for its humble attempt at fulfilling this responsibility. Where are the rest of Us The People who are willing and able to defend their right to learn, nay, their right to think?

    --
    All data is speech. All speech is Free.
  123. Citing URLs has been appropriate since 1991 by cquark · · Score: 2, Informative
    The idea is that you generally have to cite peer-reviewed, published and presented articles; criteria which the majority of web published material simply does not satisfy.

    While it's obvious that not every URL is appropriate for a research paper, papers in high energy physics have used URL-references to preprints at arxiv since 1991. It's not surprising to see some less technical fields like anthropology further behind in understanding and using the technology, and high energy physics has a particular advantage in that the web was originally created for disseminating information in that field.

    People interested in the evolution of an electronic knowledge architecture that's gradually replacing the print one in some scientific fields will likely find the articles Creating a global knowledge network and Can Peer Review be better Focused? interesting. Both are by Paul Ginsparg, who started the preprint archive 12 years ago at LANL.

    It's also worth noting that free, public access to preprints has democratized physics research, as all researchers have access to timely information instead of only a few who had the right connections to get early copies of preprints before 1991. It also provides affordable access to physics articles to researchers at institutions whose libraries can't afford the 5-figure subscription fees of many modern scientific journals.

    1. Re:Citing URLs has been appropriate since 1991 by c13v3rm0nk3y · · Score: 1
      It's not surprising to see some less technical fields like anthropology further behind in understanding and using the technology, and high energy physics has a particular advantage in that the web was originally created for disseminating information in that field.

      I agree that this field has a particular and long history with web technology.

      Statements like "further behind in understanding technology" are a bit back-handed (though I'm sure it was not intentional). To this, my GF states "the social sciences are more robust".

      Heh. Given that the original article was warning about increased reliance on scholarly works being archived online without a decent infratructure to maintain linkage and history, one could say that it's not surprising that more technical fields like high-energy physics are a little too ahead in this for their own good!

      I'm tweaking both your fields a bit; it's all in good fun.

      --
      -- clvrmnky
  124. These "reseachers" aren't too bright by smagruder · · Score: 1

    Here's the solution for them:


    1. Create simple redirecting URL's that you and/or your publisher can control for an indefinite period of time.
    2. Archive all the sites you reference.
    3. If a site you reference goes away, repost the archived site yourself and change the redirection.

    Poof! Easy.

    --
    Steve Magruder, Metro Foodist
  125. people have a choice: Arxiv by penguin7of9 · · Score: 1

    If you want your scientific papers to be archived, publish them on Arxiv (search on Google). Arxiv is replicated to multiple sites, backed up, archival, and does the right things with versions and changes.

  126. Books aren't really that reliable vs. web by Ralph+Spoilsport · · Score: 1
    A number of people in academia (my profession, as it were) suffer from significant bouts of cranio-rectal inversion, and seem to think that webpages have less "validity" than books.

    This is patently rubbish.

    Example: Ann Coulter's book Treason. It's utter crap. Full of lies and innuendo. But, it's a book with an ISBN number. Why is it privileged over a webpage?

    Another example: anything written by Bill O'Reilly. I know there are "liberal" examples, but the right wing nutjobs are so lacking in subtlety, that they make for simpler examples. Anyway: his book is also utter balderdash and plays fast and loose with the facts. But: it's a book, and so it gets privilege over some intelligent well reasoned webpage (regardless of political persuasion.)

    He or Coulter get to go on book tours, travel the USA slogging their lies and stupdity on radio programs, all because ooks are privileged. Meanwhile, perfectly reasonable bloggers write incisive intelligent analysis and are completely ignored.

    Now, when it comes to scientific papers, there is a thing about peer review in papers, and so for that reason, these publications do have privilege, but they needn't be printed on trees. Hence: there is a complexity involved: if there is peer review, the final resting place of the info is of less consequence: a peer reviewed paper on protiens that is published at a website and can be found years later on the same website, has a lot of privilege - nearly as much as the same document printed on paper and mouldering away in a library basement.

    But without the conditionals, the peer review, etc., there is a vast gap in privilege, and I see that as essentially problematic for archiving the culture of our time. Perhaps it should all be printed in pH balanced paper and stored in a warehouse...

    Another interesting problem is this: we're not going to have the same massive overpopulation in the future, that obtains in the present. There will be an order of magnitude fewer people looking at the stuff, and equally less interest in it. Given the volume and scale of the data set, we're looking at a catastrophic loss of information. Since it is all digital, ther won't be any recourse: it's nto like clay tablets that can be glued back together. I humbbly submit that people 1000 years fro mnow will know more about the 18th century than the late 20th or 21st centuries, and that's just very sad.

    RS

    --
    Shoes for Industry. Shoes for the Dead.
  127. Relates to "Research Work" by delete · · Score: 1

    As others pointed out, this article referers to citations made in research papers to online sources which become obsolete over time. Unless your 4 year old is regularly publishing work referenced in academic journals, this probably isn't an issue.

    Incidentally, try Citeseer for an example of a stable online repository of research papers.

  128. CrossRef initiative and DOIs by jtoras · · Score: 2, Interesting
    Most scholarly publishers (science, tech and medical) participate in CrossRef initiative (crossref.org). This initiative makes it especially easy to cite electronic articles. The publisher registers a unique persistent DOI for each article with CrossRef and thus this DOI is used to cite the article be it printed or electronic.

    Since publishers register DOIs as soon as the electronic version of the article is available online, the article is citable using DOI way before the print journal goes to press. And since the DOIs are persistent, links will work even if the journal changes ownership/publisher.

    In addition to providing free DOI resolution, CrossRef also provides a free metadata lookup for libraries (or it will provide it for free soon I think). Libraries will be able to lookup DOIs using article metadata as needed.

    Many publishers also participate in variety of archive initiatives, where a copy of every electronic article is made available in large or national libraries for safekeeping. In case the publisher goes out of business, the library or institution has the authority to make the stored archive available to public. With persistent DOIs this will be very easy since the existing links will not break even if the servers are different.

  129. We have this now. It's an archive.org reference. by Animats · · Score: 1
    Here's an old Slashdot page in the Internet archive. Decoding the URL http://web.archive.org/web/20000301205131/http://w ww.slashdot.org/ is straightforward. It's just the archiving site ("web.archive.org"), the medium being archived ("web"), the date and time, and the original URL being archived.

    There's another copy of the archive at "archive.bibalex.org", in Egypt. Brewster Kale wants to have four copies worldwide; then, he thinks, the information will be safe.

    One problem with the Internet Archive is that the server farm is unreliable. Sections of the archive drop offline for days at a time. It's built out of thousands of commodity PCs sitting on shelves in a building in San Francisco.

    Another problem is that web sites that are too complex don't get archived properly. If there are links embedded in JavaScript, Java, or Flash, they won't be properly adjusted to the appropriate archive references. This becomes more of a problem as more pages are created with overly complex authoring tools.

  130. Cool URIs don't change by yerricde · · Score: 1

    The URL may work today, but what happens when the site moves to a more scalable system?

    Then the system uses a rewrite rule to HTTP Redirect each page in the old URL-scheme to a page in the new URL-scheme. What's so hard about that? Cool URIs don't change.

    --
    Will I retire or break 10K?
  131. Re:RTFA... it's about references in scientific pap by AnotherBlackHat · · Score: 1

    Recently a colleague of mine published a paper in an online peer-reviewed journal which contained a trivial error (transposition typo) that however would change, in fact reverse, the interpretation results. They were permitted to fix this, months after the article had first been posted.
    Does this aid Progress, or is it Revisionist?


    It's better to fix it than to leave it broken, but even better IMO,
    is to fix it and add a footnote that explains when the change was made, and why.

    -- this is not a .sig
  132. well though-out solution: www.arXiv.org by call+-151 · · Score: 1
    As some other posters have mentioned, many scientific displines (math,cs, and physics, e.g.) have already addressed the emphemerality of research web pages with central preprint servers with mirrors and some nice front ends for searching and contributing to www.arXiv.org. Responsible people who are interested in disseminating their research widely and in a way which is recorded submit their current work to the arXiv, usually just before sending it off to a research journal, electronic or otherwise. The archival issues of file formats and such have been well-thought out by a number of people for whom this is very important and the main preferred format is TeX, as described in this FAQ.

    This is a big improvement over the previous system, where you would send of printed copies of your work to bigshots and people you thought might be interested and prevented wider distribution of preprints and results until your article was accepted and published by a journal, which with refereeing and printing backlogs, averages more than a year for most research journals in mathematics.

    From the arXiv front FAQ, addressing the concerns in the article:

    2.2 Why can't I just give a URL?
    If derivative formats of an article are less useful than the TeX source, a URL is the least useful of all. A list of URLs is like a phone book: easy to compile, temporarily convenient, and soon unreliable. The purpose of the arXiv is to record and distribute the research literature, not merely to announce its location. (On the other hand, you are free to include extra URLs along with the genuine article.)
    --
    It's psychosomatic. You need a lobotomy. I'll get a saw.
  133. I have been examining this phenomenon... by Lodragandraoidh · · Score: 1

    From a practical perspective, I have noticed the same phenomenon over the years, and have created the following axioms for my own sanity:

    1. You can not depend on any outside entity to archive information that is important to you.

    If there is come critical piece of information that you need to do your job, or as a reference to related work - by all means download and keep an archival copy for your own use. While the Internet Archive is an excellent resource - there is no way they will be able to keep track of everything on the net for all time. The drawback of this is that if you do not periodically look at the original web page you will be using the latest information (I will address this issue in a moment).

    2. Look for means of extending the ability to locate information beyond the URL.

    While the URL is a great boon to keeping unique locations on the web, they do not encapsulate enough information (meta information) to make searching and locating information easy. The problem is not just related to the internet - it also encompases other storage mediums (i.e. files outside of the exposed WWW partition). There are some recent tools that are at a test bed level now that can be used to solve this problem if brought into mainstream use, as we will see below.

    I see several technologies need to be developed/perfected to help ameliorate these issues:

    a) Software needs to be developed for end users to manage their own information resources - similar to how the Internet Archive keeps track of changes to web pages. The software should allow the user to archive pages to the local drive as desired, and provide a version control system for easily retrieving previous versions as needed; the system should also provide:

    b) An easy means of keeping meta information and annotations regarding a particular web document needs to be made a standard part of all web browsers. A good starting point is the W3C Annotea standard for keeping meta data - as implemented in the Amaya editor/browser.

    I think a good set of the pieces are already in place to accomplish what I suggest - the real issue now is integrating them into current end user tools.

    The next, and perhaps biggest, question that needs to be resolved is how does DRM fit into this picture (if at all), and how much will DRM serve to further erode the cultural continuity archivists desire?

    --

    Lodragan Draoidh
    The more you explain it, the more I don't understand it. - Mark Twain
    1. Re:I have been examining this phenomenon... by Un+pobre+guey · · Score: 1
      sig: Life will always be about struggling for what is right against greed and stupidity.

      Sysiphus meets Don Quixote, with Sancho Panza standing to one side, laughing heartily.

    2. Re:I have been examining this phenomenon... by Lodragandraoidh · · Score: 1

      Life is about that. However, our performance is rarely as consistent as our best intentions.

      Conversely, the same thing can be expressed as:

      Life will always be about further our greedy desires, despite our stupidity, at the expense of truth.

      It all depends on where you mostly fall in the desire/righteousness continuum.

      --

      Lodragan Draoidh
      The more you explain it, the more I don't understand it. - Mark Twain
    3. Re:I have been examining this phenomenon... by Un+pobre+guey · · Score: 1

      Don't get me wrong, I share the ideals expressed in your sig. It is disheartening, though, to see those very ideals crushed every day on such a vast scale.

    4. Re:I have been examining this phenomenon... by Lodragandraoidh · · Score: 1

      The key to not becoming disheartened is to pick your battles carefully. That way you aren't always getting squashed. Know when it is most useful to expose your hand, and when it is better to work quietly behind the scenes.

      Overall, it is much better to gather 10,000 allies quietly over time, than to run out into sunlight alone and get squashed right away - unless you are into being a martyr. Patience and sacrifice = success. Sacrifice alone = death.

      I do not advocate blind sacrifice. I do advocate struggling for what is right - but smartly, with your eyes open.

      --

      Lodragan Draoidh
      The more you explain it, the more I don't understand it. - Mark Twain
    5. Re:I have been examining this phenomenon... by Morosoph · · Score: 1

      I am slightly troubled by one aspect of your .sig though, and that is the prevalence both of harmless forms of greed, and of mechanisms by which greed can be harnessed (vis-a-vis trade). There is a risk that in seeking to neuter greed, life is harmed: that is, excessive righteousness can be destructive.

      But then, maybe I just need to read the rest of your .sig; stupidity is indeed a great force for evil, especially when it combines with greed (notably for power). I just think that the way it is written is an attack on the right, whereas such idiocy is near-universal!

    6. Re:I have been examining this phenomenon... by Lodragandraoidh · · Score: 1

      Trade, and I think that is what you mean by 'harmless greed', is not Greed. Balanced trade is trade that benefits not only the seller and the buyer, but also the workers the environment, and society at large.

      Greed, on the other hand, leads to destruction and destitution; there is no give and take - only take. Decisions based soley on greed are not wise decisions - only mercenary decisions bereft of any considerations of the moral, social or environmental wisdom.

      When people with billions of dollars make greedy decisions at the expense of everything decent and equitable, then real suffering occurs on a large scale - both now, and in the future. When people have this kind of power, they need to treat it with a commensurate level of care. Unfortunately, people are not perfect, and it seems like the people who end up on boards of directors are the lowest sort when it comes to character (perhaps this is because most of them always got what they wanted - or never tasted the bad effects of their poor decisions) and wisdom in the excersize of this power.

      That is my take on it. Your mileage may vary.

      --

      Lodragan Draoidh
      The more you explain it, the more I don't understand it. - Mark Twain
    7. Re:I have been examining this phenomenon... by Morosoph · · Score: 1

      It's not just trade that is "harmless greed"; seeking gain can lead to beneficial trades. There is no essential link between seeking gain and harming others, particularly if one wishes to sustain such gain over the long term (read Axelrod's "The Evolution of Cooperation" or Matt Ridley's "The Origins of Virtue"), but it is true that if one doesn't consider others or (say) the environment that one is more likely to do harm.

      Seeking gain (Greed) in itself does not necessarily lead to destruction, as in fact that greediest acts tend to be cooperative in nature in order to be able to secure future gains. This doesn't mean that one cannot do better, but it does mean that one has to be very careful when attempting to regulate greed. Crime certainly should be regulated, and harm to others (as opposed to desire for gain it itself) can be legislated against or taxed, but whereas harm to others tends to involve greed as a major component, the reverse implication does not hold.

      Even a purely greedy decision does not necessarily do harm. It is, of course much more likely to, but there are also likely to be corresponding gains. It is therefore much more important to focus upon harm done than the motive that is supposedly behind such harm, and to legislate from a perspective of maximising well-being, rather than dealing with people having the wrong motive. For analogous reasons, I prefer to focus upon eradicating poverty rather than equality per se.

      Stupidity, I do agree with, though, whether it comes in to form of blindness as to the long term, or else reactive legislation to deal with "evil" rather than actual harm.

      If you feel like checking my journal, you will see that I am not coming from a rightist perspective. I do think that there are great evils in this world, but I am sceptical of simple solutions, although I do not deny that they might exist.

  134. That's why I hate the term "karma whoring" by angle_slam · · Score: 1
    People on Slashdot sometimes seem reluctant to cut and paste an entire article. Problem is, if you are looking through a Slashdot thread that started last year, the article being referenced may no longer be online.

    I think every article being referred to by Slashdot should be cut and pasted here just for ease of use of historical posts. Otherwise, what is the purpose of keeping historical posts?

  135. This is why we need to support Archive.org by ihummel · · Score: 2, Informative

    archive.org provides an essential service to counteract the short lifespan of the typical webpage. It also allows for permanent links to webpages that might be gone soon. I personally think that academia should either pour money into archive.org or create their own specialized archive for academic websites.

    In the later case, the service would archive sites of scholarly interest on its own and it would have a feature that would allow someone writing an academic paper to request that a particular page be archives. The page that he references in his work would be a http://academicarchive.org page, not the original.

  136. Unique ID? by lowe0 · · Score: 1

    How about some sort of long, unique global identifier for each page? That way, as a page is updated, moved, etc. a search engine could follow it.

    Wouldn't be too hard to slip into the meta tags, and it would allow pages to be followed from host to host, with the latest changes intact.

    A very long hash of the initial contents oughta do it, though then you run the risk of people updating the hash with each version, thereby creating "new" documents.

  137. Maybe it's a good thing. by AnotherBlackHat · · Score: 1

    There is only so much cruft that can be dealt with.
    The ephemeral nature of the web improves it's signal to noise ratio immensely.
    (Not that it's good mind you, just better than it would be otherwise.)

    Research papers that quote web pages may not be very good papers,
    but that doesn't mean that the right answer is a more permanent form of web page.

    It would be bad to write it on tissue paper,
    but that doesn't mean we should get rid of tissues.

    If a paper needs to be less transient than the web page it's citing,
    then the paper's author should contact the web page's author and arrange for a copy.
    If anyone wanted to cite something I wrote,
    I wouldn't mind if they included a copy, and not just a link.
    I doubt I'm the only one who would be willing to do that.

    -- this is not a .sig

  138. Here's what Tim B-L has to say: by Ed+Avis · · Score: 2, Insightful

    Cool URIs don't change

    A bit over-idealistic, but worth aiming towards even if you don't achieve 100% non-URI-breakage in practice.

    I feel that search engines should slightly penalize sites that have a history of breaking links or making them redirect to a completely irrelevant page: partly because there is just less chance that the link you follow from the search engine will have the content you want, and partly because even if you do get to a correct page, its usefulness as a bookmark or a link from your own dcuments is reduced.

    --
    -- Ed Avis ed@membled.com
  139. Re:usenet posting by call+-151 · · Score: 1

    Actually, a great deal of the early Usenet postings were from academic institutions, and in those days, people used their own names for the most part since it seemed reasonable and more dignified (and it didn't occur to most people to have anything else.) So a post from 1988 from "saracoombs@physics.cornell.edu" is easy to match up with a currently exisiting person who was doing graduate work at Cornell then named Sara Coombs, to make up an example. Hopefully she didn't get in flame war about cold fusion or somesuch in those days, perhaps now jeopardizing her chances for a good job or heaven forbid, elected office!

    --
    It's psychosomatic. You need a lobotomy. I'll get a saw.
  140. you can erase usenet postings by Anonymous Coward · · Score: 1, Interesting
    Has anyone ever been fired or denied employment due to the discovery of an ancient usenet post?

    Yes. I personally know of one very senior researcher confronted by a review board with his posts about good places for gay cruising!

    Unless I remove them, I will soon get to deal with the much more fun aspect of, "Dad, what's an acid trip and where did you go when you took them?" from my daughter.

    Yes, you can remove usenet posts from Google Groups.

  141. Weak links? by Hoi+Polloi · · Score: 1

    I always thought the authors of most web pages were the weak links. Ba-dum-dum!

    --
    It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
  142. Re:usenet posting by peter303 · · Score: 1

    Agreed. Most insitutional accounts required your real name and the "rn" package put it into the usenet.

    To be fair google, the biggest usenet archiver, has a procedure for hiding ancient posts. I think it is tedious, especially if you have hundreds or thousands.

  143. Re:Worst Record Keeping: "pseudo-science" by Anonymous Coward · · Score: 0

    while i agree that there is a lot of "crap", out there, any idiot who doesn't check/verify their sources deserves the derision which they will receive when (eventually) someone else tries to check/verify their publication[s].

    what ever happened to good, old-fashioned, primary research (i.e. check the facts for yourself)?

    (btw - I run into a lot of "pseudo-science... crap" in "respectable" publications (ex: 27 empirically derived coef.'s to fit a 4th order relation, &c.) -- the 'net doesn't have a corner on this stuff)

  144. The library at Alexandria? by djeaux · · Score: 1
    Give me a break!

    The ephemerality of web pages creates a situation that is more akin to a house fire that burns up Great-Aunt Gertrude's bundles of love letters from her sweetie in World War I.

    A historian might use that bundle of letters to shed light on some historical question. But you can rest assured that same historian will exhaust every possible traditionally published resource in the process.

    If the "knowledge" exists only at a website & the website disappears, thereby destroying a researcher's work, I'd offer that the researcher & not the archival medium is to blame.

    --
    "Obviously, I'm not an IBM computer any more than I'm an ashtray" (Bob Dylan)
  145. Very annoying, actually by NecroBones · · Score: 1

    I was just thinking about this the other day. As I've gone over my own website looking at old links, I've come to the realization that most of the sites on the 'net that I'd link to often are gone after a year. If I'm writing articles that cite references or locations for additional information, or just including courtesy-links for further reading, the links frequently go dead long before I ever notice.

    The obvious solution is to write web content to be commpletely self-contained, just to save yourself all of the maintenance woes down the line. This, of course, is highly unfortunate, since the "hypertext" nature of the web is one of it's greatest strengths. The lack of any longevity in its content is perhaps it's greatest weakness.

    --
    I have not lost my mind... it's backed up on disk somewhere!
  146. Apahce :: Mod_CVS anyone? by PetoskeyGuy · · Score: 1

    I understand that this article is dealing with scientific papers, but this is just one of many complaints people have with the web, especially those who think of it as the whole internet.

    It started out as a quick and simple stateless file server protocol, and it's good at that. The limitations have caused things like SSL, CGI, cookies to be created to fill in the gaps. Now people want to add versioning too. Sure, why not. Add some new headers in there and have a CVS like service that can backup old data and present the web page at any given point in time.

    We would probably still have link rot because of people deleting their archives to save space, but it would be a step in the right direction.

    The more recent database driven websites and blogs make the problem even more complicated. The server would have to detect when the output for the exact same URL changed.

    Imagine the possibilities though. Add the date to a URL and 10 years from now people could see what the internet used to be like. Thankfully not back to when you could create a single page web site with a gray background and it could be a "Cool Site of the Day". :o)

  147. Re:RTFA... it's about references in scientific pap by Rimbo · · Score: 1
    "Frotz, Q. X and Rumble, M (1991): Dilatory freeble-tropism in the edible polka-dotted starfish, Asterias gigantiferus (L) (Echinodermata, Asteroidea), when treated with radioactive magnesium pemoline. J. f. Krankschaft und Gierschift, 221(6):340-347."


    Not only is the above comment informative and insightful, but this bit is brilliantly funny. I would have run out of clever ideas at about the word "starfish."
  148. Re:Others will cite this & the post as proof.. by Anonymous Coward · · Score: 0

    That's because anti abortion is synonomous with pro life but the term pro choice is not synonomous with pro abortion.

  149. Economics = Web Is Cheaper Than Print by Mybrid · · Score: 1

    Hi!
    Happy Monday! I just wanted to point out the econmics of printing is vastly more expensive than publishing on the web. This means that research previously not available historically because of the printing expense will now be available. This is a good thing I think. However, there will exist the problem of peer review or authentication; both will be onerous as the sheer volume of publication will defy peer review and authentication capabilities. It is definitely worth taking this problem seriously and finding solutions in my opinion.

  150. A need for anchors in the past by 0-9a-f · · Score: 1

    Arguably, the problem runs much deeper, in that our entire basis for civilisation is geared toward lookin forward, rather than back.

    In our culture, we tend to look to the past only for affirmation that we are doing the right thing. So long as we're comforted on that front, our eyes remain firmly in the future. Censorship concerns aside, the Web is the perfect medium for this culture, as the content can vary to suit the times.

    Peer review directs the attention of researchers to the present, and forces them to verify their assumptions by looking to the past. These must be rock-solid anchors to the past, otherwise their research is nothing but a house of cards.

    This goes against the basic premise that propels commercial society forward - namely, that there is nothing to be learned from the past. The Web is ideal for commerce and pop culture, but simply the wrong medium for placing anchors in the past.

    Increasingly, we look like Coyote from Road Runner cartoons - he runs off a cliff, then stops and looks down to oblivion shortly before gravity kicks in.

    --
    With each breath in, a flower somewhere opens; with each breath out, a flower withers away. In between lies beauty.
  151. SYMMANTEC WEB: rdf rdf rdf rdf by Anonymous Coward · · Score: 0

    I love the conspicuous absence of the terms "Symmanted Web" and RDF from this entire discussion.

  152. a solution by Anonymous Coward · · Score: 1, Interesting
  153. Re:RTFA... it's about references in scientific pap by drinkypoo · · Score: 1

    Krankschaft und Gierschift is certainly worth citing from any time. Frankly, who gives a shit if someone makes a reference to something on the internet, and it goes away? I'm thinking that the first thing someone does while reading an article, when they don't recognize a name, is jump to the references right away to see where they were published, and if there's a bunch of "internet" links, they're going to continue skeptically. Meanwhile, there are still journals, and people are still publishing in them. Stuff that wouldn't be written up anywhere is all over the internet, and when it becomes actually worth releasing, it can be published in a journal somewhere, and then you can do an intercollegiate information request, and feel smug.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  154. Re:About the DRM by anubi · · Score: 1
    The last point you bring up about the DRM is the one that concerns me the most.

    What concerns me most is concepts like protecting copyrights/patents of protocols or formats so that any work created using these protocols is rendered inaccessible if the holder of the copyright of the file format used deems so. The part I fume over a lot is not necessarily that one has the rights to enforce protection of something like a protocol, but that businessmen seem to completely lack the foresight into knowing not to stick their heads into nooses that others control.

    For as long as I have been in this business, I have insisted on creating any content I produce in open formats, so that it can be imported into any subsequent editors for viewing/manipulation. It bugs the shit out of me when some manager type wants the whole shebang in "word format", as now I know, not only will this whole thing most likely be readable one one kind of system, its now very likely to have compatibility issues over versioning of the OS, file manipulator, and any "enforced obsolence" the vendor may use his authority to require. I can still read files today that I originally coded for my IMSAI and Commodore-64. It was all ASCII. If I want to go back to that old program I coded for my Commodore-64 to see the equations for how I modeled my varactors for my phase-locked-loops, I am quite free to do so ( they had quite complicated equations to do it right... including calculus derivatives. ).

    If I had locked myself into a proprietary format, I would have been in the same boat as one of my previous employers who had us for years putting drawings into a proprietary-type filebase, only later to have that old filebase drift into unsupported oblivion. I learned a lot from that.

    The whole affair, in my mind, was millions of dollars worth of wasted effort - when one considers not only the salary of all those people we had at the drafting tables entering the data into the system, but on top of that, the loss of the benefit we were supposed to have by doing all this work. It sure taught me that having a file format open and supported by many vendors is critical to long-term usability of anything. I find a sue-happy company out there launching lawsuits against anyone "infringing" on their little proprietary protocols, and I will show you a company I won't touch with a ten foot pole. I haven't the foggiest idea why some businessmen think they can involve themselves with such a company and not get burned.

    --
    "Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]

  155. Why archive "everything"... by mantera · · Score: 1



    All that is required is to archive webpages that are cited in academic or scholarly research. Maybe all that is required is the creation of an academic resource designed in a way to overcome copyright issues and allow an academic or scholar who cites a webpage to submit an online request for its archival at the time of submitting his paper for publication. And then for readers who want to consult a webpage that is cited in an academic or scholarly publication to head to that site where they can either search for the webpage, or the author and/or title of the paper that cited it. Call it citedpages.com, the domain is even available.

    Basically, citation is the requirement for archival. A logo can be awarded to pages that are cited to reward their creators for allowing their pages to be copied and archived, which they may proudly display on their page as a portrayal of credibility of their content, and evidence of scholarly or academic citation in a publication can be used to guard against fraudulent submision.

    Come to think of it, this sounds like a potential business opportunity. Revenue may come from advertising to a vertical market audience. All is well until some university or governmental body create a competing site.

  156. Re:About the DRM by Lodragandraoidh · · Score: 1

    I think you hit the nail on the head.

    The light at the end of the tunnel for this is XML. It is basically a plain text (ascii or utf-8, take your pick) standard which means it shouldn't have compatability issues as you described.

    However (and this is a big however), Microsoft is basing their next-gen file standard on XML, but of course, with proprietary extensions. My ferverent hope is that the XML standard, which is designed to be extensible, is bulletproof enough to withstand Microsoft's 'embrace and extend' IT control paradigm.

    --

    Lodragan Draoidh
    The more you explain it, the more I don't understand it. - Mark Twain
  157. Ethical Justification for Article Duplication? by crashnbur · · Score: 1
    There is no reasonable justification for plagiarism, but perhaps this problem could be fixed by inserting some legal mechanism that allows linked web pages to be copied and duplicated on the web site that used it. This would solve the problem of the disappearing sources, and as long as the link to the original source remains in tact (and as long as that source link remains active), there should be no problems between the two parties.

    Of course, the duplication of information for the sake of human knowledge is too practical. Forces like the RIAA and MPAA would rather fight it to the detriment of all.

  158. no way to run a culture by Un+pobre+guey · · Score: 1
    The average lifespan of a Web page today is 100 days. This is no way to run a culture.

    [Two distinguished elderly gentlemen, sitting at a table in a large, lavish library. Several books are scattered on the table, and each man is poring over a different densely-written tome.
    Wilkins: I say, old chap, whatever does the word "fotzenmoldarischkeit" mean?

    Billingsley: "Fotzenmoldarischkeit?" Good heavens, what a strange word! I must say, I haven't the foggiest notion. Have you looked in the google.com section, near the catalog?

    Wilkins: Yes, I did, and it suggested several articles. Unfortunately, all of the shelves were gone! Ripped quite out of the floor it seems. Not a trace left.

    Billingsley: Oh my, how unfortunate. Are you quite sure you looked at the right shelf?

    Wilkins: [somewhat irritably] I can assure you, Old Boy, that I am quite capable of searching for a book. Here, I believe I still have the reference...

    [Wilkins hands Billingsley a scrap of paper with the call number scrawled on it.]
    Billingsley: [squinting through his reading spectacles] Hmm, yes... I see. "h-t-t-p colon slash slash w-w-w dot blackwell dash synergy dot com slash links slash d-o-i slash ten dot eleven eleven slash fourteen sixty-seven slash ninety-two slash..." What's this? Is this "'a' eight s", "'a' eighty two," or "a-b-s"? Here, here, I do believe you may have looked on the wrong shelf. You do know they are not in linear order, don't you?

    [Wilkins is making a dour scowl]
    Billingsley: The librarians must eliminate shelf fragmentation, Old Boy! "'a' eighty-two may well be on an entirely different floor from "a-b-s!"

    [Wilkins snatches the scrap of paper from Billingsley's hand]
    Wilkins: Confound it! Why don't they put things together by Topic! Why can't one simply browse through the stacks and find what one needs!

    [Wilkin continues to grumble to himself as he ambles off to search for the article]
  159. Ah! yes, the Legendary Library of Alexandria by rssrss · · Score: 1

    The burning of the library of Alexandria is one of the master meme plagues of western civilization. IIRC, Carl Sagan waxed most eloquent about that supposed disaster. Edward Gibbon, to my mind the greatest historian and prosidist the Anglosphere has yet produced, recounts the story in Chapter LI: Conquests By The Arabs -- Part VII of his History of the Decline and Fall of the Roman Empire:
    I should deceive the expectation of the reader, if I passed in silence the fate of the Alexandrian library, as it is described by the learned Abulpharagius.

    The spirit of Amrou was more curious and liberal than that of his brethren, and in his leisure hours, the Arabian chief was pleased with the conversation of John, the last disciple of Ammonius, and who derived the surname of Philoponus from his laborious studies of grammar and philosophy. Emboldened by this familiar intercourse, Philoponus presumed to solicit a gift, inestimable in his opinion, contemptible in that of the Barbarians -- the royal library, which alone, among the spoils of Alexandria, had not been appropriated by the visit and the seal of the conqueror. Amrou was inclined to gratify the wish of the grammarian, but his rigid integrity refused to alienate the minutest object without the consent of the caliph; and the well-known answer of Omar was inspired by the ignorance of a fanatic. "If these writings of the Greeks agree with the book of God, they are useless, and need not be preserved: if they disagree, they are pernicious, and ought to be destroyed." The sentence was executed with blind obedience: the volumes of paper or parchment were distributed to the four thousand baths of the city; and such was their incredible multitude, that six months were barely sufficient for the consumption of this precious fuel.

    Since the "Dynasties" of Abulpharagius have been given to the world in a Latin version, the tale has been repeatedly transcribed; and every scholar, with pious indignation, has deplored the irreparable shipwreck of the learning, the arts, and the genius, of antiquity.

    For my own part, I am strongly tempted to deny both the fact and the consequences. The fact is indeed marvelous. "Read and wonder!" says the historian himself: and the solitary report of a stranger who wrote at the end of six hundred years on the confines of Media, is overbalanced by the silence of two annalist of a more early date, both Christians, both natives of Egypt, and the most ancient of whom, the patriarch Eutychius, has amply described the conquest of Alexandria. The rigid sentence of Omar is repugnant to the sound and orthodox precept of the Mohammedan casuists, they expressly declare, that the religious books of the Jews and Christians, which are acquired by the right of war, should never be committed to the flames; and that the works of profane science, historians or poets, physicians or philosophers, may be lawfully applied to the use of the faithful. A more destructive zeal may perhaps be attributed to the first successors of Mohammed; yet in this instance, the conflagration would have speedily expired in the deficiency of materials.

    I should not recapitulate the disasters of the Alexandrian library, the involuntary flame that was kindled by Caesar in his own defense, or the mischievous bigotry of the Christians, who studied to destroy the monuments of idolatry. But if we gradually descend from the age of the Antonines to that of Theodosius, we shall learn from a chain of contemporary witnesses, that the royal palace and the temple of Serapis no longer contained the four, or the seven, hundred thousand volumes, which had been assembled by the curiosity and magnificence of the Ptolemies. Perhaps the church and seat of the patriarchs might be enriched with a repository of books; but if the ponderous mass of Arian and Monophysite controversy were indeed consumed in the public baths, a philosopher may allow, with a smile, that it was ultimately devoted to the benefit of mankind.

    Now go back and read that quote again. That's right. Edward Gibbon says it did not happen.

    --
    In the land of the blind, the one-eyed man is king.
  160. Re:RTFA... it's about references in scientific pap by bcrowell · · Score: 1
    the papers are published online, the references are URLs, and that an awful lot of them are stale.
    Most people first post the paper to a permanent archive (example), then publish the paper in a traditional-style paper journal (which may also put it online in a proprietary system). If the paper wasn't important enough to post to a permanent archive, and wasn't important enough to pass peer review at a journal, then it probably, er, wasn't very important, and doesn't need to be preserved.

    IMHO the problem is the reverse: too much stuff gets enshrined in journals. The vast majority of articles published in journals never get referenced in any later paper. It's not that it's all wrong, it's that it's utterly insignificant. In the time I was doing physics research, I really only published two papers that I thought were fairly important (most of the rest were basically things where I made some token contribution, and got my name on it). Of those two, I checked recently, and only one seems to have been referenced by a later paper.

  161. The survival of backups by jtheory · · Score: 3, Insightful

    Backups will contain the drafts in the future. Some them will surely survive.

    There's a weird kind of paradox involved in what will survive, though.

    Digital media has that wonderful property that it can be reproduced *perfectly* -- such that the copy is indistinguishable from the original -- but it must be copied or it will die.

    You can burn your vacation videos to CD so your grandkids will be able to see them -- but that CD won't be readable anymore in a decade, never mind a century. If you faithfully make sure they're recopied every once in a while, though (and possibly converted to whatever new video formats are invented), your descendants 500 years hence will be able to see you waving from behind that sandcastle in California, as if it were filmed yesterday. No more flipping through yellowed photographs or crumbling newspaper clippings.... Imagine it! A scientist may use your video to prove his point about how the sunsets on the west coast have improved since California sank into the ocean.

    He has to use family videos, though, because two decades of scientifically-recorded data on weather patters was all wiped out when a massive electromagnetic bomb was set up by terrorists in 2012.

    Yeah, far-fetched example. I don't want to force the point, and definitely lots of stuff will survive... but our progeny won't be making the same kinds of attic discoveries that we can today.

    "Hey, viddy all these ancient discs that Old Grampy Limp Devil had cached away up here! Can you run them? Nothing, huh? Oh, well."

    --
    There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
  162. Re:Books have an ISBN..(but web pages are googled) by Gumshoe · · Score: 1
    That was why Tim Berners-Lee wanted URL to stand for ``Universal'' (not Uniform) Resource Locator.


    Tim Berners-Lee originally wanted it to be called the `Uniform Document Indentifier'[1]. However the IETF thought it was `arrogant' to refer to it as `universal'; the web after all was insignificant at that time, June 1992.

    `Document' was also changed to `Resource' at the same meeting and `Identifier' was changed to `Locator'. The latter, the IETF felt, emphasised the fact that resources can be moved about. Berners-Lee on the other hand, realised that `identifier' emphasised that URI's should be persistant. IMO, it is this third part of the appelation rather than `Universal' that suggests that a URI could be like an ISBN.

    [1] Berners-Lee, "Weaving The Web", p66-67, Orion Business, 1999.
  163. Re:Books have an ISBN..(but web pages are googled) by Gumshoe · · Score: 1

    Of Course

    "Tim Berners-Lee originally wanted it to be called the `Uniform Document Indentifier'"

    should be

    "Tim Berners-Lee originally wanted it to be called the `Univeral Document Indentifier'"

  164. Digital doesn't mean you can't print them by Hittite+Creosote · · Score: 1
    What do you think the chances of your family photos being found in the attic by your descendants in 30 years and them being able to read them, now we're all shooting digital?

    On the one hand, pretty good, considering that I use digital to shoot large numbers of images and then choose the best ones to print. On the other, pretty lousy, as I don't intend on being dead in 30 years, and I don't keep my photo albums in the attic.

    1. Re:Digital doesn't mean you can't print them by RMH101 · · Score: 1

      i know it's doable to archive them, i'm just pointing out that a lot of snaps that people didn't pay much attention to at the time but their kids might like to see probably won't be viewable. besides, how long does your inkjet print out last?

    2. Re:Digital doesn't mean you can't print them by Hittite+Creosote · · Score: 1

      The prints I get from digital are on the same quality paper as you get from having traditional film negatives developed. Indeed, in the UK at least, you can take your data in to have them printed out by the same people who'd you have your film negatives developed by, such as photo shops like Jessops or chemists such as Boots, and given a decent enough digital camera, I'd challenge the average person to be able to tell whether the resulting print came from a digital or a film camera. So I expect them to last pretty much the same length of time. As for the ones I don't get developed - they are ones that I wouldn't have taken with a film camera. With digital, I may take two or three shots of the same thing from slightly different angles, or try to take pictures I wouldn't waste film trying to take otherwise.

  165. Re:Others will cite this & the post as proof.. by Anonymous Coward · · Score: 0

    And your response is synonymous with bullshit.

  166. Re:Books have an ISBN..(but web pages are googled) by WillAdams · · Score: 1

    If I hadn't posted, if I still had karma points, and if there were a mod for ``pedantic'', you'd've gotten my vote ;)

    Sorry, for the error and thanks for the (corrected) correction---it's been a long while since I read the book (need to remember to add a link to it from my web site).

    William

    --
    Sphinx of black quartz, judge my vow.
  167. Re:server gone indefinitely? by mausmalone · · Score: 1

    the pages didn't die... the faculty members did. The content reverts to being property of the estate and is removed from the web entirely. There is no new server location... the content simply ceases to be on the web.

    --
    -=-=-=-=-=
    I'd rather be flamed than ignored.
  168. Re:Books have an ISBN..(but web pages are googled) by cox075 · · Score: 1

    I agree with this comment except for the implication that a "formal hierarchy" would be preferable. Seems obvious, but a little more thought reveals the flaws. Who chooses the hierarchy? You may file something in one place while I might look for it in another. That's why a web is better than a tree. How many times have you lost something in your "hierarchical file system" and had to track it down with grep or find or "Search"? And how many times have you (wished you could spare the time to have) reorganised your stuff and put it in a new hierarchy as your interests and project evolve. I believe that newer "file systems" are abandoning the hierarchical/tree structure altogether in favour of a "search" paradigm. Alta-Vista used to sell something like this.