Slashdot Mirror


On Preservation of Digital Information

Cacl, a PhD student at University of Michigan in their School of Information Divison has written a feature addressing the concerns and problems of preserving digital information. This is an area of study of his - and interesting to read about.

Preservation of Digital Information

Recently there was an Ask Slashdot about the the problem of preserving digital material. The basic idea was that we are creating a massive wealth of digital information, but have no clear plan for preserving it. What happens to all of those poems I write when I try to access them for my grandkids? What about the pictures of my kids I took with that digital camera? Can I still get to them in time to embarrass them in the future?

Obsolescence of digital media can happen in three different ways:

  • Media Decay: Even when magnetic media are kept in dry conditions, away from sunlight and pollution, and hardly ever accesses they will still decay. Electrons will wander over the substrate of the media, causing digital information to become lost. CD-ROMs luckily do not have this same problem with electron loss. They still are sensitive to sunlight and pollution though. Many people mentioned last week that distributors of blank CD media often make claims of an hundred years or more for the duration of their products. Research seems to indicate the truth is closer to 25 years,which seems like a long time, until you consider the factors below. Besides, information professionals often think in terms of centuries rather than decades.
  • Hardware obsolescence: Far more dangerous than the degradation of the actual information container is the loss of machines that can read it. For instance, the Inter-University Consortium of Political and Social Research received a bunch of data on old punch cards. The problem was they had no punch card reader. It took a decent chunk of time, and a good deal of money to eventually be able to read the data off of these cards, even requiring some old technicians to come out of retirement to help tweak the system. Hardware extinction is hardly a foreign topic to Slashdotters. It happens, and as technology increases its pace of change, it will happen more quickly.
  • Software obsolescence: The real stone in the shoe of digital preservation is obsolescence of the software needed to open the digital document. This can include drivers, OSS, or plain old application software. We all have piles of old software that were written for older systems, or come across an old file the bottom of a drawer where we can't even remember what application it used.

There are several strategies for preserving digital information. People mentioned some last week:

  • Transmogrification: printing the digital document into an analog form and preserving the analog copy. An example would be printing out a Web page and archiving the print of that Web page. This, obviously, takes out the main strength of a Web document, hyperactivity, and may also ignore important color and graphical content. An alternative form of this is the creation of hardcopy binary that could later be data entered into the computers of the future. The media suggested have ranged from acid free paper to stainless steel disks etched with the binary code. The two major problems with this idea are that any misrepresentation of the binary could have disastrous results for the renewal of the document, and transformation to hard copy limits the functionality of many types of digital documents to the point of uselessness.
  • Hardware museums: preserving the necessary technology needed to run the outdated software. There are several weaknesses to this plan. Even hardware that is carefully maintained breaks and becomes un-usable. In addition, there is no clear established agency that will be responsible for maintaining these machines. Spare parts eventually become impossible to find and legacy skills are required for maintenance. There must be technicians with the requisite skills to service these preserved machines. Finally, it does not create efficient use if all possible future users must bottleneck to just a handful of viewing sites to have access to the information.
  • Standards: reliance on industry-wide standardization of formats to prevent obsolescence. Market place pressures for software produces create an incentive for a company to differentiate their product from their competitors. While unrealistic in a capitalistic marketplace, standards such as SGML have proven successful for large scale digital document repositories, like the Making of America archive hosted by the University of Michigan. However, many of these large repositories also receive information from donors that is not in a standardized format, and do not feel comfortable turning away those documents.
  • Refreshing: moving a digital object from one medium to another. For instance, transferring information on a floppy disk to a CD-ROM. This definitely seemed to be the preferred method of most Slashdotters. While this takes care of degradation and obsolescence of the media, it does not solve the problem of software obsolescence. A perfectly readable copy of a digital document is useless if there is not software program available to translate it into human-readable form.
  • Migration: moving the digital document into newer formats. An example might be taking a Word 95 document and saving it as a Word 97 document. Single generation leaps are usually not a problem, so large volumes of information could be saved. Unfortunately, migrations over several generations are often impossible, as is migrating from a document type that was abandoned, and did not evolve. Also, information loss is common in migration, and may cause the document to become unreadable. While this may be the best single method available, it is very labor intensive, and some knowledge of the nature of documents would be essential to determining which information containers to migrate. For instance, often you lose aspects of a document (good and bad) when you migrate it, but which of those aspects are important?
  • Emulation: creating a program that will fake the original behavior of the environment in which the digital object resided. This is another very intriguing method that could be used. It's actually already pretty common. For instance, most processor chips include emulators for lower level processors. There also aleady exists on the Internet a very active group of people who are interested in emulating old computer platforms. Still, we need to do a lot of research yet on the cost of this method, and what sorts of metadata are necessary to bundle with the digital object to facilitate its eventual emulation. Another problem is the intellectual property hassle caused by emulation. Reverse engineering is a big no no, and there is no point in making the lawyers rich. This area is actually where Open Source can be of biggest help to preserving the longevity of different kinds of applications.

Many people in the discussion last week seemed to believe that simple refreshment or migration of the data would be a sufficient answer to the problem. At a personal level that may be true, but for anyone responsible for large amounts of digital information, neither is a completely convincing method. Here are a couple of reasons why:

  • Not all documents are the same- In the digital preservation literature, most people talk as if all digital information is in ASCII format. Au contraire. As computing becomes increasingly robust, so do the documents we create. Multimedia games, three dimensional engineering models, recorded speeches, linked spreadsheets, virtual museum exhibits and a host of other documents spurred by the development of the Web have cropped up. How are they going to be affected by migration to a new environment?
  • It's so darned expensive- It's a little gauche to talk about, but the Y2K bug caused what ended up being a huge migration of digital information. How much did the US alone spend on that fiasco? $8 billion? For smaller organization who do not prepare for the preservation of their digital information, the cost of emergency migrations could cause all sorts of budget trouble.

There is some belief that there is no reason to preserve information at all. Most of what is created is just tripe anyway, and we should be more focused on creating content than preserving it. There are two reasons why some sort of preservation is important. First of all, it is inefficient to recreate information that already exists. Human energy is better spent on building upon existing knowledge to create new wisdom. How much do we already spin our wheels as several people collect the same data? What more could we be doing if we spent the energy instead on new pursuits? Secondly, there is some data that is irreplacable.

Which is not to say that we should keep everything. In a traditional archive, only 1% of documents received are kept. Ninety nine out of one hundred documents are destroyed for various reasons. A similar ratio is not unreasonable for digital documents. Consider that 16 billion email messages are sent each day. It seems ridiculous to keep all of them, but how do we weed out the ones we do want to keep? Appraisal of digital documents for archival purposes is going to become a major issue in the not distant future. There are already examples of data that have been lost, or nearly lost. NASA lost a ton of data off of decayed tapes. The U.S. Census nearly lost the majority of the data from the 1960 census. These huge datasets are important for establishing a scientific record that reveals longitudinal effects.

Increasingly, the record of the human experience is kept in a digital format. The act of preserving that information is the act of creating the future's past, the literal reshaping of our world in the eyes of the future. Nobody knows the best answer yet. There is probably not a single answer that will fit absolutely all situations. Information professionals are just beginning to do research in the form of user testing, cost-benefit analysis and modeling to answer some of the thornier issues raised by the preservation of digital information. There are things out there worth saving, we just need to figure out the best way to do it.

Some links of interest in case you would like to read more:

  • a really good bibliography of related sources by Michael Day
  • an article by Jeffrey Rothenberg outlining some of the issues
  • a site at Leeds University with many related links

199 comments

  1. Ok ... by Bad+Mojo · · Score: 1

    Most content could be published in book format. I know books are so ... old-wave, but they work pretty well.


    Bad Mojo

    --
    Bad Mojo
    "If you can't win by reason, go for volume." -- Calvin
    1. Re:Ok ... by gfxguy · · Score: 1
      But you missed part of the argument:
      • It defeats the point of having a digital archive, which takes much less space than printed form, and is easier to use (searching, for example).

      • It doesn't work with other forms of media:
        • Music
        • Video
        • Any binary format: images (which could be printed, but that's even worse than printed text), DEM data, cat scans and MRI, etc.
      In fact, in my research into archiving (I work in the 3D department of a video production company), it turns out that the cheap CD's only last about 10 years. But we decided (which coincides with a lot of what this was saying) that any 10 year old data would be obsolete for our purposes, and CD's would be good enough.

      The project files in Alias|Wavefront, Maya, and Softimage formats, as well as other miscellaneous formats, are just not suited to printing. Even if they were, even if you printed out ASCII versions of Maya files, for example, imagine what you'd have to do to get it back in the computer to reuse the project!


      ----------

      --
      Stupid sexy Flanders.
    2. Re:Ok ... by grouchomarxist · · Score: 1

      Some of the problems with books/paper besides those mentioned by gfxguy: 1. Paper burns. A major part of the WW II personnel archives was lost due to a fire. 2. Other natural disasters. Floods, storms, etc. all affect books and paper. 3. Since books and papers take up space people are more likely to put them in storage. Often they are stored in sub-optimal conditions which can cause the data to be lost. Material stored in basements can grow mildewey, rot, eaten by bugs or get water damaged. Of course, computers can also get damaged, but once something is in digital format it can be easier to back up, duplicate and distribute.

  2. 99%?? by Sakhmet · · Score: 2
    I should think that a 99% destruction rate is awful! Kind of defeats the purpose of an "archive" doesn't it?

    With digital documents, there's no real reason not to save all of it, even if much of it is "tripe".

    Information is information, whether or not we find it useful. Some day, someone else might find our tripe is a goldmine of information, if only for anthropological study.

    Sakhmet.
    (The REAL McCoy)


    "The surest way to corrupt a youth is to instruct him to hold in higher esteem those who think alike than those who think differently."

    --
    Ban the Nukes! Save the Whales! Screw it. Nuke the Whales!
    1. Re:99%?? by Zurk · · Score: 1

      unfortunately its impossible and worthless to preserve everything. data grows exponentially...i used to backup to paper and audio tape (300bps).moving slowly to floppy disk (360K) to higher density floppies (1.44MB) to hard disks (40MB seagates) to tape again (250MB HP colorado digital) to CDRs (650MB sony). im probably going to go DVD next. the point is that the number of audio tapes i used to have is now the same as the number of CDROMs i have ..even though the data storage capacity is exponentially higher, the data GROWS exponentially to fill available capacity. and i throw out most of the data i create...and it still GROWS.

    2. Re:99%?? by Matty+Boy · · Score: 1

      Some day, someone else might find our tripe is a goldmine of information, if only for anthropological study.

      But just think, in hundreds of year's time someone might come across Microsoft's marketing literature and think its actually true!!

      DOWN WITH ARCHIVING!!!

    3. Re:99%?? by Anonymous Coward · · Score: 0
      It does kind of wrench my gut to hear that 99% figure, too. But "Information is information" just ain't true. Sometimes it's just data. (Like when the senate doesn't like the house's budget, and the government reprints a thousand page document with every word struck through to show what they didn't like.)

      Lord, grant me the serenity to reject the data I do not need
      The courage to use the information I can find
      And the wisdom to know the difference.

  3. those AOL CD's by BrentRJones · · Score: 4

    I'm keeping all those old AOL CD-ROMs. Some software archaelogist will need them to see what Internet pioneers struggled with.

    --
    Help end the use of Sigs. Tomorrow
    1. Re:those AOL CD's by MupwI · · Score: 2

      My worry is that at some distant point in the future, all our paper will have rotted away and people think that the CDs were our primary means of communication, like Babylonian pottery shards..

      "February 24, 6423:Archeologists have discovered evidence that ancient humans worshipped a God called 'McDonald,' whose temples were signified by golden arches..."

      --
      -- Bah weep grah nah weep nini bong
    2. Re:those AOL CD's by Shadowlion · · Score: 1

      I doubt it (although I did get a chuckle out of the McDonalds bit).

      Assuming the future archaeologists uncover/engineer a way of reading our digital formats (and that assumes, of course, our digital formats - like CDs - exist in any number in several thousand years), they'll easily uncover evidence of how we communicated. Think about it - how many references are there to "printing," paper, books, television, movies, etc., in common use today? In my email archives, I probably have hundreds of referencings to printing things out or watching TV.

      Further, there will be documents inevitably left around. Look at such thing as the Dead Sea scrolls, which survived many thousands of years. If anything, they'll simply have a misunderstood idea of what we committed to paper (since "important" historical documents like the Constitution were written, and everyday crap like the specs on my desk will no doubt be destroyed, they may simply consider that paper was reserved only for important things).

      Just an observation.

    3. Re:those AOL CD's by um...+Lucas · · Score: 3

      My worry is that at some distant point in the future, all our paper will have rotted away and people think that the CDs were our primary means of communication, like Babylonian pottery shards..

      Actually no. Paper has proven to be one of our most durable ways of storing data. Egyptian papyrus from 3000 years ago is still more or less intact. CD's on the otherhand, will last for a 100 years in a "BEST CASE" scenario. Most will last much less time. CDRs might last 25 years. There are other variables besides media itself. I've seen several CD's from the early 80's that refuse to play these days. They're not at all scratched, but the theory goes that the original ink they used to print on them actually was a bit corrosive over a great span of time.

      In order to remain readable, digital data must remain more or less intact. A few missing bits in the application needed to open a file can pretty much reduce your odds of opening that file by 100%. Analog data, on the other hand, degrades much more gracefully... It may start to fade, but there's no intermediary between having the data and being able to read it (you don't need an extra "application" to read a newspaper).

      I've heard that this is actually going to be one of the least documented periods in human history, because all of our data is stored digitally and periodically purged. Even if it's not, places like NASA are generating data faster than they're able to back it up and move their old archives onto newer media.

    4. Re:those AOL CD's by karb · · Score: 3
      Egyptian papyrus from 3000 years ago is still more or less intact.

      I read an article once in that hotbed of liberal thinking, readers digest, about book deterioration. Older books were printed with a different method, and will last a couple hundred years. Newer books will only last maybe 50 years.

      This begs the question : how long will computer printouts last?

      --

      Jack Valenti and the MPAA are to technology as the Boston strangler is to the woman home alone

    5. Re:those AOL CD's by um...+Lucas · · Score: 2

      If you want to last, you can make it last. Acid free paper is available, and it does a great job at preserving documents. But today, so much of our information becomes out-dated, there seems to be little point in preserving what we know no longer applies. Old science books, etc...

      Laser prints, i'd guess, will be fairly durable... Inkjets prints, probably not... that's just a pure guess, though

    6. Re:those AOL CD's by MupwI · · Score: 1

      CD's on the otherhand, will last for a 100 years in a "BEST CASE" scenario. Most will last much less time. CDRs might last 25 years.

      That was pretty much my point, but I probably should have been clearer...the data won't be readable by anyone, the archaeologist will just keep digging up millions of these shiny discs, like they do with pottery shards, and have to come up with a theory to explain them :)

      --
      -- Bah weep grah nah weep nini bong
    7. Re:those AOL CD's by RomulusNR · · Score: 1

      Older books were printed with a different method, and will last a couple hundred years. Newer books will only last maybe 50 years.

      Regardless, there are still numerous books that are in danger of being lost forever, so rare that museums that own them keep them in climate-controlled rooms where no human may tread, for fear of destroying the books before some sure-fire method is devised to recover them.

      These books are practically melting into dust at the edges.

      Not all civs had nice, strong papyrus, especially after Rome fell, and especially when we found we could cut overhead by using elm pulp.

      that hotbed of liberal thinking, Reader's Digest

      Just checking -- that was sarcasm, right?

      --
      Terrorists can attack freedom, but only Congress can destroy it.
    8. Re:those AOL CD's by karb · · Score: 1
      that hotbed of liberal thinking, Reader's Digest

      Just checking -- that was sarcasm, right?

      As a man far wiser than me once said, "You bet your asteroids, kid."

      --

      Jack Valenti and the MPAA are to technology as the Boston strangler is to the woman home alone

  4. Limited problem, if... by MPolo · · Score: 1

    I don't think there's too much trouble with losing games and other applications as the hardware that runs them obsoleces... New ones will be created, and the best of the old will be ported.

    As to the data already archived on various media, there could indeed be a problem if people fail to move the data to newer media... Think of your pile of 5 1/4" disks that's just rotting in the corner because your new computer only has a 3 1/2" drive -- and that's not even a huge leap in technology.

    There's also the question of formats, especially for users of M$. After two revisions of the software, it can't read any of the old data! Try reading a Word 6 document in Word 97 for laughs, especially if you use any special characters ü á € in your documents...

    1. Re:Limited problem, if... by ShawnMcCool · · Score: 1

      Tears come to my eyes when i think that my children(if i chose to allow them life in the first place) will not be able to play Doom.... *Sigh*

    2. Re:Limited problem, if... by bonzoesc · · Score: 1
      I don't think there's too much trouble with losing games and other applications as the hardware that runs them obsoleces... New ones will be created, and the best of the old will be ported.
      The best of the old DO NOT always get ported. If you ever played an old DOS game like mazemakr (generated radial mazes based on parameters including radials and concentrics), jumpjoe (help janitor joe defeat the evil robots), and Commander Keen (like the Super Mario Bros., and made by the same people behind Wolf3D - if you've never played this...) and have not seen them ported to linux or windows, you probably find fault with this argument. Now, Window$ 9x and Linux can run DOS programs, but NT can't, and linux can't very well. Also, new hardware, even ones that emulate older versions, like SB Live, doesn't always support or have the nuances that the programmers of the original supported.
      Fortunately, to combat the problem of old games becoming unusable, most people have short memories of games whenever new and better ones are released. Some people, who were the biggest fans of jumpjoe, now are completely enthralled with C&C:TS (which ranks somewhere between TA and StarCraft) and can't even remember Jumpjoe. Sucks to them, mostly for liking TS. However, this is the reason that nobody really cares that all of these games came on 5.25 inch disks that are now unreadable.
      You can compare this problem to that of old texts describing primitive society and their gradual decline in usability. You can recover them to a newer format, but even then, the majority of society couldn't possibly give a rat's ass.

      "Assume the worst about people, and you'll generally be correct"

  5. more info by meighan · · Score: 3
    If interested, there is a report from '96 which offers some more information on the subject. From the Task Force on Archiving of Digital Information, here

    --

    --

    --
    It is no measure of health to be well adjusted to a profoundly sick society.

    1. Re:more info by Gerald · · Score: 2
      There is also a page at the University of Missouri that talks about media lifespans:

      "Computer reel tapes, VCR tapes, and audio tapes last about as long as a Chevy or a poodle."

    2. Re:more info by Tuscahoma · · Score: 2

      There is an article in the Sept. 98 issue of Wired that addresses the same issues on a personal level; the writer looks for a medium to preserve a dying friend's voice and poetry.

  6. BBC article by Nafta · · Score: 3

    BBC currently has an article on the same subject. This a great advantage of Open Source (preaching to the converted, I know) because that is the only open standard (and therefore durable) format. All other proprietary formats will come and go with the companies that make them.

    1. Re:BBC article by coaxial · · Score: 2
      This a great advantage of Open Source (preaching to the converted, I know) because that is the only open standard (and therefore durable) format. All other proprietary formats will come and go with the companies that make them.

      What? I'm sorry but open source IS NOT a magic bullet. There's two problems with your statment; and ironically they work in opposite directions.

      1. Just because a new dataformat comes around, doesn't mean people will use it. Look at PNG usage versus GIF usage.
      2. Say your format gets adopted. Now say a new format comes along that can handle something ultra-cool that the previous data format simply can't do. Why would people want to stay with the obsolete standard?
    2. Re:BBC article by Nafta · · Score: 1
      open source IS NOT a magic bullet.

      True, but it is a good shot.

      The alternative is something stored in a proprietary format. That format would die with the company. I.e., the data becomes impossible to access (as opposed to just difficult).
  7. magnetic storage by Signal+11 · · Score: 4
    Unfortunately the solutions we've employed up until recently are fatally flawed - they all use magnetic storage. The problem is that the higher the density, the sooner "bit rot" occurs - those magnetized iron oxide particles work against each other to depolarize. After several years (or several dozen, depending on the media) the data's unsalvagable. That's problem #1.

    The solution would be to use an optical storage media, but as others have pointed out, CDR storage has a life expectancy of 75-100 years depending on the brand. Which wouldn't be too bad except you have to realize that in 100 years you need to start putting resources into copying all that data off and re-writing it again. After awhile you'll have a snowball effect where you spend more time writing the old data than the new!

    What we really need is a piece of technology that doesn't age - an entirely self-contained computer (nuclear powered, maybe?) that has the media, the reading/writing mechanisms and has several failsafe mechanisms to alert you well before any data is lost. Think of it as a computer time capsule - you bury it and in 500 years come back and it has all the human interface necessary to reproduce the data in a usable format. Of course, you'll still need someone who reads English then..

    agh, the problems, the problems....

    1. Re:magnetic storage by raygundan · · Score: 2

      Assuming media continues to get bigger, the snowball effect is mitigated significantly. If, in 10 years, I take the mp3's (on CDROM) from my entire music collection and move it to some new super-high-density media, it will probably fit on a single disk. Thus, next time around, copying all the older stuff requires me to copy only one disk. Every 10-20 years, when I have to re-archive everything, I will have only a TINY fraction of data from the previous cycle, because it will be so small compared to the new data.

    2. Re:magnetic storage by Anonymous Coward · · Score: 1

      It's very hard to cheat the laws of thermodynamics. Things tend towards entropy. The closest the universe has come to escaping it is life (on a long term scale). Even that information mutates and is corrupted over time. I'm not sure we can save a perfect bit copy of everything, but we can carry on the legacy in some form. - Darwin

    3. Re:magnetic storage by alleria · · Score: 1

      Hmm ... I've come to notice that paper seems to be remarkably durable over the long run! ;) BTW, I'm just wondering: what's the life expectancy of microfilm and microfiche? (sp)

    4. Re:magnetic storage by Anonymous Coward · · Score: 0

      I think the new Ultra-Wide clay tablets are coming out this summer...

    5. Re:magnetic storage by SEWilco · · Score: 1

      That reminds me, does anyone have an undistorted copy of tape 1 of the Feynman Physics lectures, part of "Six Easy Pieces"? The published tape apologizes for the distortion on the original tape from 1961. It's not only high-density digital media that is having problems...

    6. Re:magnetic storage by hotseat · · Score: 1
      Assuming media continues to get bigger, the snowball effect is mitigated significantly.

      That's the principle that the Leeds University archiver works on. As the tapes are continuously getting bigger, if the system is set up semi-automatically, data can be continually transferred to current media without significant time expenditure. The ISS reckon this can continue pretty much indefinitely.

      --
      Tom Harris
      http://www.harris.ukgateway.net

  8. Is this likely? by luckykaa · · Score: 1

    A perfectly readable copy of a digital document is useless if there is not software program available to translate it into human-readable form.

    Is there an example of a computer system that doesn't exist anymore, and can't be emulated at a much greater speed than the origional using existing software? Even most arcade machines can be emulated these days

    1. Re:Is this likely? by thunderbee · · Score: 1

      I had an apple//. There are emulators. But you need an apple// to read the floppies. As a matter of fact, they are still readable (I tried recently). But when my old apple finaly dies, nothing will read the disks. If they have not been made into disk images suitable for the emulator, content will be lost.

      --
      In my opinion, Scientology is a cult you should avoid.
    2. Re:Is this likely? by dierdorf · · Score: 4
      Gee, I just happen to have a bunch of steel-ribbon tapes from a Univac I. Maybe that information is vital to civilization as we know it. Do you really think those tapes can be read today without tremendous expenditure of time and effort?

      BTW, I think the original author missed one future problem - encrypted information. I foresee hardware-based encryption becoming almost ubiquitous so that most data is encrypted. If encryption becomes universal, then much info will be encrypted that really wasn't burn-before-reading secret. What happens to all that information - of potential interest to historians looking back on the 21st century - under those conditions?

      --
      -- John Dierdorf, Austin TX
    3. Re:Is this likely? by luckykaa · · Score: 1

      Do you really think those tapes can be read today without tremendous expenditure of time and effort?

      Well, I guess I should have quoted the whole line, but I was referring to the comment about transferring it from one format to a less obselete format. And the problem was that you would still need software compatibility. If the steel ribbon tapes WERE transferred to various forms of disk through the ages, then reading the bits wouldn't be hard. However if the data was compressed in a weird way, then no they couldn't be read easily without some form of software emulation.

    4. Re:Is this likely? by Psycho+S.+Illusion · · Score: 1

      Ok, so this is a dumb question I'm sure but...
      If the medium is readable, then wouldn't it be possible to put the disks in any drive capable of translating the medium(*) into 1s and 0s, and then read off the raw 1s and 0s to be reconstructed by another program later?

      I mean, if the 1/0s are still there, then SOMETHING can read them, it requires "only the will to do so." If the data was valuable enough, then time/$ could be spent to make it happen.

      (*) The obvious problem - a device to read the medium. Still, as long as information about how these devices worked survives, a new reader could be built (perhaps 50-100 yrs in the future) that conforms to the old specs...

      Maybe I'm being a little too optimistic here...why doesn't that idea work?

    5. Re:Is this likely? by morzel · · Score: 1
      BTW, I think the original author missed one future problem - encrypted information.

      Quantum computing, DNA computing,... which method would you prefer?

      Cracking encryption could become unfeasible with quantul computing as well, provided we jack up the keylength a bit more...

      Or would distributed.net have those quantum CPU idle times working on the case by then? Oh my ;-)


      Okay... I'll do the stupid things first, then you shy people follow.

      --
      Okay... I'll do the stupid things first, then you shy people follow.
      [Zappa]
    6. Re:Is this likely? by Anonymous Coward · · Score: 0

      It sure is likely. The disintegration or disappearance of media formats happens all the time. For a wide-ranging compendium of information on this topic, see the dead media page:

      The Dead Media Project

      "An ad hoc database of the deceased, the slowly-rotting, the undead, and the never-lived media."

    7. Re:Is this likely? by Anonymous Coward · · Score: 0

      The point he is making is, Apple used a "soft" sectoring to increase performance (and allow copy protection), unlike the hard sectoring of the time. The only machine that would be able to read it would be a bit-by-bit machine that wasn't thrown off by Apple's sectoring. Please note, this doesn't address the various "copy-protection" schemes which used 1/4 stepping of the drive heads to prevent "copying" of the programs. I would assume that any non-Apple hardware wouldn't be able to accomplish similar feats.

    8. Re:Is this likely? by Pxtl · · Score: 1

      Hehehe, you're nick just reminded me of Brin's Uplift stuff, and how pertinent that is to this argument. The Library Institute is a good example of how important it is to preserve our data, as you never know when it will be important. Today, tomorrow, or a thousand years from now, it'll come in handy. Those of you that haven't the slightest what I'm talkind about, read the books. They're keen.

  9. My philosophy is ... by Cyclope · · Score: 1

    If something is of value and needs to be preserved, it will be preserved somehow (book, updating to a new software or whatever).

    If a piece of information has not been preserved and is now unaccessible, it probably means that it was of minimal value anyway.

    That's probably not the greatest way to look at this but I'm thinking that half of all the info that's presently out there is useless anyway and is just taking up space for nothing. Maybe it's a good thing that these will be lost with time. It's kind of like a good spring cleaning.


    *******************************
    This is where I should write something
    intelligent or funny but since I'm

    1. Re:My philosophy is ... by Ioldanach · · Score: 1

      If a piece of information has not been preserved and is now unaccessible, it probably means that it was of minimal value anyway

      You didn't read the whole article, did you? Or perhaps the 1960 census resulted in information of "minimal value" that we didn't need lying around anyways? This is data that cannot be recreated, and is irrevocably lost.

    2. Re:My philosophy is ... by plague3106 · · Score: 1

      Or perhaps the 1960 census resulted in information of "minimal value" that we didn't need lying around anyways? This is data that cannot be recreated, and is irrevocably lost.

      Well census info from Ancient Greece was lost and cannot be recreated. This is nothing new, we've been losing information for centuries. What about all those really old books that have rotted and will never be read again. Things that have survived have also been mangled to an extent. Do you really think the bible of today is unmodified from when it was first compiled? Kings have rewritten it, copying errors were made, and who's to say that some pages didn't go missing one century. We have and will continue to lose information b/c of technology; there simply is no way to preserve it all. Perhaps this is why we are doomed to repeat history...

    3. Re:My philosophy is ... by Anonymous Coward · · Score: 0

      >Do you really think the bible of today is unmodified from when it was first compiled? Kings have rewritten it, copying errors were made, and who's to say that some pages didn't go missing one century.

      Actually, it's a little known fact that circa 1189, the disclaimer at the beginning of the Bible was lost because several pages were stuck together while the copy was being transcribed.

      believe it or not...

    4. Re:My philosophy is ... by Anonymous Coward · · Score: 0

      the Koran, however, has survived intact, in its original form, to the present day. The words of Allah are quite clear, and provide the correct way for His people to live, and make equally clear the punishments to be meted out to the enemies of Allah.

    5. Re:My philosophy is ... by plague3106 · · Score: 1

      People are people, you cannot change that. They make mistakes in translation, slip in stuff they feel is right, remove bits they disagree with. You'd have a hard time proving your point.

  10. VA / Slash-dot Giveaway by VA+Linux+Systems · · Score: 0
    As promised, VA Linux Systems will for a limited time be offering special deals on hot VA Linux computers to Slash-dot readers.

    To kick off the promotional offers, we're having a contest drawing on March 1st. The winner will receive a VA Linux Systems StartX SP Workstation with a blazing 400MHz Intel(TM) Celeron© processer, (approx $908.00 value)!

    Five second place winners will receive a Linux / Slash-dot gift pack, including a "Debian GNU/Linux Box Set" and "Slash-dot" t-shirt (as seen on Copyleft.net), an estimated $40 value.

    Remember, this contest is only open to registered Slash-dot users. Look below for instructions on how to enter.

    In other news:

    • Slash-dot will most likely be "revamped" with a new look and feel before the end of the year. A series of polls will allow registered Slash-dot users to vote for the best-loved features.
    • Rob Malda, also known as Commander Taco, will be writing for a new column on the VA Linux web site where prominent figures in the Open - Source / Linux Community will bring you the latest news and insights on this hot new technlology. Our premier issue will feature an interview with Ian Murdock, creator of the popular Debian Linux distribution.

    I must apologize for referring to Mr. Malda as "Captain Taco" in previous statements. I received over a dozen letters from Slash-dotters like yourselves informing me of my mistake, which brings me to this point: I encourage you to let me know your opinions (and correct me if I misspeak). Within a week a special e-mail address will be set up for this purpose. Only together can we make VA / Andover.net successful. Each and every one of you is part of the team.

    Please look for my new weekly newsletter, starting on February 29th!



    Sincerely,

    Larry M. Augustin
    President, Chief Executive Officer and Director
    VA Linux Systems

    ***"VA / Slash-dot Giveaway" Contest Instructions and Rules

    How to enter: The "VA / Slash-dot Giveaway" contest (hereafter referred to as the Contest) is open to all registered Slash-dot users. To enter, send one e-mail to "service@valinux.com" with this text exactly in the subject (without the quotes): "SLASHDOT GIVEAWAY". The first line of the message body must be your registered Slash-dot username. Notification of winnings will be sent the e-mail address on file in your Slash-dot user profile. You will not receive a confirmation e-mail when you enter. Please do not send multilple entries, as they will be discarded, and e-mail abuse ("spamming") may be grounds from Contest disqualification and/or removal of your ID from Slash-dot.

    Prize drawing: Winners will be drawn from all e-mails received up until the cutoff date of 1 March 2000 at 00:00UTC. Winners are randomly chosen using HotPicker(TM) software. Winners will be notified of their status by 5 March 2000 by e-mail containing a confirmation claim number. Prizes must be claimed by 31 March 2000.

    Prizes: There is one (1) "First place" prize consisting of one (1) "VA Linux Systems StartX SP Linux Workstation" with 400MHZ Intel Celeron processor, 64MB RAM, 6.4GB hard drive, and the VA Linux OS v.6.0 Software Kit. A 17" monitor, keyboard, and mouse are included. Five (5) "Second place" winners will receive a "Linux / Slash-dot gift pack" containing: one (1) Debian GNU / Linux software box set and one (1) Copyleft "Slash-dot" t-shirt. Estimated value of "First place" prize is $908.00**. Estimated value of "Second place" prize is $40.00**.

    Disclaimer: VA Linux Systems assumes no liability for e-mail contest entries not received. The Contest is not open to employees of VA Linux Systems and Andover.net, or their immediate relatives. VA Linux Systems reserves the right to reward alternate prizes of equal or greater value, defined by the value estimate stated above.

    ** All values are in US dollars and do not include state tax and shipping charges.

    1. Re:VA / Slash-dot Giveaway by Anonymous Coward · · Score: 0

      This is a joke, right? Please tell me it's a joke.

  11. A great challenge by AjR · · Score: 3

    balancing the endless churning of the web against the need for a stable archive.

    Unless we take steps to archive, transcribe and preserve all this information (yes, grits, petrification et al) then we are in effect building a new Library of Alexandria.

    It would be the greatest loss ever for archaeologists of the future to be unable to access archives of the WWW. Every day is a unique snapshot of the world as the endless churning of webpage updates/dead link removals changes the WorldWideWeb.

    This information Ocean is something unique. Archiving such a huge store of information generates a challenge in itself.

    I don't often wax lyrical about the internet but it is in effect becoming a snapshot of our civilisation.

    What a loss for future generations if they cannot see the views of ordinary human beings (through the endless websites) preserved.

    --
    ...Upgrade now to Schrodingers Dog...
    1. Re:A great challenge by pakratt · · Score: 2

      If you want to preserve the contents of the web for future generations (research, entertainment, whatever) then a huge, high power antenna should just broadcast non-stop internet.

      This would serve two purposes:
      1) Extra-terrestrial beings (assuming they have the technology and could decode it) could have a window into life on earth.
      2) Whenever mankind figures out how to make wormholes or travel faster than light they could simply warp out to whenever they want info from and recover that day's web broadcast.

      Altogether, not a bad idea, huh?

    2. Re:A great challenge by Ig0r · · Score: 1

      The main problem with your plan to "just broadcast non-stop internet" is that there just isn't enough electromagnetic bandwidth to spread around.
      How would you choose which section of which page of which site gets transmitted this second, and the next?
      I think you're trying to bottle the ocean, and it just isn't feasable.

      --
      Soma: because a gramme is better than a damn.
  12. Old issue (at least in sci-fi) by dillon_rinker · · Score: 3

    IIRC, Orson Scott Card addressed this issue in a story set in Isaac Asimpv's universe. The library on Trantor had indices of going back thousands of years, but the contents of the library had never been refreshed. The librarians knew exactly what they had lost.

  13. Re:VA / Slash-dot [TROLL] by Anonymous Coward · · Score: 0

    please moderate this idiot down.

  14. Data Decay, Readability, and ASCII text. by inquis · · Score: 3

    When you look back at history, and you look back at documents that are a "mere" thousand years old, the wealth of information in these documents makes you wonder what could be found if all the documents from that time had survived. Just because the format is digital, rather than analog or (eek!) paper, does not mean that this media is impervious to decay.

    However, I think that decay is much, much more serious in digital media. The root of the problem is that if you are looking at physical document with water damage, even though the original "packets" of information (letters and words) are damaged, the human brain can sometimes extract meaning from smearing ink and crumbling paper. When an electron wanders on magnetic media or when a CD begins to decompose, that bit is lost forever. Digital media is much more sucepitble to lapsing into unintelligibility than physical media like paper.

    Preservation in a media that will not become obselete is the key. As mundane as it may sound, plain ASCII text will probably never become obselete because there is no real reason to come up with a new standard. Some people may scream at me: "*ML! *ML!", but at the rate that these things will obescelece, plain text will still be around when XSGHTML has been long dead.

    Just a thought. If you have something to add, feel free to respond.

    Brandon Nuttall, the inquisitor of Reinke

    1. Re:Data Decay, Readability, and ASCII text. by Steve+Burnap · · Score: 2
      As mundane as it may sound, plain ASCII text will probably never become obselete because there is no real reason to come up with a new standard.

      Someone who speaks a language that doesn't use the basic roman character set may beg to differ. There are very real reasons to consider moving to something like Unicode.

    2. Re:Data Decay, Readability, and ASCII text. by clintp · · Score: 2
      Agreed. I think the original author missed some points, which a lot of people do, by not looking at storage in a historical perspective. If you take the long view, analog storage on paper is a very reliable and proven technology for preserving data for the long haul.

      We've recovered data from thousands of years ago on crubled bits of paper that are still quite legible despite the decay, and that paper was a new technology for some civilizations then.

      [Of course, a better argument can be made for simply using clay tablets and inscriptions in stone. We've recovered carvings MUCH older than anything that's been found on paper. But you have to draw the line for convenience somewhere. ]

      Any modern technology you're relying on is bound to be inadequate. Think about this: every technology for information storage invented in the last 200 years has failed for long-term use. A thousand years from now they'll look back on the 1880-2000 as a series of dark ages. The only thing that will remain are the paper records. Even paper that's badly treated remains ledgible for long periods. There are archeological surveys going on now in garbage dumps for large cities (NY, for example) that are finding well preserved newspapers from half a century ago. Newspaper is not a good paper, and newspaper ink is a poor ink. This says a lot for the staying power of a good technology.

      Film deteriorates, magnetic media loses bits and the substrates crack and crumble, records lose crispness, wax and foil canisters wear out. Take magnetic media for example: it was thought that with careful storage and infrequent use this stuff would last a long time. As it turns out, magnetic tape barely lasts 15 years under the best of conditions. We simply don't have enough experience with these technologies to know if they'll work.

      Just because your 10 year old CD's can still be played today, doesn't mean they'll work in 2025. As a technological culture, we don't have enough experience with the materials to know. Old transparent plastics grow cloudy eventually--optical storage will probably not be your saviour in the 21st century either.

      At least until we've had a century (or two, or three) to observe technologies like CD-ROM will we know how they'll work for long-term storage. Until then, don't bet the farm.

      --
      Get off my lawn.
    3. Re:Data Decay, Readability, and ASCII text. by qbzzt · · Score: 1

      At least until we've had a century (or two, or three) to observe technologies like CD-ROM will we know how they'll work for long-term storage. Until then, don't bet the farm. The problem is that certain things, like audio or video content, can't be printed on paper in human readable form - there are just too many bits. For personal use, I'll use CDRs and back them up every few years. For institutions who can afford the price of the hardware, etched metal disks (readable with a laser) would probably work best. In any case, any content worth preserving is worth preserving in a publicly accessible format (no Word files for me!).

      --
      -- Support a free market in the field of government
    4. Re:Data Decay, Readability, and ASCII text. by s-gunn · · Score: 1

      Regarding *ML vs. plaintext, that's exactly the point. XML/SGML/HTML is plaintext--but with added metadata about the document. A document in Microsoft Word format may or may not be readable in 50 years. Since it's a proprietary format, information stored within the document may be untranslatable. On the other hand, the markup languages store the information about the document in a readable form. as long as the Document Type Description (DTD) exists somewhere the file will be readable as long as the text.

    5. Re:Data Decay, Readability, and ASCII text. by sjames · · Score: 2

      magnetic media or when a CD begins to decompose, that bit is lost forever.

      ECC would be a beginning (CDROM already uses it, that's why a 75 Minute CDROM only holds 650M of data). Microencoding of some sort in a durable substance is perfectly acceptable as long as the instructions for building the reader are in a more readily accessable format.

    6. Re:Data Decay, Readability, and ASCII text. by sjames · · Score: 2

      Just because your 10 year old CD's can still be played today, doesn't mean they'll work in 2025.

      Some of my 10 year old CDs WON'T play on a new CD player, but WILL on an ancient old player. Only ten years, and there's enough drift in standards to make that happen. Of course, neither player is top of the line, perhaps a really good one would play the old CDs.

  15. An excellent summary of the problem by dsplat · · Score: 4

    This is something that is going to be more of a concern for those of us who conduct a significant portion of our lives online already. Ask yourselves, have you ever had a moment of unusual brilliance in which you posted something to Slashdot or Usenet which was truly worth saving? Can you find it now?

    Personally, I encountered the issue of software obsolescence well over a decade ago. I migrated my resume to TeX because it had already been through four other formats and I no longer had access to the tools to read them. I picked TeX because I firmly believed that a tool that I had the source for was likely to continue to be useful to me for a longer period. And the source for the document is ASCII text, which I was able to convert to HTML a couple of years ago with little trouble. I will not rely on the future availability of any tool that I have no control over.

    This is one of the reasons that The Unix Philosophy, a fine book, recommends text formats for data. You can manipulate it with a wide variety of tools including text editors. It is unlikely that we will abandon those completely in our lifetimes. It also suggests, if memory serves, keeping notes online in text form. They are more portable and more accessible that way.

    One worthwhile source of literature preserved as plain text files is Project Gutenburg. It is probably also the oldest such project around. It is to text in some senses what Free Software is to code. Although they aren't doing collaborative authoring projects, they are collaborating on getting old books whose copyrights have expired into electronic form. If you haven't ever visited their site, take a look.

    --
    The net will not be what we demand, but what we make it. Build it well.
    1. Re:An excellent summary of the problem by Anonymous Coward · · Score: 0

      What you are calling "text format" some of us choose to call 7-bit ASCII. And 7-bit ASCII wastes approximately 1/8 of the storage channel with the redundant eight bit that's always zero.

      I'm sorry, but "The Unix Philosophy" all boils down to trying to force all information metaphors to ultimately equate to an old crofty teletype.

      Force all information to flow through a 'tty' and you've already filtered out most of the digital content people use in the present time.

    2. Re:An excellent summary of the problem by dsplat · · Score: 2
      What you are calling "text format" some of us choose to call 7-bit ASCII. And 7-bit ASCII wastes approximately 1/8 of the storage channel with the redundant eight bit that's always zero.


      Okay, you are right about that. I used ASCII as my example for three reasons. First, Slashdot is in English. Second, many if not most of the common character sets today are supersets of ASCII for compatibility. Finally, the primary but not sole input character set for TeX, which I mentioned, is ASCII.

      As for wasted space, the amount of redundant information in every written language that I am aware of is very high. The actual information content of a single character is only a bit or two in context. That can be demonstrated with any good compression program. So, I would suggest that for saving space, either we all need to abandon our human languages for one with no redundancy (not a likely proposition) or compress everything we want to save and document the compression algorithm in uncompressed files, preferrably with source code.

      I'm sorry, but "The Unix Philosophy" all boils down to trying to force all information metaphors to ultimately equate to an old crofty teletype.

      Force all information to flow through a 'tty' and you've already filtered out most of the digital content people use in the present time.


      I disagree about your premise, although your conclusion would follow from it. The idea is to have human readable streams of data that can be treated as if they are simply being set to a tty. HTML is an excellent example. With a browser, it is enormously powerful and useful. Yet at the core, it is a sequence of characters that I can type and read. I can edit it without special tools. Admittedly, those tools can make achieving just the look I want easier. They can speed my writing and make the results more reliable, but they aren't necessary.
      --
      The net will not be what we demand, but what we make it. Build it well.
    3. Re:An excellent summary of the problem by 0xdeadbeef · · Score: 2

      Yes, but compare it to Unicode, that wastes *nine* bits when used by all right-thinking people of the world. And my god, think of all those binary file formats that pad space for fields reserved for future use. And what about the people who don't compress their media, or those the whiners who think they are too good for lossy compression algorithms. Don't they realize that all *meaningful* information can be expressed in an mpeg bitstream?

      Don't even get me started on those luddites who still insist on using dried wood pulp as their storage medium. It's as if they think all information metaphores equate to a 16th century printing press.

    4. Re:An excellent summary of the problem by Oxryly · · Score: 2


      This doesn't address the most difficult parts of this problem: multimedia. Images and sounds don't have the equivalent of ASCII. There is no universal standard that all tools access the same. GIF used to be like that, but look what happened to it. JPEG is nice, but its lossy, so there goes your perfect archive.

      Then there is the further problem of giving Joe Computer User out there the capability of building a "digital" history. With companies like Kodak and Apple goading people to using proprietary data formats like FlashPix or Quicktime its an uphill battle. And once again, there's no ASCII equivalent to fall back upon.

      Ugh... this really brings back the corporatism fueled pessimism I was feeling earlier with the DVD/DeCSS debacle.

      Oxryly

    5. Re:An excellent summary of the problem by dsplat · · Score: 2
      Yes, but compare it to Unicode, that wastes *nine* bits when used by all right-thinking people of the world.


      <sarcasm>
      Oh yeah, all right thinking people speak languages that can be represented by the letters available in ASCII. Yes, Unicode was invented for all the wrong thinking people who insist on using those funny looking letters with lines or dots around them or arcane characters that no right thinking person can understand anyway.
      </sarcasm>

      If you use the UTF-8 encoding and you restrict your text to the characters available in ASCII, the resulting text is ASCII. Besides, do you have any idea how hard it is to write the credits for a big free software project these days in anything other than Unicode without mangling somebody's name?
      --
      The net will not be what we demand, but what we make it. Build it well.
    6. Re:An excellent summary of the problem by kimihia · · Score: 1
      One worthwhile source of literature preserved as plain text files is Project Gutenburg.

      Even plain text formats change over time.

      Go to rfceditor.org and have a look at the format of the early RFCs and compare that to the current RFCs.

      There is a lot of difference. (Ugly!)

  16. Not a technological problem... by captaineo · · Score: 2

    This is an excellent summary of the technical challenges to digital media preservation.

    But the technical issues are insignificant compared to the legal concerns - copyrights, patents, etc.

    Sure, most of these forms of copy limitation do expire, but until a large amount of "digital literature" becomes public domain, nobody's even going to *try* developing a preservation system, for fear of lawsuit by irate copyright-holders.

    My university's library collection totals nearly seven million books. Yet extracting information from this huge paper collection has been an incredible hassle... I would be willing to pay a significant annual fee if I could access every page in the library via a Web interface. I leave the juicy technical details to the reader's imagination. (I bet a few people with hand-held scanners and rudimentary OCR could digitize the entire library in a reasonable amount of time).

    But guess what - this is never going to happen in my lifetime.

    These seven million volumes of knowledge are never going to be preserved, because no library director in his/her right mind would risk slipping up and getting sued for violating a long-lasting copyright.

    1. Re:Not a technological problem... by Anonymous Coward · · Score: 0

      The seven million volumes you talk about will last a long long time if they are taken care of. It's generally known that books printed on paper stand up to the test of time pretty well.

      They don't, however, when the ill informed propose that it can all be OCR'd by a few people running around with hand scanners. The key to keeping a paper information archive alive is to take care of it. That involves spending real money maintaining the resources to house the collection, and a staff to keep it in order and maintain and update it. It almost always runs counter to the latest fad in many libraries, which is spending the lion's share of the budget on a bunch of Internet terminals that will be obsolete in two years.

      Oh, and information does NOT "want to be free." Information wants to be taken care of and maintained for the future. That's an expensive proposition, and will have to be paid for. The fact that it costs to do by definition means it won't be free (in any sense of the word). Deal with it.

    2. Re:Not a technological problem... by Anonymous Coward · · Score: 0

      has anyone ever told you a secret? Information most definitely wants to be free, and could care less about what happens after that.

  17. The Media Problem by Anonymous Coward · · Score: 0

    TROLL
    I want a grit cluster out of naked and petrified Beowulfs pouring hot Natalie Portmans down each other's pants!
    /TROLL

    Addressing the media hardware problem:

    I think one solution could be to store all data worth keeping for a long time on standardized media.
    In the Old Times (IIRC) nearly each computer manufactor clung to his hown proprietary set of "standardized media" - just remember the nearly thousands of different formats for the good ol' 5.25" floppy drives. This problem is far less threatening today, because nearly all media (hardware AND logical formats) are standarized. You can read a CD containing i386 Linux on a Macintosh etc.
    So one solution to the media problem would be to just keep the official standard specs (like the Books of Many Colors) in a durable format (etching them in titanium plates should be sufficient), so if, in a few thousend years, the need should arise to read that old Quake CD, the archaeologist just have to dig out the plates, build a new CD drive and lo! all that old data which survived World War XXXVII would be accessible again (if kept in a climatized room to slow the media decay).

    Unfortunately, Playstation CDs would be out of the game for not being made according to The Standard...

  18. some other problems by sloth+jr · · Score: 1
    For me, the issue is knowing what it is I have, since knowing what to keep is dependent on this. While asset tracking is "biz as usual" for archivists, I'm not one of them. How do I keep track of whatever I have, over the last dozen years, and a ton of different machines, ranging from a MicroVax II to multi-processor SGI graphics boxen? And how do I track this info in a way that doesn't consume huge gobs of time and thought?

    This assumes that information SHOULD be thrown away. I'm not interested in becoming a pack rat, I already have enough "stuff" to keep track of, thanks. I suppose I'm just not all that interested in making my information, no matter how trivial, available to future archaelogists.

  19. It's not just digital magnetic information... by fingal · · Score: 3
    The problem of transient information storage is not just connected with reading old digital data stores, but also with analogue information as well, typically audio recordings.

    In this case, the main problem is not bit-rot (although this will occur sooner or later) but rather problems with not recalling the information for an extended period of time. For example:-

    • Reels of tape start to inprint signals to adjacent tape causing loud passages to have ghost versions either before or after them.
    • Tape actually becoming stuck to itself due to using bad binding materials leading to baking of tape as desperate restorative measures.
    These and other issues are discussed on www.audio-restoration.com. Does anybody know if there are similar problems associated with digital media (the cross-talk problem will be virtually negligible due to noise-floor issues being irrelevant)? If so then it makes archiving a much more difficult thing if you have to physically do something to the archives every couple of years (especially with the exponential growth rate of information generation).
    --

    The only Good System is a Sound System

  20. Inversely proportional? by Yaruar · · Score: 2
    It just struck me that data storage times seems to be inversely proportional to the level of technology around in the age.

    Books can have a life of hundreds, if not thousands of years if treated right. Even with abuse it will survive for years.

    There is a problem of obsellescence of language, although usually there is a rosetta stone equivilent

    With modern Media technology is progressing so fast in an almost throwaway way. At my previous company we had good backups, but we had no way of accessing them as before we went to DAT and then DLT we didn't actually posess the devices needed to read the tapes and before that disks.

    It could be argued that with the internet archiving is going to be more dynamic and fluid, but where does this leave information, and especially information for future generations. It is all well and good moving from teh printed page to the digital page, but in 2000 years time will they be able to revive the contents of a hard disc, will the information on the internet evolved dynamically not leaving a snapshot. Or will they look through the books of our time???

    What will be our dead sea scrolls?

    --
    Working for the (other) man
    1. Re:Inversely proportional? by fingal · · Score: 3
      One point to remember when looking at problems with digital information storage media is that they are not really intended to be archived. What they are (mostly) designed to do is read and write the information very fast at a high frequency with a high degree of accuracy. Most of them are quite good at this and the issue of bit-rot tends to go away if you are continually reading and writing your hard-disk.

      CD's and related optical media do have problems with sunlight, but you have to remember that they where created (AFAIK) by the audio industry which is one of the most notoriously fickle industries in the world: they want you to buy a new CD from a new group every week, not have a single CD that is perfect and that lasts forever. I think that the concept of people being able to listen to their CD's for 10 years is already far too long for them.

      The problem is that there doesn't really appear to be anyone making storage media that is optimised for long-term persistent storage. But do you think that such a format would be the way forward? Each year, we generate an exponentially larger amount of information. All the hard disks on the planet now would not be enough to store the new information that will be generated in the next 5 (wild guess) years. Therefore we are going to need progressively larger and efficient forms of data storage as the information bloat gets larger. As new formats come out, the important thing is to look at the movement of legacy data onto the new formats. If data is not treated as a static thing to be boxed up and forgotten, but rather as part of the on-going current set of information and transferred onto new technologies as they are developed then you will not have the situation where people are looking at a hard disk in 50 years time and going "what's an IDE interface?".

      Of course, then you have the 'minor' issue of application file formats...

      --

      The only Good System is a Sound System

    2. Re:Inversely proportional? by Yaruar · · Score: 1
      I agree, information sizes are growing at an inverse rate, although this isn't entirely because of content, I would say mainly through format.

      Archiving should be all about finding effecient ways of storing information in retrievable ways for as long as possible

      However, archival seems to have become all about storing all the information (take the British Library or Library of congress which can't really keep up with publishing in terms of space and resources...)

      Maybe the answer is something like Guttenburg where plain text is used and a fluid medium is used, albeit one which seems to be stable (ie multiple mirrores servers with backup devices)

      I think there needs to be a shift in focus from the sheer need to store to the methods of storing and the reasons for storing.

      I think the internet is an interesting snapshot of our time, but I think it's transience and fluidity are then things that make it what it is and the things that make archival a difficult process...

      Hmm, more thought needed... I would plug my pmployers now as information management and storage is our thing, but that would be crass... (unless anyone in the field of Information storage wants a scholorship, in which case mail me for details.

      --
      Working for the (other) man
  21. Very informative.... by Anonymous Coward · · Score: 0

    sign me up.

  22. Thanks to "proprietary formats" info will be lost. by Anonymous Coward · · Score: 1

    And since copyrights of data formats is author's (or company's) life plus 100 years (gee, thanks Sonny Bono for extending this, I won't miss you), we can never hope to see any legal 3rd party readers for these files. In the IP owner decides to sit on an old format and not support, we are officially hosed.

  23. Exponential data and storage by sterno · · Score: 2
    Sure, the data grows exponentially, but as you just pointed out, so does the storage media. At one point, holding on to all of the e-mail I ever received would have been a ludicrous concept. But now using CD-R or even just a big mirrored hard drive, I can keep a limitless archive. I think the bigger concern is not the limitations of the physical media's capacity to store everything, it is the ability to view that stuff a few years down the line. E-mail is all ascii text so it isn't too difficult to deal with, but as the data becomes more robust and complex, then the issues of obsolesence become more pronounced.

    ---

    --
    This sig has been temporarily disconnected or is no longer in service
    1. Re:Exponential data and storage by Anonymous Coward · · Score: 0
      so does the storage media.

      "So does the storage medium", you mean. Or "so do the storage media." Medium is singular, media is plural.

      When people say "the media", they generally mean radio, TV, magazines, newspapers and movies. That's several different formats, thus, "media". In contrast, "The medium is the message". Think about it.

  24. Just carve it by Kaa · · Score: 2

    I would argue for the historically tested method of storing data: take a chisel and carve it into rock.

    The software obsolescence is not a big problem -- humans (we hope) are going to be around for some time and the brain wiring changes awfully slowly. Languages do get forgotten, but smart people are very good at understanding dead languages and will probably get only better. Readers are also not likely to be a problems: just like brain wiring, eyeball construction is quite stable and not going to be superseded by a better design any time soon.

    The media -- provided you pick a good hard rock like granite (avoid limestone and its derivatives like marble, they don't like acid raid) -- does not suffer from bit rot, completely ignores magnetic fields, stable with regards to solar radiation, and fairly resistant to pollutants.

    You are not limited to ASCII, and even have limited graphical capability. In fact, rock has a huge advantage over current digital media -- it's perfectly possible to create, view, and store 3D objects in rock. Just try that with your 21' monitor!

    Just in case you think I am being funny, there is a company which in exchange for a sum of money will take your text, etch it on metal plates (nickel, I believe), and store it in some cave. They are estimating >5,000 years MTBF. I still think a good slab of granite is better, though.

    Kaa

    --

    Kaa
    Kaa's Law: In any sufficiently large group of people most are idiots.
    1. Re:Just carve it by Anonymous Coward · · Score: 0

      I know that I am not the only one who gets a warm feeling, knowing that Natalie Portman has been preserved for all posterity in rock form.

      We can thank our local trolls for this, too. Lousy George Lucas was satisfied to capture our Natalie on unstable film and leave it at that. That philistine! May he bit-rot in hell! May the EPROMs in the controller for the volumetric infusion pump supplying the crack to his brain, the crack which inspired Jar-Jar, suffer from the infamous 'EPROM alzheimer' failure (already afflicting the computers of CP/M enthusiasts worldwide) and pump a nice big happy air bubble to his brain! moohaha. etc.

    2. Re:Just carve it by Cid+Highwind · · Score: 1


      Granite probably isn't the best choice either. Over time the feldspar in the granite breaks down, and the rock falls apart. Pollution and water accelerate this process. Basalt would probably be a better choice, or pure quartz, or some corrosion-resistant metal like gold or platinum.

      As for me, I'm backing up my data by encoding ASCII text as a pattern of platinum-plated titanium pins hammered into a slab of good dense shale. After that, I'll drop the slabs into the Mississippi delta, and in a few million years, my wit and wisdom will become part of the rock strata. The MTBF should be about 100 million years, barring a major tectonic event.
      </offtopic>

      --
      0 1 - just my two bits
  25. Information evolution by Steve+Burnap · · Score: 1
    It seems to me that trying to generalize a way to archive information just isn't worth the effort. Information that people consider worthwhile will get copied because people want it and will thereby be saved. Information that isn't worth much won't get copied and thus forgotten.

    The common response to this is that we may not know what is worthwhile, or that future ages may not take appropriate care. Lost greek plays that would be worth millions now were overwritten by some monk's laundry list in a less enlightened age. We feel we must save our information from that fate. But that is an impossible task. Etch the information on steel disks and some future, more barbaric age may melt those disks down for swords.

    So forget about trying to save everything. Just work to save what you think is important. Yes, stuff will get lost, but that will happen anyway. You will never get perfection. More likely is that future generations will curse you for the stuff you thought to trivial for your archive project, while finding the information archived worthless.

    1. Re:Information evolution by jwhyche · · Score: 1

      Since I think I was the one who came up with the ideal of using stainless steel disks for archiving I'm going to comment. I'm not going repeat my entire ideal here. If your that intrested in it you can search /. for it.

      In the orignal ideal I called for placing important information about our society and culture on stainless steel disks and archiving them on the moon. Placing them on the moon served two functions.

      1. The moon's environment is a perfect place to store shit for long terms. Things placed there will last billions of years.
      2. Only a moderatly technological advanced society would be able to retrive them. And that is just the moon archive. The other archives would take a real advanced society. Such a society wounldn't be intrested in melting them down for something like sword.
      Just my take on this.
      --
      I read at +2. If your post doesn't reach that level I will not see or respond to it.
    2. Re:Information evolution by Steve+Burnap · · Score: 1

      I probably used a bad example, because I don't think it is so much a matter of barbarism as it is of what a culture finds valuable. Some greek plays are lost us because one of the cultures between us and greek culture found the paper they were written on more valuable as kindling, or to be reused to make a hymnbook, etc, etc. It wasn't so much that those cultures lacked the ability to make use of those greek plays as it was that they simply didn't see them as things of value. There is no guarantee that some future society won't see the steel in those steel disks as more valuable than the symbols written on them in some future age. This is what I was trying to get at. The information will only really survive as long as people find it valuable.

    3. Re:Information evolution by Anonymous Coward · · Score: 0
      Sounds like
      • A Mote in God's Eye
      .

      A book, if you don't know what I'm talking about. A novel about an alien culture's attempts to preserve its culture and information despite repeated collapses.
    4. Re:Information evolution by Anonymous Coward · · Score: 0

      What? No barbarians on the moon?

  26. Data *and* code by AstronomyDomine · · Score: 2
    This is actually a topic close to home for me. I've had to work both with archival data and legacy code for several years now. Recovering and transferring data from real-live systems isn't always trivial. A few years ago, I recovered some data from my advisor's old IBM workstation. It had a hard disk and several floppies full of EBCDIC data. It took me a few weeks of phone calls to networking support before I discovered the wonderful "dd" command on my own (no flames about taking two weeks, I'm an astronomer, not a CS major). Another example is when existing machines migrate to new operating systems. A big recent headache for me was getting Cray CTSS data migrated to UNICOS ASCII data. I think the key in instances like that is to make sure that old data standards are well known and easily translatable.

    Now, I'm dealing with legacy code, too. One solution of course is to write vanilla code in a common language, but who knows what language is going to be used in 25 years? C+++? Fortran 2020? And vanilla code isn't always optimal, when hardware vendors build cutesy hotrodding tricks into their architecture and compilers.

    Somebody just needs to build a giant computer version of babelfish for all languages ever. Starting with cave paintings. :)

    --
    I'd rather trust a man who doesn't shout what he's found. -- Genesis
    1. Re:Data *and* code by Anonymous Coward · · Score: 0

      [W]hat language is going to be used in 25 years? C+++?

      C+=2 of course, silly.

  27. Ermmmm.... Moderators! Bart's Swearing! by luckykaa · · Score: 1

    Or if you prefer to avoid the Simpsons reference look at the link. It goes to http://www.hardcoresex.com

    (Then there the fact that the BBC website can probably handle more traffic than Slashdot so a mirror is pointless)

  28. Help? ---> A question... by ATKeiper · · Score: 1
    Thanks for a great summary of the problems with keeping data viable. Maybe some /. reader will create a start-up to help schools and businesses deal with the problem, perhaps by creating the "museum" you alluded to. (I notice, for instance, that the domain www.datadecay.com is still available.)

    I have a question, however, about the other end of the data life-cycle: its birth. Certainly data disappears, but what is the best way to describe or define "data," broadly generally? What is the best definition anybody here has ever heard for "information"? I'm having trouble finding a straight answer. Is data (information) a representation of something in the real world? Is it like a shadow of something else? We have seen how it can be created, we have seen how it can evolve, and we have seen how it can fade away and die, but what is the best definition of what it is?

    This is one of those philosophical questions that just nags at the mind. If anybody can suggest definitions (or resources), I'd be grateful.

    A. Keiper
    The Center for the Study of Technology and Society

    1. Re:Help? ---> A question... by Anonymous Coward · · Score: 0
      That's a good question, frankly. Information is whatever we say it is, and it can be music or census data or movies or pictures or anything.

      So I would suggest you define it as anything that can be converted into ones and zeros, but that would probably be a circular definition. (Data is ones and zeros which is data.)

      It sounds to me like you're looking not so much for a DEFINITION (which won't clarify what you're looking for, since it's just words) as a METAPHOR, which can shed lots of light on the matter quickly.

      I hope that helps.

      ~dafezer9~

      this, too, shall pass

  29. Hardware museums for obsolete media by Anonymous Coward · · Score: 0
    Instead of a hardware museum, why not create a spec for reading the media and store it in a form thta will still be readable and decypherable in the future? ie, print the spec indellibly (sp?) on no-rot plastic, in a language you think will be translatable in the target era (do you want these documents to last 1000 years? a million years? forever?)

    hint: don't store the instructions for reading media, on those media :-)

    just my .0001 cents.

  30. I'm off to save the world... by Chops-Frozen-Water · · Score: 1

    ...as long as I don't run out of disk space. (Paraphrasing a comment I heard at a DC thinktank.)
    It was noted that storage requirements for geographic data (geologic, topographic, etc.) would require petabytes. Multiple petabytes. And a petabyte is 1000 terabytes (right?). And we're thinking 36GB hard drives and DVD-RAM drives have a lot of space...
    --

    --
    The Future: Some assembly required; batteries not included.
  31. Caching as a possible approach to preservation by adam · · Score: 2

    There is a project that has started recently here at Stanford to investigate the possibility of using distributed web caches as a means of preserving information on the Web. The project is called "LOCKSS" (Lots Of Copies Keep Stuff Safe), and more information can be found at lockss.stanford.edu.

    This project definitely does not address all the issues with digital-document preservation; it definitely does _not_ solve the document-format problem. Its goal is to make digital publishing "immutable" so that publishers cannot modify or withdraw their work after it is published.

    Disclaimer: I work for one of the groups which is participating with the LOCKSS project, but I'm not working directly with it.

    --
    I am Jack's complete lack of surprise.
  32. So just what *is* the life of a CDR? My results. by Anonymous Coward · · Score: 4
    What is the lifespan of data stored on a CDR? An old 550(63min) CDR? a 650MB(74min) CDR? Green CDRs? Gold CDRs? Blue CDRs? A CDRW? etc.? Under non-ideal conditions?

    I put some CDRs out in the direct sun hede in the Las Vegas desert ofer the last summer. Blue, gold, green, pale green, and an RW. Both sides of the CDs had their chance to roast in the 100F+ (40C+) degree sun for several months each. And here's the results of attempting to read the data back on each type:

    Old TDK green CDR: dead, nothing readable. Faded to a mostly clear plastic disc!
    Ricoh gold/gold CDR: dead, nothing readable. The golds faded visibly first of them all. Area where data was stored faded to clear!
    Verbatim (blue): I was stunned. I read back a full and complete iso image of Red Hat 4.2. No fading at all.
    Ricoh gold/gold CDR: dead, nothing readable. The golds faded visibly first of them all. Area where data was stored faded to clear!
    Memorex silver/green CDR: mostly dead, some files readable. Faded in a few isolated patchy blotches.
    The CDRW... just started this test. No results yet. Looks OK, though.

    Overall, I'd say the blue CDRs are the best choice for long term data storage.

  33. Dead Media Problems by Anonymous Coward · · Score: 0

    About fifteen years ago the Library of Congress did a study to determine how they should be protecting important records. At the time they estimated the life of an optical disk (not a CD-ROM, but similar technology) to be ten years and the life span of a book printed on acid-free paper to be in excess of three hundred. (Books printed on cheap paper using an acid bleaching process last mere decades. Go look at any SF paperback from the fifties or sixties to see what your paperbacks from the seventies will end up looking like.)

    The Library of Congress has so many WW II audio recordings that it would take a scholar several lifetimes to listen to them all. (And it would take the same manpower to convert them to a more modern storage medium.) These recordings were made on glass disks, and pose a number of problems. They have only a few players and the disks are very fragile. (Fortunately, when they break it is often possible to recover the data.) The other problem is that the disks are not well indexed, and certainly are not searchable. Most of the recordings are speeches of little value, even to historians, but finding the valuable information requires that the material be converted to a more useful format. (Some day we will have voice recognition to automatically convert a lifetime's worth of audio into text that can be searched, but that still presupposes that someone has digitized the tens of thousands of disks. Where will that manpower come from?)

    Media lifespan is a serious issue when you are required to archive materials. Many governments are legally obligated to maintain materials for long periods of time, and replacing paper copies with electronic ones may not satisfy legal requirements. (Think about what happens when your data is on 230 MB optical disks. Remember those? Very popular eight years ago, useful only as coasters today.) None of this is a new problem, however.

    Some of you will, no doubt, remember the issue of whether or not Heisenberg was building an atomic bomb for the Nazis, and if so, was he actively interfering with the project because he disagreed with the Nazi's goals. It turned out that after the war, Heisenberg and some other scientists were being held in Britain. The British tape secretly recorded all of their conversations. The medium? Spools of wire. (Think of a spool of wire being used just like a magnetic tape.)

    A few years back some scholars wanted to listen to these recordings and had a terrible time finding a player. Eventually they found a collector who had one in working order. Wire recorders have not been made since the fifties. But they eventually found a player and carefully transcribed them. (And it seems that Heisenberg was actively trying to build a bomb, but lacked the resources to do so.)

    There are mag tapes from the seventies that cannot be read. I have tapes from the late eighties that would be difficult to read, since I no longer know anyone with a 9 track tape drive. This is a little over ten years, unlike the wire spool recordings.

    While most software will read files created by ancient versions of its competitor's software, I wonder how much longer this will last. Open Source doesn't fix the problem posed by data in proprietary formats which cannot be easily migrated.

    The issue of emulation is important, but it presupposes sufficient information to write an emulator and sufficient resources to fund the project. Many times special hardware would have to be built to read the data. NASA has this problem, as the tape drives used to store telemetry data have not been made for decades and it is very difficult to find working ones.

    I wonder if the period for technological obsolescence is compressing to the point that it will only take ten or so years for older formats to be unreadable.

    I was recently looking at the first major programs I ever wrote. I only have printouts, as 20 years ago there was no easy way, at least as a student, to save files and even if there was, it would not matter because I could not read the media today. While I will scan and OCR these someday (for sentimental purposes, as they have no value to anyone else), I count myself lucky that I saved the printouts. I have a floppy formatted as a Unix filesystem for a Lisa running a crippled System III port done by Unisoft (remember them?). It has a few papers on it and some software I wrote. Nothing terribly valuable, although the papers would likely make some plagaristically inclined college students very happy. Can I read it? Maybe if I ever find someone with such a machine and the floppy has not gone bad. How old is it? A mere fifteen years.

    Oh, and the the Y2K fiasco cost a lot more than 8 billion. I read somewhere that the New York Stock Exchange spent 600 million, and I know that the big three auto manufacturers spent at least that much apiece. I've seen estimates that the cost was $100 billion.

    1. Re:Dead Media Problems by jms · · Score: 2


      Some of you will, no doubt, remember the issue of whether or not Heisenberg was building an atomic bomb for the Nazis, and if so, was he actively interfering with the project because he disagreed with the Nazi's goals. It turned out that after the war, Heisenberg and some other scientists were being held in Britain. The British tape secretly recorded all of their conversations. The medium? Spools of wire. (Think of a spool of wire being used just like a magnetic tape.)

      A few years back some scholars wanted to listen to these recordings and had a terrible time finding a player. Eventually they found a collector who had one in working order. Wire recorders have not been made since the fifties. But they eventually found a player and carefully transcribed them. (And it seems that Heisenberg was actively trying to build a bomb, but lacked the resources to do so.)

      How interesting that they had problems finding a working wire recorder. At any time, there are between a half dozen and a dozen wire recorders for sale on eBay. The circuitry of a wire recorder is so simple that any good old-school tube radio repairman could get one working in an afternoon.

      Wire recordings are an example of an early technology that turned out, unintentionally, to be a fantastic archival medium. Sure, the recording is monophonic, and the frequency response is limited, but for voice recording, those are acceptable compromises, considering that a spool of stainless steel wire can last for centuries. Short of physically destroying the spool, or deliberately erasing it, it will not decay. There's no plastic backing to decay. There's no oxide particles to flake off. Just corrosion-proof steel wire. Fantastic!

      I have dozens and dozens of original wire recordings from the late 1940s and early 1950s, and they all sound as good today as a freshly recorded wire.

  34. Moore's Law by stevelinton · · Score: 2

    At the moment, Moore's law is the only thing that stops this problem becoming really acute. Although I keep all my email, and the total size of the archive grows almost exponentially, so does the size of my hard disk, and the speed at which I can run grep over it.

    To handle terabyte databases now, needs leading-edge hardware and state-of-the-art software specially optimised for the data format. In 20 years, however, we will just be able to haul the terabyte database into emacs, and hack up some macros to reformat it and search it.

    If Moore's law ever tops out, then we are in trouble!

  35. related article by ATKeiper · · Score: 1
    From "Old Computers Lose History Record" (BBC, 23 Feb 00)

    Irony of ironies: Data records on floppy disks relating to an an archaeological dig decayed by 5 percent in under a decade - after everything had survived the journey from the Bronze Age intact.

    A. Keiper
    The Center for the Study of Technology and Society

    1. Re:related article by Anonymous Coward · · Score: 0

      The real irony will doubtless be when all the artifacts, which have now been unearthed and put into modern steel-and-glass buildings, to be pawed over and damaged, are destroyed and/or dispersed. Archaeologists are just modern-day plunderers. Our descendants will be thankful for any traces of evidence that the current-time plunderers are unable to locate and dig up.

  36. Analog vs. Digital by doranb · · Score: 1

    I agree that the problem of preservation isn't exclusive to digital media, but one of the big differences is that analog media tends to degrade MUCH better than digital media. True, old records get scratchy or warp, and tapes can have their oxide coating flake off, but at least there's some data available (ie. you can still listen to the recording through the clicks or dropouts. With digital media, it's often and all-or-nothing affair. Either it's in perfect shape or it's gone. Of course this isn't always the case (sometimes you can extract some digital data from a damaged source) but it's much more difficult than with analog media.

    1. Re:Analog vs. Digital by fingal · · Score: 1
      Hmmm. Yes and No. You can infer stuff from damaged audio recordings, but think about what is required to cause errors in a digital recording: An error will be when a 1 or a 0 is read for it's opposite value. In order for this to happen (using conventional readers) you are going to have to have a background noise floor of somewhere around 45% of your headroom (assuming that 1 is written as 100% 'on'). Even at this level you will get a reasonable chance of reading the data correctly. However, if you listen to an audio recording with 45% background noise added then it is going to be virtually impossible to clearly distinguish the clear sound.

      Yes, when digital breaks, it definately breaks (although checksums and duplication of data can reduce the chances of losing data), but the level that you can push the degredation of a digital device too before it breaks is really quite high.

      --

      The only Good System is a Sound System

    2. Re:Analog vs. Digital by Detritus · · Score: 2
      Yes, when digital breaks, it definately breaks (although checksums and duplication of data can reduce the chances of losing data), but the level that you can push the degredation of a digital device too before it breaks is really quite high.

      One major problem with digital formats is the absence of error recovery in common hardware and software. 99.9% of the data may be intact but one bad block at the beginning of a magnetic tape can make all of the data unrecoverable.

      --
      Mea navis aericumbens anguillis abundat
  37. Vanishing Web Content by raygundan · · Score: 1

    I'm not nearly as worried by media decay as I am about content just disappearing altogether. The internet saves us from media decay-- if I keep my files on a network-capable machine, then transfer to the next generation machine is easy. Every time I get a new PC, I plug it into the hub, and let the file copying begin! On the other hand, "disappearing info" on the web may result in all sorts of archival losses! Magazines and Newspapers are archived and kept in libraries for years. What about news web sites? I'm sure most large sites keep their own archives, but will anyone ever have access to this data again? Once it is replaced by newer info on a site, is it gone forever? I'm afraid that the popularity of the web may result in the loss of good data archives in libraries for the future.

  38. How long can it last? by blackdefiance · · Score: 1

    Regardless of how it's stored, eventually the data itself becomes meaningless. I read an article that made this point last year. Ever try to read Chaucer in the original english? Same language, more or less, but over several hundred years it has become unintelligible to all but a handful of people. The way language is changing today, it could take even less time for all these articles on slashdot to become gibberish. So with a perfect medium, who would we be preserving things for? A handful of scholars, ignored by everyone? No one at all?

    1. Re:How long can it last? by Kulibali · · Score: 2

      Actually, this is not quite acturate. Languages tend to change most in isolation. That's why in places like central Africa and Papua New Guinea you have hundreds of very different languages within very small areas. With the rise of electronic communication, and especially the web, it seems likely that the rate of change in English will be a lot slower, since its used so widely.

    2. Re:How long can it last? by blackdefiance · · Score: 1
      Interesting point, but it's hard to believe that the rise of electronic communiction will have any stabilizing effect on the english language. In just a few years the net has introduced new words, idioms, phrasings, forms of punctuation, etc., and allowed them to spread instantly all over the planet. There doesn't seem to be any evidence, historical precedent and academic theories aside, that changes in the English language have slowed or will continue to slow.

      I think the important point is that the problem of data obselesence goes way beyond the drives, formats, or what have you, into problems of materials and the relevance of the information itself.

    3. Re:How long can it last? by Anonymous Coward · · Score: 0

      "The way language is changing today, it could take even less time for all these articles on slashdot to become gibberish." Oh, come on. Most of this stuff starts out as gibberish.

  39. Re:Thanks to "proprietary formats" info will be lo by Detritus · · Score: 2

    As far as I know, there is no copyright protection for file formats. You can copyright a document that describes a file format, but not the file format itself.

    --
    Mea navis aericumbens anguillis abundat
  40. Data Havens, Archive and standards oh my! by jfrisby · · Score: 2

    Not all information needs to be archived. Most of the e-mail I receive can go in the bit bucket for all I care. The rest, I archive. As for the information that can/should be archive, the author's statements to the contrary, industry standards can be used to archive what should be archived.

    Given a format that is a) adequately documented, b) accurately represents the data it encompasses, and c) has sufficient widespread adoption, we can simply archive to that format as we need to.

    Let's consider various and sundry data types, the prominent format for handling them, and the potential longevity of those formats.

    Text: For raw text of course you have ASCII. While not a permanent fixture, nobody can argue it's longevity. We'll call this the baseline. Moving up from ASCII you need some way of defining formatting and such. There are really only a couple realistic solutions. Either some SGML based system, HTML, or PDF. I'll get into the latter two cases a little further down. Let's say that for plain text, SGML has the best longevity because of widespread adoption, and simplicity.

    Rich Text (beyond simple formatting): As above, we need something better than ASCII. I'll vote for PDF here. It's a proprietary format, but it seems to be pretty well understood, and it does an accurate job of representing the original document. Mac OS X groks it very well, and Adobe has ensured that there's a viewer for every platform. If conversion tools can be made, then this is a good format.

    Images (bitmap): PNG, JPG, GIF, and TIFF. TIFF seems to be less relevent these days although most scanner software still produces it. JPG/GIF are where the majority of data presently exists, and PNG is where everything should be archive, IMHO... PNG being lossless, and supporting about every feature known to man, this seems to be the best solution. One could crawl the web, grabbing every single GIF or JPG, archive it to PNG format with no loss of data and quickly build a significant archive.

    Image (vector): Sorry, don't know much about the formats used here...

    Audio: The obvious solution for archival is uncompressed, raw audio in a well understood format like WAV. This is an area that doesn't seem to be changing much...

    Video: Again, I can't really comment on the formats here...

    Things become more complicated when you have interactive media, or other very specialized forms of data... But I'd rather save that for the experts...

    The author brings up the "loss of fidelity" issue when updating documents to a new format. I think this really only is an issue when making a lateral move. Converting from JPG to PNG wouldn't be a problem, nor GIF to PNG. Converting from WordPerfect to Word on the other hand, is problematic at best...

    Thus the need for archival formats with some longevity. Perhaps a commission should be formed on data archival formats? A group of OSS developers who do nothing but strictly define what format(s) are to be used for "data archival" purposes, and ensure that tools to read/write these formats are readily available on every platform -- including new ones as they come out.

    The trick is to avoid lateral conversions at all costs.

    --
    MrJoy.com -- Because coding is FUN!
  41. My solution to access obselete documents.. by technos · · Score: 2

    I keep what I loosly term a knowledge base; Every bit of useful tech data I run across, or have reason to believe I will need again, gets stuffed into a designated folder on my HD and later archived. I have stuff going back to Phrack 4, WordStar copies of C128 documentation, programs I wrote fifteen years ago for a hardware platform that no longer exists, System 3000 performance data, etc. While at the time I put each of them in I had access to the machinery and software to read and run them, much of it is dead now. Now I take the extra step to make sure anything new will be readable in the future. If it requires a viewer, an emulator, etc, they are saved with it. When the day comes that ia32 everything runs on and the CD the data is held on are depreciated and forgotten, they will be replaced by DVD-ROM and an ia32 emulator before obselescence becomes such an unsurmountable hurdle.

    We must activly, and over the course of time, make sure what we do is available for posterity. Next time you burn MP3's to a CD-R, burn a copy of the mpg123 source too. Thirty years down the road, the information will be usable to anyone with the ability to read C and a DVD-ROM, even if MP3 is a forgotten format. When CDROM becomes hard to find, copy it to new media. I started on a Atari, and have manage to propogate that data through audio tape, floppy disc, magnetic tape and CD-R with little effort. Preservation shouldn't be an afterthought. Just do it!

    --
    .sig: Now legally binding!
  42. Data preservation by LostOne · · Score: 2

    I can't help but wonder what future humans will think of our efforts to preserve information. Will they even have the records that show that we tried to preserve anything? Will they believe the records we leave behind are factual? How much of the fiction that is floating around will be mistaken for fact? How much of the information we currently have will survive only in fragments yanked out of context?

    This leads me to wonder how much context information do we need to bequeath to our decendants in order for them to be able to understand the information we leave behind? Consider how much information we have from ancient times which we do not truly understand because we do not have enough contextual information to really understand what was meant by this information. Look at how many conflicting translations there are of many of the documents that do still exist.

    Even if we manage to prevent the degradation of the media on which the information is stored and the devices and software necessary to read the information are preserved, what of language shifts and culture gaps across time? We will still have the problem of information being lost as meanings of words change with time or as information is translated from one language to another. This is, in fact, exactly the same problem we face with the various software revisions for products like MS Word.

    This is not to say, however, that we shouldn't make a significant effort to preserve information. I would also think that having a significant amount of contextual information (which should come along for the ride while preserving information) should help our decendants comprehend the information we leave behind. However, if our current track record for preserving contextual information is maintained, the outlook is not good for our decendants understanding our information in two or three centuries (assuming the information survives).

    Well, that's my 93.2 cents worth on the subject.


    --

    If it works in theory, try something else in practice.
  43. Why TeX is better than PostScript by tilly · · Score: 2
    I have several times had discussions with people who wondered why TeX continues to exist when PostScript is more universally available and gives a sufficiently good resolution for practically any purpose. These people generally fail to appreciate the following:
    1. PostScript is not guaranteed to give the same output from printer to printer, let alone over a period of decades.
    2. PostScript cannot easily be altered for different output formats. (eg The author and a publisher may wish to use different sizes of paper, or a different bibliography style.)
    3. Extracting content from PostScript is a very non-trivial process. TeX is simple ASCII.
    4. PostScript is insecure. PostScript is a full programming language, and the equivalent of the root password is rarely changed snd the default value is generally known. (IIRC it is "000000". I may have the number of 0's wrong.)
    5. Try editing a PostScript document (say to insert a correction). I dare you. :-)


    So if you need to store formatted documents for archival purposes in a system where you may later need to output the documents in a different form, you should look at TeX...

    Cheers,
    Ben
    --
    My usual seat in the cluetrain is at A HREF="http://pub4.ezboard.com/biwethey.ht
  44. quick answer by Anonymous Coward · · Score: 0

    Um, not to be glib or anything, but there are lots of answers all over: http://www.britannica.com/bcom/eb/article/6/0,5716 ,109286+3,00.html

  45. Interesting timing on this article by Mindwarp · · Score: 2

    A couple of days ago the BBC reported in an article entitled Old computers lose history record how archaeological records are being lost due to exactly the issues raised in this story. The story reports that "[ironically, the] archaeological information held in magnetic format is decaying faster than it ever did in the ground".

    So, it looks like we're going to have to start transferring all those old ZX81 game tapes (Timex 2000 for our U.S. cousins) to CD-ROM then. That should be good for another 25 years of '3D Monster Maze' :-)

    --

    --
    The gift of death metal does not smile on the good looking.
  46. Re:So just what *is* the life of a CDR? My results by Anonymous Coward · · Score: 0

    Ya ever put one in the nukelator for a few seconds? Cooles damn coasters I ever made...

  47. Re:Old issue (at least in real life) by djfiander · · Score: 1

    Data preservation is not a new problem, it's one that traditional librarians and archivists have been dealing with for the entire 100 years of modern librarianship, and certainly for much longer than that in less academic ways. Can you say acidic paper? How about the restorations of the Mona Lisa and the ceiling of the Sistine Chapel?

    It's not at all surprising, to me at least, that this paper was written by somebody at what was once the UMich school of library science, until they discovered that they could pump up their prestige and funding by by going dot-edu.

    - David

  48. Somewhat of a paradox by jabber · · Score: 3

    There is some belief that there is no reason to preserve information at all. Most of what is created is just tripe anyway, and we should be more focused on creating content than preserving it. There are two reasons why some sort of preservation is important. First of all, it is inefficient to recreate information that already exists. [Point 1] Human energy is better spent on building upon existing knowledge to create new wisdom. How much do we already spin our wheels as several people collect the same data? What more could we be doing if we spent the energy instead on new pursuits? [Point 2] Secondly, there is some data that is irreplacable.

    Point 1:
    With the amount of data that we produce, archiving it will take an increasing amount of time. How much new content is created daily? At best, we will plateu in a state where as much effort is required to archive content as is needed to create new content.

    With the emphasis placed squarely on non-duplication of effort, archiving becomes a secondary issue. Indexing, searching, sorting and categorizing of the archive becomes a first priority, since creative efforts should now check if they are redundant.

    If the bold statement is to be a guideline, than the idea of an archive is moot, since all new work depends on old work, and so tracks well with where the author feels human effor should go. Much like with biological evolution, new data is the fittest of the old data that was applicable to the new context. I suppose that the call for archives is little more than a suggestion that we need an organized and deliberate fossil record of how we got to where we will be at some point in the future.

    What is needed is an archive, yes, but an archive of what? Not of content, but of the essence of the content. The lessons learned, the conclusions drawn and the optimizations realized in the process of creating the content. The content is fleeting - though arguably of inherent value... Which brings us to...

    Point 2:
    Yes, some things are irreplacable. Who decides? Who defines what is art, what is fact, and what deserves eternal life?

    Some things are of immediate and significant value, but for an unknown duration. The value of other things can not be realized for a very long time, and so the alternative is to store everything. Further more, the value of certain data is totally subjective, and this begs the question of "who's in charge" of defining that 1% that is to be kept.

    On the small scale, this will lead to vanity. Any 'artist' will consider their work a masterpiece, and save it. (I have code I wrote in CS101, don't you?) Companies will store and archive all email, all financials, anything that can potentially be used to mine data or identify trends or fertalize litigation. People will pigeon-hole videos of their baby's first steps, though nobody outside themselves really cares - unless the child grows up to be the next Einstein, or Hitler.

    "Hitler" raises an interesting question on the larger scale. Who has the responsibility of deciding what 'big' facts to store? And isn't that the path to propaganda, history-making, and such things?

    And then, when the leadership changes, and the 'book burning' starts...

    To bring the concept down from the paranoid-sphere, let's recall the /. article about Nikola Tesla. His work is not well known to most, because it was not made prominent, and subsequently, not well archived. We know of him, and we can dig for more about him, but the credit goes where it may not necessarily belong.

    Same issue with Newton and Leibnitz. Leibnitz was the German Mathematician who beat Newton to the concepts of Calculus. Newton, a member of the Royal Academy of Sciences (or something to that effect) politicised HIS influence, and so was credited with all of the work - where his contribution was not complete.

    Some things are not outright lies, but oral histories get lost while written records persist.

    Who gets to choose what to write down?

    --

    -- What you do today will cost you a day of your life.
  49. Resiliant media by grappler · · Score: 2

    Perhaps a "resilient disk" standard ought to be created, for stuff you would really like to last. Perhaps a WORM (write once read many) optical disk, like a CDR, but made to be very resilient, perhaps lasting up to a thousand years.

    Perhaps they could even be made to work with existing CDROM drives and perhaps even existing CD writers. Then you just start selling a new kind of disk. Anyone that wants something to last, they put on those. If they want lots of space per penny, they can buy something else.

    --
    grappler

    --
    Vidi, Vici, Veni
  50. Memetic perspective by DezMo · · Score: 2

    I find it interesting to think about this from the perspective of the notion of memes. What has evolved from human consciousness is a rich ecosystem that generates and values an enormous diversity of information. Thinking about what will be preserved, and how, gives rise to an image of our several billion minds, aided by technology as simple (!) as spoken language or as complex as electro/magneto/optical storage, operating as a kind of primordial informatic soup.

    Out of this fecund brew maybe, just maybe, a carrier as successful as DNA will emerge, with the capability to preserve the "best" of the information. Maybe it already has, in plain old text, which will be decipherable for as long as the bits can be gotten at, and which then has the benefit of the redundancy of human languages for further decoding and understanding. Then we drop down to the question of how exactly the bits manage to survive, and it seems the only ultimate answer is some human has to care enough to refresh them. Or be clever enough to teach them to take care of themselves.

    It also seems clearly impossible that everything can be preserved, and also impossible that what is preserved will always be something to be proud of. Some extinctions, however tragic, are inevitable, and some, however richly deserved, never occur. It's part of the beauty (and maybe mercy) of conscious life that there are moments that will never appear again, can never be adequately captured for later replay. Being aware of that fact is what encourages us once in a while to put down the camcorder, shut off the microphones, maybe even try to still the stream of words in our heads, and just drink it in.

  51. A paranoid addition... by mhkohne · · Score: 4

    Disclaimer: I know I'm being a bit paranoid, but I think this should be brought up, at least for purposes of discussion. There is probably less to worry about here than in other places, but it still should, I think, be in the back of the mind of anyone trying to solve this problem.

    One thing I believe was missed in the original article is intentional change to the historical record. In addition to having to store old information, and worry about how we're going to get to it later, I think we need to pay at least half a though to intentional modification of the historical record.

    With paper and ink, it's rather time consuming and expensive to alter historical documents, even assuming you can get near them. With digital media, the situation may be different - it may become very simple to alter historical documents, especially if you're the guy who's in charge of copying them to the newest form of media.

    Aside from the obvious political reasons someone might want to do this (can you think of a fundamentalist movement of any sort that wouldn't modify old documents to read they way they would like, given the chance?), I can also see where money might come into play.

    For instance, suppose MassiveDrugCo, Inc. is introducing a new drug which prevents newly detected disease Y. Now, in order to sell a lot of this drug, you have show that Y harms enough people to worry about. Unfortuately, the historical record being used for retrospective studies doesn't show that. So, instead of going back to the drawing board and finding something else to cure, MassiveDrugCo instead feeds a modified copy of the historical data to unsuspecting independant researchers. These honest and unbribable researchers draw the conclusion desired by MassiveDrugCo - in spite of the reality of the situation.

    --
    A thousand pounds of wood moving at 300 feet per minute. Don't get in the way.
    1. Re:A paranoid addition... by ludovicus · · Score: 3

      This isn't anything new. I'll probably misspell all of the following names, but I think you'll get the gist of it.

      I believe it was King Tutankamun's father, Akenahten, who threw his world into a tizzy by rejecting the established religion and invented a new one that worshipped the sun. He went off a built a new city to go along with it too.

      Well, the bureacracy of the day didn't like this at all because it messed with their job security. And as soon as he was dead, they went around hacking his face off anywhere it appeared (of course we're talking about monuments, etc., made from stone) and I believe they went after any mention of him in text (hieroglyphs) too.

      And they almost got away with it and just about completely expunged his existence from their records. But they missed a few things and we've been able to piece together a little bit about him.

      So anyhow, there's my Discovery channel understanding of that little story. What it means in relation to this subject I'm not quite sure. I thought it was a good idea to point out that this is certainly not a new issue.

    2. Re:A paranoid addition... by RomulusNR · · Score: 1

      With paper and ink, it's rather time consuming and expensive to alter historical documents, even assuming you can get near them. With digital media, the situation may be different - it may become very simple to alter historical documents,

      But its also much easier for others to make copies of the digital formats and also generally easier to hide, if needed. And I'm not just talking about encryption, but also squirreling away on a disk with a copy of some obsolete tax program, or stored on a nondescript floppy in the old box o' disks.

      Look at DeCSS; companies, 'nonprofits' and governments are all trying to expunge it, and they're not doing a very good job.

      On the other hand, governments et al. have been rewriting history just fine for ages now. TV has been doing a pretty good job of that in modern years. Books are already obsolete to most people in terms of a way to receive information. And old books have it worse, because old books can't possibly be as reliable as new books. Or as the expert on TV just last night.

      even assuming you can get near them.

      If the people who want to rewrite the historical documents cant get near them, theres a high probability that those who would be affected by them (or changing them) can get near them (to read them), either. No one (well few) really tries to rewrite history by crossing out offensive paragraphs and pasting labels over them -- its quite easy to write new books dismissing the old ones.

      --
      Terrorists can attack freedom, but only Congress can destroy it.
  52. Wired article on this topic by vallee · · Score: 2
    Hi all,

    There is also a very well-written, very accessible article on this topic, titled "Saved", available at Wired magazine's archive. It was written by Steven Gulie, in 1998 and I distinctly remember reading it, thinking it had a profound impact on my thinking about this topic.

    Take a look. -Paul

    --
    The real Paul Vallee is slashdot userid 2192, and, what do you mean it's not cool to point out your low userid?
  53. Civilization Bootstrapping by hey! · · Score: 4

    I think you have to ask, what are you preserving information for?

    Are you trying to preserve episodes of the Simpsons so our relatively near term, technologically advanced descendants can watch them? Well, they're technologically more advanced and thus more clever than we; we just need to have suficiently stable media (micromachined gold plates would work nicely) and a either a simple minded encoding scheme or an easily readable description of the algorithm prepended. In the 22nd century, some bright Norwegian 16 year old armed with a yottaflop coputer will figure out how to read it if he cares enough.

    A bigger concern (in my opinion) is what happens when our civilization collapses. Historically, it is almost certain happen sooner or later. Rome lasted well over a thousand years; if you told a 1st century CE roman that there would ever be an end to the empire he'd think you were crazy. Yet our civilization is in many ways much more fragile because the information it is based on is in much more ephemeral form (both media and format).

    What we need is to devise a bootstrap procedure.

    (1)Reading primers in various languages.

    (2) Primers on basic technology: mathematics, simple mechanics, mining and elementary metallurgy.

    These should be in highly durable form, but the problem is that you don't want people making off with them for building materials. The problem with using gold plates is that you don't want people to have access to them until the information on them is more valuable than the substrate. Perhaps these first items could be carved onto stone pillars inconveniently large to move.

    Next, you need repositories on more advanced science and technology: chemical engineering, electronics and so forth. Perhaps you could rig a way to prevent savages from accessing these repositories; a mechanical puzzle perhaps, that requires a certain mathematical sophistication to solve. The most critical records could be kept in forms that could readily be read without mechanical assitance or with only simple mechanical assistance such as optical magnification (my local librarian likes micofilm, because she knows it will be readable for decades). Less critical things like old Simpsons episodes could be on very cryptic media that would require considerable technical finesse to read, but would be cheap to transfer to.

    Pretty much, as you go from the most basic and critical information to the least critical information, you go from the easiest to read and most expensive to produce per bit, to the hardest to read and most convenient to produce.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    1. Re:Civilization Bootstrapping by Once&FutureRocketman · · Score: 1

      Wow. I spent lunch thinking about this, and it is a really tough and really interesting problem. I would say there are really three questions that have to be addressed here:
      1) What to preserve.
      2) How to preserve it.
      3) How to control its release so that, by the time the re-emergent cultures read about things like genetic engineering and nuclear weapons, they have at least some chance of not killing themselves.

      Problem #1 becomes less of an issue as you address problem #3. Binary-encoded information on micromachined gold disks in caches scattered around the globe is a good way to preserve information on nuclear physics, biotech, cosmology, etc, as well as various data of interest to an archeologist (i.e. stuff that wouldn't be interesting to a young re-emergent civ but would be very interesting as they start to mature). Hide the caches, preferably underground. And make many, to allow for attrition and so that one nation (in the new world order) doesn't get all the goodies. Leaving clues as to their general location encoded as mathematical problems (possibly requiring values of certain fundemental constants, like /h) is a good way to at least make sure the retrieving civilization is technologically sophisticated enough to deal with the knowledge. Whether they are wise enough to do so is a whole 'nother problem, but we do what we can, eh? Don't reveal their exact location: make them look using sonar, etc. Again, a filter for technological sophistication.

      The problem of what and how to preserve are much harder when you are looking at the first tier of information -- the stuff you want them to find right away. Large stone blocks are a good idea (ala Greg Bear's Heigira). Carving the information into nickel or stainless steel (or better yet, Monel or Inconel) plates is also good, although then you have to make lots of copies, widely distributed, to get around the problem that some people will melt the things down for their metal. Still, I would suspect this would be less of a problem than you might expect, if the engravings are obvious and readable by eye: The reemergents will know that there was a technological civilization before them, and anything that might be from that civilization will likely be highly valued, especially if it contains information. Besides, if you use Monel/Inconel, by the time the new civilization figures out the technology to make forges hot enough to work them, they will know enough to want the data instead.

      As far as what to preserve, at the first tier, I would argue for mathematics and materials science/chemistry. Math is the limiting factor in scientific progress, and materials technology is the limiting factor on much technological progress. Whenever I read or think about rebuilding technology, where you need the tools to make the tools to make the tools, math and materials are always right at the top of their respective lists. And there is another advantage: written language may change, but both math and chemistry can be well represented by other symbol sets that can be tied directly to concrete reality. Math (esp. geometry) is universal. So is chemistry to a large extent -- once you get the idea of atoms and molecules across, a molecular model makes sense no matter what language you speak.

      --

      "Research is what I am doing when I don't know what I am doing." -- Wernher von Braun

    2. Re:Civilization Bootstrapping by jwhyche · · Score: 1

      I made a comment on something like this a few weeks ago. Here is that original ask slashdot. Just do a search for Deep Archiving

      --
      I read at +2. If your post doesn't reach that level I will not see or respond to it.
    3. Re:Civilization Bootstrapping by Animats · · Score: 2
      ... Primers on basic technology: mathematics, simple mechanics, mining and elementary metallurgy.

      Back in the 1950s and 1960s, the U.S. Office of Civil Defense actually did that. A library of information on how to make and do key practical and industrial operations was created, microfilmed, and thousands of copies placed in fallout shelters. This was beyond the usual survival-handbook stuff; more like "how to build or fix an oil refinery/power plant/water system/auto factory" information.

      If anyone knows where a copy of those microfilms still exist, please let me know. Thanks.

    4. Re:Civilization Bootstrapping by Anonymous Coward · · Score: 0

      As far as what to preserve, at the first tier, I would argue for mathematics and materials science/chemistry.s

      I would put agricultural science almost first on the list, since agricultural productivity underlies all advanced societies. Also I'd include enough history so they at least can see our mistakes. Maybe this should be the highest priority.


      1) What to preserve.
      2) How to preserve it.
      3) How to control its release


      As to what to preserve, I would make an attempt at preserving everything I could get my hands on. The real question is not so much what to preserve, what gets the highest priority. I wouldn't neglect literature, art, music and social sciences.

      To you list of questions I would also add -- (4) how to make it affordable for private interests to do? No government will ever undertake a project like this -- it has to be done by private individuals. This would be a tremendous project for some George Soros or Bill Gates type mega tycoon.

      Aside from potential spin off enterprises, I would think such a project would be appealing to the ego. If some future society ever needed the archives, the creator would become a towering, immortal figure, like Alexander or Isaac Newton.

      And there is another advantage: written language may change, but both math and chemistry can be well represented by other symbol sets that can be tied directly to concrete reality.

      Well, remember how the Rosetta stone opened up hieroglyphs. The greek on the stone was antique, yet despite a thousand years of barbarism people were able to read it and use it as a key to deciphering hieroglyphs (although it was far from easy).

    5. Re:Civilization Bootstrapping by hey! · · Score: 2

      Your ideas are interesting and I think sound; however I think the moon is a little too remote. After all, by the time the future civilisation reaches the moon, it will almost certainly already have weapons of mass destruction; in all probability by the time they found it they would be more advanced than we, and not very amenable to learning from our mistakes.

      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    6. Re:Civilization Bootstrapping by kristau · · Score: 1

      Returning to the ideal of Object Oriented Programming and data ecapsulation wouldn't be a bad idea in this case. Encapsulate the data with the algorithm that reads it.

      If the algorithm could be converted to a fairly generic, assembly-like format, a preprocessor could be developed that would read the algorithm and convert the instructions into whatever hardware/software platform is being used at the time.

      Preserve at least an algorithm capable of "viewing" the data with each copy of the data.

      later,
      kristau

    7. Re:Civilization Bootstrapping by ballestra · · Score: 1
      I know this is off-topic, but I've toyed with this idea ever since watching the time-travelling show "Voyagers" on TV as a kid. What would you do if you were suddenly transplanted to an earlier civilization (or I guess you could consider the destruction of our own civ) and everything you know about science and technology will be lost if you don't A) write it down and/or B) teach it to someone. Do you know enough about electricity or metalurgy or math to pass it on, or would it be lost for hundred of years, like the Dark Ages.

      Just food for thought. It always made me want to reach a deeper understanding of how things work.

  54. Value degradation by Nyarly · · Score: 2
    This may be ill-considered, but it seems to me that data's value diminishes with time far faster than it's quality.

    Sure, poems and photo's for the grandkids. That's a hundred years, tops, and migration, translation and CDR covers it, fairly easily. As far as showing pictures to people who will have only vaguely heard of me? Or preserving the IRS tax code for four thousand years? Somewhere I'm sure is codified the idea that data is useless without context. If not, there it is, Nyarly's First Thought on information theory. I'm sure it is though...

    But me noodlings with fiction, my code, my photos and graphics won't be any more useful without the cultural context they were created for than an arbitrary collection of 16 bits without a description. Is that a Float or a Fixed? Is that English or Spanish?

    And if a modern creator does produce something of Eternal Meaning, there's precedent for it's propigation by those it has meaning for. Think of the Bible, or the Collected Works of Shakespeare. These continue to exist not because they were recorded perfectly on a perfect medium, but because people found them worthwhile enough to continue them.

    What good would a perfect storage method be, anyway? If people forget it, or if they cease to care, a record could be painted in Liquid Unobtainium on God's backside, and it would be just as lost as if someone had scratched it in sand. Or on the base of a bronze statue. "Look on my works, ye mighty..."

    Paper rots, stone erodes, metal corrodes. The only eternal medium is word of mouth. Anything else is just a memory aid.

    --
    IP is just rude.
    Is there any torture so subl
  55. But books can't survive the revisionist burners! by Anonymous Coward · · Score: 0

    Noooooo one expects the Spanish Inquisition! Oh wait, the chruch tried to burn that event out of the history books. Stone tablets last the longest of all. They don't burn, don't rot quickly (even when buried in the wet underground).

  56. Monkey by Louziffer · · Score: 1
    This post was selected for a monkey moderation.

    Due to this post, a monkey was strapped to the back of a motorcycle, which was then sent at great speed toward a freight train. We had meant for the motorcycle to jump over the train; However, our technicians forgot to set up the ramp.

    There was a tremendous impact as the motorcycle slammed into the train. Unfortunately, the monkey did not survive this encounter.

    LouZiffer

    --

    LouZiffer

  57. "Proprietary Formats" are still a problem by Teancum · · Score: 3

    I've been working with the Linux Video group where we've been trying to make an open source player for DVD discs. The ONLY problem that we're fighting right now is not the know-how to get it done, but rather trying to obtain the file format documents for DVD-Video and being able to use them legally. Indeed, the recent deCSS program is another really good example of how file format specifications can be illegal to implement, even if you have obtained the specifications legally.

    The way that the DVD Fourm (formerly known as the DVD Consortium, with oversees the DVDCCA... this is the group of companies that cross-license each other's patents and shares information regarding DVD development) currenly requires you to sign a non-disclosure agreement (NDA) to obtain the specifications, and that NDA also prohibits you from even discussing the specifications with anybody unless they have also signed the same NDA. Since this is covered under the trade secret laws, this particular bit of intellectual property is theirs theoretically forever. At least until you can hire a bunch of lawyers to demonstrate that a DVD is no longer a trade secret.

    I've also set up a seperate mailing list from the main Linux Video group that is in the process of developing an Open Video Disc specification which is trying to allow people to develop products without having to pay royalties or deal with patent infringments. Fees for most of the current video formats range from over $10,000 (for the DVD specs.... license fees are on top of that) to the MPEG Licensing Authority who is being quite reasonable for most close-source projects, but if you read the details of what you must do to license a product, is contrary to the nature of most open-source projects. It is still possible to write a GPL'ed MPEG player, but it would only be free as in speech and not free as in beer. In fact, you would probabally have to charge somebody to download the software. Shareware MPEG players are probabally skating on some very thin ice legally, and certainly part of the registration costs would have to go to the MPEGLA.

    One of the things that is so nice about HTML is the fact that this standard is open, patent and royalty free. If CERN had tried to put a patent on HTML I doubt that the web would have developed nearly so quickly. Or rather imagine if Apple's hypercard system had been developed with the GPL and file formats were made open for anybody on any platform to use.

    One of the things that I believe is killing the Unicode character encoding is that all kinds of intellectual property restrictions are placed on it, and you need to pay royalties to develop much software that uses it. Again, think what would have happened with ASCII had it been kept closed up, and why EBDIC isn't being used for character encoding.

    More importantly, open and free specifications are critical to data preservation, and a point that really hasn't been brought up by Calc (the author of the original post on /.)

  58. NASA problem too by peter303 · · Score: 1

    Many of the earlyist mission datasets from the 60s and 70s are unrecoverable due to media degradation and format incompatibility.

    1. Re:NASA problem too by Mr.+Slippery · · Score: 4
      Many of the earlyist mission datasets from the 60s and 70s are unrecoverable due to media degradation and format incompatibility.
      ...including, IIRC, a bunch of old Landsat data. "So what?" I hear you ask."If the data were important, it would have been accessed more often and ended up being transcribed and preserved."

      Problem is, it's entirely possible for us to not understand the importance of a data collection for years. That old Landsat data would be a great baseline for information about global climate change.

      --
      Tom Swiss | the infamous tms | my blog
      You cannot wash away blood with blood
  59. It's the first time we've tried to store data by scotpurl · · Score: 1

    We're experiencing this problem because it's the first time we've really tried to store information for long periods of time, /and cared that we got a verbatim copy./

    10,000 years past? Word of mouth. Stories handed down from one generation to the other. Want a copy? Listen and remember. Copy quality? As good as your memory. Portability? As far as word travels.

    5,000 years past? Stone tablets, paintings, and the like. Want a copy? Make it yourself. Copy quality? As good as your talent. Portability? Can it be carried?

    1,000 years past? Paper, but acid-free by accident, and not design. Want a copy? Hire a scribe, or us a printing press. Copy quality? As good as your proofreader. Portability? As far as the traders can sell.

    Now? Binary format on varying media. Want a copy? Needs some special hardware. Copy quality? Perfect. Portability? Speed o' light, anywhere, anytime.

  60. Black Hole Applications Software by Detritus · · Score: 2
    One problem that I've noticed with certain programs is that they let you import data from old or competing file formats but they do not let you export the data to other file formats. What happens when the program and/or computer becomes obsolete?

    One of the email programs that I use stores everything in a database file. Short of saving messages to files, one at a time, there is no way to extract the messages from the database.

    --
    Mea navis aericumbens anguillis abundat
    1. Re:Black Hole Applications Software by Mr.+Slippery · · Score: 2
      One of the email programs that I use stores everything in a database file. Short of saving messages to files, one at a time, there is no way to extract the messages from the database.
      Proprietary data formats are evil. Not only do they introduce the threat of software obsolecence, they prevent you from working with the information with any tools other than the creating software.

      I use MH to handle my mail. I can use all the standard MH tools as well as nice front ends like exmh, and they give a reasonable amount of power; but better yet is that, since MH stores one message per file in a plain format, I can use find, grep, perl, emacs, and all our other friends to manage my messages. I write my documents and correspondance in HTML and/or LaTeX (often via LyX) for the same reason. If it's supposed to have written-language content and I can't grep it, it sucks.

      --
      Tom Swiss | the infamous tms | my blog
      You cannot wash away blood with blood
  61. Bit density by Tau+Zero · · Score: 1
    I would argue for the historically tested method of storing data: take a chisel and carve it into rock.
    So when people talk about having mountains of archived data, you interpret this literally?
    --
    --
    Time is Nature's way of keeping everything from happening at once... the bitch.
  62. COPY, COPY, COPY by peter303 · · Score: 1

    We lack the originals of most historical documents,
    by the important ones have been preserved by
    constant copying.
    On 'Internet Time' a document may last for months
    and the speed of copyinging is seconds.
    This compares to centuries in historical time.

  63. Reducing data for archiving purposes by tjwhaynes · · Score: 2

    One of the big problems with storing data is the sheer size of it. In astronomy, almost all data collected by telescopes, be they radio, optical or otherwise, goes through a stage known as 'Reduction'. I've put this in quotes, mainly because it doesn't necessarily reduce the size of it. In essence, Reduction is about obtaining the most important or most complete information out of the data and discarding or minimizing the redundant, the useless and the misleading out of the data so that future analysis can be carried out on the important stuff without having to wade through all the noise. For instance, 70 or 80 images of one optical observation in various wavelength bands will be collapsed into three to five optimal images, one for each band. In Radio Astronomy, collating 60 - 80 12hr observations into one file removes all the 'bad' data and is optimal for future reuse.

    To effectively make a useful archive requires some filtering of what goes into the archive. Nowadays I work for IBM on DB2 UDB, and the roadmaps suggest that the size of databases is growing exponentially - fortunately this is balanced by a proportional growth in both processor power and storage space and access speed. So while we have terabyte databases today, we could easily be looking at petabyte databases in a few years. These databases will probably hold a vast amount of digitized analogue information - memos, diagrams, papers - which currently is stored in more convential storage. The advantages of moving to a fully digital archive are great - searching and retrieval are faster, and the space saved by putting scans of 20 boxes of papers onto a hard drive or other storage are also great. However, there is a danger with archives growing out of control - if you initiate a search which will visit every part of a petabyte database, you are going to have to wait for it to finish, even with the best search algorithms and vastly faster hardware. Making sure that information is not multiply duplicated in the database, or that redundant data is not added without regard to the database retrieval performance is extremely important. If we set up a project to 'mirror the web' for archival purposes, we'll be hamstringing ourselves right at the start - most data is not needed for future reference. By applying methods to distill the important information, archives can be updated, maintained and searched without exhausting the available resources.

    Cheers,

    Toby Haynes

    --
    Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
  64. The Internet Archive by Pseudonymus+Bosch · · Score: 2

    The Internet Archive is devoted to preserve the information contained in the Internet.
    And I have just found an article from Steve Baldwin, the guy from Ghost Sites!
    --

    --
    __
    Men with no respect for life must never be allowed to control the ultimate instruments of death.
    GW Bu
  65. Upgrade fever speeds the process by Junks+Jerzey · · Score: 2

    Part of the solution is to avoid knee-jerk changes in format. For example, the Word file format gets changed every few years, but to what end? ASCII may eventually go out of date (as did EBCDIC), but at least a text file tends to be more future proof than a proprietary binary format. In terms of the web, there's already been a lot of nonsense caused by some people using Flash and other people using style sheets and other people using Microsoft or Netscape extensions to HTML. Is it really worth it? Or would it be better to stick to the least common denominator of pure HTML? I say yes, but apparently a significant number of web page creators disagree.

  66. picky, picky, picky by unitron · · Score: 1

    Digital information can probably be preserved in the same manner as any other. I believe the problem here is preserving information (of many types) that's been recorded digitally.

    --

    I see even classic Slashdot is now pretty much unusable on dial up anymore.

  67. as in archival paper... by fantomas · · Score: 1

    It would be interesting to see who's doing research into this. Certainly much research has been carried out into the paper equivalence and as a result it's possible to get books printed onto archive quality paper 'guaranteed' (well, you stand a better chance anyway) of surviving for 500 years.

    The archival paper and inks have been created to be as chemically neutral as possible - the great problem with paper is chemical reactions of consistuent parts and outside influences (light, heat, humidity, ink) gradually breaking down the material itself. In the British Library (UK national copyright repository) a vast number of books printed in the C19th are beyond salvation, crumbling away, early mechanically made paper is often very unstable.

    I am sure that research into this must be beng carried out in the same way for CDs etc. Any references anyone?

  68. A modest proposal by vlax · · Score: 3

    Archiving is important. I'm actually surprised at the number of /.'ers who just want to let the data die. I remember taking a tour of the Magninot Line in France. Having proven useless as a military outpost, the entire chain of caverns was converted to document storage decades ago. In a thousand years, archeologists will be able to substantially reconstruct live in twentieth century France. Information about births, deaths and marriages need never be lost. Detailed census reports can be preserved so historians can make new theories about the social behaviour of man. I think this is a fairly important task. Imagine how much easier it would be to reconstruct human history if past civilisation hadn't kept shoddy records.

    I suspect the problem of file formats is less serious than people make it out to be. A well-documented format should be reconstructable indefinitely. Few software companies don't document their file formats. Even without documentation, it ought to be easier than reconstructing dead languages. We learned to read Egyptian hieroglyphs primarily from one attested translation and a lot of careful deduction. Given a thousand Word 6 documents, I think a good computer archeologist ought to be able to construct a program to open and edit them.

    Museums of old hardware, and perhaps some sort of custom computer factor to make ancient hardware strikes me as a good idea. It could be like blacksmiths at SCA festivals, "Ye Olde ASIC Mill." :^) I doubt it would ever be profitable, but museums, even working ones, rarely are. Although who knows? A Commodore 64 could be an objet d'art in a hundred years, just as ugly African masks are now.

    The real problem strikes as the one most heavily emphasised in the article: decaying media. I suspect the best solution with presently forseeable technology would be to preserve data in crystalised DNA. Even in nature, DNA takes centuries to decay, and if it were crystalised and kept somewhere cool and dry, it would likely last for millenia. Encoding a document onto a billion strands of DNA weighs basically nothing and it would be a very highly redundant storage system.

    It isn't easy to do right now, but I suspect that technology is right around the corner and probably only requires a little bit of research money to become practical.

  69. Sometimes the 99% is destructive by Tau+Zero · · Score: 2
    I should think that a 99% destruction rate is awful! Kind of defeats the purpose of an "archive" doesn't it?
    If I didn't work so hard to keep down the amount of repetitive joke-list and similar traffic I get, I could easily trash 75-90% of the total volume of the e-mail I get with no loss of information. The real problem is when there is a strong disincentive to keeping data around. For instance, corporate memos. Our hyperlitigious society is turning "documentation" into land mines for companies and individuals alike, so there is a very strong incentive to dispose of everything which is not absolutely necessary to keep. If it doesn't exist, no attorney for the plaintiff bar can subpoena it and use it to sink you under a huge verdict.

    Worse yet, the labor involved with separating out the 1% of stuff that ought to be kept is going to mean a non-zero error rate; people will toss things that are still of value just because they have no time to examine them in detail. What are you going to do....
    --

    --
    Time is Nature's way of keeping everything from happening at once... the bitch.
  70. ... standing on the shoulders of giants by pjwhite · · Score: 0
    Let's assume that we want our descendants a thousand years hence to enjoy a nice game of Quake. What will they need in order to play the game? I'll assume that all existing technology will be long gone, and will need to be recreated.

    Let's see, they'll need:
    1. The software and data files that make up the game. This is easy enough, just some numbers.

    2. A computer, a display, and input devices to make the game work. This will be a little tougher. We'll need to tell them how to build it.
      1. How to build a computer.
      2. CPU, RAM and other integrated circuits

      3. Grow some pure silicon crystals and cut them into thin wafers.
      4. Construct masks for the various etching and doping processes.
      5. Build a clean room and wafer processing equipment.
      6. Perform the chemistry required to process a raw silicon wafer into a large scale integrated circuit.
      7. Package and mount the ICs on a...

    3. Printed circuit board.
      1. Make from glass fibers and epoxy, copper sheets.
      2. Use photographic etch resist process to etch away copper.
      3. Solder components with tin/lead solder

    4. Video, sound, networking, input/output logic, interrupt controller, timers, etc. More ICs. Resistors, capacitors, wire, connectors, switches -- all must be constructed.
    5. Mass storage. This could be a hard disk, or more RAM or EPROM
    6. Power supply. Generate electricity to make the whold thing work.



    I think you get the idea. Today's computer technology is based on a huge amount of knowledge and experience. If we can somehow record all this knowledge in books or other long lasting human readable media, our descendants may be able to recreate what we have today. Maybe in as little as fifty years.

  71. What's wrong with Reverse Engineering? by xant · · Score: 1
    The state of the law at this moment notwithstanding (I refer to the legal actions against the deCSS author, about which I will say nothing more lest I fill this space with a rant), there is nothing wrong with using reverse engineering to fix his third problem, the stone-in-the-shoe, software obsolescence.

    With the current state of technology, data can now outlast Copyrights. If there is anything legally wrong with reverse-engineering a piece of data to extract the human-readable meaning, it won't apply to 20-year-old data since it will only be under copyright if the owner of the intellectual property still values it--in other words, still supports it. Otherwise, the copyright will have lapsed by the time you need to extract that data.

    Data formats are created by humans, have structure, and therefore can be dismantled and examined for their contents. The original programs had to do it, why not trust that future generations, who will be competent humans, will also be able to do it. I postulate a law: Xant's law: software containing meaningful data can always be reverse engineered when the need is great enough. BTW, NOT EVEN ENCRYPTED DATA is excepted from this rule. Today's high-encryption standards are tomorrow's trivial joke. According to Moore's law, which appears in recent years to be accelerating, encryption that today would take a million years for a computer to crack could, 30 years from today, be cracked in a single year; and a project like distributed.net could probably do it in a matter of hours--that's assuming Moore's law does not accelerate. Projects like quantum computing raise the possibility that there is no limit to the speed of our electronic brains, nor to their rate of acceleration. A few hours is not such a high cost to decrypt the sort of data that we might truly find valuable 30 years from now, whatever it might be.

    And 30 years from now, who's to say AI won't be good enough to break down any of the "trivial" data formats we have today into human-readable forms. I can theorize a generalized software algorithm for standardizing data formats.

    Hardware obsolescence is a whole 'nother ball of wax, but who's to say in 60 years we won't have a generalized algorithm for pulling data off of hardware?

    --
    It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
  72. Gutenberg Factoid by Savage+Henry+Matisse · · Score: 1

    This is pretty much apropo of nuthin' (sorry), but it thought might interests folks: the oldest etext availible through Proj Gut is a version of Milton's Paradise Lost. It was originally converted to ASCII in 1964 or 65, and had to be input using IBM punch cards-- something like 100,000 of them. Ah, to be young again, manually punching bits out of cardboard with a sharp stick.

    --
    Much Love,
    "S"HM
    *****
    (I refuse to spellcheck out of contempt for your belief system)
    1. Re:Gutenberg Factoid by Anonymous Coward · · Score: 0
      You wimp.

      We did not have sharp sticks, and had to file our front teeth down to the shape of those little rectangular holes.

  73. A contrarian view by Tau+Zero · · Score: 1
    At the moment, Moore's law is the only thing that stops this problem becoming really acute.
    Moore's law pertains to computing speed and not storage density, but that's not the point I wanted to make. I'd argue that the progress being made under the "laws" of Moore and others has created the problem. For thousands of years, knowledge was recorded in books. Almost everyone had the necessary equipment (eyes) to read a book. Today, the equipment to read media becomes obsolete within years. We can still read books hundreds of years old, but 1985's data tapes are being lost to us. There's a lesson here.

    If technology ran into its physical limits, progress would become incremental rather than exponential. This would require that the space devoted to data storage increase more or less linearly with the amount of data, but it would have the side-effect of eliminating the need to change storage technologies. If 9-track tape hadn't become a hopelessly obsolete format due to its bulk, we'd have no problems reading those 1985 tapes (assuming the oxide or binder hadn't decayed or fused, but that's another issue). Sometimes progress, by creating a gulf between the present and the past, cuts us off from our own history.

    I wonder if I could still find a PET emulator and a copy of TOKER someplace... that would be fun to put out and let people play at a party.
    --

    --
    Time is Nature's way of keeping everything from happening at once... the bitch.
    1. Re:A contrarian view by stevelinton · · Score: 2

      You have a fair point. The 1985 media could have been copied in 1995 onto newer media a fraction the size. If this was not done then a problem arises. It's not insuperable, unless the media decay, but it can be expensive. We can always hand-build a tape drive, or read the magnetic field directly off the tape with an electron microscope, but it costs.

      Books are not as good as you think though. People needed to (a) be able to read and (b) speak the relevant language, for archives of old books to be of any use. Neither is completely automatic. Also, to make real use of old books, the readers would need a fair amount of cultural context for them, and that is positively expensive to acquire.

    2. Re:A contrarian view by Tau+Zero · · Score: 2

      Ah, but language and cultural context are "software" issues. I can look at Hebrew or Arabic text and copy it reasonably well even though I can't understand what it says. This would allow me to preserve it even if I couldn't make use of the content myself.
      --

      --
      Time is Nature's way of keeping everything from happening at once... the bitch.
  74. Bono's Criminal Backers Like Copyrights by Anonymous Coward · · Score: 0

    Sonny Bono was the congressional shill for the Scientology organization, an international criminal enterprise masquerading as a "church". They sell L. Ron Hubbards mad dribblings and pulp sci-fi as very expensive self-"help" techniques, bleeding people of their money, self-determinism, and self-respect.

    They use copyright law (and trade secret law! for a 501-c3???) to obstruct efforts to expose their "courses" for the pseudo-scientific mindfscks they are, and in the process they create huge damage to free speech and civil rights precedents.

    Visit xenu.net for more info.

  75. simplify by whatever3 · · Score: 1

    I think the answer is, for the most part, simple: put data on (acid-free) paper. We have empirical evidence that information can be stored on this media for several thousand years, courtesy of our Egyption ancestors, with human eyes being the only hardware needed (plus maybe a magnifying glass), and the human brain the only software (plus maybe a Rosetta stone).
    Anything important can be printed, whether it's scientifc results, historical information, statistics, or even just pictures.

    Instead of inventing more complicated machines to store information, we should be looking at the most simple ones we already have.

  76. But what is the solution? by -tji · · Score: 2
    He talks about making upgrades of all documents standard procedure. The example he gives is upgrading all documents from Word 95 to Word 97.

    But, isn't this missing the point?

    The problem exists because products like Word build in incompatibilities to force consumers to always purchase the newest product. We don't have to accept this.

    The solution is to promote open document standards for everything. This should be part of the decision process when organizations are choosing applications.

    Hopefully, in the near future, we will be able to choose an office suite that stores everything in XML format, and uses open object types like PNG or JPG images.

    Also, exporting images to a format like PDF or PostScript would solve a lot of problems. Open Source applications exist for both of these formats, ensuring that you are not at the mercy of the application vendor.

  77. An answer by Anonymous Coward · · Score: 0

    Film negatives.

    Film negatives are already an effective storage method. They can be quite compact (depending on film grain size) and they have a much longer shelflife when compared to digital storage media.

  78. Is this relevant: Deep Time By Gregory Benford???? by Anonymous Coward · · Score: 1


    Deep Time : How Humanity Communicates Across Millennia
    by Gregory Benford

    From Library Journal
    Professor and distinguished sf writer Benford (physics, Univ. of California, Irvine; Foundation's Fear, LJ
    3/15/97) adds another reflective title to his large and rapidly expanding oeuvre. Hearty and compelling, his new
    book elucidates some of the inherent problems humanity faces in communicating over the expanse of time.
    How will the hazards of, say, stored nuclear waste be communicated effectively to future generations? The
    prospect of leaving long-lasting, or "deep-time," messages is perplexing. This slim book addresses
    environmental issues in order to change how we think about the human impact on Earth; the goal is to make
    us good stewards. In the section "Digital Immortality," Benford writes one of the finest brief explanations of
    the limits associated with document preservation in a digital age. Much of the overall analysis seems
    somewhat anecdotal, but given the speculative nature of the subject, this sort of approach may serve as well as
    any other. Recommended for all public and academic libraries.--Dayne Sherman, Hammond, LA

  79. Problems with modern paper by Anonymous Coward · · Score: 1

    Actually, ancient papyrus pieces are more likely to stick around than many modern paper examples. The acidity of most modern papers tends to make them much more fragile than older papers. This has been quite a problem for the Library of Congress.

    1. Re:Problems with modern paper by Anonymous Coward · · Score: 2
      Papyrus only survives under very special circumstances (notably the arid conditions of Middle Eastern deserts). It rots as fast as anything else organic elsewhere in the ancient Mediterranean world and although we know it was used in the north-western provinces of the Roman Empire, all we have in Britain (for example) to prove its presence here are a tiny impression on a pre-Roman coin and a mineral-preserved fragment in a box of rusty Roman armour from near Hadrian's Wall.

      So much for papyrus; the Romans generated vast amounts of data and wrote a lot of it in ink on wooden tablets... a fact we only realised comparatively recently because so few survived. So much for wood...

      If you want a really durable medium for conveying information through the ages (and you have to exclude stone a) because it is much prized for re-use and b) because a lot of it, particularly the soft sandstones the Romans often used for monumental inscriptions, weathers rapidly) you need ceramics. Pots are vulnerable, but potsherds virtually indestructible. Ostraka (graffiti on sherds) survive now in as good condition as the day they were scratched, not something you can say of ancient papyrus, vellum, wood etc etc.

      So, start working out a way to put data onto dinner plates, and you have the perfect storage medium... sort of...

  80. Spinning rings by Anonymous Coward · · Score: 0

    They work for the Eloi, so they're good enough for me.

    As long as the Morlocks haven't bred out the hand/eye coordination necessary to keep spinning them, we should be just fine.

  81. Very relevant project by Sanity · · Score: 2
    The Freenet Project is extremely relevant to this - if it isn't mentioned in some of the links given, it definitely should be.

    --

  82. Sexy Data by Anonymous Coward · · Score: 1

    Nature has created the ultimate way of storeing digital data, DNA, for a long time. Its called replication. The SEX OF DATA. It is the _only_ way to keep data around a long time. Otherwise natural disasters, war, mistakes, etc.. will contrive to slowly remove it from the pool.

    How do you ensure replication? The same way nature does -- promiscuity -- your data has to be online and available and alive. Data stored on a CD-ROM in a drawer is dead and will disapear simply because of obscurity. How many "old" documents were thrown away in the year 1 AD because they were considered not relevant or important to its contemporaris. Almost all of them. How important are they now -- very much.

    Storeing data is not about technology -- it is about people.

    The best solution I have seen was an open-source project to turn everyones spare hard drive into a giant ditributed raid of some kind. Never heard what happened to it.

  83. The web embodies decay ... by shango+dee · · Score: 2

    Strangely enough, this is something I've been dwelling on a lot frequently. Everyone praising the web based news publications (katz! bah) and online magazines seem to always overlook the fact that once the issue is gone off the web, it's usually gone forever. And if not now, where will it be in 40 years? I can still go down into my basement, look through my families huge collection of periodicals, and find issues / articles from decades ago. With the quick pace of the web, such an act doesn't seem like it's going to be feasible.

    I don't know about anyone else, but there's something disturbing about this fact. Even the first few sites I've done back in the early / mid 90's have been lost forever, and while they were fairly insignificant, it's not an uncommon occurance for information to be lost.

    --
    --[shangodee]
  84. Strategy by i · · Score: 1

    In preserving data for periods of 1000 year and more you can't rely on:

    * Keeping hardware to read the data.
    * Converting data to another media at regular intervals.
    * Any assumption of best/most common format of the data.

    You must presume that:

    * The (eventual) future civilizations can manufacture the neccessary hardware.
    * And that they will have a mental ability to reconstruct the data format and software. (An example: the Egyptian hieroglyphs.)

    This means that:

    * The media should rely on minimal "mechanical" requirements (rather a CD-player than a card reader... :-))
    * They should have a description on the physical containment (of the media).
    * The media should preserve the data in at least 10000 years or more.

    The real problem is the last above, of course.
    The solution is probably something in the lines if a CD (physical "marks") rather than something like a magnetic tape (magnetic/electronical charges).

    Thomas Berg

    --
    Mundus Vult Decipi
  85. the final answer! by jlb · · Score: 1
    Nanotechnology.

    Nanobots with all the information, self replicating and self maintaining.

    Nanotechnology is the answer to everything. :)

  86. I propose... by Squeeze+Truck · · Score: 1
    I propose we carve the GPL and the Linux 0.01 Kernel source on a giant stone tablet in 3 languages for future civilizations to find.


    And throw in statues of Stallman and Torvalds looking visionary or something.

    --

    "Reactionaries must be deprived of the right to voice their opinions; only the people have that right." - Mao

  87. Ancient Egyptians by Tech · · Score: 1

    This is pure speculation on my part. Bear with me.

    There have been various discussions in things like National Geographic and TV documentaries about the ancient Egyptians. Specifically how they managed to build the pyramids and so on. I vaguely remember one discussion where they suggested that the Egyptians were quite advanced technologically, and they had tools and skill that were subsequently lost over the thousands of years.

    Here's where the speculation comes in. Could it be possible that the ancient Egyptians had technology on a similar scale to what we have today, and the reason we don't realise it is that thousands of years ago they suffered the same problems that we have now, not being able to store information suitably.

    What if 5000 years from now, all our carefully archived information will also have have been lost due to the issues raised in the subject article. It could happen that the civilisation that exists in AD7000 knows about as much about us, as we do about the ancient Egyptians.

    (Not intended to a discussion about Egyptians specifically. I'm curious to know what would result if digital storage really did become a significant problem.)

  88. The Clock of the Long Now by grumling · · Score: 1
    This is not the first time I've heard this subject brought up. Stewart Brand and bunch of other techno-hippies are building a clock in the desert. One thing they want to include is a library. They are using a laser-etching method to store pages of information.

    --
    "Well, good luck finding a judge that doesn't run a bestiality site."
  89. What about ... by N3mo · · Score: 1

    What about a project where originators could submit there new formats, and then open source developers could incorperate it into a program used for viewing all (similar) formats?

  90. I have the answer by bcilfone · · Score: 1

    I will get the mountain, someone else get the chisels.

    Just remember to keep your 1 's straight and your 0 's round.

    Jesus may love you, but I think you're garbage wrapped in skin.

    1. Re:I have the answer by Anonymous Coward · · Score: 0

      nah.. the Appalachians have allready been torn down and rebuilt 4 times. Rockies are only on the first time around.

  91. Digital media is analog underneath by dialect · · Score: 2

    Reading over many of the comments on decay of digital media it occurs to me that many people are missing the point that digital data is really analog when you get right down to the fundamental formatting. (until we're storing data in quantam media that is...)

    Even if a standard CD players can't play a degraded CD, if someone wants the data bad enough, they'll build an error correcting CD player that will reconstruct bits that a normal player can't read. Just like archeologists reconstruct paper or heiroglyphs or fossils today, future archeologists will no doubt reconstruct CD's and hard drives.

    Even today, data recovery specialists can read off multiple generations of files. Maybe archeologists will have optical readers which will read the CD/magnetic surface at many times their original resolution / sensitivity and reconstruct the data. Of course it would be nice for us to leave them some equivalent of the rosetta stone so they can decipher the various formats. But overall, I think today's digital media will be far more recoverable than people might think.

    Just a thought.
    -dialect

  92. Use the Moon by Anonymous Coward · · Score: 0

    To put the more advanced data (like how to make that super-duper AI supercomputer which will end up destroying the entire human race) you could bury it on the moon -- or maybe let it floating on the space somewhere.

  93. Studying Real Analog Backup by Garund · · Score: 1
    I deal with artwork which I'd like to see preserved for a long time to come.

    One thing I'd like to do is make analog back-ups in case digital preservation methods fail for whatever reason. (Like the fall of human civilization into another dark age, perhaps? -Where we no longer have any sockets to plug our computers into. . .) -Paranoid, sure, but hey. If you prepare against the worst case scenario, then you'll likely do alright against everything else.

    I've found a good paper manufacturer, (no-acid, non-bleached cotton fiber paper capable of lasting hundreds and hundreds of years), but I've been waiting for a decent print technology to come along which can output from a computer at a consumer level cost, (offset, litho and web presses, while capable of the task are WAY too expensive for one-offs), and which offers a high enough dot resolution and which uses a highly stable ink. -And which can print at about 11"x17".

    An Imagesetter might be the best option, and silver halides on photographic paper can be quite stable, but I don't know about the paper substrate itself. . . And either way, most of the earlier artwork was done entirely in 'Analog', but done on deteriorating paper, but I'm having trouble even scanning black & white dot screens, (newspaper grey tones), without disgusting interference patterns popping up.

    Plus I've NEVER seen any computer peripheral company even mention the long-term stability of their printer inks.

    Anybody know anything which might be of use?

  94. Speaking of books: Deep Time by Anonymous Coward · · Score: 0

    Deep Time : How Humanity Communicates Across Millennia

    By none other than physisist and Sci Fi author Gregory Benford.

    The book is non-fiction and takes a serious look at how to convey information across 1000s of years....

  95. Re:How long can it last? (language change) by SPK · · Score: 1

    Actually ... 2 points. 1) languages in isolation do not change much at all. Examples include: small groups in Switzerland and one of the best examples: Iceland (Icelandic is 'practically' Old Norse) 2) languages in contact regularly change. In fact, the 'data' you presented supports this: the areas you mentioned aren't isolated (unless you mean far away from Europe or North America). Within a small area there are many languages; through contact and a need to differentiate themselves they are quite different. If you want more data and more articles on this process, I would be more than willing to provide the linguistic evidence.

    --
    Regnant populi. (The people rule.) Pregnant ropuli. (The snake will soon lay eggs.)
  96. CD-ROMs improving by gibber · · Score: 2
    Newer CD-ROMs have a much better chance of surviving than original CDs did. There have been several major changes.
    • Plastics are now more flexible and less prone to shattering.
    • Thicker plastic places media farther from surface, hence farther from scratches. Scratches can also be filled since the actual media hasn't been damaged.
    • No new CDs from the "Crash Test Dummies" have been released. The last copy of "Mmmmm...Mmmmmm...Mmmmm...Mmmm..." I saw was the victim of a drill bit, a microwave, a utility knife and, eventually, a concrete wall.
    • Going back to aluminum foil presses rather than silver -- a couple of companies (one of the larger British CD manufacturers) attempted silver for a while until someone discovered that silver tarnishes over time.

    +------->
  97. Character sets... by guran · · Score: 2
    Oh my,...

    Some of us don't have english as our mother tongue. That is something that is too often forgotten at places like redmont.

    (OK this in slightly OT, but I'll rant anyway.)
    Between DOS and windows that company we all love to hate decided to change character sets. Suddenly three letters in the swedish alphabet have a new character code. One and a half decades later (count that in internet time...) we are still struggling with documents with mixed encoding.

    That means every damn application has to provide a way to recode OEM to Ansi. AND deal with users who tries to do this conversion on files already converted.

    This is *before* dealing with unix and mac files.

    So if we cant read freaking text files after ten years, how are we supposed to read binaries?

    Sometimes I just get too tired...

    --

    All opinions are my own - until criticized

    1. Re:Character sets... by dsplat · · Score: 1
      Between DOS and windows that company we all love to hate decided to change character sets. Suddenly three letters in the swedish alphabet have a new character code. One and a half decades later (count that in internet time...) we are still struggling with documents with mixed encoding.

      That means every damn application has to provide a way to recode OEM to Ansi. AND deal with users who tries to do this conversion on files already converted.


      The recode program from the GNU Project may be able to do the conversion for you. Take a look at the manual for it to see if it already has the support for the character sets that are troubling you. That doesn't change the fact that changing the encoding out from under people is a dirty trick.
      --
      The net will not be what we demand, but what we make it. Build it well.
    2. Re:Character sets... by guran · · Score: 1
      Thanks for the link.

      However that was not the problem. Converting is an easy task every time. The problem is that you have to thing about it *E*V*E*R*Y* *T*I*M*E*

      Say that this adds one hour of development for a project and five minutes per week for the user. Now add up:
      ((one hour/project) * (X projects per year) + (five minutes/week/user) * (Y users) * (52 weeks /year)) * (Z $$/hour) * (fifteen years)

      Enter some resonable numbers instead of XYZ...

      Anyway I'm just whining. Hey I get paid to fix stuff like this...

      --

      All opinions are my own - until criticized

  98. Rosetta stone by quiddity · · Score: 1
    various organisations have tried/are trying to produce rosettas stones.

    try Longnow.org for more ideas along those lines.

    --
    .
    . hmmm
  99. Umm.. by Anonymous Coward · · Score: 0

    ASCII a trade secret?

    For some reason, i found that very funny.

    1. Re:Umm.. by Teancum · · Score: 2

      Why not? There was some extream competition back in the early days of computing, and just about everything else was a propriatary standard anyway, why not the character encoding scheme as well? (which IBM did anyway)

      I think the real reason why ASCII was adopted had nothing to do with the computer industry, but rather with Western Union (which was a part of American Telephone and Telegraph at the time). All of the teletype machines used ASCII, and it proved to be a stock terminal option for many years. The control characters are a legacy of this heratige as well. The reason for the ASCII codes of #10 followed by #13 is that the teletype machines had to be told to scroll up the paper one line and then physically move the printer head to the left side of the terminal. How many systems still need these codes, even though all it really means is that the cursor is moved to a new position on the monitor?

  100. Re:Thanks to "proprietary formats" info will be lo by RomulusNR · · Score: 1

    Can you say LZW?

    --
    Terrorists can attack freedom, but only Congress can destroy it.
  101. Re:Ok ... books last? by Lunchmeat · · Score: 1

    The secret of lasting information is copying it over and over. Digital information has to be copied from one medium to the newer as long as the hard- and software to read and write it is available. Interconnectivity is necessary.
    How long do books last?
    Not talking about American paperbacks but of real, leather bound volumes with acid-free paper, those might last 100-500 years depending on how they are stored and treated.
    The only information, which will last is the _written word_ but whatever material you print or write it on - it has to be *accurately* copied over and over.
    Think of the Bible: Professional copyists worked hard to preserve the information and we (at least specialists in Hebrew, Aramaic and Greek) can still read and decifer the 10000s of handwritten copies nowadays. Why? Because the Hard- and Software to decode it is built into our heads and this hard- and software replicates with the same pace as population replicates! Interconnectivity is included by means of spoken or written word too. This, of course, implies education. Lamento that fewer and fewer people are good readers. Guess to which development of the past 50 years this is due to!? I wish more money would be spent on education than on development of technical gimmicks.
    Finally: Who or what decides which information is worth preserving?

  102. even the bible has degraded by HelloKitty · · Score: 1

    most translations have 1000's of words wrong, which can hugely distort the facts that are actually in there.

    remember that it was not originally in english. of course, now the zealots would hate to see the red sea be changed to the reed sea.

  103. Preservation by Chance by beroul · · Score: 1
    Think of the Bible, or the Collected Works of Shakespeare. These continue to exist not because they were recorded perfectly on a perfect medium, but because people found them worthwhile enough to continue them.

    Often, hugely important texts have been preserved by chance, despite people's indifference, sometimes even desipte people's efforts to destroy those texts.

    Bach's magnificent sonatas and partitas for solo violin were found among some papers that were destined to be used for wrapping butter.

    Because Tristan and Isolde was considered, in the Middle Ages, to be a morally and politically dangerous story, only one manuscript has survived of the original version, and it's very incomplete.

    The texts that we now have from ancient Greece were preserved by Byzantine scribes who recopied them over and over as the copies decayed. It just so happened that in the 15th century (if I remember correctly), some Byzantine scholars went to Italy, and brought a selection of masterpieces with them. Soon afterward, the Byzantine empire was destroyed, and all its ancient Greek manuscripts were lost; the only ones left were the ones that the scholars had happened to bring with them to Italy. And those are the ones we have, the ones from which the Western world has learned Ancient Greek, which had been forgotten in the intervening period.
    --

  104. Re:So just what *is* the life of a CDR? My results by Sponge! · · Score: 1

    Heck yeah! It is even better if you put like 6 or 7 on top of each other and nuke em! My findings say that the dark blue ones don't make as many coasters from write errors and neither do they make pretty coasters when they come out of the nuke-o-matic.

    --
    Sponge!
  105. Re:VA / Slash-dot Giveaway NOT AGAIN!!! by Kit+Cosper · · Score: 1
    Here we go again....

    This is not an official post from Larry.

    There isn't a Slashdot Giveaway

    This is a bored individual who enjoys misleading people and generating unnecessary email.

    Official VA promotions will always be posted on the VA Linux website.

    Sorry for the confusion that has been created.

    --Kit

    --
    Former Inmate, VA Linux Sanitarium
  106. We are the Internet Archive by bollacker · · Score: 1

    Given our organization's mandate, I thought I should throw in my $.02.
    Although still ramping up and learning how to make things work, we are
    trying to ARCHIVE THE ENTIRE INTERNET FOREVER. Crawling or other
    forms of collection are used to download the information, and we store
    everything on hard drive. We plan to have about 100TB of HTML,
    images, Usenet, streaming media, etc.. within two years, and we have
    some collections that reach back to 1996.

    Currently, we do no backups of the hard drives, because given their
    low failure rate (about 1% in our history), it's less lossy overall
    to use that space for new data rather than redundancy. By the time we
    reach equilibrium with the Internet so that our download rate
    approaches the information generation rate of the Internet, we'll have
    some sort of backup mechanism in place. Probably software RAID of
    some form.

    As time passes, we will copy data to new media, but it will be on
    disk, this will be much easier than if it were on tape or printed. I
    have a vision that in the long run, we may be able to use something
    like an Intermemory (intermemory.org) to create a distributed
    filesystem that is the storage analog to distributed.net. In an
    intermemory, folks donate storage space, so that collectively, a huge
    amount of capacity is available. A lot of redundancy is used so that
    earthquakes, floods, govt. coups, and massive hardware failures are
    still unlikely to result in data loss. As folks' PCs fail or are
    upgraded, the simply plug in the new store unit (hard drive,
    holographic, etc.) and their part of the intermemory is reconstructed
    (like RAID 5).

    There's also been comments about how to handle (index/search/browse)
    so much data if it is all archived. This is an area of active
    exploration in which we are working with research groups and others.
    Generally, we've found that working with flat ascii files and perl
    scripts is one of the few approaches that scales up to TB of
    information on reasonably priced hardware.

    From a fanciful perspective, I see us eventually being something like
    the "Library Institute" of David Brin's books, or being the digital
    analog to the Library of Alexandria. As we are a non-profit, access to
    are our archives is freely available (see archive.org) and we
    encourage users of a broad range of types. If you are interest in
    seeing a large scale implementation of archiving heterogeneous digital
    information, check us out. As a shameless plug, we are also looking
    to hire developers and researchers. What we develop is open source
    and encourage its dissemination.

    Kurt Bollacker
    Technical Director, The Internet Archive. (www.archive.org)

  107. Worried about the wrong things, a bit. by RomulusNR · · Score: 1

    I think the author (and others), first of all, worries too much about data obsolescence, especially due to software or hardware obsolescence. The author fears that some day, a certain brand/style of computer (or all of them) will become antiquated to the point that the last one will break, and that will be the end of any data stored in a format which "only" that machine can read.

    Not gonna happen! Are there many machines -- not just computers -- from history which later humans haven't been able to repair and get to work again? Or even rebuild from ancient pictures and documents? I can't think of one.

    Nor am I worried that the lack of commercially available or even skilled help for repair of old computers will mean that we will be hopelessly unable to resurrect them. Groups like the l0pht have done wondrous things in the area of resurrecting old computers; from rebuilding an old VAX, to running a web server on a Mac Plus, and various reverse engineering of both antiquated and SOTA devices. Similarly my grandfather, a retired marine engineer, works at the railroad museum in Florida repairing old steam engine trains.

    Should we be worried about people not being able to fix an Apple -- the first of which was built by two guys in a garage -- when three college students can build a nuclear breeder reactor under their bed? (See past /. story on that one.)

    .....

    On the other hand, the author worries a lot about software/hardware obsolescence as a threat to data persistence. What about the bogon factor? Data maintainers are worried about two big things when it comes to losing data: accidental deletion, and hardware failure. They're not worried about, for example, DDS tapes being discontinued, because they know they will eventually have to upgrade their backup methods to new technology. But if their backup tape gets caught in the drive mechanism, or gets immersed in water, or some fool munges the backup, or write over an old tape... These things are the real big problems, and I dare say much more data is going to fall to the factors of human error and natural disaster than any worries about data formats becoming obsolete.

    Tell me what sort of really important, crucial data is sitting on old media? I know that apparently certain data tapes from mid-20C censuses are supposedly "lost forever" due to hardware obsolescence. But is the data on those tapes really useful? In other words, is there anything useful on those tapes that isn't in another format already (books, documents, etc.)? I think not.

    I'm paranoid about losing my own data. I still migrate old disk drives from new machine to new machine because they contain old data which I can't replace (it's mostly all original work). I even shelled out top dollar for disk drive scanning software to recover data on a disk I was forced to reformat. Eventually I will back it all up, or copy it to a new drive. But if I were to delete any of that stuff by accident, and lose it to other disk operations, it would be gone, gone, gone. That's my real fear.

    --
    Terrorists can attack freedom, but only Congress can destroy it.