On Preservation of Digital Information
Preservation of Digital Information
Recently there was an Ask Slashdot about the the problem of preserving digital material. The basic idea was that we are creating a massive wealth of digital information, but have no clear plan for preserving it. What happens to all of those poems I write when I try to access them for my grandkids? What about the pictures of my kids I took with that digital camera? Can I still get to them in time to embarrass them in the future?
Obsolescence of digital media can happen in three different ways:
- Media Decay: Even when magnetic media are kept in dry conditions, away from sunlight and pollution, and hardly ever accesses they will still decay. Electrons will wander over the substrate of the media, causing digital information to become lost. CD-ROMs luckily do not have this same problem with electron loss. They still are sensitive to sunlight and pollution though. Many people mentioned last week that distributors of blank CD media often make claims of an hundred years or more for the duration of their products. Research seems to indicate the truth is closer to 25 years,which seems like a long time, until you consider the factors below. Besides, information professionals often think in terms of centuries rather than decades.
- Hardware obsolescence: Far more dangerous than the degradation of the actual information container is the loss of machines that can read it. For instance, the Inter-University Consortium of Political and Social Research received a bunch of data on old punch cards. The problem was they had no punch card reader. It took a decent chunk of time, and a good deal of money to eventually be able to read the data off of these cards, even requiring some old technicians to come out of retirement to help tweak the system. Hardware extinction is hardly a foreign topic to Slashdotters. It happens, and as technology increases its pace of change, it will happen more quickly.
- Software obsolescence: The real stone in the shoe of digital preservation is obsolescence of the software needed to open the digital document. This can include drivers, OSS, or plain old application software. We all have piles of old software that were written for older systems, or come across an old file the bottom of a drawer where we can't even remember what application it used.
There are several strategies for preserving digital information. People mentioned some last week:
- Transmogrification: printing the digital document into an analog form and preserving the analog copy. An example would be printing out a Web page and archiving the print of that Web page. This, obviously, takes out the main strength of a Web document, hyperactivity, and may also ignore important color and graphical content. An alternative form of this is the creation of hardcopy binary that could later be data entered into the computers of the future. The media suggested have ranged from acid free paper to stainless steel disks etched with the binary code. The two major problems with this idea are that any misrepresentation of the binary could have disastrous results for the renewal of the document, and transformation to hard copy limits the functionality of many types of digital documents to the point of uselessness.
- Hardware museums: preserving the necessary technology needed to run the outdated software. There are several weaknesses to this plan. Even hardware that is carefully maintained breaks and becomes un-usable. In addition, there is no clear established agency that will be responsible for maintaining these machines. Spare parts eventually become impossible to find and legacy skills are required for maintenance. There must be technicians with the requisite skills to service these preserved machines. Finally, it does not create efficient use if all possible future users must bottleneck to just a handful of viewing sites to have access to the information.
- Standards: reliance on industry-wide standardization of formats to prevent obsolescence. Market place pressures for software produces create an incentive for a company to differentiate their product from their competitors. While unrealistic in a capitalistic marketplace, standards such as SGML have proven successful for large scale digital document repositories, like the Making of America archive hosted by the University of Michigan. However, many of these large repositories also receive information from donors that is not in a standardized format, and do not feel comfortable turning away those documents.
- Refreshing: moving a digital object from one medium to another. For instance, transferring information on a floppy disk to a CD-ROM. This definitely seemed to be the preferred method of most Slashdotters. While this takes care of degradation and obsolescence of the media, it does not solve the problem of software obsolescence. A perfectly readable copy of a digital document is useless if there is not software program available to translate it into human-readable form.
- Migration: moving the digital document into newer formats. An example might be taking a Word 95 document and saving it as a Word 97 document. Single generation leaps are usually not a problem, so large volumes of information could be saved. Unfortunately, migrations over several generations are often impossible, as is migrating from a document type that was abandoned, and did not evolve. Also, information loss is common in migration, and may cause the document to become unreadable. While this may be the best single method available, it is very labor intensive, and some knowledge of the nature of documents would be essential to determining which information containers to migrate. For instance, often you lose aspects of a document (good and bad) when you migrate it, but which of those aspects are important?
- Emulation: creating a program that will fake the original behavior of the environment in which the digital object resided. This is another very intriguing method that could be used. It's actually already pretty common. For instance, most processor chips include emulators for lower level processors. There also aleady exists on the Internet a very active group of people who are interested in emulating old computer platforms. Still, we need to do a lot of research yet on the cost of this method, and what sorts of metadata are necessary to bundle with the digital object to facilitate its eventual emulation. Another problem is the intellectual property hassle caused by emulation. Reverse engineering is a big no no, and there is no point in making the lawyers rich. This area is actually where Open Source can be of biggest help to preserving the longevity of different kinds of applications.
Many people in the discussion last week seemed to believe that simple refreshment or migration of the data would be a sufficient answer to the problem. At a personal level that may be true, but for anyone responsible for large amounts of digital information, neither is a completely convincing method. Here are a couple of reasons why:
- Not all documents are the same- In the digital preservation literature, most people talk as if all digital information is in ASCII format. Au contraire. As computing becomes increasingly robust, so do the documents we create. Multimedia games, three dimensional engineering models, recorded speeches, linked spreadsheets, virtual museum exhibits and a host of other documents spurred by the development of the Web have cropped up. How are they going to be affected by migration to a new environment?
- It's so darned expensive- It's a little gauche to talk about, but the Y2K bug caused what ended up being a huge migration of digital information. How much did the US alone spend on that fiasco? $8 billion? For smaller organization who do not prepare for the preservation of their digital information, the cost of emergency migrations could cause all sorts of budget trouble.
There is some belief that there is no reason to preserve information at all. Most of what is created is just tripe anyway, and we should be more focused on creating content than preserving it. There are two reasons why some sort of preservation is important. First of all, it is inefficient to recreate information that already exists. Human energy is better spent on building upon existing knowledge to create new wisdom. How much do we already spin our wheels as several people collect the same data? What more could we be doing if we spent the energy instead on new pursuits? Secondly, there is some data that is irreplacable.
Which is not to say that we should keep everything. In a traditional archive, only 1% of documents received are kept. Ninety nine out of one hundred documents are destroyed for various reasons. A similar ratio is not unreasonable for digital documents. Consider that 16 billion email messages are sent each day. It seems ridiculous to keep all of them, but how do we weed out the ones we do want to keep? Appraisal of digital documents for archival purposes is going to become a major issue in the not distant future. There are already examples of data that have been lost, or nearly lost. NASA lost a ton of data off of decayed tapes. The U.S. Census nearly lost the majority of the data from the 1960 census. These huge datasets are important for establishing a scientific record that reveals longitudinal effects.
Increasingly, the record of the human experience is kept in a digital format. The act of preserving that information is the act of creating the future's past, the literal reshaping of our world in the eyes of the future. Nobody knows the best answer yet. There is probably not a single answer that will fit absolutely all situations. Information professionals are just beginning to do research in the form of user testing, cost-benefit analysis and modeling to answer some of the thornier issues raised by the preservation of digital information. There are things out there worth saving, we just need to figure out the best way to do it.
Some links of interest in case you would like to read more:
- a really good bibliography of related sources by Michael Day
- an article by Jeffrey Rothenberg outlining some of the issues
- a site at Leeds University with many related links
Most content could be published in book format. I know books are so ... old-wave, but they work pretty well.
Bad Mojo
Bad Mojo
"If you can't win by reason, go for volume." -- Calvin
With digital documents, there's no real reason not to save all of it, even if much of it is "tripe".
Information is information, whether or not we find it useful. Some day, someone else might find our tripe is a goldmine of information, if only for anthropological study.
Sakhmet.
(The REAL McCoy)
"The surest way to corrupt a youth is to instruct him to hold in higher esteem those who think alike than those who think differently."
Ban the Nukes! Save the Whales! Screw it. Nuke the Whales!
I'm keeping all those old AOL CD-ROMs. Some software archaelogist will need them to see what Internet pioneers struggled with.
Help end the use of Sigs. Tomorrow
I don't think there's too much trouble with losing games and other applications as the hardware that runs them obsoleces... New ones will be created, and the best of the old will be ported.
As to the data already archived on various media, there could indeed be a problem if people fail to move the data to newer media... Think of your pile of 5 1/4" disks that's just rotting in the corner because your new computer only has a 3 1/2" drive -- and that's not even a huge leap in technology.
There's also the question of formats, especially for users of M$. After two revisions of the software, it can't read any of the old data! Try reading a Word 6 document in Word 97 for laughs, especially if you use any special characters ü á € in your documents...
--
--
It is no measure of health to be well adjusted to a profoundly sick society.
BBC currently has an article on the same subject. This a great advantage of Open Source (preaching to the converted, I know) because that is the only open standard (and therefore durable) format. All other proprietary formats will come and go with the companies that make them.
The solution would be to use an optical storage media, but as others have pointed out, CDR storage has a life expectancy of 75-100 years depending on the brand. Which wouldn't be too bad except you have to realize that in 100 years you need to start putting resources into copying all that data off and re-writing it again. After awhile you'll have a snowball effect where you spend more time writing the old data than the new!
What we really need is a piece of technology that doesn't age - an entirely self-contained computer (nuclear powered, maybe?) that has the media, the reading/writing mechanisms and has several failsafe mechanisms to alert you well before any data is lost. Think of it as a computer time capsule - you bury it and in 500 years come back and it has all the human interface necessary to reproduce the data in a usable format. Of course, you'll still need someone who reads English then..
agh, the problems, the problems....
A perfectly readable copy of a digital document is useless if there is not software program available to translate it into human-readable form.
Is there an example of a computer system that doesn't exist anymore, and can't be emulated at a much greater speed than the origional using existing software? Even most arcade machines can be emulated these days
If something is of value and needs to be preserved, it will be preserved somehow (book, updating to a new software or whatever).
If a piece of information has not been preserved and is now unaccessible, it probably means that it was of minimal value anyway.
That's probably not the greatest way to look at this but I'm thinking that half of all the info that's presently out there is useless anyway and is just taking up space for nothing. Maybe it's a good thing that these will be lost with time. It's kind of like a good spring cleaning.
*******************************
This is where I should write something
intelligent or funny but since I'm
To kick off the promotional offers, we're having a contest drawing on March 1st. The winner will receive a VA Linux Systems StartX SP Workstation with a blazing 400MHz Intel(TM) Celeron© processer, (approx $908.00 value)!
Five second place winners will receive a Linux / Slash-dot gift pack, including a "Debian GNU/Linux Box Set" and "Slash-dot" t-shirt (as seen on Copyleft.net), an estimated $40 value.
Remember, this contest is only open to registered Slash-dot users. Look below for instructions on how to enter.
In other news:
I must apologize for referring to Mr. Malda as "Captain Taco" in previous statements. I received over a dozen letters from Slash-dotters like yourselves informing me of my mistake, which brings me to this point: I encourage you to let me know your opinions (and correct me if I misspeak). Within a week a special e-mail address will be set up for this purpose. Only together can we make VA / Andover.net successful. Each and every one of you is part of the team.
Please look for my new weekly newsletter, starting on February 29th!
Sincerely,
Larry M. Augustin
President, Chief Executive Officer and Director
VA Linux Systems
***"VA / Slash-dot Giveaway" Contest Instructions and Rules
How to enter: The "VA / Slash-dot Giveaway" contest (hereafter referred to as the Contest) is open to all registered Slash-dot users. To enter, send one e-mail to "service@valinux.com" with this text exactly in the subject (without the quotes): "SLASHDOT GIVEAWAY". The first line of the message body must be your registered Slash-dot username. Notification of winnings will be sent the e-mail address on file in your Slash-dot user profile. You will not receive a confirmation e-mail when you enter. Please do not send multilple entries, as they will be discarded, and e-mail abuse ("spamming") may be grounds from Contest disqualification and/or removal of your ID from Slash-dot.
Prize drawing: Winners will be drawn from all e-mails received up until the cutoff date of 1 March 2000 at 00:00UTC. Winners are randomly chosen using HotPicker(TM) software. Winners will be notified of their status by 5 March 2000 by e-mail containing a confirmation claim number. Prizes must be claimed by 31 March 2000.
Prizes: There is one (1) "First place" prize consisting of one (1) "VA Linux Systems StartX SP Linux Workstation" with 400MHZ Intel Celeron processor, 64MB RAM, 6.4GB hard drive, and the VA Linux OS v.6.0 Software Kit. A 17" monitor, keyboard, and mouse are included. Five (5) "Second place" winners will receive a "Linux / Slash-dot gift pack" containing: one (1) Debian GNU / Linux software box set and one (1) Copyleft "Slash-dot" t-shirt. Estimated value of "First place" prize is $908.00**. Estimated value of "Second place" prize is $40.00**.
Disclaimer: VA Linux Systems assumes no liability for e-mail contest entries not received. The Contest is not open to employees of VA Linux Systems and Andover.net, or their immediate relatives. VA Linux Systems reserves the right to reward alternate prizes of equal or greater value, defined by the value estimate stated above.
** All values are in US dollars and do not include state tax and shipping charges.
balancing the endless churning of the web against the need for a stable archive.
Unless we take steps to archive, transcribe and preserve all this information (yes, grits, petrification et al) then we are in effect building a new Library of Alexandria.
It would be the greatest loss ever for archaeologists of the future to be unable to access archives of the WWW. Every day is a unique snapshot of the world as the endless churning of webpage updates/dead link removals changes the WorldWideWeb.
This information Ocean is something unique. Archiving such a huge store of information generates a challenge in itself.
I don't often wax lyrical about the internet but it is in effect becoming a snapshot of our civilisation.
What a loss for future generations if they cannot see the views of ordinary human beings (through the endless websites) preserved.
...Upgrade now to Schrodingers Dog...
IIRC, Orson Scott Card addressed this issue in a story set in Isaac Asimpv's universe. The library on Trantor had indices of going back thousands of years, but the contents of the library had never been refreshed. The librarians knew exactly what they had lost.
please moderate this idiot down.
When you look back at history, and you look back at documents that are a "mere" thousand years old, the wealth of information in these documents makes you wonder what could be found if all the documents from that time had survived. Just because the format is digital, rather than analog or (eek!) paper, does not mean that this media is impervious to decay.
However, I think that decay is much, much more serious in digital media. The root of the problem is that if you are looking at physical document with water damage, even though the original "packets" of information (letters and words) are damaged, the human brain can sometimes extract meaning from smearing ink and crumbling paper. When an electron wanders on magnetic media or when a CD begins to decompose, that bit is lost forever. Digital media is much more sucepitble to lapsing into unintelligibility than physical media like paper.
Preservation in a media that will not become obselete is the key. As mundane as it may sound, plain ASCII text will probably never become obselete because there is no real reason to come up with a new standard. Some people may scream at me: "*ML! *ML!", but at the rate that these things will obescelece, plain text will still be around when XSGHTML has been long dead.
Just a thought. If you have something to add, feel free to respond.
Brandon Nuttall, the inquisitor of Reinke
This is something that is going to be more of a concern for those of us who conduct a significant portion of our lives online already. Ask yourselves, have you ever had a moment of unusual brilliance in which you posted something to Slashdot or Usenet which was truly worth saving? Can you find it now?
Personally, I encountered the issue of software obsolescence well over a decade ago. I migrated my resume to TeX because it had already been through four other formats and I no longer had access to the tools to read them. I picked TeX because I firmly believed that a tool that I had the source for was likely to continue to be useful to me for a longer period. And the source for the document is ASCII text, which I was able to convert to HTML a couple of years ago with little trouble. I will not rely on the future availability of any tool that I have no control over.
This is one of the reasons that The Unix Philosophy, a fine book, recommends text formats for data. You can manipulate it with a wide variety of tools including text editors. It is unlikely that we will abandon those completely in our lifetimes. It also suggests, if memory serves, keeping notes online in text form. They are more portable and more accessible that way.
One worthwhile source of literature preserved as plain text files is Project Gutenburg. It is probably also the oldest such project around. It is to text in some senses what Free Software is to code. Although they aren't doing collaborative authoring projects, they are collaborating on getting old books whose copyrights have expired into electronic form. If you haven't ever visited their site, take a look.
The net will not be what we demand, but what we make it. Build it well.
This is an excellent summary of the technical challenges to digital media preservation.
But the technical issues are insignificant compared to the legal concerns - copyrights, patents, etc.
Sure, most of these forms of copy limitation do expire, but until a large amount of "digital literature" becomes public domain, nobody's even going to *try* developing a preservation system, for fear of lawsuit by irate copyright-holders.
My university's library collection totals nearly seven million books. Yet extracting information from this huge paper collection has been an incredible hassle... I would be willing to pay a significant annual fee if I could access every page in the library via a Web interface. I leave the juicy technical details to the reader's imagination. (I bet a few people with hand-held scanners and rudimentary OCR could digitize the entire library in a reasonable amount of time).
But guess what - this is never going to happen in my lifetime.
These seven million volumes of knowledge are never going to be preserved, because no library director in his/her right mind would risk slipping up and getting sued for violating a long-lasting copyright.
TROLL
/TROLL
I want a grit cluster out of naked and petrified Beowulfs pouring hot Natalie Portmans down each other's pants!
Addressing the media hardware problem:
I think one solution could be to store all data worth keeping for a long time on standardized media.
In the Old Times (IIRC) nearly each computer manufactor clung to his hown proprietary set of "standardized media" - just remember the nearly thousands of different formats for the good ol' 5.25" floppy drives. This problem is far less threatening today, because nearly all media (hardware AND logical formats) are standarized. You can read a CD containing i386 Linux on a Macintosh etc.
So one solution to the media problem would be to just keep the official standard specs (like the Books of Many Colors) in a durable format (etching them in titanium plates should be sufficient), so if, in a few thousend years, the need should arise to read that old Quake CD, the archaeologist just have to dig out the plates, build a new CD drive and lo! all that old data which survived World War XXXVII would be accessible again (if kept in a climatized room to slow the media decay).
Unfortunately, Playstation CDs would be out of the game for not being made according to The Standard...
This assumes that information SHOULD be thrown away. I'm not interested in becoming a pack rat, I already have enough "stuff" to keep track of, thanks. I suppose I'm just not all that interested in making my information, no matter how trivial, available to future archaelogists.
In this case, the main problem is not bit-rot (although this will occur sooner or later) but rather problems with not recalling the information for an extended period of time. For example:-
- Reels of tape start to inprint signals to adjacent tape causing loud passages to have ghost versions either before or after them.
- Tape actually becoming stuck to itself due to using bad binding materials leading to baking of tape as desperate restorative measures.
These and other issues are discussed on www.audio-restoration.com. Does anybody know if there are similar problems associated with digital media (the cross-talk problem will be virtually negligible due to noise-floor issues being irrelevant)? If so then it makes archiving a much more difficult thing if you have to physically do something to the archives every couple of years (especially with the exponential growth rate of information generation).The only Good System is a Sound System
Books can have a life of hundreds, if not thousands of years if treated right. Even with abuse it will survive for years.
There is a problem of obsellescence of language, although usually there is a rosetta stone equivilent
With modern Media technology is progressing so fast in an almost throwaway way. At my previous company we had good backups, but we had no way of accessing them as before we went to DAT and then DLT we didn't actually posess the devices needed to read the tapes and before that disks.It could be argued that with the internet archiving is going to be more dynamic and fluid, but where does this leave information, and especially information for future generations. It is all well and good moving from teh printed page to the digital page, but in 2000 years time will they be able to revive the contents of a hard disc, will the information on the internet evolved dynamically not leaving a snapshot. Or will they look through the books of our time???
What will be our dead sea scrolls?
Working for the (other) man
sign me up.
And since copyrights of data formats is author's (or company's) life plus 100 years (gee, thanks Sonny Bono for extending this, I won't miss you), we can never hope to see any legal 3rd party readers for these files. In the IP owner decides to sit on an old format and not support, we are officially hosed.
---
This sig has been temporarily disconnected or is no longer in service
I would argue for the historically tested method of storing data: take a chisel and carve it into rock.
The software obsolescence is not a big problem -- humans (we hope) are going to be around for some time and the brain wiring changes awfully slowly. Languages do get forgotten, but smart people are very good at understanding dead languages and will probably get only better. Readers are also not likely to be a problems: just like brain wiring, eyeball construction is quite stable and not going to be superseded by a better design any time soon.
The media -- provided you pick a good hard rock like granite (avoid limestone and its derivatives like marble, they don't like acid raid) -- does not suffer from bit rot, completely ignores magnetic fields, stable with regards to solar radiation, and fairly resistant to pollutants.
You are not limited to ASCII, and even have limited graphical capability. In fact, rock has a huge advantage over current digital media -- it's perfectly possible to create, view, and store 3D objects in rock. Just try that with your 21' monitor!
Just in case you think I am being funny, there is a company which in exchange for a sum of money will take your text, etch it on metal plates (nickel, I believe), and store it in some cave. They are estimating >5,000 years MTBF. I still think a good slab of granite is better, though.
Kaa
Kaa
Kaa's Law: In any sufficiently large group of people most are idiots.
The common response to this is that we may not know what is worthwhile, or that future ages may not take appropriate care. Lost greek plays that would be worth millions now were overwritten by some monk's laundry list in a less enlightened age. We feel we must save our information from that fate. But that is an impossible task. Etch the information on steel disks and some future, more barbaric age may melt those disks down for swords.
So forget about trying to save everything. Just work to save what you think is important. Yes, stuff will get lost, but that will happen anyway. You will never get perfection. More likely is that future generations will curse you for the stuff you thought to trivial for your archive project, while finding the information archived worthless.
Now, I'm dealing with legacy code, too. One solution of course is to write vanilla code in a common language, but who knows what language is going to be used in 25 years? C+++? Fortran 2020? And vanilla code isn't always optimal, when hardware vendors build cutesy hotrodding tricks into their architecture and compilers.
Somebody just needs to build a giant computer version of babelfish for all languages ever. Starting with cave paintings. :)
I'd rather trust a man who doesn't shout what he's found. -- Genesis
Or if you prefer to avoid the Simpsons reference look at the link. It goes to http://www.hardcoresex.com
(Then there the fact that the BBC website can probably handle more traffic than Slashdot so a mirror is pointless)
I have a question, however, about the other end of the data life-cycle: its birth. Certainly data disappears, but what is the best way to describe or define "data," broadly generally? What is the best definition anybody here has ever heard for "information"? I'm having trouble finding a straight answer. Is data (information) a representation of something in the real world? Is it like a shadow of something else? We have seen how it can be created, we have seen how it can evolve, and we have seen how it can fade away and die, but what is the best definition of what it is?
This is one of those philosophical questions that just nags at the mind. If anybody can suggest definitions (or resources), I'd be grateful.
A. Keiper
The Center for the Study of Technology and Society
hint: don't store the instructions for reading media, on those media :-)
just my .0001 cents.
...as long as I don't run out of disk space. (Paraphrasing a comment I heard at a DC thinktank.)
It was noted that storage requirements for geographic data (geologic, topographic, etc.) would require petabytes. Multiple petabytes. And a petabyte is 1000 terabytes (right?). And we're thinking 36GB hard drives and DVD-RAM drives have a lot of space...
--
The Future: Some assembly required; batteries not included.
There is a project that has started recently here at Stanford to investigate the possibility of using distributed web caches as a means of preserving information on the Web. The project is called "LOCKSS" (Lots Of Copies Keep Stuff Safe), and more information can be found at lockss.stanford.edu.
This project definitely does not address all the issues with digital-document preservation; it definitely does _not_ solve the document-format problem. Its goal is to make digital publishing "immutable" so that publishers cannot modify or withdraw their work after it is published.
Disclaimer: I work for one of the groups which is participating with the LOCKSS project, but I'm not working directly with it.
I am Jack's complete lack of surprise.
I put some CDRs out in the direct sun hede in the Las Vegas desert ofer the last summer. Blue, gold, green, pale green, and an RW. Both sides of the CDs had their chance to roast in the 100F+ (40C+) degree sun for several months each. And here's the results of attempting to read the data back on each type:
Old TDK green CDR: dead, nothing readable. Faded to a mostly clear plastic disc!
Ricoh gold/gold CDR: dead, nothing readable. The golds faded visibly first of them all. Area where data was stored faded to clear!
Verbatim (blue): I was stunned. I read back a full and complete iso image of Red Hat 4.2. No fading at all.
Ricoh gold/gold CDR: dead, nothing readable. The golds faded visibly first of them all. Area where data was stored faded to clear!
Memorex silver/green CDR: mostly dead, some files readable. Faded in a few isolated patchy blotches.
The CDRW... just started this test. No results yet. Looks OK, though.
Overall, I'd say the blue CDRs are the best choice for long term data storage.
About fifteen years ago the Library of Congress did a study to determine how they should be protecting important records. At the time they estimated the life of an optical disk (not a CD-ROM, but similar technology) to be ten years and the life span of a book printed on acid-free paper to be in excess of three hundred. (Books printed on cheap paper using an acid bleaching process last mere decades. Go look at any SF paperback from the fifties or sixties to see what your paperbacks from the seventies will end up looking like.)
The Library of Congress has so many WW II audio recordings that it would take a scholar several lifetimes to listen to them all. (And it would take the same manpower to convert them to a more modern storage medium.) These recordings were made on glass disks, and pose a number of problems. They have only a few players and the disks are very fragile. (Fortunately, when they break it is often possible to recover the data.) The other problem is that the disks are not well indexed, and certainly are not searchable. Most of the recordings are speeches of little value, even to historians, but finding the valuable information requires that the material be converted to a more useful format. (Some day we will have voice recognition to automatically convert a lifetime's worth of audio into text that can be searched, but that still presupposes that someone has digitized the tens of thousands of disks. Where will that manpower come from?)
Media lifespan is a serious issue when you are required to archive materials. Many governments are legally obligated to maintain materials for long periods of time, and replacing paper copies with electronic ones may not satisfy legal requirements. (Think about what happens when your data is on 230 MB optical disks. Remember those? Very popular eight years ago, useful only as coasters today.) None of this is a new problem, however.
Some of you will, no doubt, remember the issue of whether or not Heisenberg was building an atomic bomb for the Nazis, and if so, was he actively interfering with the project because he disagreed with the Nazi's goals. It turned out that after the war, Heisenberg and some other scientists were being held in Britain. The British tape secretly recorded all of their conversations. The medium? Spools of wire. (Think of a spool of wire being used just like a magnetic tape.)
A few years back some scholars wanted to listen to these recordings and had a terrible time finding a player. Eventually they found a collector who had one in working order. Wire recorders have not been made since the fifties. But they eventually found a player and carefully transcribed them. (And it seems that Heisenberg was actively trying to build a bomb, but lacked the resources to do so.)
There are mag tapes from the seventies that cannot be read. I have tapes from the late eighties that would be difficult to read, since I no longer know anyone with a 9 track tape drive. This is a little over ten years, unlike the wire spool recordings.
While most software will read files created by ancient versions of its competitor's software, I wonder how much longer this will last. Open Source doesn't fix the problem posed by data in proprietary formats which cannot be easily migrated.
The issue of emulation is important, but it presupposes sufficient information to write an emulator and sufficient resources to fund the project. Many times special hardware would have to be built to read the data. NASA has this problem, as the tape drives used to store telemetry data have not been made for decades and it is very difficult to find working ones.
I wonder if the period for technological obsolescence is compressing to the point that it will only take ten or so years for older formats to be unreadable.
I was recently looking at the first major programs I ever wrote. I only have printouts, as 20 years ago there was no easy way, at least as a student, to save files and even if there was, it would not matter because I could not read the media today. While I will scan and OCR these someday (for sentimental purposes, as they have no value to anyone else), I count myself lucky that I saved the printouts. I have a floppy formatted as a Unix filesystem for a Lisa running a crippled System III port done by Unisoft (remember them?). It has a few papers on it and some software I wrote. Nothing terribly valuable, although the papers would likely make some plagaristically inclined college students very happy. Can I read it? Maybe if I ever find someone with such a machine and the floppy has not gone bad. How old is it? A mere fifteen years.
Oh, and the the Y2K fiasco cost a lot more than 8 billion. I read somewhere that the New York Stock Exchange spent 600 million, and I know that the big three auto manufacturers spent at least that much apiece. I've seen estimates that the cost was $100 billion.
At the moment, Moore's law is the only thing that stops this problem becoming really acute. Although I keep all my email, and the total size of the archive grows almost exponentially, so does the size of my hard disk, and the speed at which I can run grep over it.
To handle terabyte databases now, needs leading-edge hardware and state-of-the-art software specially optimised for the data format. In 20 years, however, we will just be able to haul the terabyte database into emacs, and hack up some macros to reformat it and search it.
If Moore's law ever tops out, then we are in trouble!
Irony of ironies: Data records on floppy disks relating to an an archaeological dig decayed by 5 percent in under a decade - after everything had survived the journey from the Bronze Age intact.
A. Keiper
The Center for the Study of Technology and Society
I agree that the problem of preservation isn't exclusive to digital media, but one of the big differences is that analog media tends to degrade MUCH better than digital media. True, old records get scratchy or warp, and tapes can have their oxide coating flake off, but at least there's some data available (ie. you can still listen to the recording through the clicks or dropouts. With digital media, it's often and all-or-nothing affair. Either it's in perfect shape or it's gone. Of course this isn't always the case (sometimes you can extract some digital data from a damaged source) but it's much more difficult than with analog media.
I'm not nearly as worried by media decay as I am about content just disappearing altogether. The internet saves us from media decay-- if I keep my files on a network-capable machine, then transfer to the next generation machine is easy. Every time I get a new PC, I plug it into the hub, and let the file copying begin! On the other hand, "disappearing info" on the web may result in all sorts of archival losses! Magazines and Newspapers are archived and kept in libraries for years. What about news web sites? I'm sure most large sites keep their own archives, but will anyone ever have access to this data again? Once it is replaced by newer info on a site, is it gone forever? I'm afraid that the popularity of the web may result in the loss of good data archives in libraries for the future.
Regardless of how it's stored, eventually the data itself becomes meaningless. I read an article that made this point last year. Ever try to read Chaucer in the original english? Same language, more or less, but over several hundred years it has become unintelligible to all but a handful of people. The way language is changing today, it could take even less time for all these articles on slashdot to become gibberish. So with a perfect medium, who would we be preserving things for? A handful of scholars, ignored by everyone? No one at all?
circa75.com
As far as I know, there is no copyright protection for file formats. You can copyright a document that describes a file format, but not the file format itself.
Mea navis aericumbens anguillis abundat
Not all information needs to be archived. Most of the e-mail I receive can go in the bit bucket for all I care. The rest, I archive. As for the information that can/should be archive, the author's statements to the contrary, industry standards can be used to archive what should be archived.
Given a format that is a) adequately documented, b) accurately represents the data it encompasses, and c) has sufficient widespread adoption, we can simply archive to that format as we need to.
Let's consider various and sundry data types, the prominent format for handling them, and the potential longevity of those formats.
Text: For raw text of course you have ASCII. While not a permanent fixture, nobody can argue it's longevity. We'll call this the baseline. Moving up from ASCII you need some way of defining formatting and such. There are really only a couple realistic solutions. Either some SGML based system, HTML, or PDF. I'll get into the latter two cases a little further down. Let's say that for plain text, SGML has the best longevity because of widespread adoption, and simplicity.
Rich Text (beyond simple formatting): As above, we need something better than ASCII. I'll vote for PDF here. It's a proprietary format, but it seems to be pretty well understood, and it does an accurate job of representing the original document. Mac OS X groks it very well, and Adobe has ensured that there's a viewer for every platform. If conversion tools can be made, then this is a good format.
Images (bitmap): PNG, JPG, GIF, and TIFF. TIFF seems to be less relevent these days although most scanner software still produces it. JPG/GIF are where the majority of data presently exists, and PNG is where everything should be archive, IMHO... PNG being lossless, and supporting about every feature known to man, this seems to be the best solution. One could crawl the web, grabbing every single GIF or JPG, archive it to PNG format with no loss of data and quickly build a significant archive.
Image (vector): Sorry, don't know much about the formats used here...
Audio: The obvious solution for archival is uncompressed, raw audio in a well understood format like WAV. This is an area that doesn't seem to be changing much...
Video: Again, I can't really comment on the formats here...
Things become more complicated when you have interactive media, or other very specialized forms of data... But I'd rather save that for the experts...
The author brings up the "loss of fidelity" issue when updating documents to a new format. I think this really only is an issue when making a lateral move. Converting from JPG to PNG wouldn't be a problem, nor GIF to PNG. Converting from WordPerfect to Word on the other hand, is problematic at best...
Thus the need for archival formats with some longevity. Perhaps a commission should be formed on data archival formats? A group of OSS developers who do nothing but strictly define what format(s) are to be used for "data archival" purposes, and ensure that tools to read/write these formats are readily available on every platform -- including new ones as they come out.
The trick is to avoid lateral conversions at all costs.
MrJoy.com -- Because coding is FUN!
I keep what I loosly term a knowledge base; Every bit of useful tech data I run across, or have reason to believe I will need again, gets stuffed into a designated folder on my HD and later archived. I have stuff going back to Phrack 4, WordStar copies of C128 documentation, programs I wrote fifteen years ago for a hardware platform that no longer exists, System 3000 performance data, etc. While at the time I put each of them in I had access to the machinery and software to read and run them, much of it is dead now. Now I take the extra step to make sure anything new will be readable in the future. If it requires a viewer, an emulator, etc, they are saved with it. When the day comes that ia32 everything runs on and the CD the data is held on are depreciated and forgotten, they will be replaced by DVD-ROM and an ia32 emulator before obselescence becomes such an unsurmountable hurdle.
We must activly, and over the course of time, make sure what we do is available for posterity. Next time you burn MP3's to a CD-R, burn a copy of the mpg123 source too. Thirty years down the road, the information will be usable to anyone with the ability to read C and a DVD-ROM, even if MP3 is a forgotten format. When CDROM becomes hard to find, copy it to new media. I started on a Atari, and have manage to propogate that data through audio tape, floppy disc, magnetic tape and CD-R with little effort. Preservation shouldn't be an afterthought. Just do it!
.sig: Now legally binding!
I can't help but wonder what future humans will think of our efforts to preserve information. Will they even have the records that show that we tried to preserve anything? Will they believe the records we leave behind are factual? How much of the fiction that is floating around will be mistaken for fact? How much of the information we currently have will survive only in fragments yanked out of context?
This leads me to wonder how much context information do we need to bequeath to our decendants in order for them to be able to understand the information we leave behind? Consider how much information we have from ancient times which we do not truly understand because we do not have enough contextual information to really understand what was meant by this information. Look at how many conflicting translations there are of many of the documents that do still exist.
Even if we manage to prevent the degradation of the media on which the information is stored and the devices and software necessary to read the information are preserved, what of language shifts and culture gaps across time? We will still have the problem of information being lost as meanings of words change with time or as information is translated from one language to another. This is, in fact, exactly the same problem we face with the various software revisions for products like MS Word.
This is not to say, however, that we shouldn't make a significant effort to preserve information. I would also think that having a significant amount of contextual information (which should come along for the ride while preserving information) should help our decendants comprehend the information we leave behind. However, if our current track record for preserving contextual information is maintained, the outlook is not good for our decendants understanding our information in two or three centuries (assuming the information survives).
Well, that's my 93.2 cents worth on the subject.
If it works in theory, try something else in practice.
So if you need to store formatted documents for archival purposes in a system where you may later need to output the documents in a different form, you should look at TeX...
Cheers,
Ben
My usual seat in the cluetrain is at A HREF="http://pub4.ezboard.com/biwethey.ht
Um, not to be glib or anything, but there are lots of answers all over: http://www.britannica.com/bcom/eb/article/6/0,5716 ,109286+3,00.html
A couple of days ago the BBC reported in an article entitled Old computers lose history record how archaeological records are being lost due to exactly the issues raised in this story. The story reports that "[ironically, the] archaeological information held in magnetic format is decaying faster than it ever did in the ground".
:-)
So, it looks like we're going to have to start transferring all those old ZX81 game tapes (Timex 2000 for our U.S. cousins) to CD-ROM then. That should be good for another 25 years of '3D Monster Maze'
--
The gift of death metal does not smile on the good looking.
Ya ever put one in the nukelator for a few seconds? Cooles damn coasters I ever made...
Data preservation is not a new problem, it's one that traditional librarians and archivists have been dealing with for the entire 100 years of modern librarianship, and certainly for much longer than that in less academic ways. Can you say acidic paper? How about the restorations of the Mona Lisa and the ceiling of the Sistine Chapel?
It's not at all surprising, to me at least, that this paper was written by somebody at what was once the UMich school of library science, until they discovered that they could pump up their prestige and funding by by going dot-edu.
- David
There is some belief that there is no reason to preserve information at all. Most of what is created is just tripe anyway, and we should be more focused on creating content than preserving it. There are two reasons why some sort of preservation is important. First of all, it is inefficient to recreate information that already exists. [Point 1] Human energy is better spent on building upon existing knowledge to create new wisdom. How much do we already spin our wheels as several people collect the same data? What more could we be doing if we spent the energy instead on new pursuits? [Point 2] Secondly, there is some data that is irreplacable.
/. article about Nikola Tesla. His work is not well known to most, because it was not made prominent, and subsequently, not well archived. We know of him, and we can dig for more about him, but the credit goes where it may not necessarily belong.
Point 1:
With the amount of data that we produce, archiving it will take an increasing amount of time. How much new content is created daily? At best, we will plateu in a state where as much effort is required to archive content as is needed to create new content.
With the emphasis placed squarely on non-duplication of effort, archiving becomes a secondary issue. Indexing, searching, sorting and categorizing of the archive becomes a first priority, since creative efforts should now check if they are redundant.
If the bold statement is to be a guideline, than the idea of an archive is moot, since all new work depends on old work, and so tracks well with where the author feels human effor should go. Much like with biological evolution, new data is the fittest of the old data that was applicable to the new context. I suppose that the call for archives is little more than a suggestion that we need an organized and deliberate fossil record of how we got to where we will be at some point in the future.
What is needed is an archive, yes, but an archive of what? Not of content, but of the essence of the content. The lessons learned, the conclusions drawn and the optimizations realized in the process of creating the content. The content is fleeting - though arguably of inherent value... Which brings us to...
Point 2:
Yes, some things are irreplacable. Who decides? Who defines what is art, what is fact, and what deserves eternal life?
Some things are of immediate and significant value, but for an unknown duration. The value of other things can not be realized for a very long time, and so the alternative is to store everything. Further more, the value of certain data is totally subjective, and this begs the question of "who's in charge" of defining that 1% that is to be kept.
On the small scale, this will lead to vanity. Any 'artist' will consider their work a masterpiece, and save it. (I have code I wrote in CS101, don't you?) Companies will store and archive all email, all financials, anything that can potentially be used to mine data or identify trends or fertalize litigation. People will pigeon-hole videos of their baby's first steps, though nobody outside themselves really cares - unless the child grows up to be the next Einstein, or Hitler.
"Hitler" raises an interesting question on the larger scale. Who has the responsibility of deciding what 'big' facts to store? And isn't that the path to propaganda, history-making, and such things?
And then, when the leadership changes, and the 'book burning' starts...
To bring the concept down from the paranoid-sphere, let's recall the
Same issue with Newton and Leibnitz. Leibnitz was the German Mathematician who beat Newton to the concepts of Calculus. Newton, a member of the Royal Academy of Sciences (or something to that effect) politicised HIS influence, and so was credited with all of the work - where his contribution was not complete.
Some things are not outright lies, but oral histories get lost while written records persist.
Who gets to choose what to write down?
-- What you do today will cost you a day of your life.
Perhaps a "resilient disk" standard ought to be created, for stuff you would really like to last. Perhaps a WORM (write once read many) optical disk, like a CDR, but made to be very resilient, perhaps lasting up to a thousand years.
Perhaps they could even be made to work with existing CDROM drives and perhaps even existing CD writers. Then you just start selling a new kind of disk. Anyone that wants something to last, they put on those. If they want lots of space per penny, they can buy something else.
--
grappler
Vidi, Vici, Veni
I find it interesting to think about this from the perspective of the notion of memes. What has evolved from human consciousness is a rich ecosystem that generates and values an enormous diversity of information. Thinking about what will be preserved, and how, gives rise to an image of our several billion minds, aided by technology as simple (!) as spoken language or as complex as electro/magneto/optical storage, operating as a kind of primordial informatic soup.
Out of this fecund brew maybe, just maybe, a carrier as successful as DNA will emerge, with the capability to preserve the "best" of the information. Maybe it already has, in plain old text, which will be decipherable for as long as the bits can be gotten at, and which then has the benefit of the redundancy of human languages for further decoding and understanding. Then we drop down to the question of how exactly the bits manage to survive, and it seems the only ultimate answer is some human has to care enough to refresh them. Or be clever enough to teach them to take care of themselves.
It also seems clearly impossible that everything can be preserved, and also impossible that what is preserved will always be something to be proud of. Some extinctions, however tragic, are inevitable, and some, however richly deserved, never occur. It's part of the beauty (and maybe mercy) of conscious life that there are moments that will never appear again, can never be adequately captured for later replay. Being aware of that fact is what encourages us once in a while to put down the camcorder, shut off the microphones, maybe even try to still the stream of words in our heads, and just drink it in.
Disclaimer: I know I'm being a bit paranoid, but I think this should be brought up, at least for purposes of discussion. There is probably less to worry about here than in other places, but it still should, I think, be in the back of the mind of anyone trying to solve this problem.
One thing I believe was missed in the original article is intentional change to the historical record. In addition to having to store old information, and worry about how we're going to get to it later, I think we need to pay at least half a though to intentional modification of the historical record.
With paper and ink, it's rather time consuming and expensive to alter historical documents, even assuming you can get near them. With digital media, the situation may be different - it may become very simple to alter historical documents, especially if you're the guy who's in charge of copying them to the newest form of media.
Aside from the obvious political reasons someone might want to do this (can you think of a fundamentalist movement of any sort that wouldn't modify old documents to read they way they would like, given the chance?), I can also see where money might come into play.
For instance, suppose MassiveDrugCo, Inc. is introducing a new drug which prevents newly detected disease Y. Now, in order to sell a lot of this drug, you have show that Y harms enough people to worry about. Unfortuately, the historical record being used for retrospective studies doesn't show that. So, instead of going back to the drawing board and finding something else to cure, MassiveDrugCo instead feeds a modified copy of the historical data to unsuspecting independant researchers. These honest and unbribable researchers draw the conclusion desired by MassiveDrugCo - in spite of the reality of the situation.
A thousand pounds of wood moving at 300 feet per minute. Don't get in the way.
There is also a very well-written, very accessible article on this topic, titled "Saved", available at Wired magazine's archive. It was written by Steven Gulie, in 1998 and I distinctly remember reading it, thinking it had a profound impact on my thinking about this topic.
Take a look. -Paul
The real Paul Vallee is slashdot userid 2192, and, what do you mean it's not cool to point out your low userid?
I think you have to ask, what are you preserving information for?
Are you trying to preserve episodes of the Simpsons so our relatively near term, technologically advanced descendants can watch them? Well, they're technologically more advanced and thus more clever than we; we just need to have suficiently stable media (micromachined gold plates would work nicely) and a either a simple minded encoding scheme or an easily readable description of the algorithm prepended. In the 22nd century, some bright Norwegian 16 year old armed with a yottaflop coputer will figure out how to read it if he cares enough.
A bigger concern (in my opinion) is what happens when our civilization collapses. Historically, it is almost certain happen sooner or later. Rome lasted well over a thousand years; if you told a 1st century CE roman that there would ever be an end to the empire he'd think you were crazy. Yet our civilization is in many ways much more fragile because the information it is based on is in much more ephemeral form (both media and format).
What we need is to devise a bootstrap procedure.
(1)Reading primers in various languages.
(2) Primers on basic technology: mathematics, simple mechanics, mining and elementary metallurgy.
These should be in highly durable form, but the problem is that you don't want people making off with them for building materials. The problem with using gold plates is that you don't want people to have access to them until the information on them is more valuable than the substrate. Perhaps these first items could be carved onto stone pillars inconveniently large to move.
Next, you need repositories on more advanced science and technology: chemical engineering, electronics and so forth. Perhaps you could rig a way to prevent savages from accessing these repositories; a mechanical puzzle perhaps, that requires a certain mathematical sophistication to solve. The most critical records could be kept in forms that could readily be read without mechanical assitance or with only simple mechanical assistance such as optical magnification (my local librarian likes micofilm, because she knows it will be readable for decades). Less critical things like old Simpsons episodes could be on very cryptic media that would require considerable technical finesse to read, but would be cheap to transfer to.
Pretty much, as you go from the most basic and critical information to the least critical information, you go from the easiest to read and most expensive to produce per bit, to the hardest to read and most convenient to produce.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Sure, poems and photo's for the grandkids. That's a hundred years, tops, and migration, translation and CDR covers it, fairly easily. As far as showing pictures to people who will have only vaguely heard of me? Or preserving the IRS tax code for four thousand years? Somewhere I'm sure is codified the idea that data is useless without context. If not, there it is, Nyarly's First Thought on information theory. I'm sure it is though...
But me noodlings with fiction, my code, my photos and graphics won't be any more useful without the cultural context they were created for than an arbitrary collection of 16 bits without a description. Is that a Float or a Fixed? Is that English or Spanish?
And if a modern creator does produce something of Eternal Meaning, there's precedent for it's propigation by those it has meaning for. Think of the Bible, or the Collected Works of Shakespeare. These continue to exist not because they were recorded perfectly on a perfect medium, but because people found them worthwhile enough to continue them.
What good would a perfect storage method be, anyway? If people forget it, or if they cease to care, a record could be painted in Liquid Unobtainium on God's backside, and it would be just as lost as if someone had scratched it in sand. Or on the base of a bronze statue. "Look on my works, ye mighty..."
Paper rots, stone erodes, metal corrodes. The only eternal medium is word of mouth. Anything else is just a memory aid.
IP is just rude.
Is there any torture so subl
Noooooo one expects the Spanish Inquisition! Oh wait, the chruch tried to burn that event out of the history books. Stone tablets last the longest of all. They don't burn, don't rot quickly (even when buried in the wet underground).
Due to this post, a monkey was strapped to the back of a motorcycle, which was then sent at great speed toward a freight train. We had meant for the motorcycle to jump over the train; However, our technicians forgot to set up the ramp.
There was a tremendous impact as the motorcycle slammed into the train. Unfortunately, the monkey did not survive this encounter.
LouZiffer
LouZiffer
I've been working with the Linux Video group where we've been trying to make an open source player for DVD discs. The ONLY problem that we're fighting right now is not the know-how to get it done, but rather trying to obtain the file format documents for DVD-Video and being able to use them legally. Indeed, the recent deCSS program is another really good example of how file format specifications can be illegal to implement, even if you have obtained the specifications legally.
/.)
The way that the DVD Fourm (formerly known as the DVD Consortium, with oversees the DVDCCA... this is the group of companies that cross-license each other's patents and shares information regarding DVD development) currenly requires you to sign a non-disclosure agreement (NDA) to obtain the specifications, and that NDA also prohibits you from even discussing the specifications with anybody unless they have also signed the same NDA. Since this is covered under the trade secret laws, this particular bit of intellectual property is theirs theoretically forever. At least until you can hire a bunch of lawyers to demonstrate that a DVD is no longer a trade secret.
I've also set up a seperate mailing list from the main Linux Video group that is in the process of developing an Open Video Disc specification which is trying to allow people to develop products without having to pay royalties or deal with patent infringments. Fees for most of the current video formats range from over $10,000 (for the DVD specs.... license fees are on top of that) to the MPEG Licensing Authority who is being quite reasonable for most close-source projects, but if you read the details of what you must do to license a product, is contrary to the nature of most open-source projects. It is still possible to write a GPL'ed MPEG player, but it would only be free as in speech and not free as in beer. In fact, you would probabally have to charge somebody to download the software. Shareware MPEG players are probabally skating on some very thin ice legally, and certainly part of the registration costs would have to go to the MPEGLA.
One of the things that is so nice about HTML is the fact that this standard is open, patent and royalty free. If CERN had tried to put a patent on HTML I doubt that the web would have developed nearly so quickly. Or rather imagine if Apple's hypercard system had been developed with the GPL and file formats were made open for anybody on any platform to use.
One of the things that I believe is killing the Unicode character encoding is that all kinds of intellectual property restrictions are placed on it, and you need to pay royalties to develop much software that uses it. Again, think what would have happened with ASCII had it been kept closed up, and why EBDIC isn't being used for character encoding.
More importantly, open and free specifications are critical to data preservation, and a point that really hasn't been brought up by Calc (the author of the original post on
Many of the earlyist mission datasets from the 60s and 70s are unrecoverable due to media degradation and format incompatibility.
We're experiencing this problem because it's the first time we've really tried to store information for long periods of time, /and cared that we got a verbatim copy./
10,000 years past? Word of mouth. Stories handed down from one generation to the other. Want a copy? Listen and remember. Copy quality? As good as your memory. Portability? As far as word travels.
5,000 years past? Stone tablets, paintings, and the like. Want a copy? Make it yourself. Copy quality? As good as your talent. Portability? Can it be carried?
1,000 years past? Paper, but acid-free by accident, and not design. Want a copy? Hire a scribe, or us a printing press. Copy quality? As good as your proofreader. Portability? As far as the traders can sell.
Now? Binary format on varying media. Want a copy? Needs some special hardware. Copy quality? Perfect. Portability? Speed o' light, anywhere, anytime.
One of the email programs that I use stores everything in a database file. Short of saving messages to files, one at a time, there is no way to extract the messages from the database.
Mea navis aericumbens anguillis abundat
--
Time is Nature's way of keeping everything from happening at once... the bitch.
We lack the originals of most historical documents,
by the important ones have been preserved by
constant copying.
On 'Internet Time' a document may last for months
and the speed of copyinging is seconds.
This compares to centuries in historical time.
One of the big problems with storing data is the sheer size of it. In astronomy, almost all data collected by telescopes, be they radio, optical or otherwise, goes through a stage known as 'Reduction'. I've put this in quotes, mainly because it doesn't necessarily reduce the size of it. In essence, Reduction is about obtaining the most important or most complete information out of the data and discarding or minimizing the redundant, the useless and the misleading out of the data so that future analysis can be carried out on the important stuff without having to wade through all the noise. For instance, 70 or 80 images of one optical observation in various wavelength bands will be collapsed into three to five optimal images, one for each band. In Radio Astronomy, collating 60 - 80 12hr observations into one file removes all the 'bad' data and is optimal for future reuse.
To effectively make a useful archive requires some filtering of what goes into the archive. Nowadays I work for IBM on DB2 UDB, and the roadmaps suggest that the size of databases is growing exponentially - fortunately this is balanced by a proportional growth in both processor power and storage space and access speed. So while we have terabyte databases today, we could easily be looking at petabyte databases in a few years. These databases will probably hold a vast amount of digitized analogue information - memos, diagrams, papers - which currently is stored in more convential storage. The advantages of moving to a fully digital archive are great - searching and retrieval are faster, and the space saved by putting scans of 20 boxes of papers onto a hard drive or other storage are also great. However, there is a danger with archives growing out of control - if you initiate a search which will visit every part of a petabyte database, you are going to have to wait for it to finish, even with the best search algorithms and vastly faster hardware. Making sure that information is not multiply duplicated in the database, or that redundant data is not added without regard to the database retrieval performance is extremely important. If we set up a project to 'mirror the web' for archival purposes, we'll be hamstringing ourselves right at the start - most data is not needed for future reference. By applying methods to distill the important information, archives can be updated, maintained and searched without exhausting the available resources.
Cheers,
Toby Haynes
Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
The Internet Archive is devoted to preserve the information contained in the Internet.
And I have just found an article from Steve Baldwin, the guy from Ghost Sites!
--
__
Men with no respect for life must never be allowed to control the ultimate instruments of death.
GW Bu
Part of the solution is to avoid knee-jerk changes in format. For example, the Word file format gets changed every few years, but to what end? ASCII may eventually go out of date (as did EBCDIC), but at least a text file tends to be more future proof than a proprietary binary format. In terms of the web, there's already been a lot of nonsense caused by some people using Flash and other people using style sheets and other people using Microsoft or Netscape extensions to HTML. Is it really worth it? Or would it be better to stick to the least common denominator of pure HTML? I say yes, but apparently a significant number of web page creators disagree.
Digital information can probably be preserved in the same manner as any other. I believe the problem here is preserving information (of many types) that's been recorded digitally.
I see even classic Slashdot is now pretty much unusable on dial up anymore.
It would be interesting to see who's doing research into this. Certainly much research has been carried out into the paper equivalence and as a result it's possible to get books printed onto archive quality paper 'guaranteed' (well, you stand a better chance anyway) of surviving for 500 years.
The archival paper and inks have been created to be as chemically neutral as possible - the great problem with paper is chemical reactions of consistuent parts and outside influences (light, heat, humidity, ink) gradually breaking down the material itself. In the British Library (UK national copyright repository) a vast number of books printed in the C19th are beyond salvation, crumbling away, early mechanically made paper is often very unstable.
I am sure that research into this must be beng carried out in the same way for CDs etc. Any references anyone?
Archiving is important. I'm actually surprised at the number of /.'ers who just want to let the data die. I remember taking a tour of the Magninot Line in France. Having proven useless as a military outpost, the entire chain of caverns was converted to document storage decades ago. In a thousand years, archeologists will be able to substantially reconstruct live in twentieth century France. Information about births, deaths and marriages need never be lost. Detailed census reports can be preserved so historians can make new theories about the social behaviour of man. I think this is a fairly important task. Imagine how much easier it would be to reconstruct human history if past civilisation hadn't kept shoddy records.
:^) I doubt it would ever be profitable, but museums, even working ones, rarely are. Although who knows? A Commodore 64 could be an objet d'art in a hundred years, just as ugly African masks are now.
I suspect the problem of file formats is less serious than people make it out to be. A well-documented format should be reconstructable indefinitely. Few software companies don't document their file formats. Even without documentation, it ought to be easier than reconstructing dead languages. We learned to read Egyptian hieroglyphs primarily from one attested translation and a lot of careful deduction. Given a thousand Word 6 documents, I think a good computer archeologist ought to be able to construct a program to open and edit them.
Museums of old hardware, and perhaps some sort of custom computer factor to make ancient hardware strikes me as a good idea. It could be like blacksmiths at SCA festivals, "Ye Olde ASIC Mill."
The real problem strikes as the one most heavily emphasised in the article: decaying media. I suspect the best solution with presently forseeable technology would be to preserve data in crystalised DNA. Even in nature, DNA takes centuries to decay, and if it were crystalised and kept somewhere cool and dry, it would likely last for millenia. Encoding a document onto a billion strands of DNA weighs basically nothing and it would be a very highly redundant storage system.
It isn't easy to do right now, but I suspect that technology is right around the corner and probably only requires a little bit of research money to become practical.
Worse yet, the labor involved with separating out the 1% of stuff that ought to be kept is going to mean a non-zero error rate; people will toss things that are still of value just because they have no time to examine them in detail. What are you going to do....
--
Time is Nature's way of keeping everything from happening at once... the bitch.
Let's see, they'll need:
I think you get the idea. Today's computer technology is based on a huge amount of knowledge and experience. If we can somehow record all this knowledge in books or other long lasting human readable media, our descendants may be able to recreate what we have today. Maybe in as little as fifty years.
With the current state of technology, data can now outlast Copyrights. If there is anything legally wrong with reverse-engineering a piece of data to extract the human-readable meaning, it won't apply to 20-year-old data since it will only be under copyright if the owner of the intellectual property still values it--in other words, still supports it. Otherwise, the copyright will have lapsed by the time you need to extract that data.
Data formats are created by humans, have structure, and therefore can be dismantled and examined for their contents. The original programs had to do it, why not trust that future generations, who will be competent humans, will also be able to do it. I postulate a law: Xant's law: software containing meaningful data can always be reverse engineered when the need is great enough. BTW, NOT EVEN ENCRYPTED DATA is excepted from this rule. Today's high-encryption standards are tomorrow's trivial joke. According to Moore's law, which appears in recent years to be accelerating, encryption that today would take a million years for a computer to crack could, 30 years from today, be cracked in a single year; and a project like distributed.net could probably do it in a matter of hours--that's assuming Moore's law does not accelerate. Projects like quantum computing raise the possibility that there is no limit to the speed of our electronic brains, nor to their rate of acceleration. A few hours is not such a high cost to decrypt the sort of data that we might truly find valuable 30 years from now, whatever it might be.
And 30 years from now, who's to say AI won't be good enough to break down any of the "trivial" data formats we have today into human-readable forms. I can theorize a generalized software algorithm for standardizing data formats.
Hardware obsolescence is a whole 'nother ball of wax, but who's to say in 60 years we won't have a generalized algorithm for pulling data off of hardware?
It's rare that you're presented with a knob whose only two positions are Make History and Flee Your Glorious Destiny.
This is pretty much apropo of nuthin' (sorry), but it thought might interests folks: the oldest etext availible through Proj Gut is a version of Milton's Paradise Lost. It was originally converted to ASCII in 1964 or 65, and had to be input using IBM punch cards-- something like 100,000 of them. Ah, to be young again, manually punching bits out of cardboard with a sharp stick.
Much Love,
"S"HM
*****
(I refuse to spellcheck out of contempt for your belief system)
If technology ran into its physical limits, progress would become incremental rather than exponential. This would require that the space devoted to data storage increase more or less linearly with the amount of data, but it would have the side-effect of eliminating the need to change storage technologies. If 9-track tape hadn't become a hopelessly obsolete format due to its bulk, we'd have no problems reading those 1985 tapes (assuming the oxide or binder hadn't decayed or fused, but that's another issue). Sometimes progress, by creating a gulf between the present and the past, cuts us off from our own history.
I wonder if I could still find a PET emulator and a copy of TOKER someplace... that would be fun to put out and let people play at a party.
--
Time is Nature's way of keeping everything from happening at once... the bitch.
Sonny Bono was the congressional shill for the Scientology organization, an international criminal enterprise masquerading as a "church". They sell L. Ron Hubbards mad dribblings and pulp sci-fi as very expensive self-"help" techniques, bleeding people of their money, self-determinism, and self-respect.
They use copyright law (and trade secret law! for a 501-c3???) to obstruct efforts to expose their "courses" for the pseudo-scientific mindfscks they are, and in the process they create huge damage to free speech and civil rights precedents.
Visit xenu.net for more info.
I think the answer is, for the most part, simple: put data on (acid-free) paper. We have empirical evidence that information can be stored on this media for several thousand years, courtesy of our Egyption ancestors, with human eyes being the only hardware needed (plus maybe a magnifying glass), and the human brain the only software (plus maybe a Rosetta stone).
Anything important can be printed, whether it's scientifc results, historical information, statistics, or even just pictures.
Instead of inventing more complicated machines to store information, we should be looking at the most simple ones we already have.
But, isn't this missing the point?
The problem exists because products like Word build in incompatibilities to force consumers to always purchase the newest product. We don't have to accept this.
The solution is to promote open document standards for everything. This should be part of the decision process when organizations are choosing applications.
Hopefully, in the near future, we will be able to choose an office suite that stores everything in XML format, and uses open object types like PNG or JPG images.
Also, exporting images to a format like PDF or PostScript would solve a lot of problems. Open Source applications exist for both of these formats, ensuring that you are not at the mercy of the application vendor.
Film negatives.
Film negatives are already an effective storage method. They can be quite compact (depending on film grain size) and they have a much longer shelflife when compared to digital storage media.
Deep Time : How Humanity Communicates Across Millennia
by Gregory Benford
From Library Journal
Professor and distinguished sf writer Benford (physics, Univ. of California, Irvine; Foundation's Fear, LJ
3/15/97) adds another reflective title to his large and rapidly expanding oeuvre. Hearty and compelling, his new
book elucidates some of the inherent problems humanity faces in communicating over the expanse of time.
How will the hazards of, say, stored nuclear waste be communicated effectively to future generations? The
prospect of leaving long-lasting, or "deep-time," messages is perplexing. This slim book addresses
environmental issues in order to change how we think about the human impact on Earth; the goal is to make
us good stewards. In the section "Digital Immortality," Benford writes one of the finest brief explanations of
the limits associated with document preservation in a digital age. Much of the overall analysis seems
somewhat anecdotal, but given the speculative nature of the subject, this sort of approach may serve as well as
any other. Recommended for all public and academic libraries.--Dayne Sherman, Hammond, LA
Actually, ancient papyrus pieces are more likely to stick around than many modern paper examples. The acidity of most modern papers tends to make them much more fragile than older papers. This has been quite a problem for the Library of Congress.
They work for the Eloi, so they're good enough for me.
As long as the Morlocks haven't bred out the hand/eye coordination necessary to keep spinning them, we should be just fine.
--
Nature has created the ultimate way of storeing digital data, DNA, for a long time. Its called replication. The SEX OF DATA. It is the _only_ way to keep data around a long time. Otherwise natural disasters, war, mistakes, etc.. will contrive to slowly remove it from the pool.
How do you ensure replication? The same way nature does -- promiscuity -- your data has to be online and available and alive. Data stored on a CD-ROM in a drawer is dead and will disapear simply because of obscurity. How many "old" documents were thrown away in the year 1 AD because they were considered not relevant or important to its contemporaris. Almost all of them. How important are they now -- very much.
Storeing data is not about technology -- it is about people.
The best solution I have seen was an open-source project to turn everyones spare hard drive into a giant ditributed raid of some kind. Never heard what happened to it.
Strangely enough, this is something I've been dwelling on a lot frequently. Everyone praising the web based news publications (katz! bah) and online magazines seem to always overlook the fact that once the issue is gone off the web, it's usually gone forever. And if not now, where will it be in 40 years? I can still go down into my basement, look through my families huge collection of periodicals, and find issues / articles from decades ago. With the quick pace of the web, such an act doesn't seem like it's going to be feasible.
I don't know about anyone else, but there's something disturbing about this fact. Even the first few sites I've done back in the early / mid 90's have been lost forever, and while they were fairly insignificant, it's not an uncommon occurance for information to be lost.
--[shangodee]
In preserving data for periods of 1000 year and more you can't rely on:
:-))
* Keeping hardware to read the data.
* Converting data to another media at regular intervals.
* Any assumption of best/most common format of the data.
You must presume that:
* The (eventual) future civilizations can manufacture the neccessary hardware.
* And that they will have a mental ability to reconstruct the data format and software. (An example: the Egyptian hieroglyphs.)
This means that:
* The media should rely on minimal "mechanical" requirements (rather a CD-player than a card reader...
* They should have a description on the physical containment (of the media).
* The media should preserve the data in at least 10000 years or more.
The real problem is the last above, of course.
The solution is probably something in the lines if a CD (physical "marks") rather than something like a magnetic tape (magnetic/electronical charges).
Thomas Berg
Mundus Vult Decipi
Nanobots with all the information, self replicating and self maintaining.
Nanotechnology is the answer to everything. :)
And throw in statues of Stallman and Torvalds looking visionary or something.
"Reactionaries must be deprived of the right to voice their opinions; only the people have that right." - Mao
This is pure speculation on my part. Bear with me.
There have been various discussions in things like National Geographic and TV documentaries about the ancient Egyptians. Specifically how they managed to build the pyramids and so on. I vaguely remember one discussion where they suggested that the Egyptians were quite advanced technologically, and they had tools and skill that were subsequently lost over the thousands of years.
Here's where the speculation comes in. Could it be possible that the ancient Egyptians had technology on a similar scale to what we have today, and the reason we don't realise it is that thousands of years ago they suffered the same problems that we have now, not being able to store information suitably.
What if 5000 years from now, all our carefully archived information will also have have been lost due to the issues raised in the subject article. It could happen that the civilisation that exists in AD7000 knows about as much about us, as we do about the ancient Egyptians.
(Not intended to a discussion about Egyptians specifically. I'm curious to know what would result if digital storage really did become a significant problem.)
"Well, good luck finding a judge that doesn't run a bestiality site."
What about a project where originators could submit there new formats, and then open source developers could incorperate it into a program used for viewing all (similar) formats?
I will get the mountain, someone else get the chisels.
Just remember to keep your 1 's straight and your 0 's round.
Jesus may love you, but I think you're garbage wrapped in skin.
A choice of masters is not freedom
Reading over many of the comments on decay of digital media it occurs to me that many people are missing the point that digital data is really analog when you get right down to the fundamental formatting. (until we're storing data in quantam media that is...)
Even if a standard CD players can't play a degraded CD, if someone wants the data bad enough, they'll build an error correcting CD player that will reconstruct bits that a normal player can't read. Just like archeologists reconstruct paper or heiroglyphs or fossils today, future archeologists will no doubt reconstruct CD's and hard drives.
Even today, data recovery specialists can read off multiple generations of files. Maybe archeologists will have optical readers which will read the CD/magnetic surface at many times their original resolution / sensitivity and reconstruct the data. Of course it would be nice for us to leave them some equivalent of the rosetta stone so they can decipher the various formats. But overall, I think today's digital media will be far more recoverable than people might think.
Just a thought.
-dialect
To put the more advanced data (like how to make that super-duper AI supercomputer which will end up destroying the entire human race) you could bury it on the moon -- or maybe let it floating on the space somewhere.
One thing I'd like to do is make analog back-ups in case digital preservation methods fail for whatever reason. (Like the fall of human civilization into another dark age, perhaps? -Where we no longer have any sockets to plug our computers into. . .) -Paranoid, sure, but hey. If you prepare against the worst case scenario, then you'll likely do alright against everything else.
I've found a good paper manufacturer, (no-acid, non-bleached cotton fiber paper capable of lasting hundreds and hundreds of years), but I've been waiting for a decent print technology to come along which can output from a computer at a consumer level cost, (offset, litho and web presses, while capable of the task are WAY too expensive for one-offs), and which offers a high enough dot resolution and which uses a highly stable ink. -And which can print at about 11"x17".
An Imagesetter might be the best option, and silver halides on photographic paper can be quite stable, but I don't know about the paper substrate itself. . . And either way, most of the earlier artwork was done entirely in 'Analog', but done on deteriorating paper, but I'm having trouble even scanning black & white dot screens, (newspaper grey tones), without disgusting interference patterns popping up.
Plus I've NEVER seen any computer peripheral company even mention the long-term stability of their printer inks.
Anybody know anything which might be of use?
Deep Time : How Humanity Communicates Across Millennia
By none other than physisist and Sci Fi author Gregory Benford.
The book is non-fiction and takes a serious look at how to convey information across 1000s of years....
Actually ... 2 points. 1) languages in isolation do not change much at all. Examples include: small groups in Switzerland and one of the best examples: Iceland (Icelandic is 'practically' Old Norse) 2) languages in contact regularly change. In fact, the 'data' you presented supports this: the areas you mentioned aren't isolated (unless you mean far away from Europe or North America). Within a small area there are many languages; through contact and a need to differentiate themselves they are quite different. If you want more data and more articles on this process, I would be more than willing to provide the linguistic evidence.
Regnant populi. (The people rule.) Pregnant ropuli. (The snake will soon lay eggs.)
+------->
Some of us don't have english as our mother tongue. That is something that is too often forgotten at places like redmont.
(OK this in slightly OT, but I'll rant anyway.)
Between DOS and windows that company we all love to hate decided to change character sets. Suddenly three letters in the swedish alphabet have a new character code. One and a half decades later (count that in internet time...) we are still struggling with documents with mixed encoding.
That means every damn application has to provide a way to recode OEM to Ansi. AND deal with users who tries to do this conversion on files already converted.
This is *before* dealing with unix and mac files.
So if we cant read freaking text files after ten years, how are we supposed to read binaries?
Sometimes I just get too tired...
All opinions are my own - until criticized
try Longnow.org for more ideas along those lines.
.
. hmmm
ASCII a trade secret?
For some reason, i found that very funny.
Can you say LZW?
Terrorists can attack freedom, but only Congress can destroy it.
The secret of lasting information is copying it over and over. Digital information has to be copied from one medium to the newer as long as the hard- and software to read and write it is available. Interconnectivity is necessary.
How long do books last?
Not talking about American paperbacks but of real, leather bound volumes with acid-free paper, those might last 100-500 years depending on how they are stored and treated.
The only information, which will last is the _written word_ but whatever material you print or write it on - it has to be *accurately* copied over and over.
Think of the Bible: Professional copyists worked hard to preserve the information and we (at least specialists in Hebrew, Aramaic and Greek) can still read and decifer the 10000s of handwritten copies nowadays. Why? Because the Hard- and Software to decode it is built into our heads and this hard- and software replicates with the same pace as population replicates! Interconnectivity is included by means of spoken or written word too. This, of course, implies education. Lamento that fewer and fewer people are good readers. Guess to which development of the past 50 years this is due to!? I wish more money would be spent on education than on development of technical gimmicks.
Finally: Who or what decides which information is worth preserving?
most translations have 1000's of words wrong, which can hugely distort the facts that are actually in there.
remember that it was not originally in english. of course, now the zealots would hate to see the red sea be changed to the reed sea.
music - http://www.subatomicglue.com
Often, hugely important texts have been preserved by chance, despite people's indifference, sometimes even desipte people's efforts to destroy those texts.
Bach's magnificent sonatas and partitas for solo violin were found among some papers that were destined to be used for wrapping butter.
Because Tristan and Isolde was considered, in the Middle Ages, to be a morally and politically dangerous story, only one manuscript has survived of the original version, and it's very incomplete.
The texts that we now have from ancient Greece were preserved by Byzantine scribes who recopied them over and over as the copies decayed. It just so happened that in the 15th century (if I remember correctly), some Byzantine scholars went to Italy, and brought a selection of masterpieces with them. Soon afterward, the Byzantine empire was destroyed, and all its ancient Greek manuscripts were lost; the only ones left were the ones that the scholars had happened to bring with them to Italy. And those are the ones we have, the ones from which the Western world has learned Ancient Greek, which had been forgotten in the intervening period.
--
Heck yeah! It is even better if you put like 6 or 7 on top of each other and nuke em! My findings say that the dark blue ones don't make as many coasters from write errors and neither do they make pretty coasters when they come out of the nuke-o-matic.
Sponge!
This is not an official post from Larry.
There isn't a Slashdot Giveaway
This is a bored individual who enjoys misleading people and generating unnecessary email.
Official VA promotions will always be posted on the VA Linux website.
Sorry for the confusion that has been created.
--Kit
Former Inmate, VA Linux Sanitarium
Given our organization's mandate, I thought I should throw in my $.02.
Although still ramping up and learning how to make things work, we are
trying to ARCHIVE THE ENTIRE INTERNET FOREVER. Crawling or other
forms of collection are used to download the information, and we store
everything on hard drive. We plan to have about 100TB of HTML,
images, Usenet, streaming media, etc.. within two years, and we have
some collections that reach back to 1996.
Currently, we do no backups of the hard drives, because given their
low failure rate (about 1% in our history), it's less lossy overall
to use that space for new data rather than redundancy. By the time we
reach equilibrium with the Internet so that our download rate
approaches the information generation rate of the Internet, we'll have
some sort of backup mechanism in place. Probably software RAID of
some form.
As time passes, we will copy data to new media, but it will be on
disk, this will be much easier than if it were on tape or printed. I
have a vision that in the long run, we may be able to use something
like an Intermemory (intermemory.org) to create a distributed
filesystem that is the storage analog to distributed.net. In an
intermemory, folks donate storage space, so that collectively, a huge
amount of capacity is available. A lot of redundancy is used so that
earthquakes, floods, govt. coups, and massive hardware failures are
still unlikely to result in data loss. As folks' PCs fail or are
upgraded, the simply plug in the new store unit (hard drive,
holographic, etc.) and their part of the intermemory is reconstructed
(like RAID 5).
There's also been comments about how to handle (index/search/browse)
so much data if it is all archived. This is an area of active
exploration in which we are working with research groups and others.
Generally, we've found that working with flat ascii files and perl
scripts is one of the few approaches that scales up to TB of
information on reasonably priced hardware.
From a fanciful perspective, I see us eventually being something like
the "Library Institute" of David Brin's books, or being the digital
analog to the Library of Alexandria. As we are a non-profit, access to
are our archives is freely available (see archive.org) and we
encourage users of a broad range of types. If you are interest in
seeing a large scale implementation of archiving heterogeneous digital
information, check us out. As a shameless plug, we are also looking
to hire developers and researchers. What we develop is open source
and encourage its dissemination.
Kurt Bollacker
Technical Director, The Internet Archive. (www.archive.org)
I think the author (and others), first of all, worries too much about data obsolescence, especially due to software or hardware obsolescence. The author fears that some day, a certain brand/style of computer (or all of them) will become antiquated to the point that the last one will break, and that will be the end of any data stored in a format which "only" that machine can read.
/. story on that one.)
Not gonna happen! Are there many machines -- not just computers -- from history which later humans haven't been able to repair and get to work again? Or even rebuild from ancient pictures and documents? I can't think of one.
Nor am I worried that the lack of commercially available or even skilled help for repair of old computers will mean that we will be hopelessly unable to resurrect them. Groups like the l0pht have done wondrous things in the area of resurrecting old computers; from rebuilding an old VAX, to running a web server on a Mac Plus, and various reverse engineering of both antiquated and SOTA devices. Similarly my grandfather, a retired marine engineer, works at the railroad museum in Florida repairing old steam engine trains.
Should we be worried about people not being able to fix an Apple -- the first of which was built by two guys in a garage -- when three college students can build a nuclear breeder reactor under their bed? (See past
.....
On the other hand, the author worries a lot about software/hardware obsolescence as a threat to data persistence. What about the bogon factor? Data maintainers are worried about two big things when it comes to losing data: accidental deletion, and hardware failure. They're not worried about, for example, DDS tapes being discontinued, because they know they will eventually have to upgrade their backup methods to new technology. But if their backup tape gets caught in the drive mechanism, or gets immersed in water, or some fool munges the backup, or write over an old tape... These things are the real big problems, and I dare say much more data is going to fall to the factors of human error and natural disaster than any worries about data formats becoming obsolete.
Tell me what sort of really important, crucial data is sitting on old media? I know that apparently certain data tapes from mid-20C censuses are supposedly "lost forever" due to hardware obsolescence. But is the data on those tapes really useful? In other words, is there anything useful on those tapes that isn't in another format already (books, documents, etc.)? I think not.
I'm paranoid about losing my own data. I still migrate old disk drives from new machine to new machine because they contain old data which I can't replace (it's mostly all original work). I even shelled out top dollar for disk drive scanning software to recover data on a disk I was forced to reformat. Eventually I will back it all up, or copy it to a new drive. But if I were to delete any of that stuff by accident, and lose it to other disk operations, it would be gone, gone, gone. That's my real fear.
Terrorists can attack freedom, but only Congress can destroy it.