Archiving Digital Data an Unsolved Problem
mattnyc99 writes, "It's a huge challenge: how to store digital files so future generations can access them, from engineering plans to family photos. The documents of our time are being recorded as bits and bytes with no guarantee of readability down the line. And as technologies change, we may find our files frozen in forgotten formats. Popular Mechanics asks: Will an entire era of human history be lost?" From the article: "[US national archivist] Thibodeau hopes to develop a system that preserves any type of document — created on any application and any computing platform, and delivered on any digital media — for as long as the United States remains a republic. Complicating matters further, the archive needs to be searchable. When Thibodeau told the head of a government research lab about his mission, the man replied, 'Your problem is so big, it's probably stupid to try and solve it.'"
I can't wait to hear Microsoft's explanation why the project should use one of their proprietary formats.
Apology to Ubuntu forum.
So, they're shooting for about 10 years then?
than the previous ages where all information was kept on paper or in spoken words? The problem isn't so much how to invent something that will always be readable, but some way to always have the applications to read it. If it were not for the Rosetta Stone, much of what we know about the ancient world might still be a mystery.
Support NYCountryLawyer RIAA vs People
Worked for the Egyptians didn't it?
So rise up, all ye lost ones, as one, we'll claw the clouds.
Working at a University, this is not a subject I'm not unfamiliar with. We've had lots of discussions about this. Everyone always talks about how many zillions of "pieces of information" are out there. The number of web pages in existence is always brandied about. My point in these discussions is that most of what's out there is crap. Humanity is not lessened by its loss. Good stuff gets reproduced, reviewed, studied, dissected, etc. and survives. It *is* stupid to try to solve this problem, because the problem doesn't need solving.
There exists no way of exchanging information without making judgments. --Bene Gesserit Axiom
Since I shoot RAW, I also burn a copy of dcraw.c onto every disc - so even if the current platforms get lost by the wayside, there will be code to convert them still.
;)
Storage itself? Currently burning onto Delkin Archival Gold, storing cool and dark, and in two physically distant locations.
They're also stored on my harddisk, and the best are backed up onto a USB drive.
If it looks like the DVD-ROM drive is becoming obsolete I'll burn them on to whatever comes along next.
If you're truly paranoid you can always print them on archival quality paper using pigment based inks
There are only two ways of doing this: keeping a copy of every program used to create these files (and a system to run them on) or converting them to some open and well-supported format.
For text documents, HTML is probably the best bet. It is so widely used and supported readers are almost garunteed to exist as long as computers do in their current form. (And if something ever truely supersedes it, a mass-conversion program will be written anyway.) HTML probably works for basic spreadsheets too. Graphics support for GIF, JPEG, and PNG is probably at that level as well, and MP3 for music.
As a bonus, most of the native programs for the documents to be preserved have translators to these formats already.
Beyond that I have no idea.
'Sensible' is a curse word.
From TSA: "Popular Mechanics asks: Will an entire era of human history be lost?"
Obviously not; Popular Mechanics itself has preserved much of the era in traditional hardcopy formats, making it no less lossy than previous printed-word eras.
Of course, understanding the era from such incomplete and unreliable records will be a challenge to archaeologists and historians; again, not much different from previous eras.
In conclusion: doesn't matter, hardly news.
Any sufficiently well-organized community is indistinguishable from Government.
I'd trust that guy. If there's one thing our governrment knows, it's stupidity.
"Was it a millionaire who said 'Imagine No Posessions?'" -- Elvis Costello
In this era of virtualization, the solution for x86 software is as easy as retaining a copy of the primary partition of a computer originally used to work with the desired files. Searchability could be a problem for proprietary data formats, but the move to open standards in the future will mitigate that.
The real problem is 60 years of archives of antiquated, proprietary, task-spcific and mainframe computer data cards and tapes whose original programmers are halfway to cedar boxes; if the government can't get their support in time it may as well call all the early stuff a loss and hand it over to archaeologists.
(It's never too late to join the Renaissance)
It really isn't a question WHETHER we will be able to read old digital data in the future. After all, humans invented these formats, flawed as they may be, and humans can decipher them with enough effort. We can crack cryptography -- a deliberate attempt to make it as difficult as possible to decipher certain information. So it's hard to imagine any data format that could not be deciphered in the future with some honest effort.
Instead it is a question of whether the data is WORTH the effort. From an anthropological standpoint, this is valuable historical data, and its value is not decreased by our inability to interpret it. The benefit of digital data is that it can be copied even if we don't know what it means. It will not erode or decay like other historical artifacts, if we put in the small effort required to preserve it. Assuming humanity doesn't self-destruct, there will be plenty of time in the future for historians to decipher and interpret the data when a need arises for it.
I believe Ray Bradbury had something to say on this subject.
Perhaps more ironic -- it's a pretty good bet that whatever he wrote on the subject, it's not available online due to copyright restrictions imposed by his publisher or "estate."
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Unless I miss my guess, Google will continue towards its stated objective of making all the world's information searchable and retrievable. Want something archived, Google will take care of it. And if Google fails, my suspicion is the entity that takes their place will take it on.
I ask: has this ever happened before?
Not necessarily in electronic bits and bytes. Not the "Alexandria Library" that was mostly duplicated in other libraries or private collections. Maybe like the Inca quipu, mats of knotted strings that recorded all their empire's operational records, other than the ceremonial records in statues and murals. But some quipu survive, despite Spaniards destroying most of them in the mid-1500s. Enough that we can at least recognize that they did have records of lots of transactions.
No, something more transient, as transient as our bits, read/written by something more transient than our metal/plastic/glass machines. Maybe songs or other performed stories, like tribal Australians. Maybe woven in more degradable material, like uncured plant matter. Maybe both, like the Pacific star navigation lore taught in temporary woven stics, but carried in the mind. Maybe patterns in some other loseable medium, like animal pelt patterns no longer readable now that the code has been lost, or interbred back into "blankness".
If it can happen to us, it could have happened before. Our civilization rose from meager beginnings only about 12K years ago, after the last Ice Age that lasted about 12Ky. There was another one before that, with people accumulating knowledge between. And probably a half-dozen or so others since we became as genetically developed as we are today, between 7Mya and 200Kya. We don't even have many records from the first half of the last 12Ky. Could we be reinventing the wheel, literally, every 25 thousand years?
--
make install -not war
I doubt you'd sell many Nano-Pump (tm) enlargement kits. It's all in the name, even in the future.
SAILING MISHAP
Yep. Microsoft's commitment to their "Plays for Sure" campaign with the Zune really instills confidence in their backwards compatability.
At least with OpenOffice I can legally archive the source code and install images needed to access the data for that period (say, every year or six months.) Sort of like dropping a copy of TrueCrypt on a DVD full of crypto archives.
With the new DRM keys and license enforcement policies, I dread someday trying to resurrect an old image so I can access data archives, only to find it wants to register with a DRM verification service that no longer runs or is no longer compatible with a 4-5 year old install image.
I do not fail; I succeed at finding out what does not work.
This reminds me of the study done for the Waste Isolation Pilot Plant (http://downlode.org/Etext/wipp/#executivesummary) . The study looked at how to mark the site in such a way that the purpose of the site would be indicated for 10,000 years.
While the WIPP site won't have the benefit of constant updating of the media (it's designed to be survive on its own for 10,000 years) it does address some of the same points; longevity of the media, a format that will be usable into the future, and ability of future civilizations to understand the message.
Off-topic perhaps but an interesting read.
Government's idea of a balanced budget: take money from the right pocket to balance...oh who am I kidding?
Yo mama so fake, she failed the Turing Test.