Archiving Digital Data an Unsolved Problem
mattnyc99 writes, "It's a huge challenge: how to store digital files so future generations can access them, from engineering plans to family photos. The documents of our time are being recorded as bits and bytes with no guarantee of readability down the line. And as technologies change, we may find our files frozen in forgotten formats. Popular Mechanics asks: Will an entire era of human history be lost?" From the article: "[US national archivist] Thibodeau hopes to develop a system that preserves any type of document — created on any application and any computing platform, and delivered on any digital media — for as long as the United States remains a republic. Complicating matters further, the archive needs to be searchable. When Thibodeau told the head of a government research lab about his mission, the man replied, 'Your problem is so big, it's probably stupid to try and solve it.'"
than the previous ages where all information was kept on paper or in spoken words? The problem isn't so much how to invent something that will always be readable, but some way to always have the applications to read it. If it were not for the Rosetta Stone, much of what we know about the ancient world might still be a mystery.
Support NYCountryLawyer RIAA vs People
There are only two ways of doing this: keeping a copy of every program used to create these files (and a system to run them on) or converting them to some open and well-supported format.
For text documents, HTML is probably the best bet. It is so widely used and supported readers are almost garunteed to exist as long as computers do in their current form. (And if something ever truely supersedes it, a mass-conversion program will be written anyway.) HTML probably works for basic spreadsheets too. Graphics support for GIF, JPEG, and PNG is probably at that level as well, and MP3 for music.
As a bonus, most of the native programs for the documents to be preserved have translators to these formats already.
Beyond that I have no idea.
'Sensible' is a curse word.
This isn't the 80's and almost any file being saved in Archives are in formats that many programs can open. Meaning that the specifications for those formats are known... regardless of whether or not it is legal. Even word files are viewable by a number of applications, and nobody is archiving historical information with advanced macros so don't even post with that macro crap.
Also to assume that future generations won't have the sense or ability to figure out how to open files we write is silly.
Because "some" businesses (or the military like the articles suggests) find opening archived information ON THE FLY difficult doesn't mean a (more technolgically advanced) society wanting to learn their past will have the same limitations. This article is just another example of entry level "tech writers" and of how low journalistic standards are.
PS
I am not a journalist... so save your grammer and spelling corrections for someone who is.
From TSA: "Popular Mechanics asks: Will an entire era of human history be lost?"
Obviously not; Popular Mechanics itself has preserved much of the era in traditional hardcopy formats, making it no less lossy than previous printed-word eras.
Of course, understanding the era from such incomplete and unreliable records will be a challenge to archaeologists and historians; again, not much different from previous eras.
In conclusion: doesn't matter, hardly news.
Any sufficiently well-organized community is indistinguishable from Government.
Things like music, TV shows, movies, literature, toys, magazines etc are all cultural products. For future generations we need to keep records of there items as much as family trees, great stories, buldings, etc.
;)
Besides, who's to decide what is 'crap' or not. It might be that to the untrained eye, a clay pot from Egypt might not look interesting. The color, shape, its condition, etc might tell someone who used it, why, what cultural value (symbology, usefullness, etc) the pot actually had. And culture evolves from culture. Keeping a record of everything we product allows future generations to inform themselves of who we were and what we did. Quality of the information itself is really unimportant.
Only thing I'd have to add: I wish future generations all the luck in sorting through our garbage piles and recycling/salvaging what they can. If anything, this amount of waste - or crap - is a record of us as much as anything. I can agree with you on this point about crap in our culture!!!
It really isn't a question WHETHER we will be able to read old digital data in the future. After all, humans invented these formats, flawed as they may be, and humans can decipher them with enough effort. We can crack cryptography -- a deliberate attempt to make it as difficult as possible to decipher certain information. So it's hard to imagine any data format that could not be deciphered in the future with some honest effort.
Instead it is a question of whether the data is WORTH the effort. From an anthropological standpoint, this is valuable historical data, and its value is not decreased by our inability to interpret it. The benefit of digital data is that it can be copied even if we don't know what it means. It will not erode or decay like other historical artifacts, if we put in the small effort required to preserve it. Assuming humanity doesn't self-destruct, there will be plenty of time in the future for historians to decipher and interpret the data when a need arises for it.
Expanding copyright protection to a term equal to two lifetimes means that now even some of the good stuff is being lost because it is not allowed to preserve it.
If preservation is outlawed, only outlaws will be preservationists.
I believe Ray Bradbury had something to say on this subject.
KFG
I've been wondering, with our global nature now, will we need archeologists in the future? While I believe cililiziations will surely 'collapse', won't we all be around to immediately take note of it, and update Wikepedia? Seriously, I don't think we're going to be digging for stuff from this time, the global nature of our society leads me to that conclusion. It's not like when Greek society fell.
the trick is... hoping that in a hundred thousand years people still care at all about their past. The slow realization as I read Isaac Asimov's Foundation saga about the origins of the Galactic Empire chilled me, mostly because the people of the empire had become so numb to their past as to have made it vanish entirely.
Promote Charity on Myspace, Show Your Colours!
Yes, exactly. It's the ephemera that tells you what life was like in any given era, not the palaces, official monuments, etc.
I'll wager you could reconstruct far more about the culture of early 21st century from the contents of a convenience store than that of the White House. There's a big gulf between who a people are and the mask they present to the world.
I believe Ray Bradbury had something to say on this subject.
Perhaps more ironic -- it's a pretty good bet that whatever he wrote on the subject, it's not available online due to copyright restrictions imposed by his publisher or "estate."
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Expanding copyright protection to a term equal to two lifetimes means that now even some of the good stuff is being lost because it is not allowed to preserve it.
Huh. So the FSF will win by default. You gotta hand it to somebody who is willing to play the long game.
Toronto-area transit rider? Rate your ride.
Wonderfull plan - but what if you cant find a working C compiler?
Freedom or George Bush
Open and widely published formats are good, of course. But if you're looking for a really long term solution (as in multiple millennia), then I think the prime requirement other than physical durability should be easy reverse engineering. This way the data has some hope of recovery even if the knowlege of the format has been lost. This generally means that simpler is better. Things like plain ascii text. Uncompressed and unencrypted image and/or audio data. Verbose ascii based vector graphics. Things like that. Put it all on a durable, low density, and simply formatted media that will easily give up its secrets to relatively low-tech and completely non-specialized tools like a microscope. It's not the most efficient way to store data, but it's much more likely to be useable by future archaeologists than things like MS-Word files, WMA files, JPG's, MP3's, etc.
As much as anything, it seems like we might worry about people rewriting the past. It'd be hard to edit part of one of the original copies of the US Constitution without anyone being able to tell the difference, because we actually have a really old piece of paper that someone would have to get access to, somehow erase some ink, and write over top with identical ink.
But a historical document in the form of a text file on someone's hard drive? That can be edited without a trace.
I doubt that a historian would see it your way. How many records, judged by their contemporaries as irrelevant, have helped historians piece together valuable perspectives about times past! Like the monks who deemed it appropriate to copy over Bach manuscripts, isn't there hubris when we declare with certainty what is and is not worth preserving? Perhaps we don't have enough perspective to reliably do that.
I think you're wrong (and you use a double negative ;-)
Most people are disinterested in history, hence there is no guarantee of a verbal knowledge continuum in the event of widespread hardware failure.
We know that the hardware always eventually fails.
We know that hardware always becomes obsolete.
We know that civilisations always fall.
We also know that these things have happened in the past, resulting in the loss of knowledge (in some cases it was because the language became extinct, and has never been deciphered. In others it was because proper documentation was never made, or was lost, or was destroyed, etc. If you think about archaeology, it only exists because of a _lack_ of documentation. It's trying to piece together data from scattered, incomplete fragments).
The fact that you so easily dismiss this shows a lack of knowledge of history (point 1), and perfectly illustrates that old adage "if there's one thing we can learn form history, it's that we don't learn from history."
Doubtless the Anglosaxons felt the same way about their rubbish... and yet archaeologists get orgasmic over the everyday bits and pieces that tell them so much about how those normal people led their lifes.
The question isn't IF it will disappear, the question is really WHEN and HOW. Printing to paper-based hardcopy helps for a few hundred years. It can be recopied from paper to paper easily - it's a very low context solution: ink on paper followed by ink on paper. So, important information about our society can be transferred across generations, even if the generations have no electricity at all. This is how we know Shakespeare, for instance.
Many people say "Oh, but we'll have some NEW technology that will take care of it". This assumes that the resource base for a new technology will be as generous and dense as our present resource base provides. This is a VERY unwise presumption, as there is categorically no proof that such will be the case. In fact, there are a variety of intense warning signs that suggest quite the contrary.
From the evidence I have found, and, oddly, I've studied this for a number of years now, I am fairly well convinced that industrial civilisation will simply erase itself from the human record as little more than a horribly polluted stain that destroyed itself through overpopulation and environmental stupidity. All the music you hear, all the shows you watch, all the films you cried at, it will all go away. Poof. This also means that self-absorbed hucksters like Madonna, Britney Spears, Michael Jackson, Tom Cruise, and their supporting technology of TV, Radio, DVD/CD, etc will also disappear - just the flotsam of "entertainment" culture.
The long term future will be people chasing bison/cows across the prairie or living in small agrarian villages bound by localised population bursts and die-offs. But it will take several centuries to get their. In the meantime we've got our MTV and Orange Crush. The most important thing to remember is this: not getting to that Star Trek future IS NOT A BAD THING. We pissed away the globe's resources on our Xbox's, SUVs, jetset vacationlands, and all the other minutae and ephemera that makes a society "civilised" and provides "leisure activity". All societies have that, to varying degrees. We just had more of it, thanks to our insane and unrelenting exploitation of resources, petroleum, and electrical generation. But it will all go away, and THAT'S OK.
We will disappear. We Are Atlantis.
RS
Shoes for Industry. Shoes for the Dead.
I ask: has this ever happened before?
Not necessarily in electronic bits and bytes. Not the "Alexandria Library" that was mostly duplicated in other libraries or private collections. Maybe like the Inca quipu, mats of knotted strings that recorded all their empire's operational records, other than the ceremonial records in statues and murals. But some quipu survive, despite Spaniards destroying most of them in the mid-1500s. Enough that we can at least recognize that they did have records of lots of transactions.
No, something more transient, as transient as our bits, read/written by something more transient than our metal/plastic/glass machines. Maybe songs or other performed stories, like tribal Australians. Maybe woven in more degradable material, like uncured plant matter. Maybe both, like the Pacific star navigation lore taught in temporary woven stics, but carried in the mind. Maybe patterns in some other loseable medium, like animal pelt patterns no longer readable now that the code has been lost, or interbred back into "blankness".
If it can happen to us, it could have happened before. Our civilization rose from meager beginnings only about 12K years ago, after the last Ice Age that lasted about 12Ky. There was another one before that, with people accumulating knowledge between. And probably a half-dozen or so others since we became as genetically developed as we are today, between 7Mya and 200Kya. We don't even have many records from the first half of the last 12Ky. Could we be reinventing the wheel, literally, every 25 thousand years?
--
make install -not war
Look.
In 100 years, you will be forgotten.
In 1000 years, your country will be forgotten.
In 10000 years, your civilisation will be forgotten.
In 100000 years, your species will be forgotten.
One thing you can absolutely count on is that you and everything you find familiar will be lost and forgotten. Nothing that you accomplish, no matter how famous, infamous or worthy will be remembered in 10,000 years.
There is only one contribution you can make which will have any lasting effect at all, and I'll let you work out what that is for yourself.
Deleted
As a game developer, it's profoundly disturbing how casually we treat games just a few years old. Hardware will continue to evolve and OSes will change; we really need a way to secure our ability to play old games.
Console games are semi-okay because you can at least keep the (static) hardware around, but PC games are in bad shape. PCs evolve gradually, and it only takes one small OS or video driver change to render a game unplayable. Because games are a commercial medium, games simply aren't supported once it's no longer financially beneficial.
As long as there are programmers out there willing to write emulators, I suppose we're okay... but it still makes me nervous.
Wrong. People DO want to be sheep. How many do you know that willingly trade freedom for security? People don't want to think; they've been educated for years not to. Philosophers have removed the cause-effect link, and so the large majority of people cannot distinguish between a man-made disaster and a natural one. My only solace is that I'll be worm-food by the time the end comes.
I want to delete my account but Slashdot doesn't allow it.
Yep. Microsoft's commitment to their "Plays for Sure" campaign with the Zune really instills confidence in their backwards compatability.
At least with OpenOffice I can legally archive the source code and install images needed to access the data for that period (say, every year or six months.) Sort of like dropping a copy of TrueCrypt on a DVD full of crypto archives.
With the new DRM keys and license enforcement policies, I dread someday trying to resurrect an old image so I can access data archives, only to find it wants to register with a DRM verification service that no longer runs or is no longer compatible with a 4-5 year old install image.
I do not fail; I succeed at finding out what does not work.
To most people, any of the files they used on computers before their first "IBM Compatible" is probably lost forever already. Think of how many files are "frozen" on 5.25" floppy disk for the Commodore 64 alone!
That dosen't have to be the case though, you can retrieve files from disks of hundreds of different 80's era computers on a modern PC using a Catweasel card. http://www.vesalia.de/e_catweaselmk4.htm
With the catweasel, a standard 5.25" PC floppy disk drive (hello, ebay), and a 3.5" PC floppy disk drive there's hardly a floppy disk you won't be able to retrieve your petrified files from.
Finding a program that can do anything with those files is another subject entirely.
... and in the DRM, bind them.