Lockheed Chosen For Electronic Records Archives
TrentL writes "How will we be able to read 1990's email messages in the year 2090? Will GIF files still be accessible in 2105? The US National Archives - tasked with preserving records "for the life of the republic" - has chosen Lockheed Martin to solve exactly this problem. Lockheed was awarded the $308M Electronic Records Archives contract after a year-long design competition. Full Disclosure: I worked on Lockheed's demo team."
We're just lucky that Walt didn't dream up LZW compression while he was working on Steamboat Mickey, or we'd have patents lasting for the author's life plus 90 years!
-paul
Pistol caliber is like religion: everyone has their favourite, and theirs is the only right choice.
This has a fundamental chicken and egg problem: So you store the information, you also need to store the format of that information. So then how do you read "format of the information" document? What format is *that* in?
... Do you carve it into stone?
:-(
You see; whatever format you used for anything has to be documented and you can't use paper because it won't last as long
Worse still you need some computer science grads to write up exactly the format down to how long a char is and the bit/byte order. It is a extremely difficult task even if you don't take into consideration finding a storage medium that will last that long.
Comment removed based on user account deletion
It's not just the government that needs this. Since we're funding this effort with our taxpayer dollars, I'm hopeful that some of the results from this work will lead to the availability of tools us normal folks can use to make sure our precious data can be preserved and passed down from one generation to the next.
Not sure where I read it, but there was an article I read about using good old cheap IDE Raid as a tape replacement. Some guy did it on a large scale for university, and a (relativly low cost). Considering the low cost per GB, and easy scalability, why not?
Are you against the National Archives? This program enables the National Archives, into which we've already sunk billions over the centuries, to continue to be (even more) useful in the Information Age. That's our information. Why should we throw it away now?
I'm curious, did you have any criticism for the $300M "bridge to nowhere" in Alaska when it was reported in the new budget this year? And where are you on the $200B+ we're spending in Iraq?
--
make install -not war
For a start, they should stop using stupid proprietary formats like Real Video (the Press Conference Video on their website is only available for Real Player).
tasked with preserving records "for the life of the republic"
Task completed......
Service guarantees Citizenship! Questions Guarantee GITMO.... Amerika Uber Alles!
...all the 1990's pr0n! We need to keep that in a repository for the benefit of mankind for generations to come!
First off, you do not seem to know (or do not remember) that NASA is losing all sorts of data. They have 2 problems. Just 40 years ago, they were storing data on Tape Drives. The tapes are decaying so the data is disappearing. In addition, the formats are disappearing. Back then, all the specs were written down, and yet, the formats are hard to find in mountains of data.
SO now, forward a hundrew years. Just 15 years ago, I was working with CDs that would last 100 years (50 bucks a pop). Now, ppl seem to assume that the current disk will last that long. They will not. The old disks were made out of thin gold sheets in plastics. They are now some plastic in plastic. These CDs/DVDs will last less than 10 years (and probably closer to 5). In addition, the tape drives and hard disks are storing million time more data than what was in tape in the 60s. That is the storage density is WAY up. So now, as a small pox shows up, it will affect millions x more data, making recovery very difficult.
I prefer the "u" in honour as it seems to be missing these days.
The specs could easily be lost over a long period of time, and it's very hard to reverse engineer algorithms from scratch (given that in 100 years, newer and more optimal algorithms than, LZW will be used). It's predicted that the only image format that will still be around in 100 years is ppm, simply because it only takes about half an hour to implement from scratch!
Get a free iPod Nano 4GB!
This has a fundamental chicken and egg problem: So you store the information, you also need to store the format of that information. So then how do you read "format of the information" document? What format is *that* in?
Latin, videlicet.
But seriously the problem in records is not going to be collecting the data, but turning it into knowledge. Meaning that humans in the future are likely to seriously misinterpret or be unaware of the intended meanings and social and political contexts of the preserved data.
This is not a technology problem.
They ought to make sure that real professional historians are there.
This is not nearly as difficult as you make it seem: implement the parser in a standardized language. The formal specification of the standardized language can then be included with the source of the parser.
Getting code to run on later architectures is not usually very difficult. I am fairly comfortable with the proposition of porting any code to any future architecture -- the "emulator scene" testifies to the viability of this strategy. The biggest problem to be solved is reading storage media for which no hardware exists.
For example, how do I get to my college research stored on AmigaDos floppies? Tragically, the easiest solution is to try to get my Amiga running again, and then move the data over a serial cable with kermit. I'm awfully glad I have kermit on that computer, because I don't think I'd be able to find any 2400 baud Amiga BBSes around to download it.
Liberty you never use is liberty you lose.
Did Google compete for this contract? They're the ones with the largest infrastructure for such a project and the brains to give us a really slick interface to it all. Not to mention that they could probably have faster response times than archive.org which totally fuckin' blows.
What is your penile percentile?
While looking through the documentation http://www.archives.gov/era/about/documentation.h
I found a link to the project requirements : http://www.archives.gov/era/about/requirements.cs
Which contains the following line
I know, one typo in one line in several hundred, but why that line ?
Technically I don't see any problem with storing 100PB of data in the next decade, and keep it safe from natural disasters. But how about unnatural disasters, such as an evil administration changing the entire archive to reflect better on itself or protect itself from criminal prosecution? Copies of the archive packages need to be suitable dispersed in multiple jurisdictions or even shot into space in order to make this kind of data destruction infeasible.
"The Electronic Records Archives. By the same man who gave us the Stealth(TM) aircraft".
Hhmm...
This example of format obsolescence just popped into my head. Back when Commodore-Amiga was a going concern, the IFF-ILBM graphics format was pretty widely used. It was a nearly universal standard on Amiga.
A fair number of artists and video producers used Amigas. One of Amiga's advantages was that practically all the graphics programs used ILBM format, which meant you could easily feed the output from one application into another, and then into another. It was good, and it wasn't all that many years ago.
Just trying finding a program on Mac OS X or Windows today that can read IFF-ILBM files! Go on, try it! Photoshop, for one, doesn't have a clue about them. The best you can hope is to find some obscure freeware IFF-to-PNG converter that someone has hacked together.
Another example: It's getting harder to find apps that play "tracker module" music, and the programs that are available tend to be awkward and unreliable. Everything went to MP3, and mod music was quickly forgotten.
So if the idea of today's commonplace formats becoming unknown in the future sounds far-fetched at all. . . It's not.
YES! Finally a job after all those years studying Akkadian! Clay tablets are some of the most durable media I know. At least they have a proven record. Vast numbers of documents illustrating the fascinating world of accounting, esp. Sumerian sheep and goat transactions, is available thanks to the scribal choice of clay (combined with hot arid conditions). Will soon Lockheed HR soon be seeking 8-10 years of prior "Cuneiform/Pictographic" scribal experience? I can also read omens in the entrails of an ox. That can come in handy.
I have been saying for years that the DoD should make an initiative to move towards open standards for this exact reason. The document retention requirements they have are incredible, and yet nearly all the documents generated are saved in proprietary formats. Now with the OASIS (OpenDoc) format solidifying and there is more than one implementation of it, they wouldn't even have to define a standard for word processing or spreadsheets.
Obviously, open standards are not a panacea. There are countless standards created by the military that never really spread farther than that, and therefore the support for them is limited (and thus companies that do support it can charge a pretty penny for it). And with open standards, at it is much easier to write an implementation if you need to. Compare this to MS Word, which is a pain to reverse engineer now, just imagine having to do so in the distant future, when it is not as widespread. And of course, for the very long term, nothing is more certain (and more inconvenient) than printing everything out and storing it in a warehouse, which is what is done now. But the longer that can be postponed, the more money can be saved.
As an added bonus, just imagine the competition that would spring up in the word processor market, if the DoD mandated that all new word processor documents generated internally or by contractors be in OASIS format, starting 5-10 years from now. Microsoft would have to support it (and well) or throw away a huge number of Office sales. The DoD would no longer be locked into a single vendor, saving them money upfront in addition to the money they saved on document retention.
Until then, the best plan is likely to convert as much as possible to a few standards like PDF, which is what I expect will happen here.
I'm trying to find out where in our Constitution does the Federal Government find an enumerated power to pay for this.
Wow, you can access the Constitution? I mean it was written in 1776. That's a long time ago. Good thing somebody thought to save it!
We're saving lots of data, because 1) lots of it is important and 2) we have very little perspective on it yet. In 200 years we might very well have a very different idea of what was important today.
So close and yet so far from the world's perfect ID number
All these people whinging about about how cd's won't last - I'm pretty confident that if I bother to hold on to the cdroms in my draw, provided they're kept in their cases/good condition they'll be just as playable (on the same hardware) in 100 years. Frankly I hope (probably all) of the stuff in my e-mail isn't around in 100 years.
The amount of data we are talking about is HUGE. There is no way humans could manually upgrade the data. It would be a technical and policy nightmare. As for preserving emails, the email messages of the executive branch contain much historical significance.
I worked with them for a while, as a data entry person back in the early 90's. Basically, we were responsible for keying in a parcel's 5-9 digit Zip code after it had been scanned into the system. By scanned, I mean the front of the package or envelope showing the send-to and return addresses was presented on a monochrome display, which allowed the person operating the terminal to enter the zip codes for the parcels. Then you'd hit a key and move to the next one, and so on and so on.
The bizarre thing is that I found out a few of the invididuals would "pad" their PPM (Parcels Per Minute) by typing in zipcodes they were familiar with instead of reading what was on the display, just to enter a dozen or so really quickly. It didn't happen often, but it helped them keep up the pace and "clear" the system queue more quickly, thus gaining them and their workmates an early break. However, I've no idea what damage may have occurred by their lax attitudes, and I really don't want to know now.
Which brings me to my point (I think): how can we be certain the data they're entering is one-hundred percent accurate, regardless if the medium lasts a century?
The Chronic *WHAT* les of Narnia!
Even that's pretty generous IMHO. In my experience, recent blank CDs (and DVDs) are lucky to make out 18 months, and many of mine are delaminating or corroding after only 12. I've now gotten into the habit of burning two copies of everything I "archive", and re-burning them every 12 months. Thus far I've had errors, but never errors in the same place on each copy.
Contrast this to the good old "Kodak Gold" CDs I was burning onto back in 1996, almost all of which are still readable with 0% errors...
Walt's testimony to the House Committee on Un-American Activities, 24 October, 1947
This has been discussed before. The sheer volume of data that would have to be copied every few years is HUGE. How long would it take you to transfer a stadium full of CDs onto DVDs? How much would that cost?
There's good reason people are looking for digital technologies that are as inherently stable, long-lasting, and reliable as writing on paper.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant