Lockheed Chosen For Electronic Records Archives
TrentL writes "How will we be able to read 1990's email messages in the year 2090? Will GIF files still be accessible in 2105? The US National Archives - tasked with preserving records "for the life of the republic" - has chosen Lockheed Martin to solve exactly this problem. Lockheed was awarded the $308M Electronic Records Archives contract after a year-long design competition. Full Disclosure: I worked on Lockheed's demo team."
Analog media couldn't be restored because the machines that read it broke (couldn't they make new ones?) but as long as the specs exist, I don't see why they won't be able to read the digital data (assuming we still use two bits in the future).
Send email from the afterlife! Write your e-will at Dead Man's Switch.
We're just lucky that Walt didn't dream up LZW compression while he was working on Steamboat Mickey, or we'd have patents lasting for the author's life plus 90 years!
-paul
Pistol caliber is like religion: everyone has their favourite, and theirs is the only right choice.
This has a fundamental chicken and egg problem: So you store the information, you also need to store the format of that information. So then how do you read "format of the information" document? What format is *that* in?
... Do you carve it into stone?
:-(
You see; whatever format you used for anything has to be documented and you can't use paper because it won't last as long
Worse still you need some computer science grads to write up exactly the format down to how long a char is and the bit/byte order. It is a extremely difficult task even if you don't take into consideration finding a storage medium that will last that long.
Comment removed based on user account deletion
It's not just the government that needs this. Since we're funding this effort with our taxpayer dollars, I'm hopeful that some of the results from this work will lead to the availability of tools us normal folks can use to make sure our precious data can be preserved and passed down from one generation to the next.
Not sure where I read it, but there was an article I read about using good old cheap IDE Raid as a tape replacement. Some guy did it on a large scale for university, and a (relativly low cost). Considering the low cost per GB, and easy scalability, why not?
Are you against the National Archives? This program enables the National Archives, into which we've already sunk billions over the centuries, to continue to be (even more) useful in the Information Age. That's our information. Why should we throw it away now?
I'm curious, did you have any criticism for the $300M "bridge to nowhere" in Alaska when it was reported in the new budget this year? And where are you on the $200B+ we're spending in Iraq?
--
make install -not war
For a start, they should stop using stupid proprietary formats like Real Video (the Press Conference Video on their website is only available for Real Player).
tasked with preserving records "for the life of the republic"
Task completed......
Service guarantees Citizenship! Questions Guarantee GITMO.... Amerika Uber Alles!
...all the 1990's pr0n! We need to keep that in a repository for the benefit of mankind for generations to come!
"The system's "initial operating capability" should be available during Fiscal Year 2007. Weinstein noted that "the system's architecture makes it flexible enough to accommodate evolving policy change," including the importance of "providing public access while protecting privacy and sensitive information.""..HAHAHAHAHA! Anything even *remotely* important or interesting, paid for by tax payers or not, sorry, "terrorism, security", yada yada yada.
Just look back at how much technology has changed in the past 10 years. We had 5.25" Floppy drives used back in those times, and 3.5" floppies were used as well, and CD burners were just starting to come available at the speedy rates of 1-2x, not to mention hard drives were so small compared to the 500gb drives we have today... and Windows 95 was just released, wonderful system based on FAT architecture... not NTFS like we have today...
Computer technology is increasing at such a rapid rate these days. I can only imagine how it will be in 10 years, much less 100 years from now. I am sure by then that clock speed will be in hundreds of gigahertz, memory in the terabytes, and storage in the petabyte range... if not even higher... who knows...
I also wonder, if in 2090, will their CD-ROM equivalent even exist to read this storage library? They may have long ago abandoned CD-ROMs for being too slow, and if data is stored in this format, how will it be read? Also, as hard drives get larger and larger, am sure the IDE, SCSI, and SATA drives of today will not be readable by the BIOS of tomorrow... much less have connectors to fit...
This is a huge undertaking... good luck Lockheed Martin...
Need a Nerd?
Nerd Systems
This has a fundamental chicken and egg problem: So you store the information, you also need to store the format of that information. So then how do you read "format of the information" document? What format is *that* in?
Latin, videlicet.
But seriously the problem in records is not going to be collecting the data, but turning it into knowledge. Meaning that humans in the future are likely to seriously misinterpret or be unaware of the intended meanings and social and political contexts of the preserved data.
This is not a technology problem.
They ought to make sure that real professional historians are there.
This is not nearly as difficult as you make it seem: implement the parser in a standardized language. The formal specification of the standardized language can then be included with the source of the parser.
Getting code to run on later architectures is not usually very difficult. I am fairly comfortable with the proposition of porting any code to any future architecture -- the "emulator scene" testifies to the viability of this strategy. The biggest problem to be solved is reading storage media for which no hardware exists.
For example, how do I get to my college research stored on AmigaDos floppies? Tragically, the easiest solution is to try to get my Amiga running again, and then move the data over a serial cable with kermit. I'm awfully glad I have kermit on that computer, because I don't think I'd be able to find any 2400 baud Amiga BBSes around to download it.
Liberty you never use is liberty you lose.
goatse and tubgirl shall be archived, in all their digital glory for the ages to see.
Did Google compete for this contract? They're the ones with the largest infrastructure for such a project and the brains to give us a really slick interface to it all. Not to mention that they could probably have faster response times than archive.org which totally fuckin' blows.
What is your penile percentile?
While looking through the documentation http://www.archives.gov/era/about/documentation.h
I found a link to the project requirements : http://www.archives.gov/era/about/requirements.cs
Which contains the following line
I know, one typo in one line in several hundred, but why that line ?
Technically I don't see any problem with storing 100PB of data in the next decade, and keep it safe from natural disasters. But how about unnatural disasters, such as an evil administration changing the entire archive to reflect better on itself or protect itself from criminal prosecution? Copies of the archive packages need to be suitable dispersed in multiple jurisdictions or even shot into space in order to make this kind of data destruction infeasible.
"The Electronic Records Archives. By the same man who gave us the Stealth(TM) aircraft".
Hhmm...
This example of format obsolescence just popped into my head. Back when Commodore-Amiga was a going concern, the IFF-ILBM graphics format was pretty widely used. It was a nearly universal standard on Amiga.
A fair number of artists and video producers used Amigas. One of Amiga's advantages was that practically all the graphics programs used ILBM format, which meant you could easily feed the output from one application into another, and then into another. It was good, and it wasn't all that many years ago.
Just trying finding a program on Mac OS X or Windows today that can read IFF-ILBM files! Go on, try it! Photoshop, for one, doesn't have a clue about them. The best you can hope is to find some obscure freeware IFF-to-PNG converter that someone has hacked together.
Another example: It's getting harder to find apps that play "tracker module" music, and the programs that are available tend to be awkward and unreliable. Everything went to MP3, and mod music was quickly forgotten.
So if the idea of today's commonplace formats becoming unknown in the future sounds far-fetched at all. . . It's not.
YES! Finally a job after all those years studying Akkadian! Clay tablets are some of the most durable media I know. At least they have a proven record. Vast numbers of documents illustrating the fascinating world of accounting, esp. Sumerian sheep and goat transactions, is available thanks to the scribal choice of clay (combined with hot arid conditions). Will soon Lockheed HR soon be seeking 8-10 years of prior "Cuneiform/Pictographic" scribal experience? I can also read omens in the entrails of an ox. That can come in handy.
I have been saying for years that the DoD should make an initiative to move towards open standards for this exact reason. The document retention requirements they have are incredible, and yet nearly all the documents generated are saved in proprietary formats. Now with the OASIS (OpenDoc) format solidifying and there is more than one implementation of it, they wouldn't even have to define a standard for word processing or spreadsheets.
Obviously, open standards are not a panacea. There are countless standards created by the military that never really spread farther than that, and therefore the support for them is limited (and thus companies that do support it can charge a pretty penny for it). And with open standards, at it is much easier to write an implementation if you need to. Compare this to MS Word, which is a pain to reverse engineer now, just imagine having to do so in the distant future, when it is not as widespread. And of course, for the very long term, nothing is more certain (and more inconvenient) than printing everything out and storing it in a warehouse, which is what is done now. But the longer that can be postponed, the more money can be saved.
As an added bonus, just imagine the competition that would spring up in the word processor market, if the DoD mandated that all new word processor documents generated internally or by contractors be in OASIS format, starting 5-10 years from now. Microsoft would have to support it (and well) or throw away a huge number of Office sales. The DoD would no longer be locked into a single vendor, saving them money upfront in addition to the money they saved on document retention.
Until then, the best plan is likely to convert as much as possible to a few standards like PDF, which is what I expect will happen here.
I'm trying to find out where in our Constitution does the Federal Government find an enumerated power to pay for this.
Wow, you can access the Constitution? I mean it was written in 1776. That's a long time ago. Good thing somebody thought to save it!
We're saving lots of data, because 1) lots of it is important and 2) we have very little perspective on it yet. In 200 years we might very well have a very different idea of what was important today.
So close and yet so far from the world's perfect ID number
I worked with them for a while, as a data entry person back in the early 90's. Basically, we were responsible for keying in a parcel's 5-9 digit Zip code after it had been scanned into the system. By scanned, I mean the front of the package or envelope showing the send-to and return addresses was presented on a monochrome display, which allowed the person operating the terminal to enter the zip codes for the parcels. Then you'd hit a key and move to the next one, and so on and so on.
The bizarre thing is that I found out a few of the invididuals would "pad" their PPM (Parcels Per Minute) by typing in zipcodes they were familiar with instead of reading what was on the display, just to enter a dozen or so really quickly. It didn't happen often, but it helped them keep up the pace and "clear" the system queue more quickly, thus gaining them and their workmates an early break. However, I've no idea what damage may have occurred by their lax attitudes, and I really don't want to know now.
Which brings me to my point (I think): how can we be certain the data they're entering is one-hundred percent accurate, regardless if the medium lasts a century?
The Chronic *WHAT* les of Narnia!
I have code on a modern HDD that I typed into a BBC computer 15+ years ago fro ma magazine.
I took it off of tape, via the BBC and a serial lead, I have all my chickens and all my eggs. So long as you move to a newer form of media before the old one perishes then your going to be OK.
I think it's a Chinese whisper problem not a chicken and egg problem, what happens when inaccuracies are introduces
e.g. Someone writes a file in a odd charset, nobody notices that the charset is different from ASCI when they convert the file into unicode. In 20 years time will someone notice that the file has been converted badly or will they think it's corruption? What happens when there are lots of tiny conversion errors like this.
thank God the internet isn't a human right.
Complex data backup solutions and the use of lossless formats has not, for example, kept the critical Pioneer space probe data available, after less than 30 years (http://www.planetary.org/news/2005/pioneer_anomal y_faq.html)
How in the fucks sake do you expect this to last 100+ years? Don't use lossy compression? How is that a solution?
Take Windows Bitmap image format. It's not lossy. That doesn't mean that we won't forget how to display the damn thing...
Raid 5? What problem do you think you're solving? Keeping the data around, or making the data accesible for (as the OP makes clear is the LoC's responsiblity) as long as the United States exists?
Hmmm - I'd better email myself the GIF spec - maybe along with some source code to read it with - and a C compiler to compile the source with. Ah WTF - I'll just email myself the Linux sources. ...but seriously...there won't be any problem with reading GIF if anyone actually wants to - the file format is documented all over the place and in 100 years, if there are still GIF files on some kind of readable media - then the odds are very good that those documents will be easy to find. Programming a GIF reader (or a reader for almost any documented file format) is easy - presuming you are sufficiently motivated. A historian who is interested in 100 year old documents shouldn't have any problems getting them converted to whatever format is needed.
The HUGE concern is the undocumented, encrypted or (worse) DRM'ed files. Reading those in 100 years may be exceedingly difficult.
We can read documents written in heiroglyphs around 2000 years ago. The only problem is with languages for which no translations *ever* existed.
Survival and longevity of antique media are a much bigger problem.
www.sjbaker.org
The articles were light (to the point of vacuum) on details about the approach proposed by the company.
From the article: "The system's architecture makes it flexible enough to accommodate evolving policy change," including the importance of "providing public access while protecting privacy and sensitive information." From the sound of that I'm betting its some wonky and ridiculous XML format infected with a sadly pathetic little DRM imp.
The fact is that I can read anything if I have a copy of the software that originally viewed/created it and the machine (or an emulation of the machine) on which the software ran. Adding one more format to the mix just means we have to emulate one more machine and keep track of one more piece of software and all the doubtlessly expensive effort which will be spent in conversion is wasted.
It's great to see the National Archives working on this but I would rather see the tax money farmed out in challenge grants to organizations like the
Long Now that have a chance in Hell of delivering something useful than pouring money into yet another defense company to ensure that whatever technology we use to store records can be properly sanitized and locked away according to the whims of government and "changing policy."
The biggest issue facing us right now is that most of the music, words and images created by our civilization are illegal to preserve. Ridiculous copyright extensions have ensured that the huge mass of data for which no rights owner can be found will simply rot instead of being digitized and stored.
A software emulator can ensure that historic file formats are readable in the future, but Big Media would rather squeeze our history to death before it letting go of the rights.
This is like 1000 fires at the Library in Alexandria. Future generations will curse us for every scrap of information we allow to rot while we squabble.
[-- Trust the Monkey --]
Walt's testimony to the House Committee on Un-American Activities, 24 October, 1947
Check out the micro-etched data disks used by the Rosetta Project. Their goal is to create a long-lasting archive of the basic elements of 1,000 different languages. The storage medium they're using involves etching readable words on to metal disks. The words are not readable by the naked eye, but all you need to read them is a decent optical microscope -- no special hardware or software.
The Rosetta Project's customized "Rosetta Disk" adds another clever innovation: naked-eye-readable words around the edge of the disk get smaller as they spiral inward, making it clear to anyone who might find this disk in the future that there is more information to be read at greater magnifications.
Humans will be extinct in 100 years. How many of us think we can really last for 100 more years before we have another couple world wars?
If humans do exist in 100 years, I can guarentee that this economy wont.
Many /.-ers would be interested in the Rosetta Project which aims to preserve many world languages using an extremely failsafe medium. defintiely a cool read -- check it out.
sure, it may not be terribly convenient, but it's certainly going to be readable 100 to 1000 years from now (by which point we should have adequate OCR to complete the task of reading the disc automatically)
-- If you try to fail and succeed, which have you done? - Uli's moose