Storing Data For the Next 1,000 Years
An anonymous reader writes "This may be an interesting take on creating long-term storage technologies. A team of researchers at UCSC claims to have come up with a power-efficient, scalable way to reliably store data for a theoretical 1,400 years with regular hard drives. TG Daily has an article describing this technology and it sounds intriguing as it uses self-contained but networked storage units. It looks like a complicated solution, but the approach is manageable and may be an effective solution to preserve your data for decades and possibly centuries." Nice to see research on this using the kinds of real-world figures for disk lifetimes that recent studies have been turning up.
Part of the solution to very long-term storage, of course, has to involve a method to read the data you've archived.
:)
I tend to think systems such as the one described in the article aren't good long-term solutions. If their math works on the failure rates, that's fantastic- but just try to hook up a 2028 computer to one of these things to pull the data off.*
(Ever tried to get data off an obsolete tape backup?)
I think the most reliable archival system is going to be an active one, where data is saved on modern storage hardware and always copied to more modern tech as it arrives.
The other side of this is, for anything more advanced than text-- given that you can get at the data, what do you open it with? File types die over time and it's basically impossible to find programs to open certain files nowadays, much less such programs that will run on a modern OS. I think the answer to this has to be virtualization. Store the data *and* programs that can open the filetypes you need opened inside a portable virtual machine (e.g., a Windows vmware image). Over time, you may have to layer virtual machines inside virtual machines as OSes grow obsolete. But that's okay- virtualization is only going to become more elegant, and the end result is that you'd have your data in its original environment, completely accessible by native programs.
*Some elements of this problem could be solved by having backup servers use wireless and filesharing protocols that might stand the test of time- e.g., 802.11n and SAMBA. No need to just pick one 'most likely to be future-proof' combination, either: run bluetooth and serial access, webdav and a http fileserver, etc. Still, *not* storing data on modern hardware is always going to be a risky kludge.
There's probably room for a lucrative business based around this-- figuring out the most elegant way to archive and retain meaningful access to data under various computing/disaster scenarios. Hey, I do consulting.
No, not punch cards... but close!
Stone and chisel. That's the way to store data for 1,000 years. The reason why I say this is simple. The more "religious" the world's populations become, the closer to the dark ages we become. (The reverse is true as well as history illustrates.) I expect there will be a second "dark ages" at which point all other technologies will simply not be available.
From TFA:
Santa Cruz (CA) - Have you ever thought how vulnerable your data may be through the simple fact that you may be storing your entire digital life on a single hard drive? On single drive can hold tens of thousands of pictures, thousands of music files, videos, letters and countless other documents. One malfunctioning drive can wipe out your virtual life in a blink of an eye. A scary thought. On a greater scale, at least portions of the digital information describing our generation may be put at risk by current storage technologies. There are only a few decades of life in tape and disk storage these days, but a team of researchers claims to have come up with a power-efficient, scalable way to reliably store data with regular hard drives for an estimated (theoretical) 1400 years.
My "digital life"? Scary to lose it? Man.. these people never heard of backups, or having a real life, eh? Jeez, I can store my whole "digital life" on a 1 gig USB key, with room to spare.
I've lost my backups more times than I can count, my computers are toys, mostly for communication and play. Amazing how many people put their whole LIVES on a hard disk. Remarkable actually. What would I lose? About a dozen passwords and I'd need to reinstall and re-customize my system... OH WAIT... I backed up the important scripts and source code to a DVD.. TWO in fact. Bummer, guess I don't have to cry endless tears over the loss of my "digital life".
" What luck for rulers that men do not think" - Adolf Hitler
tm
Support TBI Research: http://www.raisinhope.org
Since there will be many holes shot into this theory, let me be one of the first to fire a shot. Electricity (as we know it) may not be around then. I am not predicting the dark ages, but who's to say that far in advance there is still a live socket.
Any storage device that relies on outside power cannot be guaranteed for 100 years, let alone 1400. I would have more faith in a stone tablet.
This is a fine example of "academic" research dollars at work.
Flexible bare-metal recovery for Linux/UNIX
What if they did have petabyte-level holograms and optical storage 12,000 years ago but the whole lot got eaten by a fungus because of the organic die or something? And all that survived were those fingerpaints up in a French cave originally made by a Down syndrome kid...
Obama likes poor people so much, he wants to make more of them.
Did anyone else notice that the lead researcher's name is Mark Storer? How perfect is that?
One thing remains constant in thousands of years of recovered cave paintings, manuscripts, papyrus drawings, and more. And that constant... is pornography. It lasts, it's popular, and it's always in demand.
Clearly, the answer for long term data storage is to use steganographic techniques to encode your data into various types of creative skinpics. Pick famous folks, pretty folks, strange fetishes... the whole gamut. Pick things that people will keep. A hundred years later, all someone needs is the key phrases to search for.
"We need that Higgs Boson experiment data from 2012, how will we get it? The infocalypse has destroyed all of our cataloged data!"
"No problem, my great grandfather left a note in his journal telling his descendants to search for 'Britney spears enema' and use 'wet riffs' to decode the LHC data in whatever we use for files."
"President Spears? That's crazy!"
Voila!
Does $4.7 million sound a bit more realistic?
BD Phone Home!
Shameless plug. Like you weren't expecting it.
Wouldn't it be a lot easier to simply keep the archive on a live system, and rotate it to new media from time to time as the old media dies and new storage systems become available? After all, if no one is looking after this system, what's to keep it from being forgotten in the basement of a long-abandoned building?
In addition to taking advantage of the falling cost of storage for a fixed-size data set -- making future replacement media purchases much cheaper than redundant media purchases today -- you also have the opportunity to re-process the data into new formats, so that you'll still be able to read it when you want it.
They completely ignored the fact that the chips and memory managing the system will likely have some degree of failure in the 1400 years the data will survive on their media architecture.
Look, I am into genealogy quite a bit and see this as a tremendous problem.
The only thing approaching a viable solution is the Rosetta Disk ( http://www.rosettaproject.org/ ) using etched nickel media (rock) in a human readable format, which you could theoretically create a binary cipher for a global archival format.
But, that would take a lot of foresight, which unfortunately us people don't have (yet).
However, seeing that as completely inaffordable for us mere mortals, that leaves me with PAPER, yes, paper, as the only trustworthy medium-term solution.
I do hope everyone here realizes that if we had some sort of cosmic EMP-like event traversing the globe, we'd lose 99% of data and be plunged into the dark ages, right? We couldn't even re-create all of the machines that surround us since virtually all designs are kept digitally now. Factories would just shut down and never be able to be brought back up and every history of our existence would be forgotten in a few generations.
Our civilization is sitting on a house of cards.
Laser engraving, seriously. There's some project out there....
ah yes, here, that seeks to preserve all the languages of the world by laser-engraving them onto stainless steel plates. They've changed things up a bit, but the basic idea is the same: put it somewhere it won't get lost or corrupted, and if it's important, people will figure it out later. If it's not important, then it doesn't matter.
Very few things in the world are really worth keeping for even a lifetime. If your grandkids inherit all of your stuff, what will they save and keep, and what will they throw away? If you know what they will throw away, why not save them the trouble and toss it yourself?
We've gotten ourselves into this mindset where making backups of every piece of data you've ever owned ought to be saved, for no other reason than because it's easy and cheap. I think everyone should have a periodic storage meltdown to force them to reconsider what it is they really need to have.
Given the media, specifications and some time and money, a trio of engineering, electronics and CS students will make a machine that will read any old tape, punchcard, early HDD, etc. A CD is laughably simple technology, an engineer 100 years from now will build a player (in a way that may not look anything like our current players) in no time at all.
Today's technology is even more well documented and certainly not beyond the capabilities of future generations to make readers for.
If you find an old tape and want to do it in an afternoon, you are out of luck. If you are an historian that really, really wants to get to the data, it is not all that hard.
There are two sure-fire proven techniques for storing data long term - using a reliable non-volatile storage medium (engraving in a non-oxygen reactive metal will do nicely) and making many redundant copies of them.
Electronic storage is by its very nature unreliable -- electromagnetic properties (like charge accumulation, ferromagnetic hysteresis, etc) are inherently volatile.
And even if you manage to solve the problem of transporting your data into the future, you're still faced with the problem of making sense of it. Electronic formats change (just ask the guy out in California who makes a *FORTUNE* charging law people to retrieve files from obsolete formats and/or media). In the physical realm, this is true as well - languages change and become very difficult to read. (If you don't believe me, try reading Beowulf in its original old-English form, circa 700 AD).
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
And I mean it literally -- why have any physical storage at all? Why not just bounce chunks of data around forever on the Internet? Presumably the 'net is going to be here for a long, long time. Imagine a mass P2P network where the data being traded is just encrypted chunks of the data of other users. It needn't ever get written to a mass storage device at all -- just received from one peer and immediately sent to others.
A protocol could be developed to allow one peer to request, or steer, the network to locate and deliver requested blocks on demand. This might be a high-cost operation, akin to bringing data in from backup tape. Or, a client could just wait for the right chunk of data to recirculate to its position in the network. But storing data is easy -- just encrypt it, format it a certain way, and inject it into the network.
A natural model for the topology of such a network, and the protocol itself, is the circulatory system. Here, cells move in a fluid, generally in one direction, but through a complex network of vessels, and in a circulatory manner. The immune system might provide inspiration for directed movement of data chunks. (See? The Internet really is just a series of tubes.)
Over time, the infrastructure of the Internet, the P2P clients, and the exchange protocol itself could evolve, as long as enough redundant chunks are allowed to constantly recirculate. Specialized clients could cache data to "long term" storage for periods of a few days or weeks, in case of large, random outages, but permanent data storage would never rely on any specific technology at all -- even TCP/IP itself. It's all just this mass of recirculating encrypted chunks of data, like cells in the blood stream.
I agree that virtual machines are a solution to file formats becoming obsolete, but I think that emulation may be more appropriate than virtualization for this purpose. VMware can only be used on x86 computers, and even on x86 computers future processors may have subtle differences that could affect old virtual machines. An emulation of an entire computer, including the processor, can be ported to any computer, and have exactly identical behavior.
Also, it may not be necessary to layer virtual machines inside each other, if you have an emulator that that is easy to port new machines, such as by being open source and relatively simple. That is a large part of the motivation for the Macintosh Plus emulator I maintain.
Obviously Chinese, German or Russian social scientists were under much more obvious pressure to publish ideologically orthodox papers during their respective theocracies than physicist or biologists. Regardless of whether Nazism as religions, they behaved like intolerant monotheisms socially. In fact they were probably far worse since they existed in an age where orthodoxy could be enforced, rather than mere orthopraxy. This by the way is what Orwell was worried about - the ability of 20th Century totalitarianism to get inside people's heads.
By contrast America has lots of religion, but more importantly it has lots of religions, possibly because the Constitutional prohibition on an established state church allows them to survive. In the China or Russia lots of believers in the official religion ended up being crushed by the State because they were on the wrong side of a doctrinal dispute.
So at the risk of stating the obvious I'd say that a theocracy leads to science being suppressed, not a large number of competing religions. Competition is good, and that something that atheists, Communists and Gaia worshippers should understand as well as the believers in older traditional religions.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
First, it ignores physics. MTBF can't be used in reverse. Yes, it is possible that the MTBF on a newish disc is 300K hours or more, put differently, if you've got 1000 such discs running, then every 300 hours, about every 2 weeks, one will die.
This does however:
It would offcourse if degradation in idle state was -ZERO-. If aging made -ZERO- difference and if the MTBF-rates quoted are realistic AND constant over centuries (i.e. older discs DONT start to fail more often, not even if they're centuries old)
In short: bullshit. It's overwhelmingly likely that not a single disc out of 1000 will remain functional after a millenium, even if it is powered down 97% of the time. At which point no amount of redundancy, distributed or not, will help.
Also, the exersize is pointless. As long as storage-capacities keep growing exponentially, nearly the entire cost of storing a set of data is in the first few years. If you've paid what it costs to safeguard data for a decade, you've already paid 95% or thereabouts of what it costs to store it forever.
So, storing something safely for a very long time is actually a easy task, all you need to do is:
Yeah, this -does- mean that data that nobody cares about will die. Tough luck.
For example, if you -currently- have a petabyte you want stored, you could buy 3 petabyte enterprise storage-servers, at a cost of perhaps $3million. You host these at three separate companies, say one in europe, one in japan, one in usa. For this you may pay $300.000/year. Total cost for first 5 years: $4.5 million
After 5 years you buy 3 new entry-level storage-servers. Storage/dollar has doubled ever 18 months, or a factor of 12 over 5 years. The servers now cost let's say $300K, and they're 4U-units rather than complete racks now, so hosting-costs is down to $50.000/year.
Total cost for years 5-10: $550.000
After 10 years you buy 3 new 1U "small office" servers. They cost $21K in total. Hosting is $10K/year. Total cost for years 10-15: $71K.
After 15 years you sign up for the needed amount of space on 3 separate servers and pay $3K/year, or $15K for the period.
After 20 years you put the data on 3 thumbdrives and store them however one can cheaply store a thumbdrive, total cost perhaps $1000
Or you sign up with 3 separate el-cheapo hosting-providers and pay $300/year.
After 25, you send the data as an attachment to your choise of 3 free email-providers, they all come with atleast 500PB free storage anyway, it's not as if you'll notice the extra 1PB attachment.
More likely though, you've got much MORE data to take care of in the future, so you're still paying $1million/year. Only now that buys you a storage-solution where the old 1PB-archive is a completely trivial file, taking up a so minute fraction of the array that it's not even noticeable and the incremental cost is essentially zero.
So, they are proposing Sun StorageTek 5800 (codenamed Honeycomb) as their research?
Compare article with this whitepaper, especially Figure 13 on page 28. Networked nodes with 4 disks each, grouped in cells of 16 + 1 management node. Each object is stured redundantly on disks of different storage nodes. Everything self-contained, accessible by nice API. Oh, and the software is Open Source.
:wq
It's easy to build distributed, reliable storage that theoretically lasts thousands of years if you assume that you can just keep going down to the corner computer store and buy replacement parts that more or less work like today's parts, that operating systems keep doing what they have always been doing, and that networks keep working the way they always have. But those are bad assumptions.
Well the Old Testament was written by backward Taliban types in the dark ages. What do you expect?
Something I didn't realise about the Old Testament until recently is that when they talk of the the Philistines binding Samson in 'chains of iron' it's because the Philistines had managed to master the technology to use iron but the Israelites hadn't.
http://en.wikipedia.org/wiki/Philistines#History
The Philistines long held a monopoly on iron smithing (a skill they possibly acquired during conquests in Anatolia), and the biblical description of Goliath's armor is consistent with this iron-smithing technology.
So 'God's chosen people' hadn't enterered the Iron Age at that point. There's lots of other signs that they were not exactly academically inclined either, like the biblical value of 3 for Pi which was less accurate than the value the competing civilisations knew.
The Qu'ran is just a bad mashup of the same primitive ramblings that inspired the Old Testament with some self serving editing by Mohammed. Or more likely early Muslims, since Mohammed was not particularly literate and had more important things to do with his time, like capture slaves and booty from more settled neighbouring tribes.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
What kind of data that will be lost otherwise do we have to back-up for posterity? I mean, come on, no one is going through your perl-scripts, c++ classes, 10000 digital holiday pictures, diaries of what you had for breakfast, or IRC logfiles. You are not that important! Although it would be fun to speculate what kind of information would have been in the caveman-wiki.
Government providing support to stupid opinions and doctrines does not make them a religion -- for something to be a religion it has to specifically include belief in a supernatural deity. I remember that in USSR saying that someone believes in god was the ultimate insult to his intelligence.
Contrary to the popular belief, there indeed is no God.
no one is going through your perl-scripts, c++ classes, 10000 digital holiday pictures, diaries of what you had for breakfast, or IRC logfiles
I'm sure that the people in the 11th century would have said the same thing about their accounts and letters, and yet historians and archeologists depend on them to tell us what life was like 1000 years ago.
Just because you want the data off the disc, doesn't mean you need to create a player the same way we do now! Try finding someone now who could build a decent siege engine or longbow that would be good enough to fight a medieval battle. Hell , even finding someone these days who can rebuild steam engines is tough! There seems no shortage of such people on the Discovery Channel!
I got some new Goat-Skin-RWs, and they work great. The smell a bit when burning, but the resolution is awesome.
I am having trouble playing them in my PS3 though.
The basis of this plan is that if you spin the hard drives less time, in theory the components will last longer. Theoretically this sounds great, but in practice this is not true. Obviously these guys have never worked in a real data centre for a few years in a row. Where I work, we actually place bets with a bunch of co-workers as to how many hard drives we'll lose, everytime we have to shutdown and bring back up the data centre. We only end up doing this once, maybe twice a year. And note that these are planned graceful shutdowns. Out of about 1000 hard drives we have, we lose about 3 on average. The last time the Data Centre was shut down and brought back up, we lost 7 drives! Hard drives are designed to run for long periods of time. They were not designed to stop, start, stop, start. Try doing that with your car and see how long it lasts! I would bet money that the hard drives wouldn't last past 3 years... 5 if you're lucky with this plan. 1400 years is completely ridiculous. And that my friends is the difference between theory and practice. So as they say....
"In theory, practice is perfect; but in practice, it is often only theory".
Adeptus
No trees were killed in the making of this post; however, many trillions of electrons were horribly inconvenienced.
I mean that Communism and Nazism behaved like religions.
No. I was there, and I can most certainly say that they were ideologies and not religions. Religion always includes or endorses some ideology, but the reverse is not true, ideology does not necessarily have anything to do with religious belief.
A state-supported ideology is common and often nearly invisible for the member of society that practices it -- it is proclaimed (often clumsily) by government officials, is seen kinda working because society can prosper while supposedly implementing it for decades, it is assumed to be right by most and rarely questioned, but people also rarely actually think about it, or any alternatives, it's as if its validity or invalidity is irrelevant to the people's lives as long as society is capable of implementing it without creating discomfort and unrest. After all, it merely claims what is "a better way of running a society" as opposed to making claims about physical world that exists independently outside of human mind and ideas. Since most of people are not politicians, assuming that politicians are following some sort of rules that have little impact on everyday lives is a natural (though often stupid) thing to do, however for, say, a physicist it would be impossible to assume that religion's creation myth is correct -- it contradict with things physicist experiences in his work. In US the ideas of "capitalism" and "democracy" enjoy the same kind of ideological support -- I can make a case of both of them being pretty poorly thought out ideas in the first place, and separately of neither of them actually playing an important role in the way US society operates, however none of it will be a scientific argument because I will have to discuss people's ideas, behavior, motivation and impression about life. At most I can catch government and businesses lying and manipulating people using ideology as the tool to achieve desired behavior of the masses, however for every my claim there would be tens of millions of rednecks claiming that they naturally love doing exactly what I see them manipulated into doing.
Religion, on the other hand, requires actual belief and is treated not only as important part of everyday lives, ethics and history but also makes claims of facts -- something that ideology often approaches but never actually does. Even Nazi had to form their ideas of "superiority" and "rightful claim" of control in subjective terms -- though they used religious imagery and pseudo-scientific language, they neither required belief in any deity or creation myth, nor bothered to find scientific evidence of any kind. Their ideas are only "religious" in a way of "but won't it be nice if YOUR ethnicity was destined to rule the world?" as their first and last greatest proof of their ideas, not unlike "but won't it be nice if the world was ruled by benevolent deity?" is the first and last greatest proof of religion. It's a pretty weak analogy.
In that sense the Communist belief in Lysenkoism is a bit like the Catholic aversion to birth control. Neither were part of the original doctrine, but once you have priests or politicians that believe they have access to the absolute truth a bit sprouting is almost inevitable.
No. It's merely one person who gained favorable treatment by the government and massively abused the power he gained through it. This has nothing to do with religion and everything with government officials' irresponsibility and concentration of power. After the end of Stalinism in mid-50's, Lysenko's theories were thoroughly discredited, and it remains a single such event in the whole USSR history -- it taught post-Stalin governments to never mess with the content of scientific discourse, and limit government's influence to choosing directions to fund and support.
US propaganda loves picking such blunders in USSR history (almost exclusively taking them from Stalin's time) and present them as if they discredit wide aspects of USSR or Russian society, C
Contrary to the popular belief, there indeed is no God.
I know about semi's - I don't know about any other things in the devices. How will FR-4 age? Will solder joints fail just by sitting around?
Do you think Microsoft Office 3008 will be backward compatible with Office 2K3?
My wife doesn't listen to me either...
It wasn't technology gap, it was arms control enforced during centuries of oppression. They certainly did have the technology, as the technology itself is described in dozens of passages. (Deu 4:20, 1 Sa 12:31) There's lots of other signs that they were not exactly academically inclined either, like the biblical value of 3 for Pi which was less accurate than the value the competing civilisations knew. 1) A round bathtub is not the same thing as a circular bathtub, 2) even if it was circular you forgot to account for the annulus.
But don't worry your arrogant little head about it. Other people are stupid and you are smart.
Liberty you never use is liberty you lose.
You make it sound hard, but considering people nowadays slice open completely proprietary computer chips running proprietary code and reverse engineer the thing using a microscope and some simulation software, the CD isn't going to be too hard to do 100 years from now.
You have to remember that it is going to be pretty obvious for anyone that the original use was to play back music. Most likely, they will find them in places where the player is still next to it - even if it doesn't work. Even without the red book spec, there will be loads of cues about how the data might be on there.
And who knows what computing will be up to? Is giving a computer a electron microscope scan of a CD and telling it: "it's supposed to be sound, probably in binary encoding and it will have some error correction data in there" so hard to imagine? I don't think it is if technology keeps advancing like it does now.
Will they do it in a weekend? probably not, but what makes you think that if you can't do it in a weekend, everybody is just going to walk away and say: "not worth it, its too hard". That is not how humans worked a thousand years ago, not how they work now and nor will they in the 23 century.