Storing Data For the Next 1,000 Years
An anonymous reader writes "This may be an interesting take on creating long-term storage technologies. A team of researchers at UCSC claims to have come up with a power-efficient, scalable way to reliably store data for a theoretical 1,400 years with regular hard drives. TG Daily has an article describing this technology and it sounds intriguing as it uses self-contained but networked storage units. It looks like a complicated solution, but the approach is manageable and may be an effective solution to preserve your data for decades and possibly centuries." Nice to see research on this using the kinds of real-world figures for disk lifetimes that recent studies have been turning up.
No, not punch cards... but close!
Stone and chisel. That's the way to store data for 1,000 years. The reason why I say this is simple. The more "religious" the world's populations become, the closer to the dark ages we become. (The reverse is true as well as history illustrates.) I expect there will be a second "dark ages" at which point all other technologies will simply not be available.
Since there will be many holes shot into this theory, let me be one of the first to fire a shot. Electricity (as we know it) may not be around then. I am not predicting the dark ages, but who's to say that far in advance there is still a live socket.
Any storage device that relies on outside power cannot be guaranteed for 100 years, let alone 1400. I would have more faith in a stone tablet.
This is a fine example of "academic" research dollars at work.
Flexible bare-metal recovery for Linux/UNIX
Wouldn't it be a lot easier to simply keep the archive on a live system, and rotate it to new media from time to time as the old media dies and new storage systems become available? After all, if no one is looking after this system, what's to keep it from being forgotten in the basement of a long-abandoned building?
In addition to taking advantage of the falling cost of storage for a fixed-size data set -- making future replacement media purchases much cheaper than redundant media purchases today -- you also have the opportunity to re-process the data into new formats, so that you'll still be able to read it when you want it.
The more "religious" the world's populations become, the closer to the dark ages we become. (The reverse is true as well as history illustrates.)
/. is simply common fare and is an easy way to boost karma, but seriously, what? Where is this link between religion - one would assume all religion, as the OP discuss the population of the entire world - and this surge to the dark ages?
I realize that taking swipes at religion at
From the demographic viewpoint, a simple look at the high rate of belief in deity/practice of religion and the United States - the world economic leader, and still, in spite of some losses in this area, the center of innovation in all (well, at least most) things technological - would seem to indicate that the causal link between a belief in religion and a return to the "dark ages" is tenuous at best. For fun, compare the rate of technological advance in the U.S. with that of the devoutly non-religious Soviet Russia or Communist China throughout the cold war.
Then, one could look at individuals - Mendel, Newton, a wide assortment of Muslim mathematicians and astronomers, etc. Even a look at more mundane topics, such as engineers and inventors shows a broad array of other religious folks as well. As a Mormon, the first two that come to mind are Browning, a perhaps unrivaled genius to this day in the design of firearms, and Farnsworth - largely responsible for the electronic television.
Now, I'll be the first to concede the point that several religious groups have shown less technological advance over time, Wahabi Muslims in particular come to mind, but so do numerous others. Some groups have eschewed technology altogether, such as the Amish, but these are exceptional cases. But to argue that the act of being religious at all is somehow tied to a magical turn to the dark ages is absurd, and to argue that a lack of religion has always led to some drive away from the dark ages is no better.
Given the media, specifications and some time and money, a trio of engineering, electronics and CS students will make a machine that will read any old tape, punchcard, early HDD, etc. A CD is laughably simple technology, an engineer 100 years from now will build a player (in a way that may not look anything like our current players) in no time at all.
Today's technology is even more well documented and certainly not beyond the capabilities of future generations to make readers for.
If you find an old tape and want to do it in an afternoon, you are out of luck. If you are an historian that really, really wants to get to the data, it is not all that hard.
"Now, what was my password five years ago?"
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
I agree that virtual machines are a solution to file formats becoming obsolete, but I think that emulation may be more appropriate than virtualization for this purpose. VMware can only be used on x86 computers, and even on x86 computers future processors may have subtle differences that could affect old virtual machines. An emulation of an entire computer, including the processor, can be ported to any computer, and have exactly identical behavior.
Also, it may not be necessary to layer virtual machines inside each other, if you have an emulator that that is easy to port new machines, such as by being open source and relatively simple. That is a large part of the motivation for the Macintosh Plus emulator I maintain.
First, it ignores physics. MTBF can't be used in reverse. Yes, it is possible that the MTBF on a newish disc is 300K hours or more, put differently, if you've got 1000 such discs running, then every 300 hours, about every 2 weeks, one will die.
This does however:
It would offcourse if degradation in idle state was -ZERO-. If aging made -ZERO- difference and if the MTBF-rates quoted are realistic AND constant over centuries (i.e. older discs DONT start to fail more often, not even if they're centuries old)
In short: bullshit. It's overwhelmingly likely that not a single disc out of 1000 will remain functional after a millenium, even if it is powered down 97% of the time. At which point no amount of redundancy, distributed or not, will help.
Also, the exersize is pointless. As long as storage-capacities keep growing exponentially, nearly the entire cost of storing a set of data is in the first few years. If you've paid what it costs to safeguard data for a decade, you've already paid 95% or thereabouts of what it costs to store it forever.
So, storing something safely for a very long time is actually a easy task, all you need to do is:
Yeah, this -does- mean that data that nobody cares about will die. Tough luck.
For example, if you -currently- have a petabyte you want stored, you could buy 3 petabyte enterprise storage-servers, at a cost of perhaps $3million. You host these at three separate companies, say one in europe, one in japan, one in usa. For this you may pay $300.000/year. Total cost for first 5 years: $4.5 million
After 5 years you buy 3 new entry-level storage-servers. Storage/dollar has doubled ever 18 months, or a factor of 12 over 5 years. The servers now cost let's say $300K, and they're 4U-units rather than complete racks now, so hosting-costs is down to $50.000/year.
Total cost for years 5-10: $550.000
After 10 years you buy 3 new 1U "small office" servers. They cost $21K in total. Hosting is $10K/year. Total cost for years 10-15: $71K.
After 15 years you sign up for the needed amount of space on 3 separate servers and pay $3K/year, or $15K for the period.
After 20 years you put the data on 3 thumbdrives and store them however one can cheaply store a thumbdrive, total cost perhaps $1000
Or you sign up with 3 separate el-cheapo hosting-providers and pay $300/year.
After 25, you send the data as an attachment to your choise of 3 free email-providers, they all come with atleast 500PB free storage anyway, it's not as if you'll notice the extra 1PB attachment.
More likely though, you've got much MORE data to take care of in the future, so you're still paying $1million/year. Only now that buys you a storage-solution where the old 1PB-archive is a completely trivial file, taking up a so minute fraction of the array that it's not even noticeable and the incremental cost is essentially zero.
It's easy to build distributed, reliable storage that theoretically lasts thousands of years if you assume that you can just keep going down to the corner computer store and buy replacement parts that more or less work like today's parts, that operating systems keep doing what they have always been doing, and that networks keep working the way they always have. But those are bad assumptions.
While it's not generally too awkward to convert from one characted encoding to another, "just text" is a slight oversimplification.
What kind of data that will be lost otherwise do we have to back-up for posterity? I mean, come on, no one is going through your perl-scripts, c++ classes, 10000 digital holiday pictures, diaries of what you had for breakfast, or IRC logfiles. You are not that important! Although it would be fun to speculate what kind of information would have been in the caveman-wiki.
no one is going through your perl-scripts, c++ classes, 10000 digital holiday pictures, diaries of what you had for breakfast, or IRC logfiles
I'm sure that the people in the 11th century would have said the same thing about their accounts and letters, and yet historians and archeologists depend on them to tell us what life was like 1000 years ago.
Just because you want the data off the disc, doesn't mean you need to create a player the same way we do now! Try finding someone now who could build a decent siege engine or longbow that would be good enough to fight a medieval battle. Hell , even finding someone these days who can rebuild steam engines is tough! There seems no shortage of such people on the Discovery Channel!
You make it sound hard, but considering people nowadays slice open completely proprietary computer chips running proprietary code and reverse engineer the thing using a microscope and some simulation software, the CD isn't going to be too hard to do 100 years from now.
You have to remember that it is going to be pretty obvious for anyone that the original use was to play back music. Most likely, they will find them in places where the player is still next to it - even if it doesn't work. Even without the red book spec, there will be loads of cues about how the data might be on there.
And who knows what computing will be up to? Is giving a computer a electron microscope scan of a CD and telling it: "it's supposed to be sound, probably in binary encoding and it will have some error correction data in there" so hard to imagine? I don't think it is if technology keeps advancing like it does now.
Will they do it in a weekend? probably not, but what makes you think that if you can't do it in a weekend, everybody is just going to walk away and say: "not worth it, its too hard". That is not how humans worked a thousand years ago, not how they work now and nor will they in the 23 century.