Slashdot Mirror


Storing Data For the Next 1,000 Years

An anonymous reader writes "This may be an interesting take on creating long-term storage technologies. A team of researchers at UCSC claims to have come up with a power-efficient, scalable way to reliably store data for a theoretical 1,400 years with regular hard drives. TG Daily has an article describing this technology and it sounds intriguing as it uses self-contained but networked storage units. It looks like a complicated solution, but the approach is manageable and may be an effective solution to preserve your data for decades and possibly centuries." Nice to see research on this using the kinds of real-world figures for disk lifetimes that recent studies have been turning up.

13 of 243 comments (clear)

  1. Only half the problem by Raindance · · Score: 4, Informative

    Part of the solution to very long-term storage, of course, has to involve a method to read the data you've archived.

    I tend to think systems such as the one described in the article aren't good long-term solutions. If their math works on the failure rates, that's fantastic- but just try to hook up a 2028 computer to one of these things to pull the data off.*

    (Ever tried to get data off an obsolete tape backup?)

    I think the most reliable archival system is going to be an active one, where data is saved on modern storage hardware and always copied to more modern tech as it arrives.

    The other side of this is, for anything more advanced than text-- given that you can get at the data, what do you open it with? File types die over time and it's basically impossible to find programs to open certain files nowadays, much less such programs that will run on a modern OS. I think the answer to this has to be virtualization. Store the data *and* programs that can open the filetypes you need opened inside a portable virtual machine (e.g., a Windows vmware image). Over time, you may have to layer virtual machines inside virtual machines as OSes grow obsolete. But that's okay- virtualization is only going to become more elegant, and the end result is that you'd have your data in its original environment, completely accessible by native programs.

    *Some elements of this problem could be solved by having backup servers use wireless and filesharing protocols that might stand the test of time- e.g., 802.11n and SAMBA. No need to just pick one 'most likely to be future-proof' combination, either: run bluetooth and serial access, webdav and a http fileserver, etc. Still, *not* storing data on modern hardware is always going to be a risky kludge.

    There's probably room for a lucrative business based around this-- figuring out the most elegant way to archive and retain meaningful access to data under various computing/disaster scenarios. Hey, I do consulting. :)

    1. Re:Only half the problem by LoudMusic · · Score: 4, Informative

      (Ever tried to get data off an obsolete tape backup?)

      I think the most reliable archival system is going to be an active one, where data is saved on modern storage hardware and always copied to more modern tech as it arrives. Oh man, the headaches involved here. It only takes five years and archived data is obsolete. And yes, virtualization can help, but in the past I've resorted to keeping an entire system available, off-line, to guarantee that the client be able to open their data. Sometimes you get lucky and there's either a plug-in for the old app to export to the new app, or one for the new app to import from the old app. But even on the rare chance that one is available, I've never seen a 100% conversion - even on simple stuff.

      Maybe old data was meant to die.
      --
      No sig for you. YOU GET NO SIG!
    2. Re:Only half the problem by oGMo · · Score: 4, Funny

      There's probably room for a lucrative business based around this-- figuring out the most elegant way to archive and retain meaningful access to data under various computing/disaster scenarios. Hey, I do consulting. :)

      Find a chisel.

      --

      Don't think of it as a flame---it's more like an argument that does 3d6 fire damage

  2. Sometimes old tech is best by erroneus · · Score: 5, Insightful

    No, not punch cards... but close!

    Stone and chisel. That's the way to store data for 1,000 years. The reason why I say this is simple. The more "religious" the world's populations become, the closer to the dark ages we become. (The reverse is true as well as history illustrates.) I expect there will be a second "dark ages" at which point all other technologies will simply not be available.

    1. Re:Sometimes old tech is best by martin-boundary · · Score: 4, Interesting

      Why not microscopic etching. One advantage over the stone and chisel approach is that you can carry the mountain in your pocket until the next civilization figures out how to read it...

    2. Re:Sometimes old tech is best by evanbd · · Score: 4, Informative

      You could, of course, update the technology a bit: Rosetta Project. High density, readable with a high quality microscope, and partially readable with the naked eye -- the spiral of shrinking text should make the usage instructions obvious: "get a magnifying glass, there's more here."

  3. Maybe /. needs something that lasts a bit longer.. by Tmack · · Score: 4, Funny
    Since those "recent studies" links have already degraded into 404's. Maybe something like what was covered a few days ago?

    tm

    --
    Support TBI Research: http://www.raisinhope.org
  4. Born for this job by Anonymous Coward · · Score: 5, Funny

    Did anyone else notice that the lead researcher's name is Mark Storer? How perfect is that?

  5. Steganography and P2P by Chairboy · · Score: 4, Funny

    One thing remains constant in thousands of years of recovered cave paintings, manuscripts, papyrus drawings, and more. And that constant... is pornography. It lasts, it's popular, and it's always in demand.

    Clearly, the answer for long term data storage is to use steganographic techniques to encode your data into various types of creative skinpics. Pick famous folks, pretty folks, strange fetishes... the whole gamut. Pick things that people will keep. A hundred years later, all someone needs is the key phrases to search for.
    "We need that Higgs Boson experiment data from 2012, how will we get it? The infocalypse has destroyed all of our cataloged data!"
    "No problem, my great grandfather left a note in his journal telling his descendants to search for 'Britney spears enema' and use 'wet riffs' to decode the LHC data in whatever we use for files."
    "President Spears? That's crazy!"

    Voila!

  6. Re:What about filling it up? by Blkdeath · · Score: 4, Informative

    Unless 10 PB (petabytes) means something other than what I think (10,000 terabytes), where did they get the $4700 number? I even read their definition of static cost (You have to go up a few paragraphs) and I still don't know.

    Table 3: Comparison of system and operational costs for 10 PB of storage. All costs are in thousands of dollars and reflect common configurations. Operational costs were calculated assuming energy costs of $0.20/kWh (including cooling costs).

    Does $4.7 million sound a bit more realistic?

    --
    BD Phone Home!

    Shameless plug. Like you weren't expecting it.

  7. Try harder by daBass · · Score: 4, Insightful

    (Ever tried to get data off an obsolete tape backup There are loads of people that can make this work. The most important thing is having the specs of what is on it, how it was recorded. (even just a few hints and some knowledge of how computer systems in that era might have recorded data is enough) That the machine used is no longer functioning and had an interface that doesn't work with your USB-only modern PC anyway is of no relevance.

    Given the media, specifications and some time and money, a trio of engineering, electronics and CS students will make a machine that will read any old tape, punchcard, early HDD, etc. A CD is laughably simple technology, an engineer 100 years from now will build a player (in a way that may not look anything like our current players) in no time at all.

    Today's technology is even more well documented and certainly not beyond the capabilities of future generations to make readers for.

    If you find an old tape and want to do it in an afternoon, you are out of luck. If you are an historian that really, really wants to get to the data, it is not all that hard.
  8. Idiotic by Eivind · · Score: 4, Insightful
    This is completely idiotic.

    First, it ignores physics. MTBF can't be used in reverse. Yes, it is possible that the MTBF on a newish disc is 300K hours or more, put differently, if you've got 1000 such discs running, then every 300 hours, about every 2 weeks, one will die.

    This does however:

    • NOT imply that a average disc will last for 300K hours of operation, i.e. 47 years.
    • NOT imply that a disc that is idle 90% of the time will last for 470 years.
    • NOT imply that a disc that is idle 95% of the time will last for a millenium.


    It would offcourse if degradation in idle state was -ZERO-. If aging made -ZERO- difference and if the MTBF-rates quoted are realistic AND constant over centuries (i.e. older discs DONT start to fail more often, not even if they're centuries old)

    In short: bullshit. It's overwhelmingly likely that not a single disc out of 1000 will remain functional after a millenium, even if it is powered down 97% of the time. At which point no amount of redundancy, distributed or not, will help.

    Also, the exersize is pointless. As long as storage-capacities keep growing exponentially, nearly the entire cost of storing a set of data is in the first few years. If you've paid what it costs to safeguard data for a decade, you've already paid 95% or thereabouts of what it costs to store it forever.

    So, storing something safely for a very long time is actually a easy task, all you need to do is:

    • Create multiple copies at geographically distinct sites.
    • Regularily transfer the copies to newer larger media


    Yeah, this -does- mean that data that nobody cares about will die. Tough luck.

    For example, if you -currently- have a petabyte you want stored, you could buy 3 petabyte enterprise storage-servers, at a cost of perhaps $3million. You host these at three separate companies, say one in europe, one in japan, one in usa. For this you may pay $300.000/year. Total cost for first 5 years: $4.5 million

    After 5 years you buy 3 new entry-level storage-servers. Storage/dollar has doubled ever 18 months, or a factor of 12 over 5 years. The servers now cost let's say $300K, and they're 4U-units rather than complete racks now, so hosting-costs is down to $50.000/year.
    Total cost for years 5-10: $550.000

    After 10 years you buy 3 new 1U "small office" servers. They cost $21K in total. Hosting is $10K/year. Total cost for years 10-15: $71K.

    After 15 years you sign up for the needed amount of space on 3 separate servers and pay $3K/year, or $15K for the period.

    After 20 years you put the data on 3 thumbdrives and store them however one can cheaply store a thumbdrive, total cost perhaps $1000
    Or you sign up with 3 separate el-cheapo hosting-providers and pay $300/year.

    After 25, you send the data as an attachment to your choise of 3 free email-providers, they all come with atleast 500PB free storage anyway, it's not as if you'll notice the extra 1PB attachment.

    More likely though, you've got much MORE data to take care of in the future, so you're still paying $1million/year. Only now that buys you a storage-solution where the old 1PB-archive is a completely trivial file, taking up a so minute fraction of the array that it's not even noticeable and the incremental cost is essentially zero.
  9. Nobody but historians? by argent · · Score: 4, Insightful

    no one is going through your perl-scripts, c++ classes, 10000 digital holiday pictures, diaries of what you had for breakfast, or IRC logfiles

    I'm sure that the people in the 11th century would have said the same thing about their accounts and letters, and yet historians and archeologists depend on them to tell us what life was like 1000 years ago.