Slashdot Mirror


Large IDE Drives as Long-Term Archival Media?

PlatterMan asks: "The question of how to cope with backing up disk drives which are rapidly increasing in size, onto tape and other backup devices which aren't scaling in size as quickly isn't new to Slashdot. Neither is the use of single, raided, and removal disks as backup devices, this has been covered numerous times on Slashdot in e.g. here and here. One thing I haven't really seen discussed however is the feasibility of disk drives as medium to long-term archival media, say 5 to 10 years. Like many people I'm in the position of now having multiple machines with a combined data pool of about 220 Gig, and backing up these onto DDS or DLT tapes is slow and manual to do, and expensive in tape costs. So I'm looking to add a removal drive bay to my primary backup machine and pick up a bunch of large IDE drives, so that I can do regular disk to disk backups over 100 Meg Ethernet (and for my machines which are in cages, over the Net) pulling out and alternating the backup drives on a 3-way backup cycle."

"Backups are of no use without offsite archival copies so I plan to take one set of disks out of the pool, and archive them offsite on a quarterly basis.

However, I've heard horror stories about the data retention and usability off older disks which have been shelved for archival, for example disk stiction - where people try to restore data off of a 4 to 5 year old drive only to find that the disk won't spin up due to solidification of lubricants, or that they've experienced data degradation.

I'd be interested in the Slashdot crowd's opinion on using large IDE drives as an archival media. Clearly one possible problem is being able to get hold of a machine in the future with a suitable IDE interface to plug them into for restoration, but I can't see IDE disappearing within 5 years (maybe 10 though). I'm more interested in experiences and opinions on the suitability of the disks themselves for long-term archival.


  • Is stiction still likely occur on newer makes of IDE drives or have manufacturers beaten the problems which caused this in the past?
  • Likewise how likely is bit drop-out and general data degradation over say a 5 year and 10 year period, and what do people think would be the likely maximum feasible time that a shelved drive would be usable for?
  • Any suggestions as to how would I need to store drives in order to minimize these types of problem and maximise their feasible life as archival media.
Thanks!"

15 of 710 comments (clear)

  1. Re:rock and chisel by nsample · · Score: 5, Interesting


    I know this parent was modded up as +Funny, but it's actually +Informative. "Rock and chisel" are the best thing we have, and there's a real trend toward using it more. Take a look at Norsam's HD-Rosetta. It's an etched nickel plate designed to last for thousands of years. Vive la Rock & Chisel!

  2. Re:IDE ? by Gudlyf · · Score: 3, Interesting
    "because in 5-10 years from now, IDE may not even exist anymore..."

    In that case, you could always just buy a new, cheap system for the purpose of reading the IDE disks, and keep that in the vault with the drives "just in case".

    I'm not saying this idea with backing up to IDE is a good idea, though. Drop a tape on the floor while you're running to the tape drives for a critical restore, no biggie. Drop a drive on the floor in the same situation, you'd better hope your resume wasn't one of the files needing a restore.

    --
    Trolls lurk everywhere. Mod them down.
  3. Re:GraniteDigital is what I use by coyote-san · · Score: 5, Interesting

    At the least, toss the media into freezer-weight ziplock bags. Better yet is double-bagging it - put the media in a smaller bag, and then in a larger bag with smaller bag's opening on the 'far' side.

    Paper-rated "fire safes" work by putting a media that undergoes a phase change at high temperatures, releasing steam in the process. (Think of the latent heat involved in freezing and melting ice, same theory is used to keep the interior of the safe at a reasonable temperature.)

    The only problem is that paper tolerates steam fairly well. Ditto the smoke that can make its way into the safe. The paper may be damaged, but it is still readable. Computer media will be destroyed. Fortunately freezer-weight plastic is more than adequate to block the steam, leaving only small openings in the seal. Even this is modest, and the second bag is mostly to allow you to avoid smearing soot onto the media as you remove it from the bag.

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  4. Crappy backups better than nothing by jolshefsky · · Score: 5, Interesting
    I don't know how "pro" you want to go with this, but I ran into a similar situation and resigned myeslf to the same solution. My DDS2 SCSI tape drive is getting to be too small at 4/8GB. I would like to have a tape solution, but it's too expensive for my purpose. I get drives as pulls and last-years-models so I only spent US$150, but with tapes at US$10, even 8GB is absurdly small. If I were to go with new equipment and step up to DDS-4, I'd be out about US$1000 for the drive and another US$20 for each 20-40GB tape. Total cost for a basic 3-tape rotating backup: US$1060.

    On the other hand, I could spend (as I have) US$40 on a basic (a.k.a. el-cheapo) FireWire-IDE case, US$30 for 3 removeable IDE enclosures, and (eventually) about US$70 each for 3 60GB IDE drives. Total cost: US$280.

    What do I sacrifice? Not much ... one of the drives might fail. At that point I'd just replace it with another US$70 capacity drive (which would probably be larger.) If I needed to restore something from backup, I'm already looking at up-to 24-hour old data, and if that drive happened to die, possibly 48-hour ... it's unlikely that all the drives would fail at once.

    The advantages? I can use the US$780 I save for something else and I don't have to worry about shelling out another US$1000 every four years just to scale to "current" requirements. I don't know what the upper limit of an IDE drive is these days (i.e. what can the ATAPI bus handle) but even 200GB is pretty big for me right now.

    Anyway, just a few thoughts. The basic thing is lower cost for nearly the same risk ... tapes fail too, you know. Remember, too, that this story would be very different if I had to handle 50 machines instead of 2.

    --
    --- Jason Olshefsky

    Karma: Poser (mostly affected by adding this line long after everyone else did)

  5. Re:A lot of folks will say.... by Havokmon · · Score: 4, Interesting
    Personally, I think the way to go is just to give up and admit that disk is not cheap. You need to back up your data to a live mirror system with identical storage (hourly rsync does a nice job) and then you need to arrage a service that can back up your data to remote live mirror systems.

    Note that in both cases I The remote backup part is expensive, but it's the only reliable way. You seed it by tape (full backup to tape, and mail them to the vendor) and then use dedicated lines to keep a regular incremental update going.said "live mirror".

    I agree wholeheartedly. Though, I would note, that IDE is the perfect solution for your redundancy. All you need is space. It doesn't have to be the fastest, or the highest quality mirror. Buying 20 IDE drives and having half of them fail is still cheaper than high capacity SCSI. Do a RAID 50 (IIRC, two RAID 5's - mirrored) offsite, and use rsync to mirror your data over your Inet line. Or string your mirror. Have your 'backup' offsite RAID rsync off the primary offsite RAID. I'd bet the only people who would have problems with that are the ones doing heavy graphics.

    Check out Rackspace for your offsite needs, I didn't think they were that expensive, at least compared to an actual archival facility. Pick your favorite encryption method to secure it. Hell of a lot cheaper than a point to point.

    Those people yelling 'insecure' apparently don't have an issue with their data being driven all around town. You want banking info? Just steal the grey box out the the '80 Ford Escort. OTOH, A 'man-in-the-middle' attack requires just that. So, if possible, host at your own ISP.

    --
    "I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)
  6. Re:Um you've pretty much answered your own questio by Bobulusman · · Score: 3, Interesting

    I have a 20 mb (yes, you read that right) hard drive from 1989 that I can still read just fine. I've hooked it up once or twice over the years just for the nostalgia.

    --
    Cogito ergo sum in Slashdot.
  7. Re:Has DLT tape ever worked consistently? by geekoid · · Score: 3, Interesting

    SOmething is wrong with your system, or something is happening to the tape. I've done a lot of work with DLT, and your failure rate is way out of proportion.
    I would regularly, I mean several time a day, move a tape from system to system for testing purposes.

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  8. Re:Why would your disks be by Mysticalfruit · · Score: 4, Interesting

    Obviously you haven't purchased any DLT tapes recently...

    Lets just say you go with 40GB DLT tapes...

    220/40 = 5.5 DLT tapes to back up your data.

    DLT tapes cost 50 bucks a piece. 6 tapes * 50 bucks = 300 bucks just for the tapes.

    Oh yeah, now you've gotta buy a DLT drive as well... and if you plan on doing any real backups your not going to sit there and load 6 tapes in succession into the drive so your going to need a library of some kind. So, tack on 5000 bucks for a library... I'll make the assumption that your using a some free archival software, otherwise you'd have to tack on some big money for that as well...

    So... 5300 dollar tape solution vs. 500 harddrive solution...

    You choose...

    --
    Yes Francis, the world has gone crazy.
  9. Re:Steve Gibson by ikeleib · · Score: 4, Interesting

    Actually, using RAID5 on tapes is not unusual. It has the same benefits that RAID5 disk arrays have. It allows for the loss of one tape, as well as increased throughput. This technique can actually be extended to any media.

  10. Re:warranty period by Clover_Kicker · · Score: 4, Interesting
    You are missing the point. What is your backup method for backing up 220GB?

    Tapes are designed for backups. If you seriously need to backup 200GB, then you are looking at DLT or better, and it ain't cheap.

    Oh, you don't backup 220GB of personal data on a regular basis?

    Who the fuck has 220GB of personal data? Seriously, for the cost of backing up that much porn, you can just go down to the store and buy the legit DVDs. While you're at it, you can stop off at the record store and buy some albums so you can re-rip your MP3s.

    Just because you have 220GBs of hard drives in your machines doesn't mean you need to back up every byte.

    C:\WINDOWS>ver

    Windows 98 [Version 4.10.2222]

    C:\WINDOWS>du |sort |tail -1
    353472k ./

    C:\games\Diablo II>du |sort |tail -1
    1378784k ./

    C:\games\Diablo II>du save

    1696k save/old/
    3328k save/

    Pop quiz - if I wanted to back up this machine, do I

    • backup 1.5GB of Windows and Diablo binaries
    • backup 3 megs of Diablo II save files (would fit on 2 fucking floppies, FFS.) because I have my Win98SE and Diablo II+LOD CDs on the shelf.

    My documents (resume, web pages, GNU Cash files, email etc.) live on a server, where they are in fact backed up nightly to a second hard drive.

    Every couple of months I burn a CD of the latest backup tarfiles. Cheap CDRs are a half-assed long-term archival solution, but the price is right.

    Some things (Mozilla installer, service packs) are so ephemeral that they aren't worth backing up, i.e. when you need them there will probably be a new version available anyway.

    What about my MP3s and pr0n? When I've got enough new stuff I burn a CD full. Every year or so it's worth re-burning the MP3s so that I've got the same genre on a given CD. When you've got Sarah McLaughlin, Mozart, Dead Kennedies, Suicidal Tendencies, Reverand Horton Heat and Johnny Cash on the same CD, there isn't a person in the world who won't make fun of you.

    So, you trust having no backup at all over having a backup on an unreliable medium?

    I did not recommend that no backup be performed. I said that I do not trust IDE drives for long-term archival use.

    If you are determined to archive to IDE, fill your boots - it ain't my data.

  11. Re:the absolute surefire way to back something up. by ncc74656 · · Score: 4, Interesting
    And just how many tons of paper are you going to need to reliably back up a terabyte in dots and dashes?

    If you were actually going to produce some kind of machine-readable dead-tree backup, it's more likely that you'd produce a type of 2D barcode that could be scanned back in and read. Assuming an 8x10" grid at 200 dpi (the remaining area can be used for alignment and checksumming), you could get about 390K per page (single-sided...you could also double that by making it a "flippy," and you wouldn't need a notch-cutter :-) ). You're still looking at a little over 5 tons for 1 TB, but it's an improvement. 200 dpi should be well within the abilities of currently-available laser printers and scanners. If you wanted to try 300 dpi, you'd more than double your capacity and get about 879K per page (single-sided).

    --
    20 January 2017: the End of an Error.
  12. Re:the absolute surefire way to back something up. by 2nd+Post! · · Score: 5, Interesting
    But each of your 20k per page can easily encode a unicode value, which means you can cram 2 bytes per spot, or only 50 tons per terabyte.

    But how about a 600dpi laser printer, 8"x10"?

    For good readability, we can use:
    ***
    **
    *
    *
    **
    ***
    For (1,0) which gives us 3 dots per bit, or 200 bits per inch. A square inch would then give us 40,000 bits, or 5,000 bytes. A sheet of 8x10 then gives us 400,000 bytes. Or if you tweak the margins, 400k per page. So that's already 20 times your density. Increase the resolution to 1200dpi, and you can increase the data density to 1600k per page.

    We can also use different encodings: Right now we use 9 bits to encode 1 bit of information (really, really, redundant). We can probably safely use the following encoding to double our data density:
    ***

    ***

    *
    *
    *
    *
    *
    *
    So this further gives us 2 bits of information in the same 3x3 square, which increases our data density another 2fold: 800k or 3200k per page. At 1200dpi, that's 3mb per page, so that 1gb == 333 pages, and 1tb == 333k pages. 67 boxes, or 134 pounds per terabyte.

    There are more variations of course. We can increase density to 4 bits per 3x3 square. With a bit of thought, we can also increase the density up to the theoretical limit of 2^9 values in a 3x3 square, but we want to include some leeway for data redundancy...

    So by doubling to 4 bits per square, we require only 70 pounds per terabyte. By doubling again to 8 bits per square, That's down to 35 pounds.

    That much (little) paper... is actually lighter than a terrabyte of digital storage!
  13. Re:the absolute surefire way to back something up. by schmink182 · · Score: 5, Interesting
    To take this a little farther, a helpful reference tells us some useful information.

    2000 sheets of 8-1/2 x 11, 20# laserwriter paper weighs 20 lbs.
    First of all, this changes your estimate of weight from 100 tons to 250 tons.

    Typical yield of paper: 125 lbs per tree
    250 tons (500000 lbs) divided by 125 lbs per tree gives us 4000 trees.

    440 trees per acre
    This, after division, gives us 9 acres of trees destroyed for backing up 1 TB of data. Seem worth it? :)

  14. Re:warranty period by Kaa · · Score: 5, Interesting

    Who the fuck has 220GB of personal data?

    And what's so weird about it?

    A scan of a single frame of a 35mm film, on a high-end consumer film scanner will create a file... let's see:

    The scanner is 4000dpi, so the resulting image is about 4000x6000 pixels. We are working in 16-bit-per-color-channel mode, so that's 6 bytes per single pixel. A bit of multiplication get you 144Mb. As a practical matter, the film frame is slightly smaller so your output TIFF file is about 120Mb in size. That is for a single 35mm film frame.

    So raw scans of slightly under 2000 film frames will already hit the 220Gb figure.

    Still think it's a ridiculous number?

    --

    Kaa
    Kaa's Law: In any sufficiently large group of people most are idiots.
  15. PaperDisk by cameldrv · · Score: 3, Interesting

    www.paperdisk.com claims that they can get either 660K or 1MB depending on resolution on a sheet of paper. How long a piece of paper will last when encoded with this density is unknown, but with good paper I'd bet it's a hell of a lot longer than any disk. Furthermore, even at that density, there's a huge ammount of physical redundancy in the data storage. If the paper gets to be fifty years old or so, I would imagine that the technology would be available to cheaply scan at ultra-high resolution to compensate for any degradation.