Slashdot Mirror


Large IDE Drives as Long-Term Archival Media?

PlatterMan asks: "The question of how to cope with backing up disk drives which are rapidly increasing in size, onto tape and other backup devices which aren't scaling in size as quickly isn't new to Slashdot. Neither is the use of single, raided, and removal disks as backup devices, this has been covered numerous times on Slashdot in e.g. here and here. One thing I haven't really seen discussed however is the feasibility of disk drives as medium to long-term archival media, say 5 to 10 years. Like many people I'm in the position of now having multiple machines with a combined data pool of about 220 Gig, and backing up these onto DDS or DLT tapes is slow and manual to do, and expensive in tape costs. So I'm looking to add a removal drive bay to my primary backup machine and pick up a bunch of large IDE drives, so that I can do regular disk to disk backups over 100 Meg Ethernet (and for my machines which are in cages, over the Net) pulling out and alternating the backup drives on a 3-way backup cycle."

"Backups are of no use without offsite archival copies so I plan to take one set of disks out of the pool, and archive them offsite on a quarterly basis.

However, I've heard horror stories about the data retention and usability off older disks which have been shelved for archival, for example disk stiction - where people try to restore data off of a 4 to 5 year old drive only to find that the disk won't spin up due to solidification of lubricants, or that they've experienced data degradation.

I'd be interested in the Slashdot crowd's opinion on using large IDE drives as an archival media. Clearly one possible problem is being able to get hold of a machine in the future with a suitable IDE interface to plug them into for restoration, but I can't see IDE disappearing within 5 years (maybe 10 though). I'm more interested in experiences and opinions on the suitability of the disks themselves for long-term archival.


  • Is stiction still likely occur on newer makes of IDE drives or have manufacturers beaten the problems which caused this in the past?
  • Likewise how likely is bit drop-out and general data degradation over say a 5 year and 10 year period, and what do people think would be the likely maximum feasible time that a shelved drive would be usable for?
  • Any suggestions as to how would I need to store drives in order to minimize these types of problem and maximise their feasible life as archival media.
Thanks!"

15 of 710 comments (clear)

  1. Um you've pretty much answered your own question. by MisterFancypants · · Score: 4, Insightful
    Hard drives are a horrible archival medium.

    Without normal/regular use, you WILL have problems trying to read from them in 4-5 years time. Hell, the way most IDE drives are these days (note the recent reduction in warrenty time periods), you'll be lucky if the drives last 2 years even WITH regular use.

  2. warranty period by Clover_Kicker · · Score: 5, Insightful

    Since IDE HD manufacturers recently decreased their warranty period, I'd be *really* reluctant to trust 'em 10 years from now.

  3. Alternatives... by anarchima · · Score: 4, Insightful

    People here are saying, "Don't even think about using IDE!". Well he has no choice, does he? Tape has several drawbacks as the author mentions his comment to Slashdot. He has asked for advice on IDE. If this is not a feasible option, recomend some others (besides tape). Or ARE THERE NONE?

  4. A lot of folks will say.... by ajs · · Score: 4, Insightful

    that disks will rot, so you can't trust them.

    I counter with this: tapes rot too. In fact, any tape older than one year that I've had to go back to has been worthless (read: it had deteriorated data).

    Tape is a really bad medium to trust, but we keep buying it because we can't think of a better solution. Personally, I think the way to go is just to give up and admit that disk is not cheap. You need to back up your data to a live mirror system with identical storage (hourly rsync does a nice job) and then you need to arrage a service that can back up your data to remote live mirror systems. Note that in both cases I said "live mirror". You don't want a backup sitting on a cold box because you never know the quality of it until you need it.

    The remote backup part is expensive, but it's the only reliable way. You seed it by tape (full backup to tape, and mail them to the vendor) and then use dedicated lines to keep a regular incremental update going.

    If one of those two backup systems fail you know about it right away and you fix it. No more tapes rotting on a shelf only to be discovered when your data goes south.

    1. Re:A lot of folks will say.... by Chrisje · · Score: 4, Insightful

      Yes, tape will rot. As will anything that is magnetic.

      DDS tape has a guaranteed data retention period of 2 years, but then you may face head alignment problems if you replace the drive. DLT and LTO have data retention periods of 5 years approx. Head alignment problems don't form a problem because of the nature of the mechanism.

      This is however not the point. The point it that a harddrive is not an ARCHIVAL medium. Neither is tape. Harddrives are the work horses for on-line data and tape is meant as a BACKUP. Backup meaning a copy for safe-keeping under a very limited time (ie next week, when tuesdays tape is run again, or... well, you get the point... ).

      CD's (CD-R(W)) offer a theoretical data retention span of 20-100 years depending on who you ask. So that is safer, but still not perfect.

      A Service Level Agreement with a maintenance company would do the trick too, but is expensive.

      But why archive? Doesn't an automated backup to a tape robot with a weekly rolling schedule combined with a RAID 1/5 solution for your single disk failures satisfy your needs? What is so damn important that you need Off-Site ARCHIVAL rather than off-site backups?

      With the falling prices of both tape and disk cost per megabyte, it's affordable to keep all relevant data on the drives of the server and then do backup to tape if needed.

      Just my 2$c.

  5. Tapes *is* the right medium for long term backup by MooRogue · · Score: 5, Insightful

    I'm sorry, but 220GB easily handled by backup tape. With SDLT and AIT tape capacities exceeding 100GB per tape, two tapes can easily handle your load.

    If you have the budget, get an autoloader so you can perform a full backup in one session, or two tape drives for that matter.

    Personally, i am backing up 600+GB onto tape and it works well. I've had numerous IDE hard disk failures, yet not a single data tape failure so far.

  6. Tapes are NOT a long term archival medium. by silentbozo · · Score: 5, Insightful

    Tapes are fine for backups, but I never expect to pull complete and usable data off of them after 6 months. Why? Tape degrades - it's nothing more than rust on platic. As humidity and temperature change, you can end up with a solid roll which will stick to your tape drive heads and result in whole patches of magnetic coating coming off. I worked on a project restoring data from 10+ year old reel-to-reel tape, and it was a nightmare. 1 out of 4 tapes was completely unusable.

    Even worse, tape drive formats keep changing - and since tape drives are guaranteed to wear out, where are you going to get a working tape drive to restore data 5, 10, 15 years from now? I've gone through 3 tape drives in the last 8 years - thank god I got a CD burner early, that data I can still read (although it's about time to start recopying stuff from 1996.)

    Basically, if you entrust your data to tape long term, you have to continuously copy that data to new tapes, and or new tape formats. Where tape has traditionally shined is as a short-term backup format, although with the drop in DVD-burner drives/media, and the high-cost of high-capacity tape drives/media, this may no longer be the case (assuming you get some peon to do the big backup on DVDs, and you get to do daily diffs - otherwise, having a bank of tape drives is cheaper on staff time.)

  7. Re:Steve Gibson by LoudMusic · · Score: 4, Insightful

    No flame, other than the term 'RAID 5'. Tapes aren't as dangerous as hard drives, but they can still mess up. It's not like they're garounteed beyond all odds. So a RAID 5 IDE array takes care of your data.

    I'm currently using Dell NAS machines as archival backups.

    Bonuses (as I see them):
    Online 100mbit access to old data.
    Cheap!
    Fits in a physically small space.

    Negatives:
    Higher failure rate than tape. Pop fizzle, your data is gone.
    Difficult to take off site.
    Long-term replacement isn't really an option. (for RAID replacement)

    The way we negate the negatives (double negative, is that a possitive?):
    -Failure rate / Data loss is countered by RAID
    -Taking it offsite ... it is possible to cost effectively mirror an IDE RAID system over broadband Internet and do it securely. If you are a major corporation surely your campus is large enough to simply run fiber to two corner and put mirrored backup at each location.
    -Long term replacement of RAID drives ... buy a truckload of disks when you do the initial installation? (:

    --
    No sig for you. YOU GET NO SIG!
  8. Re:Why Tape Is Good by BlankTim · · Score: 5, Insightful

    Obviously, you've never had a tape physically fail.

    Maybe it's just me, but after the experiences I've had the last year with crappy tapes, I'm surprised the "tape as a backup medium" idea hasn't been seen for the farce that it is.

    Backing up to IDE or SCSI? Good short term solution, but I don't think I'd trust my backup drives for more than 1 year, tops.

    Burn to CD? Good long term solution, just not practical due to the file sizes involved. Burn to DVD isn't much better.

    It's time for something new. Hell, maybe it will turn into the next "killer thing" and revitalize the economy.

    I vote for soft bubble memory

    --
    Just once, I'd like it if someone called me "Sir".
    Without adding, "You're creating a scene."
  9. Re:Tapes *is* the right medium for long term backu by Drakantus · · Score: 5, Insightful


    "I have $500 to spend on a backup solution for my 220GB data pool, and I was thinking of buying 4 120GB IDE drives along with an IDE RAID1 card and useing the array for backups, anyone have other ideas?"

    "No way, you are insane. IDE is horribly unreliable and you will surely lose your data. You need a $6000 tape drive, if you can't afford it you are better off with no backups at all"

    --
    I love going down to the elementary school, watching all the kids jump and shout, but they dont know I'm using blanks.
  10. Five Points About Archiving by maggard · · Score: 5, Insightful
    1. Accept that you can't just stick magnetic media on a shelf (in a vault, even climate-controlled) and expect it to last forever.

      Bits rot. Under the most perfectly controlled environment the damn stuff still goes bad. Be realistic, anticipate this, do everything you can to slow it down, but plan for it and make provisions when you first put your archiving strategy in place. Tapes are likely more robust the platters as there's fewer critical parts to go wrong but nothing is perfect.

    2. Accept that CD & DVD don't have 100-year lifespans, mebbe not 10 year, and possibly far less.

      Yes they're cheap but we've far less experience with these media then we do with tape and studies are showing that they dyes may not be as stable as first thought. Heck, there's even a bug out there that eats some of these. There's also the question of long-term standards in some cases like DVDs.

    3. Checksums and multiple-backups (that reinforce eachother) are a necessity.

      Nothings worse then losing one part of an archive at one site, another part at a different site, and being unable to easily reconcile the two to get a good whole set. Make sure that however you archive things, same media or different media, that partial archives can be reconciled.

    4. Everything evolves - Keep updating backups.

      Years ago there was a big scramble to recover the US Govt's 1950 Census. It had been stored on steel tape and the required Unisys readers were no longer. (Much of the data was available but the entire raw set wasn't.) Eventually a working one was built from cannibalized parts in museum and private collections but the lesson was clear: Don't depend on the readers. The same goes for the recent BBC Domesday Book debacle - nobody could read the optical disks. Any good archive scheme will call for the material to be re-read and re-transcribed regularly in order to ensure the entire recovery-chain still works: Hardware, software, OS's, etc. If recovery becomes difficult migrate the material.

    5. Be pragmatic about what you archive.

      All too often folks archive everything 'cause they're too lazy to determine what is actually necessary and what isn't. Combine this with the difficulty of later having someone unfamiliar try to winnow down the material and this becomes a real problem. Even worse is later trying to find the useful material among all of the dross. Establish clear policies of what can be archived and make folks justify their material. Just as importantly make sure the costs are clear up front, even to the point of charging them a rate covering several years of storage initially. Suddenly some pack-rat deciding EVERYTHING they've ever typed is potentially a goldmine isn't so funny. Lastly, run everything past Legal: Some of this they don't want hanging around any longer then necessary.

    --
    I don't read ACs: If a post isn't worth so much as a nom de plume to its author then I wont bother either.
  11. Some advice by Monkelectric · · Score: 4, Insightful
    Let me first go on record and say you are a complete fool if you think this will work ... Bite the bullet and buy a 100gb native DLT drive. At my last job I backed up 2.6TB on a DLT+autoloader, I know 220 gigs *seems* like alot of data, but you're small time.

    However, if this is going to have *any* chance of working, you will need to read the drives on a regular basis. I would pop each drive in a machine and (in linux) do a "dd if=/dev/hdc of=/dev/null" to read the entire drive. I would do this monthly.

    Why you ask? Because modern hard drives are sophisticated and they auto-correct errors *before* they become a problem. Hard drives will do things like correct recoverable errors and rewrite weak sectors when they encounter them. Thus if you go over every sector of the drive every once in awhile, you will use the drives auto-correction features to your advantadge (and protect against the drive fading, which would be my primrary concern, not stickage (which is easy to fix)).

    --

    Religion is a gateway psychosis. -- Dave Foley

  12. Re:Why Tape Is Good by skroz · · Score: 4, Insightful

    One very important thing to consider : With certain types of tape drives, a misaligned head can render your tape media useless in another drive of exactly the same type. DLT is a good example of this. You can write and read to your heart's content on the same drive, but try to read a tape written in one drive on another and you can be sunk (professional data recovery experts with the proper tools can work around this, but it's expensive, and the whole point of this discussion was the need for "professional help" if certain parts of the hardware fail.)

    --
    -- Minds are like parachutes... they work best when open.
  13. Re:the absolute surefire way to back something up. by BeBoxer · · Score: 5, Insightful

    But is printing a whole character per bit, or even byte, efficient? I'm curious how much data a laser printer could store on a piece of paper. Is it realistic to expect individual bits printed at 300dpi to actually be retrievable? Perhaps on a good 600dpi or 1200dpi printer.

    300dpi gives us almost 11KBytes per square inch. Figure 70 square inches on a letter page with 1/2" margins. That's 770KB. Print full duplex and you're looking at 1.5MB per page, or roughly a floppy disk (coincidence?) You wouldn't want to back up your MP3 collection, but for an archival method that is likely to last 100 years it's not too bad. Factor in compression and you are probably getting a 100x increase in storage density over plain text. Kind of a neat thought.

  14. Ten year old data by Eric+Green · · Score: 5, Insightful
    I actually have a lot of data that is now 16 years old, including the source code (6502 assembly language) for a BBS program that I wrote as a kid. The secret: Regular migration of data to newer/larger media. From 1541 floppy to Amiga via serial port and xmodem, from Amiga to Linux via serial port and uucp, and on Linux, periodic moving of the data to newer hard drives as I upgrade my systems. I also now maintain a copy of my data in CVS, so that if something gets accidentally erased or changed, I can retrieve a copy. My CVS archive, too, periodically gets moved to newer/larger/faster hard drives.

    And to top it all off, I back it all up to a DDS-4 DAT autochanger. Yes, those six tapes will only hold 120gb, but the amount of important data on my disk drive is far less than 120gb (it is actually less than 20gb, including the original 44.1khz .wav recordings of all my original songs, and fits onto one tape easily).

    Do you *REALLY* need a backup of your .mp3 collection?! Probably not. Do you *REALLY* need a backup of all those ISO CDROM images that you downloaded for fifty versions of Linux and a half dozen versions of FreeBSD? Probably not. But that's the sorts of things that are taking up 80gb plus on my hard drives -- i.e., utterly disposable cruft. Which is true for most personal computers.

    --
    Send mail here if you want to reach me.