Slashdot Mirror


Ask Slashdot: Best Offline Storage Method For Large Archives?

An anonymous reader writes "I have a collection of large projects (Indesign files with associated images), which are typically 40GB to 60GB each. In this current climate, what is the 'best' method of archiving these? Spinny magnets? Solid state drives? USB? Tape? Blu-ray? All have pros and cons and price considerations. If I remove the price issue (my data is important to me), does this change the choice?"

31 of 397 comments (clear)

  1. Rotational media by BWJones · · Score: 4, Interesting

    For this project, we have multiple multi-terabyte (5-18 terabyte) datasets that need backup. We have online and offline strategies and the offline strategy is simply multiple, redundant copies on hard drives stored in static proof containers onsite and off site.

    Hard drives are *very* cheap all things considered, are easy to store, take up very little physical space and if things go badly, restoring from them is faster than just about any other method. For datasets in the GB range, its a no-brainer to go with hard disks.

    --
    Visit Jonesblog and say hello.
    1. Re:Rotational media by Mad+Merlin · · Score: 5, Insightful

      I concur on this point, online storage really makes the most sense. Cheap, high performance (for sequential read/write) and easily expandable. You can get a single machine with dozens of SATA drives in it (including the drives) for way under 5 figures. When drives fail, they're simple to replace, and every couple years, migrate the whole thing to newer (faster, bigger) drives. Mirror your data unless you don't care about it. RAID 1/10 for really small datasets (2-4 drives), RAID 6 for moderate size datasets (5-10 drives) and RAID 60 for anything bigger.

      A very important note to keep in mind... stay away from hardware RAID! When your controller dies, so does all your data, unless you have an identical spare controller card (buy it up front, they won't exist in a couple years). The same goes for fake RAID (ie, software RAID driven by the BIOS), but s/controller card/motherboard/g;. Pure software RAID (ie, using mdadm) is a safe bet.

    2. Re:Rotational media by juventasone · · Score: 2

      Even on a small scale this makes sense. The easiest is 2.5" external drives ($100 for 1TB). This avoids the mess of power adapters. If you need significantly more storage, you may want to consider a dock ($50) and internal desktop drives ($80 for 2TB). Consider this: you can buy from anywhere a USB adapter that will plug into a 20+ year old drive and any OS will mount it. Wish I could say the same about all my removable media...

      Traditionally the way to do this is with tape. As you replace the drive (and you will), your tape capacity increases, but it will be read-compatible with your old tapes. The investment is huge, but it makes it very easy to replicate, take off-site, archive, etc.

    3. Re:Rotational media by Local+ID10T · · Score: 2

      A very important note to keep in mind... stay away from hardware RAID! When your controller dies, so does all your data, unless you have an identical spare controller card (buy it up front, they won't exist in a couple years). .

      I have to disagree... Adaptec raid card -they stand the test of time. Hell i can still buy a 2940 controller card if I want one...new!

      --
      "You want to know how to help your kids? Leave them the fuck alone." -George Carlin
    4. Re:Rotational media by lucm · · Score: 3, Informative

      > and a _good_ storage controller such as a xyratex.

      I would rather run Windows Home Server on a RAID-0 of IBM DeathStars installed in a HP Pavilion than deal with a Xyratex.

      --
      lucm, indeed.
    5. Re:Rotational media by Keruo · · Score: 4, Informative

      > Online storage makes no sense.
      In storage, online means the data is connected and instantly available(harddrive etc) vs offline(dvd,tape etc)

      --
      There are no atheists when recovering from tape backup.
    6. Re:Rotational media by vegiVamp · · Score: 3, Insightful

      Fully concur, and let me sum it up: The best type of long-term storage is "redundant".

      --
      What a depressingly stupid machine.
    7. Re:Rotational media by petermgreen · · Score: 2

      For offline archival I would avoid raid, it's just another thing to potentially go wrong when you try and hook the drives back up and retrieve the data.

      To protect against corruption and drive failures either just keep multiple copies and checksums so you can tell which copy is good (simpler but less efficient in terms of the protection you get relative to the storage space you use) or use something like parchive. Maybe combine the two with two sets of drives in different locations holding the data and then parchive in case the same drive becomes failed/corrupt in both sets.

      Whatever you do with offline hard drives make sure you do tests once in a while. so you can see when drives are failed/failing and replace them before the situation becomes critical. You should also keep an eye out for drives using obsolete interfaces and migrate the data to more modern drives.

      KISS should be your guiding principle. If you can't retrive your data with nothing more than the raw drives, a standard PC, a linux livecd and common FOSS tools then you are doing it wrong.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    8. Re:Rotational media by the_B0fh · · Score: 2

      *WHAT* century are you living in? In the 1990s, Adaptec was a good brand. Not any more. Not just not good, but actively sucky.

      http://marc.info/?l=openbsd-misc&m=125783114503531&w=2
      http://marc.info/?l=openbsd-misc&m=126775051500581&w=2
      http://marc.info/?l=openbsd-misc&m=128779369427908&w=2

      (Read 'em in order)

      Adaptec: another way of saying "my data is not important"
      Adaptec: unsafe on any platform.

    9. Re:Rotational media by LordLimecat · · Score: 2

      I cant believe my ears. Someone is asking for advice on archival media, have suggested that tape is an option and price is not a big factor, and the response is "yea, you should buy less reliable, more expensive rotational media".

      Why on EARTH wouldnt he pick up a cheap LTO3 drive (which can be had for $200 these days), grab a couple $20 tapes (holding ~300GB), and call it a day? Theres your multiple copies, LTO supports WORM tech, and youre basically guarenteed that you will be able to read the tapes for the next 10 years or so (read-ability is guarenteed 2 generations back in LTO, so LTO5 drives can handle LTO3 media).

      Hard drives just are not generally convenient to have multiple media sets (unless you want to find a hot-swap bay and a zillion drive cages), theyre prone to disaster, SATA connectors REALLY arent meant for rotation (theyre rated for 50 insertions per connector...), they have motors and bearings that can wear out, etc, and at the end of the day are more expensive.

      Think of it this way-- once youve bit the cost of the tape drive, cost wise you can afford 3 tapes for every single drive. So if your data is really valuable, take into consideration the number of backups you can feasibly afford on tape vs spinning platter.

    10. Re:Rotational media by cthulhu11 · · Score: 2

      We had a single JBOD made by Nstor, who were bought by Xyratex. Xyratex in theory serviced it, but we had to rip the interface board out and send it away for several weeks, no advance or on-site replacement, and when it came back, it was still broken. This array actually had two disks per tray, which meant that for online servicing one had to know that in advance (not always obvious when one is remote) and lay out volumes very carefully. Mind you this wasn't a Xyratex product per se, but the way they handled it didn't impress me. The Linux md driver is meh, but at least it doesn't require compiling metadevice layout into the kernel. It's there and it works, but like SVM/SDS/ODS from 15 years ago it's dated and limited, and mdadmin et al are clumsy at best. "Hardware RAID" means different things to different people. Two main divisions: o HBA RAID. Claimed advantage in that mirror/parity writes don't clog up the host channel. Another is that on systems with anachronistic BIOS stupidity, booting from a mirror when the primary fails can be difficult or often impossible to set up. Big disadvantages: - No model that I've yet seen does 3-way mirrors. None. Disks fail, and more than once I've had a mirror fail while replacing the other side. It can take time to get bad disks replaced, and if remote local hands yank the good side of a mirror pair instead of the bad, you're screwed. - SPoF. Volumes can't span HBA's, and yes, HBA's do go bad. - Crummy monitoring. Some vendors supply CLI's for some OS's, but ongoing support is a uncertain, and the interfaces are always downright horrid. raidctl, for example, on my systems behaves in at least three distinct ways. - Want to use one for the boot device? Be prepared to hit a multi-key sequence during a split-second at the BIOS level after POST. These often assume that you're sitting in front of a desktop, and can enter Alt-keys or function keys to invoke. Using a remote serial console? Sorry, you're screwed. o Chassis RAID: SPoF unless 2-3 identical arrays are mirrored in software. No monitoring to speak of. If you get a serial console, it may need a proprietary adapter cable, null-modem, *and* DB9-RJ45 adapter, as many mistake a 1/8" round headphone-style jack for a serial connector. Linux storage is miserable, still stuck in the mindset of a desktop user with 1-2 drives. The sda/sdb/sdc/etc. naming convention is meant to echo legacy MS-DOS drive letters, and obscures vital information about which drives are in what slot of what array. cXtXdX FTW. When adding an HBA or disks, existing disks can even suffer name changes. ZFS / btrfs is *desperately* needed, but there is no indication of viability anytime soon.

  2. Tape by sirsnork · · Score: 2

    You probably need to define "best". How long do you really want to keep them for, and in what sort of environment.

    Traditionally the answer is tape, and probably will be in your case too for files of that size. Optical isn't proven enough (at least for the sizes your're talking about) to be trusted, and HDD's need to be run up fairly regularly to keep working.

    --

    Normal people worry me!
  3. Large removable disk on the low end, tape highend by mlts · · Score: 4, Informative

    BD-R disks are an idea, and relatively inexpensive, but your best bang per buck would be large removable disks in the 2-3 TB range. The reason I state "disks" plural is for obvious reasons.

    I would also use a program like WinRAR with a recovery record, or one of the PAR utilities used for USENET to store your files in. This way, you can tell if there was file corruption, and have a good chance of recovering from it.

    For serious stuff where money is less of an issue, I'd consider a LTO-5 tape drive and multiple tapes. Tapes tend to last longer than HDDs because they have very few moving parts.

    Don't forget to see about copying your archives to new media every couple years. It isn't uncommon to be able to pop a 10+ year old tape or HDD in and pull off the contents... but it isn't uncommon either to find the HDD clicking, or the tape full of hard errors.

  4. Re:VHS by ZorinLynx · · Score: 2

    Whoosh!

  5. Bare Drives via Hot Pluggable Trayless SATA by bynick · · Score: 3, Informative

    Screw tape... you pay $2,000 USD for the drive, $50+ per tape for a couple of hundred gigs. Go with bare drive external: Install a trayless SATA bay for 3.5" hard drives... this will run you $12. Buy some bare SATA drives.. these run $50 for 1TB and are available up to 3TB. I buy bare drive hard cases for about $3 each. My Intel ICH10R on-board RAID controller supports hot-swap -- so in effect it's a big 3.5" floppy.. that's right. If your tape drive breaks, you're out another two grand. This is far less expensive, faster, higher density, and random access. In addition, you can boot from it. Want RAID0? Install two trayless SATA bays for a total of $24 and back up in pairs.

  6. We really need..... by pcjunky · · Score: 4, Informative

    Eternally Yours, The case for the development of a reliable repository for the preservation of personal digital objects.

    http://explorer.cyberstreet.com/CET4970H-Peterson-Thesis.pdf

  7. The practices are the same... by fuzzyfuzzyfungus · · Score: 2

    Depends on price: HDDs are crazy cheap, for the capacity; but untrustworthy. However, thanks to the cheapness, redundancy, preferably in multiple locations, periodic testing/copying to newer disks/etc. is fairly affordable. Make sure that you have(either manually, at the utility level, or at the FS level, hashes/checksums) and hope for the best. LTOs are rather more durable, having fewer moving parts in the storage media; but the cost of entry is substantially higher. All the same principles apply, though.

    There are no truly reliable storage mechanisms for large quantities of digital data, only storage mechanisms cheap enough that you can duplicate your way to reliability.

  8. Tape/climate control by triffid_98 · · Score: 3, Informative

    You can't argue with Tape. It's been proven to last since the 1960's if kept in a climate controlled space (dry/cool). Just make sure to keep a spare tape drive handy (just ask NASA), because spare parts for 40 year old tape drives are surprisingly difficult to locate.

    Optical isn't even close, assuming you're talking burned discs. Taiyo Yuden claims a 70 year shelf life, but they have only been around for what, 8 years tops?

    Hard drives are an option if you've built a redundant array, but even with that you're still going to be out of luck if you burn up your raid controller.

  9. You neglect the most important question... by stox · · Score: 2

    How long?

    What is good for a decade may not be good for a century, and vice versa.

    For millenium+ archives, nothing beats punch cards.

    --
    "To those who are overly cautious, everything is impossible. "
    1. Re:You neglect the most important question... by artor3 · · Score: 2

      For millenium+ archives, you better just make it a book. Computers have been around for how many centuries? What's that you say, approximately zero? Well, then we probably shouldn't assume that they'll look the same in ten. Heck, even your writing will probably look like "Hwaet! We Gardena in geardagum, theodcyninga, thrym gefrunon, hu dha aethelingas ellen fremedon," to people in 3011, but there will probably be historians who can manage it. Maybe there'll be historians who specialize in all the different file formats we use today, but I wouldn't hold my breath.

  10. Torvalds quote by bmo · · Score: 3, Interesting

    "Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it" - Linus Torvalds

    --
    BMO

  11. Dedup or Tape by lucm · · Score: 3, Interesting

    If price is not an issue, a great solution is to go with a data-deduplication device (such as EMC DataDomain or IBM Protectier). If you were to host one unit in your basement and the other in coloc environment far from your home, you could setup replication and have a very reliable archive. Coloc of a 1U device can be quite cheap, I have one of them for which I pay less than 100$ a month.

    If you have a smaller budget, then the best cost-benefit is still found on tape, and it can even work in case of network disruption. Like Andrew Tannenbaum said: "Never underestimate the bandwidth of a station wagon full of tapes". A single LTO-5 tape is very cheap (50-60$) and can store 1.5TB (can easily double that with dedup).

    There are other interesting technologies out there, such as MAID, which you can use as a VTL with a good backup software to maintain a reliable archive, however cheap disks are cheap and in a MAID configuration they might not last as long as typical disks because of the on/off behavior.

    --
    lucm, indeed.
  12. Re:Online Storage by lucm · · Score: 2

    The title choosen by the author of the original post: "Best Offline Storage Method For Large Archives?"
    Your answer: "Why not go with an online storage solution such as Amazon S3"

    I suspect that one of you is off-topic, but I also wanted to say that S3 is really a great service and quite cheap.

    --
    lucm, indeed.
  13. Re:Hard drives by JMJimmy · · Score: 2

    Put it in the cloud! *waves arms like it's something mystical*

    Seriously though, there is no great solution. Burned discs separate over time, there's not enough data on SSDs yet but it's not looking promising, platter drives are susceptible to radiation, tape to magnetic fields and degradation. HDD in triplicate, replace every 7-10 years is the "best" method right now. So despite being modded down, serkit is right. Hard drives.

  14. Obvious answer by phizi0n · · Score: 2

    OP: "If I remove the price issue (my data is important to me), does this change the choice?"

    ME: If price isn't an issue then you don't choose one, you choose them all.

  15. Re:BitTorrent hash check by QRDeNameland · · Score: 3, Insightful

    Yeah, that's a great solution if all you want to do is detect corruption, but note the GP's point about havng "a good chance of recovering from it". The only way to recover with BitTorrent is to have another copy available to replace any bad blocks. PAR2, on the other hand, is able to recover any random missing X% of data from a dataset as long as X% of PAR2 data was generated.

    --
    Momentarily, the need for the construction of new light will no longer exist.
  16. Re:Large removable disk on the low end, tape highe by AmiMoJo · · Score: 2

    I think the bottom line is that no medium is bulletproof. If you really care about the data and money is no object then a combination of at least two different mediums is the way to go.

    Aside from the usual suspects like tape and HDD I'd suggest looking at flash memory. Expensive per GB but also not prone to mechanical problems. Most flash memory states data retention for 10 years, but it is a little bit more complicated than that. Every time you write data to a flash memory device it "refreshes" and the 10 year counter for that data starts again. To be safe you should probably be imaging and re-writing the flash every year or two.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  17. Lots of goofy options by KahabutDieDrake · · Score: 2

    I'm seeing a lot of really goofy suggestions in here. I'm going to make my own. First, let me say that my last job was to create massive image archives sourced from disparate media, and store them, permanently. Massive, as in 30tb a year. (maybe not that massive, but we were a tiny company, with a matching budget).

    First, let me tell you what won't work. Optical media. Just DON'T. It's unreliable, slow and generally a pain in the ass. I worked at a place that burned 150 CDs a day for distribution, we had consistent failure rates within 20 days of 50%. Granted, that's using the cheapest possible media, but that's still awful. Further our "archive" had thousands of discs in it, was stored well, and as a whole, had a 41% failure rate over 10 years. Optical media is crap for long term storage.

    Something else that won't work, TAPE. I know, heresy. But listen for a minute... do you know anything about tape? Ever used it? No? Then don't touch it, unless you plan to hire someone that is an expert to build out the system and keep it running. Were you planning to hire a full time systems manager? I didn't think so. Alternately, if you happen to have experience with tape, hell, use it. You can't beat the density or reliability.

    Now, a suggestion that does work. Build your own NAS (or buy one if you don't have the chops to build it). You ought to be able to build/buy a 5tb array for under 3k, give or take. It will quietly hum along in the closet doing it's thing for pretty much the next few years. After 3 years, start a swap program to replace each and every hard drive. Doing this all at once allows you to store the old raid in cold storage (box it up and stick it in the corner). Doing this at the rate of one drive per month allows you to absorb the costs a little easier. Continue forever.

    Now, if you are really nuts, and you actually think your data is valuable (you know, like you can trade it for money at some point), then you build out the NAS, order three of them, and keep one at your mom's house (or wherever), then you buy co-lo rack space and put the third unit (did I mention you need 3?) in there and sync all three as often as you can afford the bandwidth. This is, for all intents and purposes, how google backs up data. 3 systems, in 3 locations, each with a complete copy of the data. It's not exactly CHEAP, but neither is redoing all that work.

    I'm going to leave out suggestions like using a kodak image writer to burn the images to microfilm that is digitally indexed. Why, because you don't know the first thing about a system like that, and because you want "backups" not permanent archives. Also, you can't afford this method. I'll also skip the really wacky shit, like using BD discs, or SSD arrays (in the terrabyte range? Fuck off$$$), or anything that involves the clouds.

    Storing relatively large groups of data has been dirt cheap and easy for the last 5 or so years. Even before that it wasn't that hard. Don't invent a difficult system, or buy into enterprise gear. You don't need difficult, and you don't need a NAS that performs 100,000 IO ops a second with a fiber channel back haul. You need a couple of raided drives in a box in the corner, powered up pretty much all the time.

    Oh yeah, and do you know the single greatest cause of HDD failure? Cold storage. TURN THE FUCKING THINGS ON, and leave them that way. They last MUCH, much longer. God it was hard to teach people that concept at my last company. No, putting the drives in a box in the storage locker does not make them last longer, in fact, they started failing the minute you unplugged them. (yes, I know, physical shock is probably actually higher up on the list, as is manufacturing defects, a little hyperbole never hurt anyone)

  18. Real solution by jcoy42 · · Score: 5, Funny

    There are only 2 real solutions if you want real long term storage. The first is you become Linus and just dump it on a server and let the rest of the world back it up, and the second is you make your data a religious text somehow. Because those guys with translate it for centuries to come, even if it means sitting 50 dudes in a room for 3 years with nothing but a feather, ink, and parchment.

    come to think of it, same thing.

    --
    Never trust an atom. They make up everything.
  19. Re:Multiple copies by Joce640k · · Score: 2

    Unless you buy extremely good archival grade discs, optical media is the worst suggestion.

    Even with archival-grade disks it's still the worst suggestion.

    Apart from tape - yeah, let's put all our data on something that can't be read without specialized hardware! (where will you get a tape drive from in an emergency?)

    Hard disks can be connected to any PC, they're cheap, they're fast. The only problems I've ever had with USB disks is failure of the cheap-ass wall-warts they supply them with. Luckily all USB drives use either 5V/12V so it's easy to wire them up to a spare PC power supply. I have one under the desk and any USB disk which is switched on all day gets connected to that. The wall-wart goes in a drawer for emergencies.

    All other considerations aside though, the only thing that's going to garantee long-term success is:
    a) Use something that can be read on any machine with no special hardware or drivers.
    b) Make multiple copies of the data and store it in different locations.
    c) Use some widely used, non-proprietary format for combining/compressing the files (eg. zip).

    Base whatever you do on this philosophy and you should be OK.

    --
    No sig today...
  20. Don't use DVD media by Jon+Harms · · Score: 2

    I'm a PhD student studying magnetism, and one thing I can say for sure is that DVD/BR media is not the way to go. Professionally printed media (the silver bottom) uses a stamp to make a mechanical impression, not unlike vinyl records. Once sealed, it last forever. Writable media uses a die, and unless you store your media at 0K, finite temperatures will cause the die to diffuse and the media become useless. This takes much less time than people think. Good disks will last 10 years, cheap ones only a few years. The problem is that it's impossible to tell anymore who is making the good disks, since all of the production lines get shared by many brands.

    Alternatively, magnetic storage isn't that great either (tapes or HDDs). For both a HDD and tape, thermal fluctuations cause random data to be lost, but hard drives are designed to recover this data and correct it. If you pull your hard drive off line for several years, it doesn't have the opportunity to constantly scan itself and check for these errors, so never expect an unpowered hard drive to store data for long periods of time - they just are not designed to do this.

    As previous users have pointed out, software raid is the only way to go. Hardware raid provides a single point of failure, and is really only suitable for high performance and short term reliability, not long term reliability.

    Tape drives also have the same thermal fluctuations issue, but because the magnetic grains can be much larger (tapes have 1000's of times more surface area to store the same amount of data) they can go much longer. I would still "refresh" my tapes every year or two though.

    Based on your requirements, I would suggest tape first, then a large software raid of HDDs. Anything else is just not safe!