Ask Slashdot: Best Offline Storage Method For Large Archives?
An anonymous reader writes "I have a collection of large projects (Indesign files with associated images), which are typically 40GB to 60GB each. In this current climate, what is the 'best' method of archiving these? Spinny magnets? Solid state drives? USB? Tape? Blu-ray? All have pros and cons and price considerations. If I remove the price issue (my data is important to me), does this change the choice?"
For this project, we have multiple multi-terabyte (5-18 terabyte) datasets that need backup. We have online and offline strategies and the offline strategy is simply multiple, redundant copies on hard drives stored in static proof containers onsite and off site.
Hard drives are *very* cheap all things considered, are easy to store, take up very little physical space and if things go badly, restoring from them is faster than just about any other method. For datasets in the GB range, its a no-brainer to go with hard disks.
Visit Jonesblog and say hello.
You probably need to define "best". How long do you really want to keep them for, and in what sort of environment.
Traditionally the answer is tape, and probably will be in your case too for files of that size. Optical isn't proven enough (at least for the sizes your're talking about) to be trusted, and HDD's need to be run up fairly regularly to keep working.
Normal people worry me!
BD-R disks are an idea, and relatively inexpensive, but your best bang per buck would be large removable disks in the 2-3 TB range. The reason I state "disks" plural is for obvious reasons.
I would also use a program like WinRAR with a recovery record, or one of the PAR utilities used for USENET to store your files in. This way, you can tell if there was file corruption, and have a good chance of recovering from it.
For serious stuff where money is less of an issue, I'd consider a LTO-5 tape drive and multiple tapes. Tapes tend to last longer than HDDs because they have very few moving parts.
Don't forget to see about copying your archives to new media every couple years. It isn't uncommon to be able to pop a 10+ year old tape or HDD in and pull off the contents... but it isn't uncommon either to find the HDD clicking, or the tape full of hard errors.
Whoosh!
Screw tape... you pay $2,000 USD for the drive, $50+ per tape for a couple of hundred gigs. Go with bare drive external: Install a trayless SATA bay for 3.5" hard drives... this will run you $12. Buy some bare SATA drives.. these run $50 for 1TB and are available up to 3TB. I buy bare drive hard cases for about $3 each. My Intel ICH10R on-board RAID controller supports hot-swap -- so in effect it's a big 3.5" floppy.. that's right. If your tape drive breaks, you're out another two grand. This is far less expensive, faster, higher density, and random access. In addition, you can boot from it. Want RAID0? Install two trayless SATA bays for a total of $24 and back up in pairs.
Eternally Yours, The case for the development of a reliable repository for the preservation of personal digital objects.
http://explorer.cyberstreet.com/CET4970H-Peterson-Thesis.pdf
Depends on price: HDDs are crazy cheap, for the capacity; but untrustworthy. However, thanks to the cheapness, redundancy, preferably in multiple locations, periodic testing/copying to newer disks/etc. is fairly affordable. Make sure that you have(either manually, at the utility level, or at the FS level, hashes/checksums) and hope for the best. LTOs are rather more durable, having fewer moving parts in the storage media; but the cost of entry is substantially higher. All the same principles apply, though.
There are no truly reliable storage mechanisms for large quantities of digital data, only storage mechanisms cheap enough that you can duplicate your way to reliability.
You can't argue with Tape. It's been proven to last since the 1960's if kept in a climate controlled space (dry/cool). Just make sure to keep a spare tape drive handy (just ask NASA), because spare parts for 40 year old tape drives are surprisingly difficult to locate.
Optical isn't even close, assuming you're talking burned discs. Taiyo Yuden claims a 70 year shelf life, but they have only been around for what, 8 years tops?
Hard drives are an option if you've built a redundant array, but even with that you're still going to be out of luck if you burn up your raid controller.
How long?
What is good for a decade may not be good for a century, and vice versa.
For millenium+ archives, nothing beats punch cards.
"To those who are overly cautious, everything is impossible. "
"Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it" - Linus Torvalds
--
BMO
If price is not an issue, a great solution is to go with a data-deduplication device (such as EMC DataDomain or IBM Protectier). If you were to host one unit in your basement and the other in coloc environment far from your home, you could setup replication and have a very reliable archive. Coloc of a 1U device can be quite cheap, I have one of them for which I pay less than 100$ a month.
If you have a smaller budget, then the best cost-benefit is still found on tape, and it can even work in case of network disruption. Like Andrew Tannenbaum said: "Never underestimate the bandwidth of a station wagon full of tapes". A single LTO-5 tape is very cheap (50-60$) and can store 1.5TB (can easily double that with dedup).
There are other interesting technologies out there, such as MAID, which you can use as a VTL with a good backup software to maintain a reliable archive, however cheap disks are cheap and in a MAID configuration they might not last as long as typical disks because of the on/off behavior.
lucm, indeed.
The title choosen by the author of the original post: "Best Offline Storage Method For Large Archives?"
Your answer: "Why not go with an online storage solution such as Amazon S3"
I suspect that one of you is off-topic, but I also wanted to say that S3 is really a great service and quite cheap.
lucm, indeed.
Put it in the cloud! *waves arms like it's something mystical*
Seriously though, there is no great solution. Burned discs separate over time, there's not enough data on SSDs yet but it's not looking promising, platter drives are susceptible to radiation, tape to magnetic fields and degradation. HDD in triplicate, replace every 7-10 years is the "best" method right now. So despite being modded down, serkit is right. Hard drives.
OP: "If I remove the price issue (my data is important to me), does this change the choice?"
ME: If price isn't an issue then you don't choose one, you choose them all.
Yeah, that's a great solution if all you want to do is detect corruption, but note the GP's point about havng "a good chance of recovering from it". The only way to recover with BitTorrent is to have another copy available to replace any bad blocks. PAR2, on the other hand, is able to recover any random missing X% of data from a dataset as long as X% of PAR2 data was generated.
Momentarily, the need for the construction of new light will no longer exist.
I think the bottom line is that no medium is bulletproof. If you really care about the data and money is no object then a combination of at least two different mediums is the way to go.
Aside from the usual suspects like tape and HDD I'd suggest looking at flash memory. Expensive per GB but also not prone to mechanical problems. Most flash memory states data retention for 10 years, but it is a little bit more complicated than that. Every time you write data to a flash memory device it "refreshes" and the 10 year counter for that data starts again. To be safe you should probably be imaging and re-writing the flash every year or two.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
I'm seeing a lot of really goofy suggestions in here. I'm going to make my own. First, let me say that my last job was to create massive image archives sourced from disparate media, and store them, permanently. Massive, as in 30tb a year. (maybe not that massive, but we were a tiny company, with a matching budget).
First, let me tell you what won't work. Optical media. Just DON'T. It's unreliable, slow and generally a pain in the ass. I worked at a place that burned 150 CDs a day for distribution, we had consistent failure rates within 20 days of 50%. Granted, that's using the cheapest possible media, but that's still awful. Further our "archive" had thousands of discs in it, was stored well, and as a whole, had a 41% failure rate over 10 years. Optical media is crap for long term storage.
Something else that won't work, TAPE. I know, heresy. But listen for a minute... do you know anything about tape? Ever used it? No? Then don't touch it, unless you plan to hire someone that is an expert to build out the system and keep it running. Were you planning to hire a full time systems manager? I didn't think so. Alternately, if you happen to have experience with tape, hell, use it. You can't beat the density or reliability.
Now, a suggestion that does work. Build your own NAS (or buy one if you don't have the chops to build it). You ought to be able to build/buy a 5tb array for under 3k, give or take. It will quietly hum along in the closet doing it's thing for pretty much the next few years. After 3 years, start a swap program to replace each and every hard drive. Doing this all at once allows you to store the old raid in cold storage (box it up and stick it in the corner). Doing this at the rate of one drive per month allows you to absorb the costs a little easier. Continue forever.
Now, if you are really nuts, and you actually think your data is valuable (you know, like you can trade it for money at some point), then you build out the NAS, order three of them, and keep one at your mom's house (or wherever), then you buy co-lo rack space and put the third unit (did I mention you need 3?) in there and sync all three as often as you can afford the bandwidth. This is, for all intents and purposes, how google backs up data. 3 systems, in 3 locations, each with a complete copy of the data. It's not exactly CHEAP, but neither is redoing all that work.
I'm going to leave out suggestions like using a kodak image writer to burn the images to microfilm that is digitally indexed. Why, because you don't know the first thing about a system like that, and because you want "backups" not permanent archives. Also, you can't afford this method. I'll also skip the really wacky shit, like using BD discs, or SSD arrays (in the terrabyte range? Fuck off$$$), or anything that involves the clouds.
Storing relatively large groups of data has been dirt cheap and easy for the last 5 or so years. Even before that it wasn't that hard. Don't invent a difficult system, or buy into enterprise gear. You don't need difficult, and you don't need a NAS that performs 100,000 IO ops a second with a fiber channel back haul. You need a couple of raided drives in a box in the corner, powered up pretty much all the time.
Oh yeah, and do you know the single greatest cause of HDD failure? Cold storage. TURN THE FUCKING THINGS ON, and leave them that way. They last MUCH, much longer. God it was hard to teach people that concept at my last company. No, putting the drives in a box in the storage locker does not make them last longer, in fact, they started failing the minute you unplugged them. (yes, I know, physical shock is probably actually higher up on the list, as is manufacturing defects, a little hyperbole never hurt anyone)
There are only 2 real solutions if you want real long term storage. The first is you become Linus and just dump it on a server and let the rest of the world back it up, and the second is you make your data a religious text somehow. Because those guys with translate it for centuries to come, even if it means sitting 50 dudes in a room for 3 years with nothing but a feather, ink, and parchment.
come to think of it, same thing.
Never trust an atom. They make up everything.
Unless you buy extremely good archival grade discs, optical media is the worst suggestion.
Even with archival-grade disks it's still the worst suggestion.
Apart from tape - yeah, let's put all our data on something that can't be read without specialized hardware! (where will you get a tape drive from in an emergency?)
Hard disks can be connected to any PC, they're cheap, they're fast. The only problems I've ever had with USB disks is failure of the cheap-ass wall-warts they supply them with. Luckily all USB drives use either 5V/12V so it's easy to wire them up to a spare PC power supply. I have one under the desk and any USB disk which is switched on all day gets connected to that. The wall-wart goes in a drawer for emergencies.
All other considerations aside though, the only thing that's going to garantee long-term success is:
a) Use something that can be read on any machine with no special hardware or drivers.
b) Make multiple copies of the data and store it in different locations.
c) Use some widely used, non-proprietary format for combining/compressing the files (eg. zip).
Base whatever you do on this philosophy and you should be OK.
No sig today...
I'm a PhD student studying magnetism, and one thing I can say for sure is that DVD/BR media is not the way to go. Professionally printed media (the silver bottom) uses a stamp to make a mechanical impression, not unlike vinyl records. Once sealed, it last forever. Writable media uses a die, and unless you store your media at 0K, finite temperatures will cause the die to diffuse and the media become useless. This takes much less time than people think. Good disks will last 10 years, cheap ones only a few years. The problem is that it's impossible to tell anymore who is making the good disks, since all of the production lines get shared by many brands.
Alternatively, magnetic storage isn't that great either (tapes or HDDs). For both a HDD and tape, thermal fluctuations cause random data to be lost, but hard drives are designed to recover this data and correct it. If you pull your hard drive off line for several years, it doesn't have the opportunity to constantly scan itself and check for these errors, so never expect an unpowered hard drive to store data for long periods of time - they just are not designed to do this.
As previous users have pointed out, software raid is the only way to go. Hardware raid provides a single point of failure, and is really only suitable for high performance and short term reliability, not long term reliability.
Tape drives also have the same thermal fluctuations issue, but because the magnetic grains can be much larger (tapes have 1000's of times more surface area to store the same amount of data) they can go much longer. I would still "refresh" my tapes every year or two though.
Based on your requirements, I would suggest tape first, then a large software raid of HDDs. Anything else is just not safe!