Ask Slashdot: Best Offline Storage Method For Large Archives?
An anonymous reader writes "I have a collection of large projects (Indesign files with associated images), which are typically 40GB to 60GB each. In this current climate, what is the 'best' method of archiving these? Spinny magnets? Solid state drives? USB? Tape? Blu-ray? All have pros and cons and price considerations. If I remove the price issue (my data is important to me), does this change the choice?"
Put it all on 5 1/4" floppies :)
"Freedom in the USA is not the ability to do what you want. It is the ability to stop others from doing what THEY want"
Definitely hard drives
If you're worried about long term reliability, try a raid-1 array of a spinning drive mirrored against an SSD. Make monthly backups to optical. That way, if your SSD fails, you still have two other options. This would probably be the most affordable method us mere mortals could have to hope to store long term data with a pretty good reliability. Unless you can get your hands on a second hand tape drive and some 500g tapes.
For this project, we have multiple multi-terabyte (5-18 terabyte) datasets that need backup. We have online and offline strategies and the offline strategy is simply multiple, redundant copies on hard drives stored in static proof containers onsite and off site.
Hard drives are *very* cheap all things considered, are easy to store, take up very little physical space and if things go badly, restoring from them is faster than just about any other method. For datasets in the GB range, its a no-brainer to go with hard disks.
Visit Jonesblog and say hello.
You probably need to define "best". How long do you really want to keep them for, and in what sort of environment.
Traditionally the answer is tape, and probably will be in your case too for files of that size. Optical isn't proven enough (at least for the sizes your're talking about) to be trusted, and HDD's need to be run up fairly regularly to keep working.
Normal people worry me!
BD-R disks are an idea, and relatively inexpensive, but your best bang per buck would be large removable disks in the 2-3 TB range. The reason I state "disks" plural is for obvious reasons.
I would also use a program like WinRAR with a recovery record, or one of the PAR utilities used for USENET to store your files in. This way, you can tell if there was file corruption, and have a good chance of recovering from it.
For serious stuff where money is less of an issue, I'd consider a LTO-5 tape drive and multiple tapes. Tapes tend to last longer than HDDs because they have very few moving parts.
Don't forget to see about copying your archives to new media every couple years. It isn't uncommon to be able to pop a 10+ year old tape or HDD in and pull off the contents... but it isn't uncommon either to find the HDD clicking, or the tape full of hard errors.
Use SATA drives, possibly in hot-pluggable trays. Treat them right, store them right, spin them up periodically, and use a filesystem (like ZFS) that can do data integrity checking. And as others have said, if it's important, mirror it (RAID 1, etc).
http://en.wikipedia.org/wiki/ArVid
Get a bunch of 3tb HDD's and put them in a raid 6 then you are protected against multiple drive failures and so monthly/ bi-monthly tape backups
Multiple spinning drives regularly upgraded. I have 1.9 TB of my photography that I try to keep archived. I maintain HDs of different brands at work, another set at home and another set at the parents 100 miles away. Copy and duplicate these regularly. Right now integrating 300GB of photos into my system and its a PITA, but worth it. Anything short of the Ruskies frying every HD in America with an EMP burst has me sleeping soundly.
And though I would love to, the thought of burning some 500 archival DVDs or ~100 Blu-Rays has me staying away from optical media (this is what I used to use, and I currently have close to a 1000 CDs and DVDs of photos and projects sitting in my storage unit)
same goes for the half TB of music and movies, but my photos are more precious, therefore receive the extra attention.
Hard disk NAS storage would work best. Spinning disk has been around long enough to make it reliable and cheap. for 250$ you can get a really good NAS setup with 2 to 3 TB.
There is no such thing as offline storage any more, except as a transient backup of spinning media.
RAID it and spool a copy out to tape.
See that "Preview" button?
Screw tape... you pay $2,000 USD for the drive, $50+ per tape for a couple of hundred gigs. Go with bare drive external: Install a trayless SATA bay for 3.5" hard drives... this will run you $12. Buy some bare SATA drives.. these run $50 for 1TB and are available up to 3TB. I buy bare drive hard cases for about $3 each. My Intel ICH10R on-board RAID controller supports hot-swap -- so in effect it's a big 3.5" floppy.. that's right. If your tape drive breaks, you're out another two grand. This is far less expensive, faster, higher density, and random access. In addition, you can boot from it. Want RAID0? Install two trayless SATA bays for a total of $24 and back up in pairs.
http://www.drobo.com/
Make sure it is hot pluggable with USB (if one exists yet) as both IDE and Scsi have changed many times with incompatible adapters and cables with different plugs. Odds are they will change again and be unreadable in a couple of years.
Dvd's have rot in which the metallic thin sheet peels off. They say it is based on UV light damage but I found a Gentoo cd under a dark bed in a blinded room from 2005 that is rotting away as we speak. So BluRay discs are out of the question.
Another slashdotter mentioned an external hard drive but magnetic interference from the Earth would erase it like my old audio tapes within a decade or two.
Whatever you choose make sure it is external as USB and Firewire like to remain backwards compatible and this makes it easy to share between machines. Something solid state is the best way to make sure the data remains secure. Or find an internet provider where you can upload it too if you do not mind paying per month or year.
http://saveie6.com/
Eternally Yours, The case for the development of a reliable repository for the preservation of personal digital objects.
http://explorer.cyberstreet.com/CET4970H-Peterson-Thesis.pdf
Depends on price: HDDs are crazy cheap, for the capacity; but untrustworthy. However, thanks to the cheapness, redundancy, preferably in multiple locations, periodic testing/copying to newer disks/etc. is fairly affordable. Make sure that you have(either manually, at the utility level, or at the FS level, hashes/checksums) and hope for the best. LTOs are rather more durable, having fewer moving parts in the storage media; but the cost of entry is substantially higher. All the same principles apply, though.
There are no truly reliable storage mechanisms for large quantities of digital data, only storage mechanisms cheap enough that you can duplicate your way to reliability.
Buy three SATA drives and make multiple copies, and store them separately. Hard drives are dirt cheap now.
The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50
You're talking about archiving, which I assume means permanent to long term (10 years or more) storage. I wouldn't trust flash, writable optical media, or rotational media for that time frame. This is what tapes were built for. Say LTO3 will let you put 10 projects on a tape, and new tapes are ~$30 each. The initial outlay isn't super-cheap ($600-1k for the drive), but you'll make up for it in the long run.
Why not go with an online storage solution such as Amazon S3 and let them be your backup and not worry about doing it yourself. I know that you can ship them a hard drive so you don't need to spend time uploading data.
Oh, no. I've been in this argument before.
Ok, so Seagate sells a USB 3.0 SATA drive where the adaptor can be removed and used for other drives. Get one of those, and when if fills up, just start buying 1TB laptop SATA drives. They're small, cheap-ish, and more impact-resistant than their desktop counterparts.
110100 1101000 1101000 1100110 0 1101111 1101000 1100011 1
1: Current online storage
2: online backups (live, hourly, daily, whatever...): a backup drive ready to take the place of the online storage at any time
3: offline backups
every month:
2 becomes 3
1 becomes 2
either 3 becomes 1 or new drive becomes 1
You can't argue with Tape. It's been proven to last since the 1960's if kept in a climate controlled space (dry/cool). Just make sure to keep a spare tape drive handy (just ask NASA), because spare parts for 40 year old tape drives are surprisingly difficult to locate.
Optical isn't even close, assuming you're talking burned discs. Taiyo Yuden claims a 70 year shelf life, but they have only been around for what, 8 years tops?
Hard drives are an option if you've built a redundant array, but even with that you're still going to be out of luck if you burn up your raid controller.
In this current climate, what is the 'best' method of archiving these? Spinny magnets? Solid state drives? USB? Tape? Blu-ray?
If the data is important why settle for one type of media? At least external HD and tape, maybe external SSD too. Move to newer media periodically.
/src and /doc hierarchies and only back these up. I don't bother with operating systems and applications, they can simply be reinstalled.
My QIC-80 tapes from the 90s are probably unreadable by now. However their contents were moved from tape to CD to DVD over the years. My backups just sort of accumulate and grow over the years. Its practical for me to do this since I religiously keep things in
Punch cards made out of steel. As long as they don't deteriorate, you're good to go.
Last I looked into this, the best format in terms of reliability was magneto-optical. It heats up the disc with a laser before the magnetic bits are able to be manipulated, so it's unlikely to be corrupted by only magnetic interference or only light/heat. You can get a 9GB rewritable disc or 30GB write-once for ~$50.
There's also tape, which has massive capacities, but every anecdote about tape I've heard ends in "but the tape backup was partially/completely corrupted". Make two tape backups from the source (not copied from one tape to the other) using different technologies from different manufacturers if you absolutely must use tape. Keep one copy on-site for a few days if you'd normally ship backups to a storage center, so that you may not be required to recall the truck if the server explodes and the on-hand copy works.
Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
Carve the files onto titanium plates and store them in an underground bunker somewhere with little seismic activity.
This signature can save you $400 on your car insurance!
If you have more than a few of these projects, SSDs are not yet a good choice for backups. You say price is no issue, but I doubt you'd want to buy 50 SSDs at their current prices. I'd suggest a few "spinny magnets," perhaps in an array if you need more apce than a couple of terabytes. Pros: low cost per terabyte, reasonable transfer speed, decent reliability, easy to implement as a tried and true technology. And of course for added safety, mirror the drive/drives to a second set. USB (if you are refering to removable thumb drives) would not be your best choice, though tape might be worth considering, especially as a secondary backup. I know nothing about Blu-ray, so I won't comment, though the capacity of the disks is a little low, isn't it?. Personally, I'd go with redundant spinny magnets to prevent having your collection on multiple removable drives/discs/whaetevers that can be lost.
Hook up a redundant raid array, or two arrays, put them in a safe place, forget it. Tapes or a portable HD array to be taken off-site to guard destruction against fires, tsunamis, tornados, hurricanes, Godzilla, and bombs. How much data you have, and how frequently you will add to the collection, are factors that need to be considered but aren't mentioned here. My suggestion assumes that you will backup frequently and have a lot of your 40-60GB projects. Less data or less need to back it up might steer you towards something else.
This is a hacked account, for which the owner can not be held responsible.
if money is not an object, dump it all on standard HDD and burn discs for backup. i dont know how long you plan to keep it but relevant quote
"How long will a Blu-ray Disc last?
It is expected to last 30 years or more when stored at room temperature. The optimum temperature is 68F, and the optimum relative humidity is 40%."
from relevant source http://www.tapestockonline.com/sotdfubldibd.html
It doesn't really matter what you choose to backup to. Just make sure you have multiple copies (and are stored in multiple locations) and also look at having some kind of corruption method built in that you can use should you be unable to get the data back (see http://en.wikipedia.org/wiki/Parchive). Also if your going the cd/dvd/blueray method make sure you verify the burns in a separate drive than you did the burn in (different manufacture also helps, md5sums of files is fine)? And make sure you store the media is a cool dry spot that doesn't see the light of day (http://en.wikipedia.org/wiki/Humidor ??) It will take a little extra time but it's well worth it if your data is indeed important.. using a offsite backup service is the easy way if you can schedule how often you will want data offsite
Put your important work on paper.
It's been proven to last long and can contain quite a lot of information.
what you do, it'll all be lost in the next giant solar flare that gets shot at earth, or the next EMP attack. Nothing will survive in the way of computer equipment to be able to read it.
Forgot what it is called, but it is where multi-colored lines and patterns are drawn out on a sheet of paper/fabric/other material. This kind of storage as I recall can do up to several terabytes (maybe up to 50TB?).
The real answer is ... hire someone that knows what they are doing, as by asking the question you clearly don't.
Yes, thats a shitty answer that you're not going to like, but its the right answer.
The longer answer is ... you back it up int he same place you back all your important data up. Which could be done any number of ways.
Spinning platters is fine if you maintain them, as is every other method of data access under the sun.
Stop trying to stick the data in some sort of long term storage and just keep all your data active, as YOU MOVE to new storage mediums, you move ALL your data with you at the same time. So you are always using current technology and worrying about pulling those bits off something that is hard to find in 10 years won't be an issue because you'll not be using something hard to find in 10 years, you'll be using whatever is popular in 10 years.
This is really easy to accomplish.
You have server A and server B. You work on server A, its close to you, has redundant storage and fast access ... and automatically syncs to server B, which is also full of redundant storage and several thousand miles away from you for disaster recovery purposes.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
How long?
What is good for a decade may not be good for a century, and vice versa.
For millenium+ archives, nothing beats punch cards.
"To those who are overly cautious, everything is impossible. "
The key to data protection is risk mitigation. Depending on how important your data is, you should probably consider employing multiple methods of protection, such as a Disk or SSD based copy with a Tape or Optical based copy.
Personally, I'd keep a near-online copy by means of an External Drive or NAS device which can be powered down if necessary, but if you want to go further you could lock that in a fire-resistant safe/filing cabinet, but you should definitely have another copy offsite somewhere.
You could even use an online storage provider? Let them worry about maintaining the hardware? But you still need a second (offline, offsite) copy, imho.
Multiple copies
Multiple formats
Multiple locations
If your data is truly worth something to you, this is the only and best approach. LTO-5, BD-R, and RAID are not bad ideas.
Trayless SATA - http://www.newegg.com/Product/Product.aspx?Item=N82E16817998041 - This isn't the exact brand I used, but this is the style. Do some comparison shopping. The case I use for each drive is the ADIDT HS-1 for 3.5" HD. I bought them off ebay for about half of newegg's price. I couldn't find them listed at the moment on ebay, but there are plenty of hits on the web.. hit google and you'll see the pics.. assorted colors. They're stackable too and have spaces for labels. This is a strong case -- it takes two fingers for me to open the snap. I also print numeric tags with a label maker to stick on the drive for identification in the corner. 1 TB hard drives - http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=100007603%20600003269&IsNodeId=1&name=1TB%20and%20higher - my pick is the Western Digital Green drives.. read up on their soft seek technology which made them the quietest drive at the time I researched them. They come in consumer and RAID versions. The consumer version works well for both applications and costs less. For the cost savings using this method, you can double up in drives which is a given for storing any data -- always have at minimum two copies. Because they're just plain drives, you won't need special hardware to read them if your PC is destroyed by natural disaster or stolen. Store one set off-site... safe-deposit box works good. Encryption is a plus http://www.truecrypt.org/.
Spinny magnets? Solid state drives? USB? Tape? Blu-ray?
One of these things is not like the others. USB is not a way to store your data: it's a way to transfer it.
On the original topic, though: since you said that price isn't an object, whatever you go with, get enough of it to store multiple copies. Your best storage-certainty per dollar is probably from hard drives for small volumes, and tape for large volumes (since you need to pay for the tape drive).
"Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it" - Linus Torvalds
--
BMO
I'll second the 2.5" (laptop) hard drives -- I have seen them take a fall to the floor while powered up and survived. They are extremely durable. My pick is the WD Passport (run the toolkit to remove the virtual driver / backup disk and its as close as you can get to a 'plain drive' these days without the need for drivers or other junk / bells / whistles). It's good in USB 2.0 or 3.0. 3.0 is backwards compatible with 2.0. The 2.0 is $10 less expensive and the cable is a bit lighter. A USB 3.0 cable is comparable to Ethernet in feel.
If price is not an issue, a great solution is to go with a data-deduplication device (such as EMC DataDomain or IBM Protectier). If you were to host one unit in your basement and the other in coloc environment far from your home, you could setup replication and have a very reliable archive. Coloc of a 1U device can be quite cheap, I have one of them for which I pay less than 100$ a month.
If you have a smaller budget, then the best cost-benefit is still found on tape, and it can even work in case of network disruption. Like Andrew Tannenbaum said: "Never underestimate the bandwidth of a station wagon full of tapes". A single LTO-5 tape is very cheap (50-60$) and can store 1.5TB (can easily double that with dedup).
There are other interesting technologies out there, such as MAID, which you can use as a VTL with a good backup software to maintain a reliable archive, however cheap disks are cheap and in a MAID configuration they might not last as long as typical disks because of the on/off behavior.
lucm, indeed.
You can always rebuild it if you have to from proper printouts. Get them professionally published to archive quality. Then to save your actual digital stuff, rotational disk archiving. No digital storage medium will be 100% effective so use a Raid array and plan to upgrade it every 3-4 years to a new array of better/bigger drives. Even then though, your absolute best best is that first part, then put them into archival storage in multiple locations.
I guess this isn't a very popular suggestion. And you seemed to imply you wanted a local archive for your data, something you do yourself.
I would just a large iSCSI NAS. 2TB Drives are really cheap these days. FreeNAS even lets you flag a drive as a hotspare so you don't have to as much about failures.
Then back this NAS up to at least two online storage services. Make sure they're not both the same thing on the back end (like amazon's S3). Actually Carbonite personal can't distinguish iSCSI from a local drive and is unlimited storage for personal use. I'm sure that violates some terms some where but technically it's possible. Pick another high capacity online service for redundancy.
Also, encrypt the data locally *before* it's uploaded (it's just a good idea).
You didn't say how much total data you have to archive or how fast if at all it is growing nor how often you would need to access it. I have seen amateurs making 40TB storage servers from component parts. Honestly I can't think of a reason to go with anything other large capacity drives. I assume 2TB drives don't have any where to go but down in price.
"UNIX is very simple, it just needs a genius to understand its simplicity." -Dennis Ritchie
Why are the OPs in-thread questions hidden? And what's up with bashing him for asking questions? I took the time to answer him/her twice. To the person that did that you make me sick. I hope he/she can find the reply since it met the same fate otherwise this site is a waste of bandwidth.
If all you want is a by-the-megabyte file corruption check, go with BitTorrent. Create a .torrent of each project directory. You can fill the tracker field with some bogus server name, say http://127.0.0.1./ The beauty of a BitTorrent hash file is that you can pinpoint exactly where a file error occurs in a file, give and take a megabyte or whatever the file chunk you set for your BitTorrent file. This is unlike a ordinary md5 or sha1 check sum where all you know is whether a file is corrupted or not
It does look like one though, I admit.
Some time ago I read about the BackBlaze box here on slashdot. Essentially it's a 4U server chassis design that holds 45 LFF SATA drives and a server motherboard, plus the requisite connector bits and power and so on. BackBlaze is a storage provider that offers some online storage service and they designed the chassis to do high-density storage and hired a company, Protocase, to build it. BackBlaze doesn't sell servers, or server designs. They designed it because they needed it and shared it in the hope others would give back design improvements.
BackBlaze open-sourced the design and authorized Protocase to sell it. I learned about this when I followed up on the story with Protocase because I'm in the server trade and the storage density was intriguing. We went back and forth but I never bought the thing.
Purely by coincidence I got an email from Protocase just today. They're selling the thing now as a fully built server with everything you need (motherboard, processor, PSU, expanders, drive controllers, etc) -- except drives now for $5395.00 (1-4 units) and $4995.00 (5-9 units). Their website won't sell it, you have to contact lpodgursky@protocase.com via email for how to buy this because they're not geeks like us - they bend sheet metal for a living. At the time of the slashdot story this would store 67TB, but nowadays it's twice that. 3TB drives now cost $120, which would be $10,800 roughly for 135TB raw or probably 110TB usable - which puts it at $100 per served terabyte. Some folks would consider that a bargain. You'll want the 10Gbps links as that much volume will be link constrained for volume migrations. For storage density that's 1.35PB (raw) per rack, which is about as good as it gets right now. Bring cash or AmEx because Protocase is a tiny company and can't offer terms for new customers.
Of course for stuff that's commercially valuable that much data would cost a lot to recreate. I would probably want two of these at least, and store multiple copies on each one. Advances in HDD density should take care of expansion needs and migration needs if your data is currently less than 50TB. For software look into OpenFiler, which is free to use and has commercial support available.
This is not an advertisement. I don't work for any of these people. I don't care if you buy this thing. But if it was my money and my data and it was worth $50K or more... I'd buy several of these and find some geographically diverse locations to put them and devise a strategy for replicating and migrating my data as the hardware grew stale.
So as long as I'm posting this... to totally sexy this up with automatically tiered storage for performance I'd add a couple Fusion-IO IODrive Octals per unit with Fusion-IO's directCache software to front this storage with 10TB of SSD cache per 135TB of slow SATA disk. That should get you up to over 1M 512B iops per node if you've stepped up to Infiniband QDR to handle the bandwidth. And I don't work with them either. This last bit will cost several times all of the rest of it. Probably layer lustre file system on top of that for large volume needs. If you need less volume, look into drobo.
I've already gone overlength for this post, so I may as well go completely nuts. So here's some of Lewis Carroll's "Alice in Wonderland":
* "Just the place for a Snark!" the Bellman cried,
As he landed his crew with care;
Supporting each man on the top of the tide
By a finger entwined in his hair.
o Fit the First : The Landing
* "Just the place for a Snark! I have said it twice:
That alone should encourage the crew.
Just the place for a Snark! I have said it thrice:
What I tell you three times is true."
"Three times" is a good rule for data. If you put data in three disparate places it's less likely to be lost. Alice in Wonderland is a great reference manual for just about everything. The Reverend Dodgson was a wise man.
Help stamp out iliturcy.
I know it is a PITA, but duplicating across pairs of hard drives is cheap per GB, and allows you to move the data to an offsite location / *SNL Al Gore Voice* lockbox. I have ~2TB of video project data that I store using this method.
DLT technology is proven to work for 30+ years, LTO should be comparable but is not proven (ie. it has not existed for 30+ years).
Most modern writable optical discs (CD, DVD, BlueRay) are usually not sealed and do not use non corrosive material (gold) as data layer, they may or may not keep your data for more than 5 years. Harddisks will keep your data but may have some issues with bearings.
Paper have proven to be effective long time storage when kept properly but data density is too low.
I got an old Dell poweredge tower ($150) and put ubuntu and samba on it (free). Bought a cheap adaptec sata raid card ($65, took some searching) and setup a terabyte raid array (3 500gb drives at, then, $85/pop). I use it to host ghost backups of my desktop, and my and my girlfriends laptops. A raid5 array (and other configuration) means I get an email when there's a failed drive and I can simply replace it (now a drive is about $50). Remember, raid is not backup in itself. So I took it a step further and used a portable drive and ghosts offsite backup feature. So I hook the drive up to my desktop on my first day off work, Ghost backs up existing backups to it and then I keep it in my bag I take to work. So should my house burn down, I have some selected data (software projects, pictures, video, and music) with minimal losses. No single backup method should be considered best. It's a multi-faceted solution that takes some regular, dedicated work on your part.
Chewbacon
The Bible is like Wikipedia: written by a bunch of people and verifiable by questionable sources.
OP: "If I remove the price issue (my data is important to me), does this change the choice?"
ME: If price isn't an issue then you don't choose one, you choose them all.
It can hold all of your data, and it will withstand the test of time.
Depending on the period you want too keep it.
Backup to multiple destinations:
- external HDD/disks/tapes - initial cost, plus some cost to refresh it from time to time
- Online storage (Crashplan, SpiderOak, Amazon S3....) - will incur a monthly/yearly cost but it' usually very reliable.
I, for one, welcome our data cloud overlords.
http://www.freenas.org/ You can use several inexpensive (and not reliable) disks to be used by the ZFS (software-raid) filesystem. Easy to upgrade/replace/manage. Online storage seems the second best alternative for now , IMHO...
The first question you need to ask is who you're saving these projects for. The second question is you need to ask is whether or not you need every bit of data in them.
First question: If you're saving these files for a client, just increase your price to cover the cost of a RAID NAS (or two) and cloud backup--have the storage fee as a line item on your invoice. If they're for friends, family, or people who can't pay, put a dollar value on your own attachment to the projects, then spend that much. And no more.
Second question: Do you need every bit of data? Can you flatten some of those PSDs? Reduce some of those image sizes? Save some of those tiffs as jpgs? Heck, for some projects I'll bet you can get away with just exporting a couple PDFs (high quality print-ready, high quality web-ready, low quality web ready) and trashing all your IDDs.
My personal approach would be three fold:
Short term: save the projects, as is, in the same place as all my vital data--redundancy and everything.
Midterm: export PDFs, save the PDFs with all my vital data, and move the projects to a less critical set of hard drives.
Long term: Move the PDFs to a less critical set of hard drives and let the original project files die from bit rot on whatever antiquated hardware you have running in your closet.
This is my approach for everything--including the extremely nice family history photo albums I made in indesign. No one except me is ever going to want the original IDDs or PSDs. All anyone else will want are PDFs, text, and jpgs.
ALL OF THEM.
But yeah, tapes is normally what people are fans of. Ideally you have 2-3 backups no matter what though.
There is no such thing as too many backups.
Paperback, a printer and some paper: http://www.ollydbg.de/Paperbak/index.html#1
Ok, your projects are about 50GB each, so you can fit about 20 of them per Terabyte. How many of them do you want to keep? If they're something that you generate 100 of them per day, all year, you're looking at a much different solution than if a project takes you a month to tweak all the pixels lovingly by hand.
For a few Terabytes, just use 3.5" hard disk, make backups, keep one copy offsite. If you want to keep 10 or so TB handy and online, maybe you'll want to do a RAID thing, or maybe you just want JBOD, and 128-256GB of SSD for the project you're working on right now (but you're still copying it to hard disk once it's baked.)
If you want to keep much more than that online, you'll need to think about fancier storage architectures, and more money. You can go with NAS (Network Attached Storage, a bit pricier, not much faster, high-density disks), or you can go with SAN (Storage Area Networks, much more expensive, blazingly fast, very large.) If you're the bureaucrats who run our IT department, you understand how to support SAN at $8000/TB, and don't have a clue how to support NAS at more like $100/TB, which is ok if you want a big blazingly fast database system, and way out of line if you want to keep a lot of log files that will be Write Once, Read Never (well, Hardly Ever.)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Lots has been written on the subject of archiving, and with lots of valuable and eventually irreplaceable data, it would pay BIG dividends to read a few books and look at some of the companies that manage data for their living.
Others here have noted the variables with respect to media, hardware & software and the fact that over time they all change and eventually become obsolete. Then comes the factors of where you store it and how many places do you store duplicates in to prevent fire, flood or whatever war from wiping the cache of data out.
That is the "best" method if you have unlimited money. Laser cut titanium punch cards will last for thousand and thousand and thousands of years. Go for it!
Print many copies of the data to microfilm (micro fiche) and properly store the copies in various safe offsite facilities.
in the group is palpable tonight. Full moon?
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
I think the bottom line is that no medium is bulletproof. If you really care about the data and money is no object then a combination of at least two different mediums is the way to go.
Aside from the usual suspects like tape and HDD I'd suggest looking at flash memory. Expensive per GB but also not prone to mechanical problems. Most flash memory states data retention for 10 years, but it is a little bit more complicated than that. Every time you write data to a flash memory device it "refreshes" and the 10 year counter for that data starts again. To be safe you should probably be imaging and re-writing the flash every year or two.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
I can just see some poor schmuck holding paper tape up to the light, squinting, and reading into a mike attached to a computer:
one
oh
oh
one, no oh
oh, darn it, one
one
oh
oh
one, uh oh
(Calculating in my head, my memory is about 2 mm/byte, so a terabyte (base 10 tera) would be, erm, about 2 gigameters long.)
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
Hardware gets cheaper and better by the minute. That means your choices are degrading your chances to recover archived information as soon as it enters storage. It's a paradox. I have no answer to that; I've been bitten by storage on Apple ][+ floppies which can't been read on any current hardware I've got. Hopefully, there's an aftermarket for people who can afford to read the obsolete hardware of the past and transfer it to the nonexistent hardware of the future. Maybe there's a standard that won't be intentionally subverted by market forces (emphasis on force), but I dunno what it is. Pray that all that expensive data remains decryptable, if its encrypted. Your best bet may be to pay for redundancy at every weak point in your system.
``Tension, apprehension & dissension have begun!'' - Duffy Wyg&, in Alfred Bester's _The Demolished Man_
freeze your whole system in carbonite, should be good for a few hundred thou, obviously having copied to a new system 1st, don't forget that bit!
I was going to say, data domain might be a nice option since it is cheaper then EMC..... but appearntly EMC bought it allready ;)
It's basicly the same, just the cheap(er) solution then the big EMC SAS solutions.
Do note, you will need DD's own disks, since they run there own little firmware wich is needed. (I smell $$)
I would suggest laser-etched stone tablets buried in clay sediment with a religiously depicted stone monolith above them that would be likely to become a historical monument. If you're talking seriously long times, I mean.
I'm seeing a lot of really goofy suggestions in here. I'm going to make my own. First, let me say that my last job was to create massive image archives sourced from disparate media, and store them, permanently. Massive, as in 30tb a year. (maybe not that massive, but we were a tiny company, with a matching budget).
First, let me tell you what won't work. Optical media. Just DON'T. It's unreliable, slow and generally a pain in the ass. I worked at a place that burned 150 CDs a day for distribution, we had consistent failure rates within 20 days of 50%. Granted, that's using the cheapest possible media, but that's still awful. Further our "archive" had thousands of discs in it, was stored well, and as a whole, had a 41% failure rate over 10 years. Optical media is crap for long term storage.
Something else that won't work, TAPE. I know, heresy. But listen for a minute... do you know anything about tape? Ever used it? No? Then don't touch it, unless you plan to hire someone that is an expert to build out the system and keep it running. Were you planning to hire a full time systems manager? I didn't think so. Alternately, if you happen to have experience with tape, hell, use it. You can't beat the density or reliability.
Now, a suggestion that does work. Build your own NAS (or buy one if you don't have the chops to build it). You ought to be able to build/buy a 5tb array for under 3k, give or take. It will quietly hum along in the closet doing it's thing for pretty much the next few years. After 3 years, start a swap program to replace each and every hard drive. Doing this all at once allows you to store the old raid in cold storage (box it up and stick it in the corner). Doing this at the rate of one drive per month allows you to absorb the costs a little easier. Continue forever.
Now, if you are really nuts, and you actually think your data is valuable (you know, like you can trade it for money at some point), then you build out the NAS, order three of them, and keep one at your mom's house (or wherever), then you buy co-lo rack space and put the third unit (did I mention you need 3?) in there and sync all three as often as you can afford the bandwidth. This is, for all intents and purposes, how google backs up data. 3 systems, in 3 locations, each with a complete copy of the data. It's not exactly CHEAP, but neither is redoing all that work.
I'm going to leave out suggestions like using a kodak image writer to burn the images to microfilm that is digitally indexed. Why, because you don't know the first thing about a system like that, and because you want "backups" not permanent archives. Also, you can't afford this method. I'll also skip the really wacky shit, like using BD discs, or SSD arrays (in the terrabyte range? Fuck off$$$), or anything that involves the clouds.
Storing relatively large groups of data has been dirt cheap and easy for the last 5 or so years. Even before that it wasn't that hard. Don't invent a difficult system, or buy into enterprise gear. You don't need difficult, and you don't need a NAS that performs 100,000 IO ops a second with a fiber channel back haul. You need a couple of raided drives in a box in the corner, powered up pretty much all the time.
Oh yeah, and do you know the single greatest cause of HDD failure? Cold storage. TURN THE FUCKING THINGS ON, and leave them that way. They last MUCH, much longer. God it was hard to teach people that concept at my last company. No, putting the drives in a box in the storage locker does not make them last longer, in fact, they started failing the minute you unplugged them. (yes, I know, physical shock is probably actually higher up on the list, as is manufacturing defects, a little hyperbole never hurt anyone)
There are only 2 real solutions if you want real long term storage. The first is you become Linus and just dump it on a server and let the rest of the world back it up, and the second is you make your data a religious text somehow. Because those guys with translate it for centuries to come, even if it means sitting 50 dudes in a room for 3 years with nothing but a feather, ink, and parchment.
come to think of it, same thing.
Never trust an atom. They make up everything.
Delkin offers Archival Quality Disks tested via ISO 18927-2002 standards with an estimated life expectancy of 200 years. You can buy them at most any mid to high end photography supply store for around $10 each for 24GB of storage. I use them for all of my research projects.
Bad Panda! No Bamboo for you! In matters of importance ACs will not be responded to. Want to say something critical,OK
when going down the hard drive path. I have an old fc-storage unit with 16 4year+ old sata 400GB disks on which I use ZFS (zfs-fuse) on a zpool with 2 8-disk raidz3 vdevs. Gives me a total of about ~4TB on dirt-cheap hardware and I should be able to loose about 6 disks (3 per vdev) and have the ability to fix silent data corruption (or so, it's advertised). Safe enough (ignoring the possibility of non-drive hardware failures on the fc-storage unit) for my data (which is replicated to other places as well anyway) and for an acceptable budget. But if budget wasn't an issue, I'd swap it for tape any day.
...and not optical one?
I can't claim to have any professional knowledge of storage technologies but I would assume that somebody wanting to archive large amounts of data would usually prefer an optical over a magnetic storage medium.
Wouldn't laser disks, CD/DVD/BD-ROM (not -R!), ... pass the test of time much better than harddisks, tapes, ... ?
Magnetic storage is nice because it can be overwritten very easily - but when you want to archive data the reusability of the storage medium doesn't matter and this benefit does rather look like a curse.
Immune to light, UV radiation, humidity, high temperatures or magnetism
Backwards compatible with all common burners and readers including Blu-ray writers.
http://www.datatresordisc.eu/introduction-page-dtd.html
Sorry. It's not going to be popular with the Slashdot crowd, but dumping it onto a cloud storage service seems to make the most sense.
I keep seeing ads for a service that gives you 2TB backups for £5/mo. For that you get full redundancy, and let them worry about replacing broken hardware etc. Cheaper than buying the hardware yourself, over the first year, and bound to get cheaper.
If you're genuinely worried about your cloud provider having a catastrophe of some kind that their own fault-tolerance approach doesn't cover, then dupe your archive across two cloud storage suppliers.
Single-layer BD-R disks and 2TB SATA disks are currently matched at $0.04/GB. I will assume that the OP's data, which contains images, is already compressed sufficiently.
The BD-R disks have an unknown livespan and the OP's dataset would have to span 2-3 disks per project. The 2TB disks would hold multiple projects. There is an argument to be had that it is less expensive and more reliable to use the BD-R disks from the perspective of adding a single parity disk. The loss of any disk set would lose that project, not multiple projects. The data would be immediately offlined. As optical media tends not to fail by-disk, but by block, a filesystem like ZFS may be safest.
Contrast to the 2TB solution where you could use RAID-5, fill the array, & then offline for archival. For as long as the drives are online, there is an increased risk of failure. The loss of the array would lose multiple projects (~66 projects). Your individual drives are arguably more reliable, but you have fewer disks at a greater capacity, so the impact of a disk failure is much greater than with the more distributed BD-R model.
The benefits of hard disk storage here are ease-of-use and a better known MTBF. With fewer disks, it is easier & faster to online & verify your archives every so often. Even with ZFS-on-BDR, I'm not sure how well BDR disks will last over 10 years in a humidor, let alone on a random shelf.
If you want true longevity of archives, it isn't about finding a format that will not ever die, because they all can. It is about making copies. The brilliance of digital storage is perfect copies for an unlimited number of generations. So you take advantage of that. Have more than one backup, and test the backups. If one fails, make a new copy from the good data. Also, check the expected life of the backup medium, and replace it with new copies when it starts to age.
Along those lines, to keep it useful, make sure to convert it to a new format, when appropriate. This mostly means new backup media format, like if you are using LTO-5 now you'd probably move to a newer LTO, 7 or 8 or something in a decade, but also the data itself. Like say your data was images stored in TIFF format. Ok fine, but maybe convert them to PNG, since TIFF has less support these days and is becoming a relic in some ways. Some time in the future maybe you'd again convert it to a new image format.
The reason to do those things is otherwise in the far future, maybe you run in to a problem. Let's say it is 2060 and you need the data. It is all on LTO-5 tapes, however, and the world moved to a holographic storage medium 20 years earlier and a working LTO-5 drive is nearly impossible to find. Then you do get it off and the format is something no software reads anymore, so you have to break out an emulator to convert to a newer format, and then again, until you finally get to something you can use. If you can't do all that, then the data might as well be lost since you can't access it.
Keep plenty of copies and keep them up to date (and tested) and you are good. The only other thing is to protect them from damage. That means storing them some place that is secure against various things. A good fire safe would be a good idea, if it is really important maybe a vault some place else.
The thing to do is to ask yourself at what point do you stop caring about your data? That point does exist. Then design something that can withstand more or less anything below that.
As an example I helped my parents get backups set up for their business. They care about them only so long as the business survives. If the building burns down, or floods, nothing on them matters. So the backups are in a good safe, but on the premises. There are plenty of things that could result in data loss, but only the things that would also result in the business being lost and them not caring. On the other hand at work we have data that needs to survive pretty much everything short of a nuclear war. If our building goes down, if we all die, it still needs to be intact. So we take copies of it to another building, in to an underground vault. It would take a pretty catastrophic event to get it all, and that would be large enough that then it wouldn't matter.
I have had two disks in a RAID 10 fail me directly after each other, once. Guess which two? Yay!
Especially for backups where write speed is not much of an issue, you want RAID 6 or above. Never RAID 10.
Best for long-time archives is still stone - although kB/kg ratio makes this a rather unmovable storage ;)
Offsite is great, but nothing beats one secure backups - like two secure backups.
You could adopt the British Broadcasting Corporations approach to valuable archival holdings c.1972: thrown everything into a furnace.
My web domain.
1. Buy a single drive LTO-2 or 3 off Ebay. I see an 8 slot loader for sale for $270, but you really only need a single drive at 200-400Gb/tape. Let's say $150 for the drive and $100 for 5 tapes, another hundred bucks for your controller. Write all your projects to LTO-2 TWICE. Send a copy to a relatives house for safekeeping.
2. In about 4 years, check Ebay again and buy yourself an LTO4 drive for around the same dollars. Read in all your LTO2 and write it back to an LTO4 tape.
3. Rinse and repeat. You could probably get similar value out of a single 2nd hand LTO 3 drive now and wait for LTO5 to be cheap in 4-5 years.
You media has a shelf life of 25+ years if kept in a cool dry place. Just try and use really common software to cut the backup (like Linux Tar). The LTO standard says that the next generation has to be able to read and write the current generation tapes and must be able to read the previous generation tapes. New LTO standards come out every couple of years.
If it's just you, and just one computer, why not carbonite (or another reputable online storage service) AND ALSO 1 or two external usb hard drives, keeping one off site and periodically rotating them. Like, weekly.
If carbonite implodes, you have the hard drives. If you lose one hard drive, you have the other. If you lose both hard drives you have carbonite.
If you never lose the working data, then you aren't out TOO much, as carbonite is not TOO expensive and external usb hard drives are also reasonable.
No, I don't work for carbonite, just using them as a ubiquitous brand name like kleenex. I could have just as easily said dropbox. Oh, wait, no, not dropbox. Nevermind.
Flappinbooger isn't my real name
Yeah, it's not offline, but it's cheap online backup with unlimited data and support for seeding data by getting a hard drive sent via mail. Same for restoring large amounts of data. If you don't trust a single company, also add BlackBlaze.
Forgive my jaded perspective - respondents to this query are almost without exception fan boys of particular techie solutions. The real solution is far more commonsensical. I have every file I ever created from my 486 SX25 (circa 1990) onwards through a wealth of "blindingly fast' iterations of Pentium machines - my data, insofar as I ever wanted to keep it - is complete and has survived hard drive crashes, laptop and desktop thefts, floods, fire, misguided backup solutions involving CD and DVD, and the most malignant viruses the world felt able to bless me with. I have never had a raid array, a tape backup system - and I hasten to add - I spit in the general direction of your cloud solutions. Clouds are soft, vaporous and wholly subject to evaporation into nothingness. And I have never lost a file I wanted... The painfully obvious answer is - backup your hard drives - keep two copies (at least) of everything (preferably in different locations - I use family member backup and it has never failed) currently I have about 6TB of personal data - all backed up locally plus in at least one external location - this can be done with a handful of drives for an outlay of just a few hundred dollars - add a hot-swappable 3.5 inch drive dock or two and all your data is independent of all your computers. Just remember the rules: 1) The data on your computer is all temporary storage - never rely on it in the longer term - you should be able to reformat at the drop of a hat if you are doing it right 2) One copy is your interim (I don't care if I lose it) position 3) A 'cloud' copy is your 'this is convenient - but lets not pretend this is long term' solution for when you are traveling or using multiple computers in different locations 4) Two copies on site (on separate external drives) is your provisionally safe position (better still - keep one at the office) 4) Three copies with at least one in a remote location means you actually own your data - it is going nowhere without your say so and you will be able to bequeath your digital estate to those who are deserving (they in turn will be able to retain it - but only if they follow the rules above...) There! That's not so hard is it?
The OP failed to mention how many of these things needed archiving. A couple hundred? Redundant disks (don't even bother with RAID) spun up once a year or so. Ten thousand? Tape. No question. It has proven and well-known long-term reliability. But you must meet the media's storage requirements to achieve the media life specs. (If you can't do that, there are any number of off-site tape storage places that can.)
I would also use a program like ... one of the PAR utilities used for USENET
Thank you! That's a darn good idea!
I'm in the process of setting up an off-site backup system, and I've been a little paranoid about my backup getting corrupted. That would give me a little peace of mind.
http://www.snseurope.com/ We do large genome sequencing runs and processing of the raw data with data sets not unlike yours. As other people have said how long do you need to keep the data or be able to retrieve the original and re-do the data analysis?
My home media collection is around 20TB. For the longest time, I was dealing with redundancy by just maintaining a second, synchronized set of file servers. Each server has either a 16 port SAS controller or two 8 port controllers and a total of 16 storage drives in RAID6 including the hot spares. Each machine probably cost me $2500 to set up and I have four of them. And that's with getting the RAID cards and rack chassis from Ebay.
The truth is, the chance of having a non-recoverable error while doing a RAID rebuild is really, disgustingly high. Hopefully I wouldn't run into a scenario in which both servers in a synchronized pair had issues at the same time, but that wasn't giving me warm fuzzies the more I read about the reliability of RAID5/6 for large volumes.
So when it came time for me to upgrade my storage setup, I chose to go a different direction. I bought a lightly used LTO4 changer. Every tape holds 800GB and costs about $20. The tapes can be taken off site (shipped to my parents) and can grow to deal with whatever expansion of storage I make in the future. In the near term I will probably purchase a second LTO drive to store with my tapes, but I expect that I'll be in a much better place for dealing with my needs for at least the next several years.
It's not a solution for everyone, but it was the right move for me.
-- I wanna decide who lives and who dies - Crow T. Robot, MST3K
Spinning disks consume power. Archival storage on HDD is unproven as HDDs are intended to be in use. The correct choice is tape - it has the lowest energy consumption, the environmental impact of manufacturing is far lower than for HDDs and they consume no power while stored.
WinRAR? PAR? Seriously. It would also be tediously slow and be a micromanaging solution that only covers the files but fails to consider the need to also keep a working backup image of the system and a properly licensed version of InDesign to ensure he can still reopen the files.
Tapes make sense especially since they can easily be couriered for offsite backup. A well thought out disaster recovery plan must include offsite backup.
The submitter mentions 60GB InDesign files and sounds like a small profession or high end amatuer, so the extra cost of a RAID setup sounds like it would be a sensible investment.
Copying to new media sounds sensible but is not exactly the right answer, the correct solution is not only to make backups but to also to check that you are able to restore from backup.
On the one hand you say your project is large, but later in the sentence you say it's 60 GB. So which is it? Is it large or is it 60 GB?
My HD video projects are regularly 400 - 500 GB.
Personally I double backup on hard drives. My HP desktop has a built-in hard drive dock that makes it relatively painless to backup.
I agree -- tape drives are perfect for backups. Like someone mentioned, tar volumes from the 1970s are readable on tapes today.
Tapes are an ideal backup medium, provided you use more than one tape for archiving, and periodically go through and recopy files to new media every couple years or so. Newer tape drives offer WORM capability, so data can only be destroyed, not tampered with.
However, why I mention tapes secondary is that they are so expensive for meaningful capacity. Yes, you can buy tapes with less capacity cheaper, but there is a point where you are better off with multiple hard disks than trying to copy an archive onto 50-100 tapes. Same with optical media.
Take a LTO-5 drive, which is par for the course, and has enough storage capacity to be useful. It costs about $2500.00. However, it needs a SAS card, and it also needs I/O. Similar to old CD-Rs, a tape that doesn't get enough data streaming to it starts shoe-shining, which jumps the chance of errors and adds considerable wear to the heads and the tape. So, the machine that tape drives need to be attached to either has to be fairly high end, or a dedicated machine just for moving stuff to tape with no other functions.
If you can afford tape, it can be argued as the best backup media out there. However, most people can't, so external HDDs (laptop drives are better as they do not require power supplies) are the second best choice. They are nowhere near perfect, but for those who can't afford a new tape drive, are pretty much the only game in town for large files.
I have a few friends who own print design agencies. Here's how they do it (I asked them last year when I was setting up project storage for my company):
- A few do the old-school library checkout system and get the drive from storage and use it with their desktop/laptop. Most often they use USB drives.
- The more sophisticated ones have a multi drive ESATA box and request that a particular archive be put online. An admin gets the drive and mounts it as needed. Live projects are stored online. Backups are done to another hard drive.
- The most sophisticated have a big old NAS or file server and just leave everything online, and back up to HD.
The second option is really the most popular.
-- $G
It's probably already in the comments here, but a lot of it was tl;dr. I do like slaker's LTO approach but would also recommend keeping a dedicated server backup system attached to a NAS device or something similar. Right now I don't have that much data, only about 1TB so I keep two external HDD's and any small documents that I'd definitely want to keep I use dropbox to store away. I may not be the best source for personal storage, but have been working in an enterprise environment for a few years now and a tape + NAS backup system has suited us quite well.
Hard drives are the cheapest storage. Put some Terabyte drives in a computer and add a removable drive slot. Put your data on the drives, RAID is a waste of time. SATA is fast enough for most people.
If this is archival data that does not change, create a schedule where you back it up once a month, once a quarter, or annually depending on value. The backup will be to a drive you put in and then remove for offline storage. You can so some fancy Hard-linking of backups to maintain versioning, or you can encrypt your backups. The most important part is that they are in two locations, live on your server, and on an offline hard drive. You may consider hashing data to check for corruption.
This setup is fast, it's easy to access, it's not expensive, and it provides robust data security. You should reevaluate in 5 years, purge unnecessary data, or see if better options have appeared.
Check out my site, I specialize in this sort of setup.
Cheap storage VM.
If you want something that extremely stable and will last for a hundred years then you want data glyphs. You use a printer, paper, and a flatbed scanner or hi res camera and you have a viable backup solution. The most common data glyph is the bar code.
As with everything there are downsides:
- You need to use a quality paper and ink if your backing up for the long hall
- Printing takes time especially if you're using
- Storing a ream of paper for each back up
Below is list of some glyph formats. There use to be a site for a full Xerox solution but I think they licensed it out to another company.
http://www.adams1.com/stack.html
You say things that offend me and I can deal with it. Can you?
If you have a NAS with multiple HDD then that'll work
Ramyphotography a portrait and wedding photography
Make lots of copies.
Hard drives are cheap, any important data gets copied onto a few (2-3) drives stored in different locations (I like bank safe deposit boxes, parents basement) are a good one.
I also keep my data on my main server, so it's included in every new backup, but I only have a few hundred gigs.
If cost is really no object, I would have added tape and filed them away too, but tape is much more expensive, and my data isn't _that_ important.
Offline, I would suggest a pair of HDD docks/hot-swap bays. SATA, with normal HDDs. Put them in a ZFS mirror and keep track of which ones go together. ZFS detects bit-rot, the mirror allows you to correct for it. Add in another backup layer as desired, for that size range, you're probably into LTO tape or multi-layer BD... Just make sure to store a checksum so you know if your data is good when you try to read it later. MD5 would probably work fine.
If you're talking ONLINE storage, I would still use HDDs and ZFS, but include a cron job that runs "zpool scrub" on the array to keep it checked for bit-rot. Check the logs and "zpool status" for errors, replace bad drives as they show up. Make sure to use mirrors or RAIDZ. For large HDDs (>1TB?) I'd go with RAIDZ2 or 3-way mirrors. The reconstruction time can fail a second disk...
Many commentators will mention tape drives for high volume/capacity data storage. My experience tells me not to trust tapes or tape drives.
Unless you have humidity controlled and air conditioned storage facility tapes are a bad idea. Still tapes fail. Specifications change. LTO 4 becomes LTO 5 and so on. They are notoriously prone to slightest jerks or shocks.
The best, cheapest option I found is the RAID 2 external SATA hard drives of 2TB from Western Digital. Keep two of these units...you have 4 hard drives with your data. That's how I store DPX files (each file is around 9MB, 24 files per second) from my feature film projects.
Tat Tvam Asi
True. Current tape formats are nice, provided that you have a reliable Grandfather/Father/Son overwrite policy, with appropriate verification. You can't expect too much of it, though. However, I have to admit that my point of view is coloured by repeated experience (in the '80s and '90s) of multi-volume backup sets proving to be worthless as a result of bad I/O in the middle of the set. However, with modern equipment YMMV.
RAID is a great system for reliability (so long as it isn't RAID-0) but can't be relied on as a backup at all, since it is constantly plugged in.
Amazon, BackBlaze, etc... have pretty decent services that would allow you to backup to their service and not have to worry about the details. You can TrueCrypt and parity-protect your files before uploading to protect against their service either snooping or corrupting your data. Heck, if you are really paranoid, you could upload to 2 or 3 such services for the same price as rolling your own or even have periodic hashing for consistency.
The only downside I can see here is that you need to have sufficient upload bandwidth (and a compliant ISP) for the volume of data that needs to be backed up. The upsides are manifest: probably cheaper, better tech, not having to worry about implementation details, wasting less of your time managing your creations and more time creating.
This is one of those (rare) cases where offloading to the cloud makes a ton of sense.
1. use multiple technologies - one backup on hard disks, one on optical, one on tape, one "somebody else's problem".
2. use standard formats - FAT32 and ZIP, exfs3 and tar.gz - avoid proprietary hardware & software, you want stuff which you can plug into Linux and read.
3. give yourself the ability to recover from both partial failure and complete failure of a single disk/tape.
4. redo your backups before you lose the ability to recover them.
Under 2 years? Multiple backups on hard disk, no question.
2-10 years? Hard disks with integrity checks of every drive every 6 months so you know you always have 2 good copies at all times.
10+ years? Same as above but migrate data to new media as it becomes available, OR plan on migrating the data to a proven archival storage in the future.
Proven archival storage available today:
Archival-quality optical media.
Archival-quality tape.
Archival paper - bits can be printed as graphics.
Archival microfilm or other "microscopic paper"-equivalents.
Proven archival storage available in 2021: ???
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
The gold standard for archiving (distinct from backup) is LTO-4 tape. I know this is expensive. I know that a lot of people go "ewww, tape, STFU grampa", but you should have a serious look at the efficiency of SAS tape drives, the simplicity of the solution for really large amounts of archival data, and the reliability of the medium.
-fb Everything not expressly forbidden is now mandatory.
They don't call it a REDUNDANT Array of INEXPENSIVE Disks for nothing.
You can buy 2TB for 80$ now. Buy 8 of them. You now have 16TB. Set up a mirrored RAID. You now have 8TB. Run it off a cheap low end PC (that has 4 SATA RAID of course). Cost you under 1000$ bucks.
Except for MOD (where the capacities did not keep up), archival tape is the only long-lived option, all other options either require regular maintenance or are unreliable.
You will find that this quality of tape is more expensive than HDDs per GB and that the drives run you a couple thousand dollars.
Also, keep in mind that you need to archive the software used to access the data in a form that you can run later, e.g. as VM images.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Here is *the* solution...
Print it out!
Buy a NAS that is large enough for you. Plug it in your network.
You said you had 40-60GB datasets, but not how many datasets you're backing up.
Tape may very well be the "best" solution at the moment, though hard drives actually make a lot of sense. I would consider a part/parity system with enough parity to rebuild the entire file (yes, 2x the storage space), with the resulting set copied to two separate hard drives. 2TB hard drives are fairly commonplace, and can be had for $80 each. So you've got 100GB (avg) backup sets, and you're making two copies. You should be able to write to the drives simultaneously using two eSATA docking stations (about $40 each) and get proper cases for the drives (~$5 each). You shouldn't be out more than $200 for 20 files backed up, or $10 per project. Might as well store them in two separate physical locations.
Finally, plan a migration path on a two-four year cycle. The migration should involve purchasing new media (presumably with 2x+ the initial storage density) and copying all of your files over. That will act as your bit-check, though in theory you could do the bit check without migration. Even with migration, the long term media cost should be less than $20-$25 per project, exclusive of the manpower to do the transfers.
Is it just my observation, or are there way too many stupid people in the world?
Tape was the way to go in 1972. Today, we use RAID arrays of hard drives to provide reliable off site managed backups at 10 cents per GB.
The description looks like that you want to store your projects 5-10 years (after all: would it still make sense to open them afterwards?). If this is true, DVDs or even Blueray make sense - or even cloud (but then use at least 2 independent providers and check often).
However, if you want to store them for more than 5-10 years, ask yourself first the question: How do you go to archive the programs that you need to open your projects? Open formats for the content, and open source for the programs would be a huge help. The you can think about the archive media for it.
I agree -- tape drives are perfect for backups. Like someone mentioned, tar volumes from the 1970s are readable on tapes today.
I just read a VHS archive made in 1987 and it worked fine. Funny that most of the commercials were for food or clothes. Only one for a car and one for aspirin.
Since when is a 40 GB project "large"??? If it fits on a single USB thumb drive it ain't large.
Less than 2 TB = fits on a single hard drive = small.
Less than 20 TB = fits on a single RAID = medium.
Once you're up into the hundreds of terabytes (e.g. backing up all source material for a feature film), then yes, you have issues.
There is a free download (that isn't exactly working) for those who purchased after June 6, 2011.
http://www.apple.com/macosx/uptodate/
all in all, it depends what your limitations and needs are....i can zip a file that is 60 gb to a small 10 gb, then break it off in chunks, and upload it using an automated upload to hotmail account software using imap, needing a few accounts, but doable....
A quick google search returns this: "LTO 4 is rated for 11200 end to end passes and 200 full read/write cycles". Doesn't sound very durable unless he buys several tapes and uses them sparingly. Hard drives are rated for magnitudes more use.
Thanks all for your suggestions. HDs at $100/TB is certainly hard to beat! My main machine is raid-1, with nightly backups to a raid-5 server, The annual copy to new discs, and an offsite backup is my current regime. As I'm not in the TB+ archive category this still seems to be most sensible way to proceed. As a side question, what would you pay /TB to know for sure, you'd never have to check your backups for integrity? $1E3/TB? $1E4/TB?
The Supermicro SC936E26-R1200B Storage Chassis holds 16 3.5" drives. (This same chassis comes in cheaper models, but the savings are not worth the compromises - learned that one the hard way.) Then get an appropriate motherboard, memory, processor, and an LSI MegaRAID SAS 9280-24i4e raid card. Set up your Seagate Barracuda XT drives in Raid 60 drive groups, say 8 drives to a raid 6 stripe You'll get up to 2 stripes, so with 6 data drives per stripe that's up to 12 data drives in the chassis. At 2 terabytes per drive that's 24 terabytes, or with the new 3TB drives, that's 36 terabytes of reasonable performance, cost effective, dead reliable storage. That's not enough? Add up to 6 additional chassis to plug into your LSI card, (no motherboard needed), for a total of nearly a quarter exabyte. If you want a different cost/performance ratio substitute 5200 rpm drives, or Hitachis or whatever, and/or make smaller or larger Raid 6 stripes.
Don't tell *anyone* where the thing is!
Social Credit would solve everything...
I'm a PhD student studying magnetism, and one thing I can say for sure is that DVD/BR media is not the way to go. Professionally printed media (the silver bottom) uses a stamp to make a mechanical impression, not unlike vinyl records. Once sealed, it last forever. Writable media uses a die, and unless you store your media at 0K, finite temperatures will cause the die to diffuse and the media become useless. This takes much less time than people think. Good disks will last 10 years, cheap ones only a few years. The problem is that it's impossible to tell anymore who is making the good disks, since all of the production lines get shared by many brands.
Alternatively, magnetic storage isn't that great either (tapes or HDDs). For both a HDD and tape, thermal fluctuations cause random data to be lost, but hard drives are designed to recover this data and correct it. If you pull your hard drive off line for several years, it doesn't have the opportunity to constantly scan itself and check for these errors, so never expect an unpowered hard drive to store data for long periods of time - they just are not designed to do this.
As previous users have pointed out, software raid is the only way to go. Hardware raid provides a single point of failure, and is really only suitable for high performance and short term reliability, not long term reliability.
Tape drives also have the same thermal fluctuations issue, but because the magnetic grains can be much larger (tapes have 1000's of times more surface area to store the same amount of data) they can go much longer. I would still "refresh" my tapes every year or two though.
Based on your requirements, I would suggest tape first, then a large software raid of HDDs. Anything else is just not safe!
Build 3 Petabyte servers using Backblaze's instructions, place 1 in your house, and find 2 offsite locations to place the remaining pods.
from my experience you should 4 external storage that separate from your PC, There are special DVD for such purpose as weel thx
http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/
Read
it's a bit hard to recommend backups for an unknown backup size ....
The Cloud - because you don't care if your apps and data are up in the air.
http://millenniata.com/ makes a DVD burner that will burn DVD's that last forever. I'm using it for my long term storage (stuff like family pictures, genealogy, journal scans, etc.) I did not see it listed here.