Ask Slashdot: Best Offline Storage Method For Large Archives?
An anonymous reader writes "I have a collection of large projects (Indesign files with associated images), which are typically 40GB to 60GB each. In this current climate, what is the 'best' method of archiving these? Spinny magnets? Solid state drives? USB? Tape? Blu-ray? All have pros and cons and price considerations. If I remove the price issue (my data is important to me), does this change the choice?"
For this project, we have multiple multi-terabyte (5-18 terabyte) datasets that need backup. We have online and offline strategies and the offline strategy is simply multiple, redundant copies on hard drives stored in static proof containers onsite and off site.
Hard drives are *very* cheap all things considered, are easy to store, take up very little physical space and if things go badly, restoring from them is faster than just about any other method. For datasets in the GB range, its a no-brainer to go with hard disks.
Visit Jonesblog and say hello.
You probably need to define "best". How long do you really want to keep them for, and in what sort of environment.
Traditionally the answer is tape, and probably will be in your case too for files of that size. Optical isn't proven enough (at least for the sizes your're talking about) to be trusted, and HDD's need to be run up fairly regularly to keep working.
Normal people worry me!
BD-R disks are an idea, and relatively inexpensive, but your best bang per buck would be large removable disks in the 2-3 TB range. The reason I state "disks" plural is for obvious reasons.
I would also use a program like WinRAR with a recovery record, or one of the PAR utilities used for USENET to store your files in. This way, you can tell if there was file corruption, and have a good chance of recovering from it.
For serious stuff where money is less of an issue, I'd consider a LTO-5 tape drive and multiple tapes. Tapes tend to last longer than HDDs because they have very few moving parts.
Don't forget to see about copying your archives to new media every couple years. It isn't uncommon to be able to pop a 10+ year old tape or HDD in and pull off the contents... but it isn't uncommon either to find the HDD clicking, or the tape full of hard errors.
Whoosh!
Screw tape... you pay $2,000 USD for the drive, $50+ per tape for a couple of hundred gigs. Go with bare drive external: Install a trayless SATA bay for 3.5" hard drives... this will run you $12. Buy some bare SATA drives.. these run $50 for 1TB and are available up to 3TB. I buy bare drive hard cases for about $3 each. My Intel ICH10R on-board RAID controller supports hot-swap -- so in effect it's a big 3.5" floppy.. that's right. If your tape drive breaks, you're out another two grand. This is far less expensive, faster, higher density, and random access. In addition, you can boot from it. Want RAID0? Install two trayless SATA bays for a total of $24 and back up in pairs.
Make sure it is hot pluggable with USB (if one exists yet) as both IDE and Scsi have changed many times with incompatible adapters and cables with different plugs. Odds are they will change again and be unreadable in a couple of years.
Dvd's have rot in which the metallic thin sheet peels off. They say it is based on UV light damage but I found a Gentoo cd under a dark bed in a blinded room from 2005 that is rotting away as we speak. So BluRay discs are out of the question.
Another slashdotter mentioned an external hard drive but magnetic interference from the Earth would erase it like my old audio tapes within a decade or two.
Whatever you choose make sure it is external as USB and Firewire like to remain backwards compatible and this makes it easy to share between machines. Something solid state is the best way to make sure the data remains secure. Or find an internet provider where you can upload it too if you do not mind paying per month or year.
http://saveie6.com/
Eternally Yours, The case for the development of a reliable repository for the preservation of personal digital objects.
http://explorer.cyberstreet.com/CET4970H-Peterson-Thesis.pdf
Depends on price: HDDs are crazy cheap, for the capacity; but untrustworthy. However, thanks to the cheapness, redundancy, preferably in multiple locations, periodic testing/copying to newer disks/etc. is fairly affordable. Make sure that you have(either manually, at the utility level, or at the FS level, hashes/checksums) and hope for the best. LTOs are rather more durable, having fewer moving parts in the storage media; but the cost of entry is substantially higher. All the same principles apply, though.
There are no truly reliable storage mechanisms for large quantities of digital data, only storage mechanisms cheap enough that you can duplicate your way to reliability.
Why not go with an online storage solution such as Amazon S3 and let them be your backup and not worry about doing it yourself. I know that you can ship them a hard drive so you don't need to spend time uploading data.
Papertape is the way to go. Not susceptible to magnetism at all. And it'll be easy to reinvent a way of reading it when the 4th Reich comes to power, as there will always be some way to tell a computer "yes, I can see a light, or no, I cannot see a light".
Sleep your way to a whiter smile...date a dentist!
1: Current online storage
2: online backups (live, hourly, daily, whatever...): a backup drive ready to take the place of the online storage at any time
3: offline backups
every month:
2 becomes 3
1 becomes 2
either 3 becomes 1 or new drive becomes 1
You can't argue with Tape. It's been proven to last since the 1960's if kept in a climate controlled space (dry/cool). Just make sure to keep a spare tape drive handy (just ask NASA), because spare parts for 40 year old tape drives are surprisingly difficult to locate.
Optical isn't even close, assuming you're talking burned discs. Taiyo Yuden claims a 70 year shelf life, but they have only been around for what, 8 years tops?
Hard drives are an option if you've built a redundant array, but even with that you're still going to be out of luck if you burn up your raid controller.
Punch cards made out of steel. As long as they don't deteriorate, you're good to go.
Last I looked into this, the best format in terms of reliability was magneto-optical. It heats up the disc with a laser before the magnetic bits are able to be manipulated, so it's unlikely to be corrupted by only magnetic interference or only light/heat. You can get a 9GB rewritable disc or 30GB write-once for ~$50.
There's also tape, which has massive capacities, but every anecdote about tape I've heard ends in "but the tape backup was partially/completely corrupted". Make two tape backups from the source (not copied from one tape to the other) using different technologies from different manufacturers if you absolutely must use tape. Keep one copy on-site for a few days if you'd normally ship backups to a storage center, so that you may not be required to recall the truck if the server explodes and the on-hand copy works.
Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
Carve the files onto titanium plates and store them in an underground bunker somewhere with little seismic activity.
This signature can save you $400 on your car insurance!
If you have more than a few of these projects, SSDs are not yet a good choice for backups. You say price is no issue, but I doubt you'd want to buy 50 SSDs at their current prices. I'd suggest a few "spinny magnets," perhaps in an array if you need more apce than a couple of terabytes. Pros: low cost per terabyte, reasonable transfer speed, decent reliability, easy to implement as a tried and true technology. And of course for added safety, mirror the drive/drives to a second set. USB (if you are refering to removable thumb drives) would not be your best choice, though tape might be worth considering, especially as a secondary backup. I know nothing about Blu-ray, so I won't comment, though the capacity of the disks is a little low, isn't it?. Personally, I'd go with redundant spinny magnets to prevent having your collection on multiple removable drives/discs/whaetevers that can be lost.
Hook up a redundant raid array, or two arrays, put them in a safe place, forget it. Tapes or a portable HD array to be taken off-site to guard destruction against fires, tsunamis, tornados, hurricanes, Godzilla, and bombs. How much data you have, and how frequently you will add to the collection, are factors that need to be considered but aren't mentioned here. My suggestion assumes that you will backup frequently and have a lot of your 40-60GB projects. Less data or less need to back it up might steer you towards something else.
This is a hacked account, for which the owner can not be held responsible.
Hard disk NAS storage would work best. Spinning disk has been around long enough to make it reliable and cheap. for 250$ you can get a really good NAS setup with 2 to 3 TB.
And some NAS devices have multiple drives and can be configured for RAID.
You are kidding right? Use 8-in floppies that can store 180k per disk.
Sorry, but gray text on gray background is making my eyes bleed.
what you do, it'll all be lost in the next giant solar flare that gets shot at earth, or the next EMP attack. Nothing will survive in the way of computer equipment to be able to read it.
Right, 1.2 MB used to be less reliable for me and the guy said that his data was important to him.
Everything I write is lies, read between the lines.
Print it
Like food, this sig will also pass
Can you imagine how long it would take to toss even a single gig of data onto 360k floppy disk? The funny part would be how much money you would waste by sitting in front of a computer switching disks every 60 seconds or so and then writing labels for them all, and sticking them on (straight of course or you have to carefully peel it off and put on a new one).
Lesse.. wolfram alpha says just over 4 weeks (assuming an 8 hour work day) http://www.wolframalpha.com/input/?i=60gib+%2F+360kib+*+60+seconds
For a total of 174,763 floppies. That would be a stack of floppies (http://www.wolframalpha.com/input/?i=174763+*+2mm) ~350 meters tall, or just taller (1.2x) than the Eiffel Tower.
Might be easier to just buy a bunch of TB hard disks and put them in a Raid 5 configuration (with hot spares of course) and be done with it.
"Freedom in the USA is not the ability to do what you want. It is the ability to stop others from doing what THEY want"
The real answer is ... hire someone that knows what they are doing, as by asking the question you clearly don't.
Yes, thats a shitty answer that you're not going to like, but its the right answer.
The longer answer is ... you back it up int he same place you back all your important data up. Which could be done any number of ways.
Spinning platters is fine if you maintain them, as is every other method of data access under the sun.
Stop trying to stick the data in some sort of long term storage and just keep all your data active, as YOU MOVE to new storage mediums, you move ALL your data with you at the same time. So you are always using current technology and worrying about pulling those bits off something that is hard to find in 10 years won't be an issue because you'll not be using something hard to find in 10 years, you'll be using whatever is popular in 10 years.
This is really easy to accomplish.
You have server A and server B. You work on server A, its close to you, has redundant storage and fast access ... and automatically syncs to server B, which is also full of redundant storage and several thousand miles away from you for disaster recovery purposes.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
How long?
What is good for a decade may not be good for a century, and vice versa.
For millenium+ archives, nothing beats punch cards.
"To those who are overly cautious, everything is impossible. "
The key to data protection is risk mitigation. Depending on how important your data is, you should probably consider employing multiple methods of protection, such as a Disk or SSD based copy with a Tape or Optical based copy.
Personally, I'd keep a near-online copy by means of an External Drive or NAS device which can be powered down if necessary, but if you want to go further you could lock that in a fire-resistant safe/filing cabinet, but you should definitely have another copy offsite somewhere.
You could even use an online storage provider? Let them worry about maintaining the hardware? But you still need a second (offline, offsite) copy, imho.
Trayless SATA - http://www.newegg.com/Product/Product.aspx?Item=N82E16817998041 - This isn't the exact brand I used, but this is the style. Do some comparison shopping. The case I use for each drive is the ADIDT HS-1 for 3.5" HD. I bought them off ebay for about half of newegg's price. I couldn't find them listed at the moment on ebay, but there are plenty of hits on the web.. hit google and you'll see the pics.. assorted colors. They're stackable too and have spaces for labels. This is a strong case -- it takes two fingers for me to open the snap. I also print numeric tags with a label maker to stick on the drive for identification in the corner. 1 TB hard drives - http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=100007603%20600003269&IsNodeId=1&name=1TB%20and%20higher - my pick is the Western Digital Green drives.. read up on their soft seek technology which made them the quietest drive at the time I researched them. They come in consumer and RAID versions. The consumer version works well for both applications and costs less. For the cost savings using this method, you can double up in drives which is a given for storing any data -- always have at minimum two copies. Because they're just plain drives, you won't need special hardware to read them if your PC is destroyed by natural disaster or stolen. Store one set off-site... safe-deposit box works good. Encryption is a plus http://www.truecrypt.org/.
"Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it" - Linus Torvalds
--
BMO
I'll second the 2.5" (laptop) hard drives -- I have seen them take a fall to the floor while powered up and survived. They are extremely durable. My pick is the WD Passport (run the toolkit to remove the virtual driver / backup disk and its as close as you can get to a 'plain drive' these days without the need for drivers or other junk / bells / whistles). It's good in USB 2.0 or 3.0. 3.0 is backwards compatible with 2.0. The 2.0 is $10 less expensive and the cable is a bit lighter. A USB 3.0 cable is comparable to Ethernet in feel.
If price is not an issue, a great solution is to go with a data-deduplication device (such as EMC DataDomain or IBM Protectier). If you were to host one unit in your basement and the other in coloc environment far from your home, you could setup replication and have a very reliable archive. Coloc of a 1U device can be quite cheap, I have one of them for which I pay less than 100$ a month.
If you have a smaller budget, then the best cost-benefit is still found on tape, and it can even work in case of network disruption. Like Andrew Tannenbaum said: "Never underestimate the bandwidth of a station wagon full of tapes". A single LTO-5 tape is very cheap (50-60$) and can store 1.5TB (can easily double that with dedup).
There are other interesting technologies out there, such as MAID, which you can use as a VTL with a good backup software to maintain a reliable archive, however cheap disks are cheap and in a MAID configuration they might not last as long as typical disks because of the on/off behavior.
lucm, indeed.
What is this "MB" you speak of? My 1541 drive stores 170KB per floppy. That should surely be enough for anyone.
The CB App. What's your 20?
You'd be surprised. Just for grins I restored my (circa 1989) QIC-80 tapes a year ago. No problems at all.
...That said, your point is totally valid. Multiple archive copies is the safe way to go. If you want to be even more secure, go with PAR. PAR or RAR recovery records will tell you when chunks are corrupt and can allow you to recover an uncorrupted copy even if both archives are damaged.
Optical, I've had both DVD and CD bitrot, even on the old Kodak 'gold' discs.
wow, those would be five really tiny floppies.
-- Flame me and I will happily flame you back. Bring it!
Don't forget to punch the floppy so you can use the back side!
I'll wait until they have it on Blu-Ray or at least Tivo
lucm, indeed.
No Cost consideration? 2 Servers with Raid 7 Arrays/Servers using enterprise drives of course. LTO 5 Stacker With Weekly Iron Mountain Pickups. This is the same setup I would suggest for my clients for Seismic Data. I'd be happy to set it up for you for $125k. Depending on your opinions and general level paranoia, we can discuss online backups, the pricey part always seems to be fat pipe.
I guess this isn't a very popular suggestion. And you seemed to imply you wanted a local archive for your data, something you do yourself.
I would just a large iSCSI NAS. 2TB Drives are really cheap these days. FreeNAS even lets you flag a drive as a hotspare so you don't have to as much about failures.
Then back this NAS up to at least two online storage services. Make sure they're not both the same thing on the back end (like amazon's S3). Actually Carbonite personal can't distinguish iSCSI from a local drive and is unlimited storage for personal use. I'm sure that violates some terms some where but technically it's possible. Pick another high capacity online service for redundancy.
Also, encrypt the data locally *before* it's uploaded (it's just a good idea).
You didn't say how much total data you have to archive or how fast if at all it is growing nor how often you would need to access it. I have seen amateurs making 40TB storage servers from component parts. Honestly I can't think of a reason to go with anything other large capacity drives. I assume 2TB drives don't have any where to go but down in price.
"UNIX is very simple, it just needs a genius to understand its simplicity." -Dennis Ritchie
Why are the OPs in-thread questions hidden? And what's up with bashing him for asking questions? I took the time to answer him/her twice. To the person that did that you make me sick. I hope he/she can find the reply since it met the same fate otherwise this site is a waste of bandwidth.
Only makes sense if you're doing small scale stuff, and (because you don't understand depreciation) will want to hang on to old drives.
Drobo is very flexible, but horribly slow.
Put it in the cloud! *waves arms like it's something mystical*
Seriously though, there is no great solution. Burned discs separate over time, there's not enough data on SSDs yet but it's not looking promising, platter drives are susceptible to radiation, tape to magnetic fields and degradation. HDD in triplicate, replace every 7-10 years is the "best" method right now. So despite being modded down, serkit is right. Hard drives.
Uh oh. MichaelKristopeit is on my side? I don't know how I should feel about that.
"Freedom in the USA is not the ability to do what you want. It is the ability to stop others from doing what THEY want"
If all you want is a by-the-megabyte file corruption check, go with BitTorrent. Create a .torrent of each project directory. You can fill the tracker field with some bogus server name, say http://127.0.0.1./ The beauty of a BitTorrent hash file is that you can pinpoint exactly where a file error occurs in a file, give and take a megabyte or whatever the file chunk you set for your BitTorrent file. This is unlike a ordinary md5 or sha1 check sum where all you know is whether a file is corrupted or not
It does look like one though, I admit.
Some time ago I read about the BackBlaze box here on slashdot. Essentially it's a 4U server chassis design that holds 45 LFF SATA drives and a server motherboard, plus the requisite connector bits and power and so on. BackBlaze is a storage provider that offers some online storage service and they designed the chassis to do high-density storage and hired a company, Protocase, to build it. BackBlaze doesn't sell servers, or server designs. They designed it because they needed it and shared it in the hope others would give back design improvements.
BackBlaze open-sourced the design and authorized Protocase to sell it. I learned about this when I followed up on the story with Protocase because I'm in the server trade and the storage density was intriguing. We went back and forth but I never bought the thing.
Purely by coincidence I got an email from Protocase just today. They're selling the thing now as a fully built server with everything you need (motherboard, processor, PSU, expanders, drive controllers, etc) -- except drives now for $5395.00 (1-4 units) and $4995.00 (5-9 units). Their website won't sell it, you have to contact lpodgursky@protocase.com via email for how to buy this because they're not geeks like us - they bend sheet metal for a living. At the time of the slashdot story this would store 67TB, but nowadays it's twice that. 3TB drives now cost $120, which would be $10,800 roughly for 135TB raw or probably 110TB usable - which puts it at $100 per served terabyte. Some folks would consider that a bargain. You'll want the 10Gbps links as that much volume will be link constrained for volume migrations. For storage density that's 1.35PB (raw) per rack, which is about as good as it gets right now. Bring cash or AmEx because Protocase is a tiny company and can't offer terms for new customers.
Of course for stuff that's commercially valuable that much data would cost a lot to recreate. I would probably want two of these at least, and store multiple copies on each one. Advances in HDD density should take care of expansion needs and migration needs if your data is currently less than 50TB. For software look into OpenFiler, which is free to use and has commercial support available.
This is not an advertisement. I don't work for any of these people. I don't care if you buy this thing. But if it was my money and my data and it was worth $50K or more... I'd buy several of these and find some geographically diverse locations to put them and devise a strategy for replicating and migrating my data as the hardware grew stale.
So as long as I'm posting this... to totally sexy this up with automatically tiered storage for performance I'd add a couple Fusion-IO IODrive Octals per unit with Fusion-IO's directCache software to front this storage with 10TB of SSD cache per 135TB of slow SATA disk. That should get you up to over 1M 512B iops per node if you've stepped up to Infiniband QDR to handle the bandwidth. And I don't work with them either. This last bit will cost several times all of the rest of it. Probably layer lustre file system on top of that for large volume needs. If you need less volume, look into drobo.
I've already gone overlength for this post, so I may as well go completely nuts. So here's some of Lewis Carroll's "Alice in Wonderland":
* "Just the place for a Snark!" the Bellman cried,
As he landed his crew with care;
Supporting each man on the top of the tide
By a finger entwined in his hair.
o Fit the First : The Landing
* "Just the place for a Snark! I have said it twice:
That alone should encourage the crew.
Just the place for a Snark! I have said it thrice:
What I tell you three times is true."
"Three times" is a good rule for data. If you put data in three disparate places it's less likely to be lost. Alice in Wonderland is a great reference manual for just about everything. The Reverend Dodgson was a wise man.
Help stamp out iliturcy.
This is exactly what I was going to recommend.
A lot of people assume that if one is going to store data on a hard drive then that drive must be powered up all the time. All reliability figures are based upon the assumption that the drives will be powered up and in use for most of a working day. However, if you only power up a drive when you need to store or retrieve data - data that is written only for archival purposes - then the drive could last a life-time.
In my system I use an external, dual-drive, eSATA connected, setup. (I like this one.) I only turn on the drive when I need to transfer files to it. When I don't need a drive in the dock, I put it in an anti-static bag with a desiccant packet (just as they came from the manufacturer), squeeze the whole thing into a slightly modified old VHS case (I cut out the things that go into the reel holes in the tape), and put it on the shelf - labeled, of course.
I prefer the dual dock so I can simply do a full-drive copy to make backups of my archive disks. At full eSATA speeds it doesn't take nearly as long or take up nearly as much real-world space as tape, and it is less expensive as well.
I know it is a PITA, but duplicating across pairs of hard drives is cheap per GB, and allows you to move the data to an offsite location / *SNL Al Gore Voice* lockbox. I have ~2TB of video project data that I store using this method.
As both a printer and a graphic designer, I have to say ---> THIS
There's no such thing as one perfect solution. If you REALLY care, you'll apply multiple solutions. Put it on DVD, then put it on at least one hard drive, then put it into print. If both DVDs and hard drive are ruined then at least you can reproduce the paper documents.
I got an old Dell poweredge tower ($150) and put ubuntu and samba on it (free). Bought a cheap adaptec sata raid card ($65, took some searching) and setup a terabyte raid array (3 500gb drives at, then, $85/pop). I use it to host ghost backups of my desktop, and my and my girlfriends laptops. A raid5 array (and other configuration) means I get an email when there's a failed drive and I can simply replace it (now a drive is about $50). Remember, raid is not backup in itself. So I took it a step further and used a portable drive and ghosts offsite backup feature. So I hook the drive up to my desktop on my first day off work, Ghost backs up existing backups to it and then I keep it in my bag I take to work. So should my house burn down, I have some selected data (software projects, pictures, video, and music) with minimal losses. No single backup method should be considered best. It's a multi-faceted solution that takes some regular, dedicated work on your part.
Chewbacon
The Bible is like Wikipedia: written by a bunch of people and verifiable by questionable sources.
OP: "If I remove the price issue (my data is important to me), does this change the choice?"
ME: If price isn't an issue then you don't choose one, you choose them all.
Depending on the period you want too keep it.
Backup to multiple destinations:
- external HDD/disks/tapes - initial cost, plus some cost to refresh it from time to time
- Online storage (Crashplan, SpiderOak, Amazon S3....) - will incur a monthly/yearly cost but it' usually very reliable.
Has anyone done any studies of the effects of turning on and off repeatedly? Turning things on is a not infrequent cause of failure.
What changed under Obama? Nothing Good
Has anyone done any studies of the effects of turning on and off repeatedly? Turning things on is a not infrequent cause of failure.
Even if they have, it is irrelevant to this situation. Remember, one does not turn on a hard drive to archive a major project multiple times per day. Perhaps not even once per day. Only when the project is complete and ready to be archived. Certainly not "repeatedly." Therefore, this is fewer power cycles than a normal hard drive endures. I can only presume that when a reliability test says the drive is on a certain number of hours per day, that they had to turn the thing off during the other hours of the day. Thus, at least one power cycle per day is incorporated into the reliability tests. If I am doing fewer power cycles, then my drives will last longer. Heck, I only turn one drive in my collection on about once a week or so, and only leave it on for a few minutes at a time. So, again, I figure my drives will last a lifetime.
Paperback, a printer and some paper: http://www.ollydbg.de/Paperbak/index.html#1
Ok, your projects are about 50GB each, so you can fit about 20 of them per Terabyte. How many of them do you want to keep? If they're something that you generate 100 of them per day, all year, you're looking at a much different solution than if a project takes you a month to tweak all the pixels lovingly by hand.
For a few Terabytes, just use 3.5" hard disk, make backups, keep one copy offsite. If you want to keep 10 or so TB handy and online, maybe you'll want to do a RAID thing, or maybe you just want JBOD, and 128-256GB of SSD for the project you're working on right now (but you're still copying it to hard disk once it's baked.)
If you want to keep much more than that online, you'll need to think about fancier storage architectures, and more money. You can go with NAS (Network Attached Storage, a bit pricier, not much faster, high-density disks), or you can go with SAN (Storage Area Networks, much more expensive, blazingly fast, very large.) If you're the bureaucrats who run our IT department, you understand how to support SAN at $8000/TB, and don't have a clue how to support NAS at more like $100/TB, which is ok if you want a big blazingly fast database system, and way out of line if you want to keep a lot of log files that will be Write Once, Read Never (well, Hardly Ever.)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Lots has been written on the subject of archiving, and with lots of valuable and eventually irreplaceable data, it would pay BIG dividends to read a few books and look at some of the companies that manage data for their living.
Others here have noted the variables with respect to media, hardware & software and the fact that over time they all change and eventually become obsolete. Then comes the factors of where you store it and how many places do you store duplicates in to prevent fire, flood or whatever war from wiping the cache of data out.
This is my anecdoctical experience: I've been using a 160 GB 3.5" hd for daily backups since about 2005. I switch it on in the morning, run the backup script, which also pulls some data from some servers and switch it off at the end of the script. It usually stays on for less than an hour. The only thing that failed was the power switch. Luckily I could replace it.
in the group is palpable tonight. Full moon?
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
Right now it seems like the best solution is to have servers that replicates the data between them and use classic hard disks.
Two or three servers with a RAID setup on each. Hotswap disks are preferred since disks do fail now and then. Synchronize between them using rsync, and have the servers at different geographical locations. (at least have one hosted at a friend - or at work.)
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
I think the bottom line is that no medium is bulletproof. If you really care about the data and money is no object then a combination of at least two different mediums is the way to go.
Aside from the usual suspects like tape and HDD I'd suggest looking at flash memory. Expensive per GB but also not prone to mechanical problems. Most flash memory states data retention for 10 years, but it is a little bit more complicated than that. Every time you write data to a flash memory device it "refreshes" and the 10 year counter for that data starts again. To be safe you should probably be imaging and re-writing the flash every year or two.
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
I can just see some poor schmuck holding paper tape up to the light, squinting, and reading into a mike attached to a computer:
one
oh
oh
one, no oh
oh, darn it, one
one
oh
oh
one, uh oh
(Calculating in my head, my memory is about 2 mm/byte, so a terabyte (base 10 tera) would be, erm, about 2 gigameters long.)
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
Hardware gets cheaper and better by the minute. That means your choices are degrading your chances to recover archived information as soon as it enters storage. It's a paradox. I have no answer to that; I've been bitten by storage on Apple ][+ floppies which can't been read on any current hardware I've got. Hopefully, there's an aftermarket for people who can afford to read the obsolete hardware of the past and transfer it to the nonexistent hardware of the future. Maybe there's a standard that won't be intentionally subverted by market forces (emphasis on force), but I dunno what it is. Pray that all that expensive data remains decryptable, if its encrypted. Your best bet may be to pay for redundancy at every weak point in your system.
``Tension, apprehension & dissension have begun!'' - Duffy Wyg&, in Alfred Bester's _The Demolished Man_
I was going to say, data domain might be a nice option since it is cheaper then EMC..... but appearntly EMC bought it allready ;)
It's basicly the same, just the cheap(er) solution then the big EMC SAS solutions.
Do note, you will need DD's own disks, since they run there own little firmware wich is needed. (I smell $$)
I'm seeing a lot of really goofy suggestions in here. I'm going to make my own. First, let me say that my last job was to create massive image archives sourced from disparate media, and store them, permanently. Massive, as in 30tb a year. (maybe not that massive, but we were a tiny company, with a matching budget).
First, let me tell you what won't work. Optical media. Just DON'T. It's unreliable, slow and generally a pain in the ass. I worked at a place that burned 150 CDs a day for distribution, we had consistent failure rates within 20 days of 50%. Granted, that's using the cheapest possible media, but that's still awful. Further our "archive" had thousands of discs in it, was stored well, and as a whole, had a 41% failure rate over 10 years. Optical media is crap for long term storage.
Something else that won't work, TAPE. I know, heresy. But listen for a minute... do you know anything about tape? Ever used it? No? Then don't touch it, unless you plan to hire someone that is an expert to build out the system and keep it running. Were you planning to hire a full time systems manager? I didn't think so. Alternately, if you happen to have experience with tape, hell, use it. You can't beat the density or reliability.
Now, a suggestion that does work. Build your own NAS (or buy one if you don't have the chops to build it). You ought to be able to build/buy a 5tb array for under 3k, give or take. It will quietly hum along in the closet doing it's thing for pretty much the next few years. After 3 years, start a swap program to replace each and every hard drive. Doing this all at once allows you to store the old raid in cold storage (box it up and stick it in the corner). Doing this at the rate of one drive per month allows you to absorb the costs a little easier. Continue forever.
Now, if you are really nuts, and you actually think your data is valuable (you know, like you can trade it for money at some point), then you build out the NAS, order three of them, and keep one at your mom's house (or wherever), then you buy co-lo rack space and put the third unit (did I mention you need 3?) in there and sync all three as often as you can afford the bandwidth. This is, for all intents and purposes, how google backs up data. 3 systems, in 3 locations, each with a complete copy of the data. It's not exactly CHEAP, but neither is redoing all that work.
I'm going to leave out suggestions like using a kodak image writer to burn the images to microfilm that is digitally indexed. Why, because you don't know the first thing about a system like that, and because you want "backups" not permanent archives. Also, you can't afford this method. I'll also skip the really wacky shit, like using BD discs, or SSD arrays (in the terrabyte range? Fuck off$$$), or anything that involves the clouds.
Storing relatively large groups of data has been dirt cheap and easy for the last 5 or so years. Even before that it wasn't that hard. Don't invent a difficult system, or buy into enterprise gear. You don't need difficult, and you don't need a NAS that performs 100,000 IO ops a second with a fiber channel back haul. You need a couple of raided drives in a box in the corner, powered up pretty much all the time.
Oh yeah, and do you know the single greatest cause of HDD failure? Cold storage. TURN THE FUCKING THINGS ON, and leave them that way. They last MUCH, much longer. God it was hard to teach people that concept at my last company. No, putting the drives in a box in the storage locker does not make them last longer, in fact, they started failing the minute you unplugged them. (yes, I know, physical shock is probably actually higher up on the list, as is manufacturing defects, a little hyperbole never hurt anyone)
There are only 2 real solutions if you want real long term storage. The first is you become Linus and just dump it on a server and let the rest of the world back it up, and the second is you make your data a religious text somehow. Because those guys with translate it for centuries to come, even if it means sitting 50 dudes in a room for 3 years with nothing but a feather, ink, and parchment.
come to think of it, same thing.
Never trust an atom. They make up everything.
Sorry. It's not going to be popular with the Slashdot crowd, but dumping it onto a cloud storage service seems to make the most sense.
I keep seeing ads for a service that gives you 2TB backups for £5/mo. For that you get full redundancy, and let them worry about replacing broken hardware etc. Cheaper than buying the hardware yourself, over the first year, and bound to get cheaper.
If you're genuinely worried about your cloud provider having a catastrophe of some kind that their own fault-tolerance approach doesn't cover, then dupe your archive across two cloud storage suppliers.
Definitely hard drives
I don't know why that got modded down. Thanks to SATA having a standard plug positioning and hot swap connectors, there are some nice solutions that allow you to just slot in a SATA drive and pull it when you are done with it. You can get them direct SATA that plug into the front of the machine or external USB/Firewire models.
Single-layer BD-R disks and 2TB SATA disks are currently matched at $0.04/GB. I will assume that the OP's data, which contains images, is already compressed sufficiently.
The BD-R disks have an unknown livespan and the OP's dataset would have to span 2-3 disks per project. The 2TB disks would hold multiple projects. There is an argument to be had that it is less expensive and more reliable to use the BD-R disks from the perspective of adding a single parity disk. The loss of any disk set would lose that project, not multiple projects. The data would be immediately offlined. As optical media tends not to fail by-disk, but by block, a filesystem like ZFS may be safest.
Contrast to the 2TB solution where you could use RAID-5, fill the array, & then offline for archival. For as long as the drives are online, there is an increased risk of failure. The loss of the array would lose multiple projects (~66 projects). Your individual drives are arguably more reliable, but you have fewer disks at a greater capacity, so the impact of a disk failure is much greater than with the more distributed BD-R model.
The benefits of hard disk storage here are ease-of-use and a better known MTBF. With fewer disks, it is easier & faster to online & verify your archives every so often. Even with ZFS-on-BDR, I'm not sure how well BDR disks will last over 10 years in a humidor, let alone on a random shelf.
If you want true longevity of archives, it isn't about finding a format that will not ever die, because they all can. It is about making copies. The brilliance of digital storage is perfect copies for an unlimited number of generations. So you take advantage of that. Have more than one backup, and test the backups. If one fails, make a new copy from the good data. Also, check the expected life of the backup medium, and replace it with new copies when it starts to age.
Along those lines, to keep it useful, make sure to convert it to a new format, when appropriate. This mostly means new backup media format, like if you are using LTO-5 now you'd probably move to a newer LTO, 7 or 8 or something in a decade, but also the data itself. Like say your data was images stored in TIFF format. Ok fine, but maybe convert them to PNG, since TIFF has less support these days and is becoming a relic in some ways. Some time in the future maybe you'd again convert it to a new image format.
The reason to do those things is otherwise in the far future, maybe you run in to a problem. Let's say it is 2060 and you need the data. It is all on LTO-5 tapes, however, and the world moved to a holographic storage medium 20 years earlier and a working LTO-5 drive is nearly impossible to find. Then you do get it off and the format is something no software reads anymore, so you have to break out an emulator to convert to a newer format, and then again, until you finally get to something you can use. If you can't do all that, then the data might as well be lost since you can't access it.
Keep plenty of copies and keep them up to date (and tested) and you are good. The only other thing is to protect them from damage. That means storing them some place that is secure against various things. A good fire safe would be a good idea, if it is really important maybe a vault some place else.
The thing to do is to ask yourself at what point do you stop caring about your data? That point does exist. Then design something that can withstand more or less anything below that.
As an example I helped my parents get backups set up for their business. They care about them only so long as the business survives. If the building burns down, or floods, nothing on them matters. So the backups are in a good safe, but on the premises. There are plenty of things that could result in data loss, but only the things that would also result in the business being lost and them not caring. On the other hand at work we have data that needs to survive pretty much everything short of a nuclear war. If our building goes down, if we all die, it still needs to be intact. So we take copies of it to another building, in to an underground vault. It would take a pretty catastrophic event to get it all, and that would be large enough that then it wouldn't matter.
Unless you buy extremely good archival grade discs, optical media is the worst suggestion.
Even with archival-grade disks it's still the worst suggestion.
Apart from tape - yeah, let's put all our data on something that can't be read without specialized hardware! (where will you get a tape drive from in an emergency?)
Hard disks can be connected to any PC, they're cheap, they're fast. The only problems I've ever had with USB disks is failure of the cheap-ass wall-warts they supply them with. Luckily all USB drives use either 5V/12V so it's easy to wire them up to a spare PC power supply. I have one under the desk and any USB disk which is switched on all day gets connected to that. The wall-wart goes in a drawer for emergencies.
All other considerations aside though, the only thing that's going to garantee long-term success is:
a) Use something that can be read on any machine with no special hardware or drivers.
b) Make multiple copies of the data and store it in different locations.
c) Use some widely used, non-proprietary format for combining/compressing the files (eg. zip).
Base whatever you do on this philosophy and you should be OK.
No sig today...
I have had two disks in a RAID 10 fail me directly after each other, once. Guess which two? Yay!
Especially for backups where write speed is not much of an issue, you want RAID 6 or above. Never RAID 10.
Or, to make things even simpler, use laptop disks so you can pull the power through USB.
You could adopt the British Broadcasting Corporations approach to valuable archival holdings c.1972: thrown everything into a furnace.
My web domain.
She says you're a little on the small side.
If it's just you, and just one computer, why not carbonite (or another reputable online storage service) AND ALSO 1 or two external usb hard drives, keeping one off site and periodically rotating them. Like, weekly.
If carbonite implodes, you have the hard drives. If you lose one hard drive, you have the other. If you lose both hard drives you have carbonite.
If you never lose the working data, then you aren't out TOO much, as carbonite is not TOO expensive and external usb hard drives are also reasonable.
No, I don't work for carbonite, just using them as a ubiquitous brand name like kleenex. I could have just as easily said dropbox. Oh, wait, no, not dropbox. Nevermind.
Flappinbooger isn't my real name
Forgive my jaded perspective - respondents to this query are almost without exception fan boys of particular techie solutions. The real solution is far more commonsensical. I have every file I ever created from my 486 SX25 (circa 1990) onwards through a wealth of "blindingly fast' iterations of Pentium machines - my data, insofar as I ever wanted to keep it - is complete and has survived hard drive crashes, laptop and desktop thefts, floods, fire, misguided backup solutions involving CD and DVD, and the most malignant viruses the world felt able to bless me with. I have never had a raid array, a tape backup system - and I hasten to add - I spit in the general direction of your cloud solutions. Clouds are soft, vaporous and wholly subject to evaporation into nothingness. And I have never lost a file I wanted... The painfully obvious answer is - backup your hard drives - keep two copies (at least) of everything (preferably in different locations - I use family member backup and it has never failed) currently I have about 6TB of personal data - all backed up locally plus in at least one external location - this can be done with a handful of drives for an outlay of just a few hundred dollars - add a hot-swappable 3.5 inch drive dock or two and all your data is independent of all your computers. Just remember the rules: 1) The data on your computer is all temporary storage - never rely on it in the longer term - you should be able to reformat at the drop of a hat if you are doing it right 2) One copy is your interim (I don't care if I lose it) position 3) A 'cloud' copy is your 'this is convenient - but lets not pretend this is long term' solution for when you are traveling or using multiple computers in different locations 4) Two copies on site (on separate external drives) is your provisionally safe position (better still - keep one at the office) 4) Three copies with at least one in a remote location means you actually own your data - it is going nowhere without your say so and you will be able to bequeath your digital estate to those who are deserving (they in turn will be able to retain it - but only if they follow the rules above...) There! That's not so hard is it?
The OP failed to mention how many of these things needed archiving. A couple hundred? Redundant disks (don't even bother with RAID) spun up once a year or so. Ten thousand? Tape. No question. It has proven and well-known long-term reliability. But you must meet the media's storage requirements to achieve the media life specs. (If you can't do that, there are any number of off-site tape storage places that can.)
http://www.snseurope.com/ We do large genome sequencing runs and processing of the raw data with data sets not unlike yours. As other people have said how long do you need to keep the data or be able to retrieve the original and re-do the data analysis?
Unless its D-VHS. Those store 25GB per DF-240 tape and only takes 2 hours per tape.
Optical, I've had both DVD and CD bitrot, even on the old Kodak 'gold' discs.
Use good media, and don't fall for the "gold archival" hype, those MAM-A discs (also sold under the Delkin brand) are/were subpar and overpriced. I'm willing to bet most of those bitrot discs were cheapo CMC Magnetics, Princo, or Ritek made discs. Problem is its getting harder to find known good blank media.
My home media collection is around 20TB. For the longest time, I was dealing with redundancy by just maintaining a second, synchronized set of file servers. Each server has either a 16 port SAS controller or two 8 port controllers and a total of 16 storage drives in RAID6 including the hot spares. Each machine probably cost me $2500 to set up and I have four of them. And that's with getting the RAID cards and rack chassis from Ebay.
The truth is, the chance of having a non-recoverable error while doing a RAID rebuild is really, disgustingly high. Hopefully I wouldn't run into a scenario in which both servers in a synchronized pair had issues at the same time, but that wasn't giving me warm fuzzies the more I read about the reliability of RAID5/6 for large volumes.
So when it came time for me to upgrade my storage setup, I chose to go a different direction. I bought a lightly used LTO4 changer. Every tape holds 800GB and costs about $20. The tapes can be taken off site (shipped to my parents) and can grow to deal with whatever expansion of storage I make in the future. In the near term I will probably purchase a second LTO drive to store with my tapes, but I expect that I'll be in a much better place for dealing with my needs for at least the next several years.
It's not a solution for everyone, but it was the right move for me.
-- I wanna decide who lives and who dies - Crow T. Robot, MST3K
WinRAR? PAR? Seriously. It would also be tediously slow and be a micromanaging solution that only covers the files but fails to consider the need to also keep a working backup image of the system and a properly licensed version of InDesign to ensure he can still reopen the files.
Tapes make sense especially since they can easily be couriered for offsite backup. A well thought out disaster recovery plan must include offsite backup.
The submitter mentions 60GB InDesign files and sounds like a small profession or high end amatuer, so the extra cost of a RAID setup sounds like it would be a sensible investment.
Copying to new media sounds sensible but is not exactly the right answer, the correct solution is not only to make backups but to also to check that you are able to restore from backup.
This is right on. I actually do backups for an enterprise company (and numerous smaller ones). My enterprise customer uses raid for live data, raided disk arrays for current backups, intranet transfer to an alternate facility on another continent for mirroring and current backups, daily and weekly to tape and tapes are rotated out to Iron Mountain weekly for a month of offsite data. It is expensive, but so is losing the information your business relies on. For this person though, let's be realistic - get a couple external hard drives, flash storage media, blueray, or tapes - pick any two and keep them in two different locations. test them on a regular basis to make sure they are readable and refresh the media as specified by the manufacturer (replace optical at least every year, replace tape based on wear or every 3-5 years and replace hard drives and flash media based on wear or when they are out of warranty.
Get a web developer
Laptop disks tend to be either much lower capacity or much more expensive than equivalently priced 'desktop' drives.
Plus ... you soon run out of USB ports if you want half a dozen of them (a lot of the bigger ones need two USB cables to get enough current). This means buying powered USB hubs ... and ... the hubs are usually powered by cheap-ass wall-warts!
No sig today...
I agree -- tape drives are perfect for backups. Like someone mentioned, tar volumes from the 1970s are readable on tapes today.
Tapes are an ideal backup medium, provided you use more than one tape for archiving, and periodically go through and recopy files to new media every couple years or so. Newer tape drives offer WORM capability, so data can only be destroyed, not tampered with.
However, why I mention tapes secondary is that they are so expensive for meaningful capacity. Yes, you can buy tapes with less capacity cheaper, but there is a point where you are better off with multiple hard disks than trying to copy an archive onto 50-100 tapes. Same with optical media.
Take a LTO-5 drive, which is par for the course, and has enough storage capacity to be useful. It costs about $2500.00. However, it needs a SAS card, and it also needs I/O. Similar to old CD-Rs, a tape that doesn't get enough data streaming to it starts shoe-shining, which jumps the chance of errors and adds considerable wear to the heads and the tape. So, the machine that tape drives need to be attached to either has to be fairly high end, or a dedicated machine just for moving stuff to tape with no other functions.
If you can afford tape, it can be argued as the best backup media out there. However, most people can't, so external HDDs (laptop drives are better as they do not require power supplies) are the second best choice. They are nowhere near perfect, but for those who can't afford a new tape drive, are pretty much the only game in town for large files.
I have a few friends who own print design agencies. Here's how they do it (I asked them last year when I was setting up project storage for my company):
- A few do the old-school library checkout system and get the drive from storage and use it with their desktop/laptop. Most often they use USB drives.
- The more sophisticated ones have a multi drive ESATA box and request that a particular archive be put online. An admin gets the drive and mounts it as needed. Live projects are stored online. Backups are done to another hard drive.
- The most sophisticated have a big old NAS or file server and just leave everything online, and back up to HD.
The second option is really the most popular.
-- $G
If you think tape is a bad idea you clearly haven't done any serious work in archiving or backup. Where would I get a tape drive in an emergency? I would probably use the one mounted in the rack attached to the backup server like I did yesterday, or if that failed I would use the spare we keep in case of failures, and if it all burned down I would drive over to the other facility and use one of the two drives located there.
Get a web developer
Put it all on 5 1/4" floppies :)
I don't get it. Sure, I realize that it would take a HUGE number of floppies to accomplish this, but I'm a bit lost as to why that's "funny" as in the "smiley face" emoticon. Can you explain?
More than 20 years ago, I'd back up a 30 Megabyte disk on 1.2MB 5 1/4" floppies. The process took two or three 10-pack boxes of disks and several hours of babysitting the computer, swapping out floppies, labeling them in order, hoping they were all good and finally labeling and storing away all the boxes now full of my most recent backup. Now imagine how tedious (expensive, error-prone, ridiculous, utterly impractical) this would be for tens or hundreds of gigabytes .
If that doesn't do it, then imagine filling a swimming pool with a teaspoon. Mowing the lawn with scissors. Dialing every possible phone number because you forgot the last four digits.
If none of these things sounds funny, I guess this just isn't your style of humor. I didn't LOL, but I did chuckle.
I am not a crackpot.
bad idea. Those usb powered drives tend to fail more often because many draw more power than the usb spec allows (which is why some ship with two usb connectors), sure they might spin up, but the low amperage is pretty hard on them long term.
Get a web developer
It's probably already in the comments here, but a lot of it was tl;dr. I do like slaker's LTO approach but would also recommend keeping a dedicated server backup system attached to a NAS device or something similar. Right now I don't have that much data, only about 1TB so I keep two external HDD's and any small documents that I'd definitely want to keep I use dropbox to store away. I may not be the best source for personal storage, but have been working in an enterprise environment for a few years now and a tape + NAS backup system has suited us quite well.
Hard drives are the cheapest storage. Put some Terabyte drives in a computer and add a removable drive slot. Put your data on the drives, RAID is a waste of time. SATA is fast enough for most people.
If this is archival data that does not change, create a schedule where you back it up once a month, once a quarter, or annually depending on value. The backup will be to a drive you put in and then remove for offline storage. You can so some fancy Hard-linking of backups to maintain versioning, or you can encrypt your backups. The most important part is that they are in two locations, live on your server, and on an offline hard drive. You may consider hashing data to check for corruption.
This setup is fast, it's easy to access, it's not expensive, and it provides robust data security. You should reevaluate in 5 years, purge unnecessary data, or see if better options have appeared.
Check out my site, I specialize in this sort of setup.
Cheap storage VM.
If you want something that extremely stable and will last for a hundred years then you want data glyphs. You use a printer, paper, and a flatbed scanner or hi res camera and you have a viable backup solution. The most common data glyph is the bar code.
As with everything there are downsides:
- You need to use a quality paper and ink if your backing up for the long hall
- Printing takes time especially if you're using
- Storing a ream of paper for each back up
Below is list of some glyph formats. There use to be a site for a full Xerox solution but I think they licensed it out to another company.
http://www.adams1.com/stack.html
You say things that offend me and I can deal with it. Can you?
If you have a NAS with multiple HDD then that'll work
Ramyphotography a portrait and wedding photography
Make lots of copies.
Hard drives are cheap, any important data gets copied onto a few (2-3) drives stored in different locations (I like bank safe deposit boxes, parents basement) are a good one.
I also keep my data on my main server, so it's included in every new backup, but I only have a few hundred gigs.
If cost is really no object, I would have added tape and filed them away too, but tape is much more expensive, and my data isn't _that_ important.
Offline, I would suggest a pair of HDD docks/hot-swap bays. SATA, with normal HDDs. Put them in a ZFS mirror and keep track of which ones go together. ZFS detects bit-rot, the mirror allows you to correct for it. Add in another backup layer as desired, for that size range, you're probably into LTO tape or multi-layer BD... Just make sure to store a checksum so you know if your data is good when you try to read it later. MD5 would probably work fine.
If you're talking ONLINE storage, I would still use HDDs and ZFS, but include a cron job that runs "zpool scrub" on the array to keep it checked for bit-rot. Check the logs and "zpool status" for errors, replace bad drives as they show up. Make sure to use mirrors or RAIDZ. For large HDDs (>1TB?) I'd go with RAIDZ2 or 3-way mirrors. The reconstruction time can fail a second disk...
There used to be a guy, maybe there still is, working at a disc duplication facility who would periodically evaluate various manufacturer's media. I haven't checked in years and manufacturing may have moved so I won't mention the old results. Maybe some googling can (re)discover such an effort?
Many commentators will mention tape drives for high volume/capacity data storage. My experience tells me not to trust tapes or tape drives.
Unless you have humidity controlled and air conditioned storage facility tapes are a bad idea. Still tapes fail. Specifications change. LTO 4 becomes LTO 5 and so on. They are notoriously prone to slightest jerks or shocks.
The best, cheapest option I found is the RAID 2 external SATA hard drives of 2TB from Western Digital. Keep two of these units...you have 4 hard drives with your data. That's how I store DPX files (each file is around 9MB, 24 files per second) from my feature film projects.
Tat Tvam Asi
At over $1k each, tape drives are out of the budget for most home users. Tape is also much slower then most people realize, much slower...
Cheap storage VM.
True. Current tape formats are nice, provided that you have a reliable Grandfather/Father/Son overwrite policy, with appropriate verification. You can't expect too much of it, though. However, I have to admit that my point of view is coloured by repeated experience (in the '80s and '90s) of multi-volume backup sets proving to be worthless as a result of bad I/O in the middle of the set. However, with modern equipment YMMV.
RAID is a great system for reliability (so long as it isn't RAID-0) but can't be relied on as a backup at all, since it is constantly plugged in.
FWIW, http://www.digitalfaq.com/reviews/dvd-media.htm
Amazon, BackBlaze, etc... have pretty decent services that would allow you to backup to their service and not have to worry about the details. You can TrueCrypt and parity-protect your files before uploading to protect against their service either snooping or corrupting your data. Heck, if you are really paranoid, you could upload to 2 or 3 such services for the same price as rolling your own or even have periodic hashing for consistency.
The only downside I can see here is that you need to have sufficient upload bandwidth (and a compliant ISP) for the volume of data that needs to be backed up. The upsides are manifest: probably cheaper, better tech, not having to worry about implementation details, wasting less of your time managing your creations and more time creating.
This is one of those (rare) cases where offloading to the cloud makes a ton of sense.
1. use multiple technologies - one backup on hard disks, one on optical, one on tape, one "somebody else's problem".
2. use standard formats - FAT32 and ZIP, exfs3 and tar.gz - avoid proprietary hardware & software, you want stuff which you can plug into Linux and read.
3. give yourself the ability to recover from both partial failure and complete failure of a single disk/tape.
4. redo your backups before you lose the ability to recover them.
Under 2 years? Multiple backups on hard disk, no question.
2-10 years? Hard disks with integrity checks of every drive every 6 months so you know you always have 2 good copies at all times.
10+ years? Same as above but migrate data to new media as it becomes available, OR plan on migrating the data to a proven archival storage in the future.
Proven archival storage available today:
Archival-quality optical media.
Archival-quality tape.
Archival paper - bits can be printed as graphics.
Archival microfilm or other "microscopic paper"-equivalents.
Proven archival storage available in 2021: ???
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
The gold standard for archiving (distinct from backup) is LTO-4 tape. I know this is expensive. I know that a lot of people go "ewww, tape, STFU grampa", but you should have a serious look at the efficiency of SAS tape drives, the simplicity of the solution for really large amounts of archival data, and the reliability of the medium.
-fb Everything not expressly forbidden is now mandatory.
Heh, that brings back memories of 88KB on my Atari's floppies, which I doubled by cutting a notch on the other side and flipping it over, even if they were sold as Single-Sided Single Density.
As for the OP, so far I'm keeping stuff on external hard drives, but I know it's not the best solution for long-term. Where are those multi-layer FMD discs I kept hearing about?
"The only legitimate use of a computer is to play games." - Eugene Jarvis
They don't call it a REDUNDANT Array of INEXPENSIVE Disks for nothing.
You can buy 2TB for 80$ now. Buy 8 of them. You now have 16TB. Set up a mirrored RAID. You now have 8TB. Run it off a cheap low end PC (that has 4 SATA RAID of course). Cost you under 1000$ bucks.
There's always PaperDisk, apparently able to store 1 meg per sheet. They quote 4MB, but that's after compression and BS to people like us. :-)
"The only legitimate use of a computer is to play games." - Eugene Jarvis
Except for MOD (where the capacities did not keep up), archival tape is the only long-lived option, all other options either require regular maintenance or are unreliable.
You will find that this quality of tape is more expensive than HDDs per GB and that the drives run you a couple thousand dollars.
Also, keep in mind that you need to archive the software used to access the data in a form that you can run later, e.g. as VM images.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Buy a NAS that is large enough for you. Plug it in your network.
You said you had 40-60GB datasets, but not how many datasets you're backing up.
Tape may very well be the "best" solution at the moment, though hard drives actually make a lot of sense. I would consider a part/parity system with enough parity to rebuild the entire file (yes, 2x the storage space), with the resulting set copied to two separate hard drives. 2TB hard drives are fairly commonplace, and can be had for $80 each. So you've got 100GB (avg) backup sets, and you're making two copies. You should be able to write to the drives simultaneously using two eSATA docking stations (about $40 each) and get proper cases for the drives (~$5 each). You shouldn't be out more than $200 for 20 files backed up, or $10 per project. Might as well store them in two separate physical locations.
Finally, plan a migration path on a two-four year cycle. The migration should involve purchasing new media (presumably with 2x+ the initial storage density) and copying all of your files over. That will act as your bit-check, though in theory you could do the bit check without migration. Even with migration, the long term media cost should be less than $20-$25 per project, exclusive of the manpower to do the transfers.
Is it just my observation, or are there way too many stupid people in the world?
That's nice, but it means spanning multiple discs for a even a non-parity protected version. Three discs per project would be required. Not only time consuming, but kind of a pain in the ass.
Is it just my observation, or are there way too many stupid people in the world?
Don't forget to use checksums for bit rot detection.
For archival disks (assuming that I turn them off when not actively being used) I'd be more concerned with them going dead on the shelf while not in use. I see comments on that but don't have hard data on those failure rates. Also, that's the sort of thing that seems it would be very dependent on the manufacturer/model.
Ruskies could also take a personal interest in just getting rid of YOUR data.
where will you get a tape drive from in an emergency
That is certainly something you should have address in your disaster recovery plan, but I would not characterize it as a hard problem to solve.
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
The description looks like that you want to store your projects 5-10 years (after all: would it still make sense to open them afterwards?). If this is true, DVDs or even Blueray make sense - or even cloud (but then use at least 2 independent providers and check often).
However, if you want to store them for more than 5-10 years, ask yourself first the question: How do you go to archive the programs that you need to open your projects? Open formats for the content, and open source for the programs would be a huge help. The you can think about the archive media for it.
I agree -- tape drives are perfect for backups. Like someone mentioned, tar volumes from the 1970s are readable on tapes today.
I just read a VHS archive made in 1987 and it worked fine. Funny that most of the commercials were for food or clothes. Only one for a car and one for aspirin.
Since when is a 40 GB project "large"??? If it fits on a single USB thumb drive it ain't large.
Less than 2 TB = fits on a single hard drive = small.
Less than 20 TB = fits on a single RAID = medium.
Once you're up into the hundreds of terabytes (e.g. backing up all source material for a feature film), then yes, you have issues.
expectancy doesn't do much for me...
Cheap storage VM.
There is a free download (that isn't exactly working) for those who purchased after June 6, 2011.
http://www.apple.com/macosx/uptodate/
all in all, it depends what your limitations and needs are....i can zip a file that is 60 gb to a small 10 gb, then break it off in chunks, and upload it using an automated upload to hotmail account software using imap, needing a few accounts, but doable....
A quick google search returns this: "LTO 4 is rated for 11200 end to end passes and 200 full read/write cycles". Doesn't sound very durable unless he buys several tapes and uses them sparingly. Hard drives are rated for magnitudes more use.
It's not as if they are fruit or anything.
I have some crazy old IDE drives sitting around that I use for things like storing all the installation files for almost every version of every program I have ever installed. In fact, I have one fired up as I write this. They still work just fine. Sure, they may go bad sooner than other drives - especially if I start running and accessing them all day - because they are older. But I have no worries that any of them will go bad on me just sitting there in the ancient IDE-USB drive enclosure it is sitting in.
I have been dinking around with computers since the days when we had to store our programs on cassette tape. I have continuously finagled all kinds of drive setups to make use of older / cheaper drives. I have never heard of any modern drive going bad just sitting on the shelf. Sure, I guess it is possible for the magnetic field to dissipate over time.This used to happen on some of the very first PC hard drives (does anyone remember RLL and connecting two data cables to each drive?), back in the eighties, but I haven't heard of it happening since. So, you can just run SpinRite on them once a year or so to refresh the magnetization and you are good to go.
The Supermicro SC936E26-R1200B Storage Chassis holds 16 3.5" drives. (This same chassis comes in cheaper models, but the savings are not worth the compromises - learned that one the hard way.) Then get an appropriate motherboard, memory, processor, and an LSI MegaRAID SAS 9280-24i4e raid card. Set up your Seagate Barracuda XT drives in Raid 60 drive groups, say 8 drives to a raid 6 stripe You'll get up to 2 stripes, so with 6 data drives per stripe that's up to 12 data drives in the chassis. At 2 terabytes per drive that's 24 terabytes, or with the new 3TB drives, that's 36 terabytes of reasonable performance, cost effective, dead reliable storage. That's not enough? Add up to 6 additional chassis to plug into your LSI card, (no motherboard needed), for a total of nearly a quarter exabyte. If you want a different cost/performance ratio substitute 5200 rpm drives, or Hitachis or whatever, and/or make smaller or larger Raid 6 stripes.
Don't tell *anyone* where the thing is!
Social Credit would solve everything...
I'm a PhD student studying magnetism, and one thing I can say for sure is that DVD/BR media is not the way to go. Professionally printed media (the silver bottom) uses a stamp to make a mechanical impression, not unlike vinyl records. Once sealed, it last forever. Writable media uses a die, and unless you store your media at 0K, finite temperatures will cause the die to diffuse and the media become useless. This takes much less time than people think. Good disks will last 10 years, cheap ones only a few years. The problem is that it's impossible to tell anymore who is making the good disks, since all of the production lines get shared by many brands.
Alternatively, magnetic storage isn't that great either (tapes or HDDs). For both a HDD and tape, thermal fluctuations cause random data to be lost, but hard drives are designed to recover this data and correct it. If you pull your hard drive off line for several years, it doesn't have the opportunity to constantly scan itself and check for these errors, so never expect an unpowered hard drive to store data for long periods of time - they just are not designed to do this.
As previous users have pointed out, software raid is the only way to go. Hardware raid provides a single point of failure, and is really only suitable for high performance and short term reliability, not long term reliability.
Tape drives also have the same thermal fluctuations issue, but because the magnetic grains can be much larger (tapes have 1000's of times more surface area to store the same amount of data) they can go much longer. I would still "refresh" my tapes every year or two though.
Based on your requirements, I would suggest tape first, then a large software raid of HDDs. Anything else is just not safe!
... Mowing the lawn with scissors. ...
Did that in USAF Basic Training.
If you want news from today, you have to come back tomorrow.
Build 3 Petabyte servers using Backblaze's instructions, place 1 in your house, and find 2 offsite locations to place the remaining pods.
from my experience you should 4 external storage that separate from your PC, There are special DVD for such purpose as weel thx
it's a bit hard to recommend backups for an unknown backup size ....
The Cloud - because you don't care if your apps and data are up in the air.