How Do You Backup 20TB of Data?
Sean0michael writes "Recently I had a friend lose their entire electronic collection of music and movies by erasing a RAID array on their home server. He had 20TB of data on his rack at home that had survived a dozen hard drive failures over the years. But he didn't have a good way to backup that much data, so he never took one. Now he wishes he had.
Asking around among our tech-savvy friends though, no one has a good answer to the question, 'how would you backup 20TB of data?'. It's not like you could just plug in an external drive, and using any cloud service would be terribly expensive. Blu-Ray discs can hold a lot of data, but that's a lot of time (and money) spent burning discs that you likely will never need. Tape drives are another possibility, but are they right for this kind of problem? I don' t know. There might be something else out there, but I still have no feasible solution.
So I ask fellow slashdotters: for a home user, how do you backup 20TB of Data?" Even Amazon Glacier is pretty pricey for that much data.
Asking around among our tech-savvy friends though, no one has a good answer to the question, 'how would you backup 20TB of data?'. It's not like you could just plug in an external drive, and using any cloud service would be terribly expensive. Blu-Ray discs can hold a lot of data, but that's a lot of time (and money) spent burning discs that you likely will never need. Tape drives are another possibility, but are they right for this kind of problem? I don' t know. There might be something else out there, but I still have no feasible solution.
So I ask fellow slashdotters: for a home user, how do you backup 20TB of Data?" Even Amazon Glacier is pretty pricey for that much data.
I would say use floppies, but I'm kind of old and out of touch now.
At home, I didn't feel like paying for 2 large arrays to store my data, so if I rip any media, I always rip it to DIVX. 800 MB for a DVD or even bluray rip is a great economy, saves me money on primary storage and also enables me to back it up. I accept the loss of quality as I can always reference the original media if I want.
Another option in the future may be subscription services which have HD content, thus eliminating my need to roll my own. We'll see what happens there.
Crashplan has unlimited storage. I use their home plan; it's unlimited for up to 10 machines. I think I am backing up about 6TB there now.
I have a 16 TB media collection at home that I just back up on more hard drives.
External hard drives in USB cases + Robocopy works great for me.
I don't respond to AC's.
> It's not like you could just plug in an external drive [...]
Why not? Maybe not one, but 10 or 20 of them.
BackBlaze offers unlimited backup storage for home users for around $5/mo - encrypted with asymmetric keys. I've got about 750 GB on there myself, works great. Although they may not *like* you backing up 20 TB of stuff, they should accept it. And, if they don't, you're about back five bucks. Probably worth a try.
"My friend (read I) lost 20TB of pirated content! What should my friend have done different?"
How about, ask yourself, how much of that content were you intending to ever consume again. Yeah, you can most likely delete 95% of it, that's 1TB of content that you might use again.
Hoarders! *lol*
A quick check at one service which lists such large amounts, you would be looking at almost $20k/year to keep a single offsite copy of that. That is the posted price however, I imagine that is enough that you could shop around and find a deal, but, a deal is still going to be prohibitive for most people.
At 20 TB I would start thinking about one of two things: Tape, and/or git-annex.
Unless prices have changed since I last looked and the scales tipped, tape has the advantage of being cheap. Of course, you will need to test your tapes occasionally and likely want 2 copies just in case, but, at that point you are invested in tape, may as well.
The other possibility is git-annex and lots of drives, but you can mix types. That way you can keep a catalog of your library and information on where it all is, and how many copies of each thing you have.
Of course, any way you slice it, each physical piece of media is something that can fail so you have to occasionally test to ensure redundancy.
"I opened my eyes, and everything went dark again"
He could have always bought a sufficiently large tape-library from ebay - but I guess the data wasn't worth that much.
That's always the first pair of questions to ask: how much is it worth and how much would it cost to recreate?
If the answer is somewhere between "I don't know" and "Well, it's not that much", then he just should stop hoarding that much stuff.
He could have built a filer with ZFS and sent daily snapshots to a 2nd filer - but that wouldn't have helped him if the house burnt down...
Windows 2000 - from the guys who brought us edlin
If you want to back up 20TB of data, you have to pay for it.
Build another server and rsync hourly.
Figure out the theory of everything.
Then you can always recompute your data from scratch.
If Pandora's box is destined to be opened, *I* want to be the one to open it.
With a second array, or tape backup. The second array is going to be the easiest solution, but tape backup provides you the option of storing the tapes off-site, which is important for any real backup plan. After all, your friend could just as easily wipe out the 2nd array by mistake, or a disaster could wipe out the physical location. LTO-6 tapes are cheap and can hold 2.5-6.5TB of data depending on compression. Tape drives are perfect for backup, so why even ask if it's right?
As you noted, Bluray holds a lot of data, but would take some time. Since its audio/video media, odds are most of it is pretty stagnant. I'd do an initial rsync job to write out to Bluray... then once a month or so repeat the job but now rsync will only get what's changed. Depending on the media type and age, you could also look at dedup'ing it and if the dedup'd copy is significantly smaller than the source you might be able to put that onto say one or two 3-4Tb drives.
You could always just call up the NSA and ask them to restore the data. Odds are good they have a copy of it...
Same as always.
Whenever you buy storage, you should buy the necessary backup capacity at the same time. You should never buy storage without buying backup capacity. Budget for it right from the start. If you can't afford the backup, you can't afford the storage. This may mean getting half as much storage as you'd like, but that's just the way it has to be. You probably wouldn't buy a car without an engine. It wouldn't do its job. So don't buy storage without backup. If you do, you have a storage system that can't do its job.
I agree, I've been using Crashplan for three years and the unlimited space it's really great BUT... ...I'm not sure about the bandwidth they provide: how long it will take to upload 20 TB?
Anyway, I don't see what's the problem in using external drives for backup. Here in my lab I've realized that the best way to backup X Terabytes is to have another storage with X Terabytes...
It's not like you could just plug in an external drive, and using any cloud service would be terribly expensive. Blu-Ray discs can hold a lot of data, but that's a lot of time (and money) spent burning discs that you likely will never need. Tape drives are another possibility, but are they right for this kind of problem? I don' t know. There might be something else out there, but I still have no feasible solution.
Lets start from the top: You *can* plug in an external drive, it's called a complete hardware duplicate of your array (or perhaps for space/cost consideration, a single disk based copy held offline and synced regularly). Not hard and not terribly expensive (i would go with this solution personally). Cloud? Yep the bandwidth and storage even on something like Amazon Glacier would be prohibitive to all but the most financially independent geeks. Bluray doesnt hold enough (even at 50gb/disc you need 400 of them, groan). So, tapes? You bet your ass tapes are designed to do exactly this task, why do you think they are still in use? You can get individual tapes at 1/1.5TB, but for a one man operation they are probably going to cost you more than the first solution (offline spinning disks) and they are a pain to manage properly.
Now what is this doing on ask slashdot? A pencil, some scratch paper, and 15 minutes between amazon.com and newegg.com would tell you the prices of every solution. Oh, right, they need a chance to tee up some targeted ads for Carbonite, Mozy, Crashplan, etc.
How about backing up only the crown jewels of the collection?
Make a directory like /entertainment/premium and put the best stuff there, with a 4 TB limit. Rotate two external 4 TB HDDs and copy the stuff over periodically. Put a little sticker or some other mark on the newest, so you remember which one it is. If your main RAID array fails, build a new one, and restore the premium stuff from the most recent one of the two external disks.
These "unlimited" claims always turn out to be lies. When will we learn?
My friend paid for an "unlimited" account from JustCloud for backup. He stored 1.8 TB on it and then they "fair use"'d his ass and canceled his account. They didn't even give him a refund for the rest of the money he prepaid.
I use Glacier and its great. 20 TB is about $200 a month which to me does not seem like all that much money for backing up that much data. The biggest problem from a home users perspective is getting all of that data to Amazon. Hopefully he lives somewhere where fiber is available to his house.
md prnt dwn
Connect a raspberry pi and configure it as a backup server and let it copy all to /dev/null... ...
Then put aside the money you would have invested in a "better" solution, put it in a safe bank (under your mattress)
and wait until you need to restore something..
Most probably you'll enjoy the money more
Catalogue the contents and when you lose it all you can spend 10 minutes searching for the 2% of the content you really want to download again and feel good that you now have 98% of your storage space back to start filling with more crap :D
That's why it is supposed to be used with caution, as no 'rm' supports it. ;)
Glacier at $20 per month for 20TB is rediculously cheap by today's standards. And at those sizes, you'd want to ship those drives to Amazon instead of uploading. We do this all the time and it's not that hard.
The price of TBs of storage of course will come down without question. But by today's standards $20/month for a medium that won't "bit rot" on you is an amazing deal.
You missed a 0, he has 20,000GB and the cost for glacier is $.01/gb/mo (not including upload charges). So, Glacier would cost him $200 a month or $2400 a year. Not hugely expensive but if you are OK with a quasi-local copy (offline and stored in a fire safe, perhaps) you could do it cheaper for less, after you hit the 1 year mark.
I was going down the route of buying an expensive RAID NAS / DAS, but then I remembered when I got broken into in the Canary Islands and the thieves took both of my backup drives from two separate rooms. I'm now settled on a simple external drive, with the whole lot backed up offsite. I was looking for... + Unlimited backup, so I don't need to think +The ability to backup attached drives (NAS, DAS, USB, etc) + To feel that my data is safe with a 2nd layer of encryption You can try it free here: http://bit.ly/1bRNax1 My blog post about this: http://www.bentristem.com/1/po... Enjoy!
Ben Tristem I'd love to know more about you in this short survey... http://bit.ly/1oM7Fvl
No one will ever see this anonymous post but a cheap robot changer (used) on ebay can be had from between a few hundred to a few thousand dollars. Most of us are geeks and love technology. I use two such devices, couldn't imagine life without them. LTO4 is still the sweet spot in storage cost (media) and capacity. The tapes hold 800GB and can be purchased for around $22 dollars each.
1- if you need to backup 20 TB today, you need to budget for 40TB in the medium term.
2- a backup is off-line, off-site, tested, and multiple. The "multiple" part is pricey, and the other 3 you can get cheapest with a PC filled with HDs. Or two (I'm making do with one). $200 for the BC, $150 per 4TB HD x 5 = $950. Hide that backup in a place safe from theft, floods, fire...
The Cloud - because you don't care if your apps and data are up in the air.
There are many > 1TB tape back up systems, many with very high speeds, assuming you can feed it data fast enough.
I have to wonder though.. 20TB for a single person? I'm not gonna do the math but that sounds like so much stuff to be impossible to listen/watch all of it.
But at least he has proven once again, RAID is not a backup. RAID will merrily do what ever you wish, including copying drive corruption.
Just assuming that your friend had a fully legal collection, I would think that all he needs to do is ask the media companies for a new copy. Because the media industry tells us that we do not buy music, we buy licenses, right?? So even if we lose the bits-and-bytes which are easy to replace, then we still hold a license and the media companies should facilitate that your friend can exercise his licensed rights..
[/sarcasm]
To Terminate, or not to Terminate, that's the question - SCSIROB
If IBM punch cards were used, 1 GB equals approximately 47 cubic yards (assuming 80 bytes per 187x86x0.18mm per card) and about 70,000 lbs (at 2.42 g per card), so one standard railroad boxcar (limited by both cubic capacity and weight) could hold about 3 GB. 20 TB would need over 6000 boxcars of punch cards; at 60 feet per boxcar, that's a freight train about 70 miles long.
What does it mean that he didn't have "a good way to backup that much data, so he never took one"?
The concepts behind backing up data have not changed. You need to manage the size of your data to redundantly fit into the storage of your system. So either pony up the cash and time to properly store your files, stop collecting TBs of crap, or stop complaining about losing it when your system crashes.
It's frustrating to see people continuously complaining about how they have too much data to back up cheaply and conveniently. It's even more frustrating to see them complaining about losing all of their data because they didn't back it up properly.
I think that the main issue is that most people do not realistically or conservatively plan their actual storage capability. For example, it seems like 90% computer users believe that having 4 TB of hard drive space means that they can safely store 4 TB of data.
After a conversation about scratch space, redundant drives, and timestamped backups, they then will grudgingly agree to allocate 25% of their available storage to RAID/Backup space, which obviously does not get the job done! Very few are willing to accept using 66% of their available hard drive space for RAID and Backups, which is really the minimum metric for any sort of storage longevity.
20 TB is an awkward amount of data for a non-corporate individual to be storing. It's more data than most people actually need for their media and it is getting into a very expensive price range to backup for basic music/movie content. (By expensive, I mean that it would be cheaper to just re-purchase the media rather than back it up.)
To /.ers saying that 1TB+ tapes would be a good idea to do this backup, please:
Add some references and price of such hardware and media that would suit best home usage.
You can get an LTO4 SAS drive for ~$50 on ebay, they do 800GB native per tape, so typically ~1.2TB per tape for mixed content (obviously if it's all compressed media it will be much closer to native). 10-20 tapes doesn't seem that bad (we send that many offsite daily). The tapes will cost you ~$20 each unless you're willing to go used (ewww).
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Do you really need to back-up that much data?
I'm just speaking generally here, there are certainly cases where someone would need to back up this much data, but for your home media library? If we're talking movies, 20 TB is roughly 20,000 movies (for sake of argument, I'm not considering music). At what point is this just digital hoarding? I used to keep a large collection of movies, mostly pirated, and eventually realized that:
a) I was spending more time and money managing the collection then I wanted to. b) That I rarely watched many of the items in my library. c) That I was placing myself in legal jeopardy by storing so many illegal copies. d) Anything I did want to re-watch I could get from Netflix, the public library, or download.
Music would be slightly different, as I could see where music is in some kind of constant rotation, but again, how much of it are you actively using? I'm just playing devil's advocate here, but I think this kind of collecting/hoarding is a byproduct of pre-internet scarcity.
Even with bandwidth, there are caps and fees here in the US. Try moving 1TB of data via LTE, and the telco will likely hand the person a five digit bill next month. Do it on some cable company plans, and you will be greeted by a $300 bill. So, large data via the Internet isn't going to happen.
There are a number of solutions for this problem:
1: One of the better ones is a server with decent backup software and a LTO tape drive. Then eight tapes will save the 20 TB. Expensive, but the job is done right.
2: One can always have a 20TB RAID, then plug in removable HDDs and use them with WinRAR or another utility as volumes. I'd buy 5-6 4TB removable drives, use WinRAR to make segments, and have at least one recovery volume so that data can be recovered if a HDD fails.
3: One can always buy a an external RAID enclosure, add drives, and use that as a large volume. Then use multiple enclosures that were swapped around as volumes (with an offsite rotation), so a failure or loss of everything at the site wouldn't mean everything is gone.
4: Buy a RDX drive and media, and use 2TB disks. The drive costs $600, the 2TB disk cartridges around $360. However, this just needs a USB connection, no fast box, no SAS interface required.
If I wanted to do the job "right", I'd buy a dedicated server with its own RAID array, use a decent backup utility, and dump the data to multiple sets of tapes, one set being stored offsite. If the server and where it sits gets destroyed, it can be re-bought. LTO-4 and newer have built in AES-256 tape encryption so just set a long passphrase that you can remember and call it done.
The benefits of tape are:
Data will probably last 30 years (I have read 30 YO tapes myself) HD interfaces go out of fashion every few years.
You can have a pool of tapes, and recycle them when you no longer need the data.
Tapes will survive serious abuse. (A lot more than HDs anyway) definitely included the back of a station wagon (except in tropical climates).
You can use Amanda, Bacula or tar for free. (I recommend tar if you want to keep the data for 30 years).
Sent from my ASR33 using ASCII
You're dating yourself. LTO-5 is 1.5TB native, 3TB compressed at $25 per tape. LTO-6 is 2.5TB native and 6.25TB compressed. Both of those compressed numbers are using the built-in compression in the drive.
A 10-pack of LTO-5 tapes is about $250.
You can easily encrypt the tapes and tape them offsite. You can keep a copy onsite and offsite. You're simply not doing that with disk.
Your speed is also off - an LTO-5 can write at 280MB/sec. The limiting factor is not the write time on the media but the read time from disk.
Restore times are typically limited by the write rate on the destination raidset, not the read rate from tape.
LTO-6 can hold 2.5TB per tape, a tape cost ~$70, the drives cost $2000. That's still more expensive then just more HDDs for 20TB, but at >50TB it might be worth it.
Now this joke has really come full circle.
I really did get a kick out of some of these responses. I sell data protection products for a living and 20TB is what I would consider an average small/medium customer. Every business these days has tens of terabytes of data. Of course they all need to backup their data, so there is nothing novel here. We have plenty of customers backing up hundreds of petabytes of data. Every dataset just needs a plan for backup, pretty simple.
The way I see it, this guy has a few options. One option is to just get more disk and make redundant a redundant copy. This would have have saved him in this case of the mistakenly erased raid, depending on how smart his sync script is. But a redundant copy is not a valid genuine backup plan. So many types of failures will show the holes of the dumb redundant copy.
The other option for a home user who's not looking to spend a bunch of money, is LTO6. They hold a sufficiently large amount of data, so only a handful of tapes will be needed. LTO6 drives are cheap enough, they won't break the bank. Since the data is on tape, you can shuttle the tapes to an off site location. Seems pretty simple.
I'm backing up my 40TB music library on Jacquard loom punch cards.
Added bonus: You can use the punched cards to make fabric. ...as a sweater!
Right now I'm wearing Justin Bieber's "Love Me"
https://web.duke.edu/isis/gess...
http://www.flexraid.com/
http://lime-technology.com/ (UnRaid)
Best solution for big media collections.
All data is stored seperatly on each drive, and 1 separate parity drive can protect up to 21 drives (as long as its as big or bigger than any 1 of those 21 drives).