How Do You Backup 20TB of Data?
Sean0michael writes "Recently I had a friend lose their entire electronic collection of music and movies by erasing a RAID array on their home server. He had 20TB of data on his rack at home that had survived a dozen hard drive failures over the years. But he didn't have a good way to backup that much data, so he never took one. Now he wishes he had.
Asking around among our tech-savvy friends though, no one has a good answer to the question, 'how would you backup 20TB of data?'. It's not like you could just plug in an external drive, and using any cloud service would be terribly expensive. Blu-Ray discs can hold a lot of data, but that's a lot of time (and money) spent burning discs that you likely will never need. Tape drives are another possibility, but are they right for this kind of problem? I don' t know. There might be something else out there, but I still have no feasible solution.
So I ask fellow slashdotters: for a home user, how do you backup 20TB of Data?" Even Amazon Glacier is pretty pricey for that much data.
Asking around among our tech-savvy friends though, no one has a good answer to the question, 'how would you backup 20TB of data?'. It's not like you could just plug in an external drive, and using any cloud service would be terribly expensive. Blu-Ray discs can hold a lot of data, but that's a lot of time (and money) spent burning discs that you likely will never need. Tape drives are another possibility, but are they right for this kind of problem? I don' t know. There might be something else out there, but I still have no feasible solution.
So I ask fellow slashdotters: for a home user, how do you backup 20TB of Data?" Even Amazon Glacier is pretty pricey for that much data.
I would say use floppies, but I'm kind of old and out of touch now.
At home, I didn't feel like paying for 2 large arrays to store my data, so if I rip any media, I always rip it to DIVX. 800 MB for a DVD or even bluray rip is a great economy, saves me money on primary storage and also enables me to back it up. I accept the loss of quality as I can always reference the original media if I want.
Another option in the future may be subscription services which have HD content, thus eliminating my need to roll my own. We'll see what happens there.
Crashplan has unlimited storage. I use their home plan; it's unlimited for up to 10 machines. I think I am backing up about 6TB there now.
Use a ZFS pool using a combination of a mirror, a raidz3 & spares. Add new disks as hot spares when money can be allocated. Easy, some what affordable & allows for failure.
I really doubt anyone actually uses 20TB of movies & music. It just sits there.
I have a 16 TB media collection at home that I just back up on more hard drives.
External hard drives in USB cases + Robocopy works great for me.
I don't respond to AC's.
but you need real backup software. As you fill up drives you replace it and continue the backup until you have a full backup. This way you can take them off site. Like any other backup solution, make sure you test the drives every few months to make sure your data is not corrupt and have a failed drive.
> It's not like you could just plug in an external drive [...]
Why not? Maybe not one, but 10 or 20 of them.
I have a similar situation; 18.6 TB RAID-Z at home (8 3TB drives) using FreeNAS and with the new update it shows it was initially set up using a non-native block size (I was a bit naive regarding the settings when I first set it up) and I'd like to rebuild it but I have no way to backup 14+ TB. Also, I would like to have a backup in case more than one drive dies (1 parity works well but I could still suffer a catastrophic failure). I've looked into tape backup but anything that seems like it'd have enough storage to be practical (1+ TB per tape) seems excessively expensive and the 100GB tapes seems like it'd be unmanageable.
-SaNo
If your ISP doesn't have data caps, look at Backblaze ( http://www.backblaze.com/ ). $5 / month for unlimited storage for one computer. Only available for Mac and Windows, but I'm sure a virtual instance of Windows if you're using a Linux box would work... These are the folks that opensourced their hardware design for their storage "pods." http://blog.backblaze.com/2011...
Why not store it all in 20000 github repositories?
BackBlaze offers unlimited backup storage for home users for around $5/mo - encrypted with asymmetric keys. I've got about 750 GB on there myself, works great. Although they may not *like* you backing up 20 TB of stuff, they should accept it. And, if they don't, you're about back five bucks. Probably worth a try.
"My friend (read I) lost 20TB of pirated content! What should my friend have done different?"
How about, ask yourself, how much of that content were you intending to ever consume again. Yeah, you can most likely delete 95% of it, that's 1TB of content that you might use again.
Hoarders! *lol*
But punched paper tape is slow and makes a lot of noise.
I'll see your senator, and I'll raise you two judges.
I have about 7TB. I built 2 RAID devices, and back one up to the other.
Totally not a "backup" solution but raidz2 to protect the data from many types of failures, and hourly snapshots to protect the data from the operator....
Now if your box catches fire, floods, etc you are in trouble but i agree the problem is not easy to fix.
You either spend a ton of time (and money) writing blurays, expensive tape soloutions, etc.
At the end of the day you might find it is cheaper to just have two boxes with seperate raidz2 pools and sync them.
Heck, you can even use the snapshots to support offline replication where you power up the second box and dump the snapshots across and power it down again.
A quick check at one service which lists such large amounts, you would be looking at almost $20k/year to keep a single offsite copy of that. That is the posted price however, I imagine that is enough that you could shop around and find a deal, but, a deal is still going to be prohibitive for most people.
At 20 TB I would start thinking about one of two things: Tape, and/or git-annex.
Unless prices have changed since I last looked and the scales tipped, tape has the advantage of being cheap. Of course, you will need to test your tapes occasionally and likely want 2 copies just in case, but, at that point you are invested in tape, may as well.
The other possibility is git-annex and lots of drives, but you can mix types. That way you can keep a catalog of your library and information on where it all is, and how many copies of each thing you have.
Of course, any way you slice it, each physical piece of media is something that can fail so you have to occasionally test to ensure redundancy.
"I opened my eyes, and everything went dark again"
Glacier at $20 per month for 20TB is rediculously cheap by today's standards. And at those sizes, you'd want to ship those drives to Amazon instead of uploading. We do this all the time and it's not that hard.
The price of TBs of storage of course will come down without question. But by today's standards $20/month for a medium that won't "bit rot" on you is an amazing deal.
He could have always bought a sufficiently large tape-library from ebay - but I guess the data wasn't worth that much.
That's always the first pair of questions to ask: how much is it worth and how much would it cost to recreate?
If the answer is somewhere between "I don't know" and "Well, it's not that much", then he just should stop hoarding that much stuff.
He could have built a filer with ZFS and sent daily snapshots to a 2nd filer - but that wouldn't have helped him if the house burnt down...
Windows 2000 - from the guys who brought us edlin
If you want to back up 20TB of data, you have to pay for it.
Build another server and rsync hourly.
Figure out the theory of everything.
Then you can always recompute your data from scratch.
If Pandora's box is destined to be opened, *I* want to be the one to open it.
With a second array, or tape backup. The second array is going to be the easiest solution, but tape backup provides you the option of storing the tapes off-site, which is important for any real backup plan. After all, your friend could just as easily wipe out the 2nd array by mistake, or a disaster could wipe out the physical location. LTO-6 tapes are cheap and can hold 2.5-6.5TB of data depending on compression. Tape drives are perfect for backup, so why even ask if it's right?
In my brief search I wasn't able to find a version of 'rm' that accepted a '-a' option.
As you noted, Bluray holds a lot of data, but would take some time. Since its audio/video media, odds are most of it is pretty stagnant. I'd do an initial rsync job to write out to Bluray... then once a month or so repeat the job but now rsync will only get what's changed. Depending on the media type and age, you could also look at dedup'ing it and if the dedup'd copy is significantly smaller than the source you might be able to put that onto say one or two 3-4Tb drives.
I use ZFS on NAS4Free at home and have two 48TB arrays, the second array is at a neighbors house, I am using mikrotik SXT PTP links in trade for him keeping my secondary server at his house, he gets internet and access to the movie storage/backup array. With ZFS I am not worried about a RAID failure as I just had a controller card fail and kill two drives on each of my pool. I didn't have any problems rebuilding the array and had I, I would have just pulled from the backup server. Also, with ZFS you get RaidZ-2 along with snapshots, which has been very handy at client locations to be able to save deleted documents(more than once from a disgruntled employee) also all of our machines backup to a backup area the kids have more than once gotten malware and restoring from a snapshot is easy!
Cheers!
Ironic since from your description it would appear the RAID architecture served sufficiently well here (as it should have). It would appear you are seeking a solution to operator error, not equipment failure or other acts of God. Good luck.
You could always just call up the NSA and ask them to restore the data. Odds are good they have a copy of it...
Same as always.
One byte at a time.
Whenever you buy storage, you should buy the necessary backup capacity at the same time. You should never buy storage without buying backup capacity. Budget for it right from the start. If you can't afford the backup, you can't afford the storage. This may mean getting half as much storage as you'd like, but that's just the way it has to be. You probably wouldn't buy a car without an engine. It wouldn't do its job. So don't buy storage without backup. If you do, you have a storage system that can't do its job.
AC, please point us mere mortals in the direction in which me may find these DVDs with a storage capacity of 4TB...
I agree, I've been using Crashplan for three years and the unlimited space it's really great BUT... ...I'm not sure about the bandwidth they provide: how long it will take to upload 20 TB?
Anyway, I don't see what's the problem in using external drives for backup. Here in my lab I've realized that the best way to backup X Terabytes is to have another storage with X Terabytes...
From my own (painful) experience: if you don't plan for it up front, you are always fighting fires (playing catch-up). Organizing your data can help a LOT! If it is media, arrange it by genre (e.g. video animation or video classical or whatever) to keep a particular grouping small enough to backup easily. If it is data, arrange by some category that works for you (e.g. current financial projects or past analytic projects).
The most useful guide I have found for resources allocated to backup: how much is it worth to me to re-create this resource? ("Worth" can be money, time, sentiment, or any other measure(s) or combination you chose.)
My current feelings: disk is the most versatile and cost effective.
Obviously you get 5,000 women pregnant, and ask each one of them to backup just one DVD-R!
It's not like you could just plug in an external drive, and using any cloud service would be terribly expensive. Blu-Ray discs can hold a lot of data, but that's a lot of time (and money) spent burning discs that you likely will never need. Tape drives are another possibility, but are they right for this kind of problem? I don' t know. There might be something else out there, but I still have no feasible solution.
Lets start from the top: You *can* plug in an external drive, it's called a complete hardware duplicate of your array (or perhaps for space/cost consideration, a single disk based copy held offline and synced regularly). Not hard and not terribly expensive (i would go with this solution personally). Cloud? Yep the bandwidth and storage even on something like Amazon Glacier would be prohibitive to all but the most financially independent geeks. Bluray doesnt hold enough (even at 50gb/disc you need 400 of them, groan). So, tapes? You bet your ass tapes are designed to do exactly this task, why do you think they are still in use? You can get individual tapes at 1/1.5TB, but for a one man operation they are probably going to cost you more than the first solution (offline spinning disks) and they are a pain to manage properly.
Now what is this doing on ask slashdot? A pencil, some scratch paper, and 15 minutes between amazon.com and newegg.com would tell you the prices of every solution. Oh, right, they need a chance to tee up some targeted ads for Carbonite, Mozy, Crashplan, etc.
If you took all the punch tape ever produced, I'm not sure you'd have 20Tb worth of storage... I wonder how many times THAT would go around the earth?
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
How about backing up only the crown jewels of the collection?
Make a directory like /entertainment/premium and put the best stuff there, with a 4 TB limit. Rotate two external 4 TB HDDs and copy the stuff over periodically. Put a little sticker or some other mark on the newest, so you remember which one it is. If your main RAID array fails, build a new one, and restore the premium stuff from the most recent one of the two external disks.
What, noone mentioned duplicity yet?
http://duplicity.nongnu.org/
Why, yes! I AM new here.
These "unlimited" claims always turn out to be lies. When will we learn?
My friend paid for an "unlimited" account from JustCloud for backup. He stored 1.8 TB on it and then they "fair use"'d his ass and canceled his account. They didn't even give him a refund for the rest of the money he prepaid.
Simply compress and encrypt your backup data, then post it on a torrent tracker as "New Julian Assange insurance file, decryption key to be released if extradited". Thousands of other people will make backup copies for you.
I use Glacier and its great. 20 TB is about $200 a month which to me does not seem like all that much money for backing up that much data. The biggest problem from a home users perspective is getting all of that data to Amazon. Hopefully he lives somewhere where fiber is available to his house.
md prnt dwn
Connect a raspberry pi and configure it as a backup server and let it copy all to /dev/null... ...
Then put aside the money you would have invested in a "better" solution, put it in a safe bank (under your mattress)
and wait until you need to restore something..
Most probably you'll enjoy the money more
-=Skip
When you have that much data your only real viable option is to have a second storage array just as large. RAID 5/6 isn't backup. It provides fault tolerance which means you can still access (read/write) the array as you normally would if a disk fails.
Its almost a no-duh solution in lieu of tape or other cumbersome removeable storage options. Even backing up to 50GB or 100GB blu-ray discs would be rather pointless as the cost of a single disc is the cost of buying a movie on blu-ray. Even if you could fit 4+ movies on a 100GB BD-ROM is it worth the hassle and cost?
Why is another array better? Its quite simple. You dont have to shuffle discs or tapes to make backup sets. You also aren't stuck with a format that could become obsolete. That LTO tape drive might look good but what if it fails? Can you find another for a reasonable cost? Will you periodically test your backup tapes or disks for bit-rot? A tape or BD-ROM rotting in a safe deposit box, safe or shoe box under your bed is useless. If your disk fails you can replace them quite easily as most every disk supports SATA or SAS and controllers are found on every motherboard. A failure of a disk will be reported and you can handle it.
Here is another question that is kind of burning in my head. If your friend legally purchased all of his movies and music, wouldn't he have the original sources? If not, and i'm not judging I have a collection of both music and movies that I pilfered over the years, you have to be smart and make copies. My collection is just about 2TB, mostly some hard to come by movies and TV shows (entire MST3K library). My home server used to be on 24/7 but it was a waste of power. I almsot lost that array of 5 500GB disks until I made copies to 1TB drives. I now have a few 2TB drives that have copies of the server data on them. One drive is even at my place of work in a USB box. Even if my home burns down I still have a copy somewhere.
15 million of them.
Troll is not a replacement for I disagree.
Tape backups are the cheapest way to go as far as media and surprisingly is making a comeback due to high storage requirements. It can be expensive as far as hardware and software depending on what you buy. We backup about the same amount of data in our production environment for offsite storage. Latest tapes can hold 4 TB per tape.
Catalogue the contents and when you lose it all you can spend 10 minutes searching for the 2% of the content you really want to download again and feel good that you now have 98% of your storage space back to start filling with more crap :D
That's why it is supposed to be used with caution, as no 'rm' supports it. ;)
I did 'rm -rf /' once years back. It did not delete the entire filesystem. It stopped when it erased '/lib/libc.so'.
lot of time (and money) spent burning discs that you likely will never need
If you have any data, over a long enough period of time you WILL need a backup. Saying "I will likely never need this backup" is a non-sensical statement, because (a) you probably will, and (b) the cost of NOT having the backup is essentially infinite in pain and grief.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Even if 100% of that 20TB is legally owned content, recovery is a huge process: re-ripping hard media is still awfully slow -- if you can even find where you stashed it (I think a few CD's have walked off the reservation)
Purchased digital media is no better: you've got many sources to find it from, and it may disappear: preview tracks, live tracks, etc. may disappear when they stop updating their MySpace, or a local distributor goes belly-up. That's also assuming you're still using the same providers: if you had download privs on some of the music servers of the 2000's, you'd have 'ownership' of that media, but you may not be able to get it again if you aren't still paying for the account.
The most economical and reliable is probably a mirror RAID array. It sounds like this guy accidentally issued a command to erase the content, rather than a RAID failure. Ordinarily, the RAID should be good for most stupidities, but this falls a little outside that. The question is, if you have mirroring software, how frequently does it try to match, and would it clean off the mirror too?
Design for Use, not Construction!
20TB of music and movies? How many of those could be downloaded again tomorrow? My guess is "most." The only thing that's really "mine" on my computers, and not backed up, is my own pictures. I upload those to image sharing sites on the internet. Most docs are done on Google Docs for portability reasons, and other things I've created are already on Dropbox.
I experienced a catastrophic hard drive failure a year or so ago. After replacing the hard drive, and about one day of downloading and installing the programs I needed, I was up and running again. It took 24 hours to download enough of the series that I was watching to pick up where I left off again. And if I ever get a hankering for watching something I've seen before, well, I can get it from the internet again in a matter of hours or days.
It may look like I'm doing nothing, but I'm actively waiting for my problems to go away.
--Scott Adams
I was going down the route of buying an expensive RAID NAS / DAS, but then I remembered when I got broken into in the Canary Islands and the thieves took both of my backup drives from two separate rooms. I'm now settled on a simple external drive, with the whole lot backed up offsite. I was looking for... + Unlimited backup, so I don't need to think +The ability to backup attached drives (NAS, DAS, USB, etc) + To feel that my data is safe with a 2nd layer of encryption You can try it free here: http://bit.ly/1bRNax1 My blog post about this: http://www.bentristem.com/1/po... Enjoy!
Ben Tristem I'd love to know more about you in this short survey... http://bit.ly/1oM7Fvl
No one will ever see this anonymous post but a cheap robot changer (used) on ebay can be had from between a few hundred to a few thousand dollars. Most of us are geeks and love technology. I use two such devices, couldn't imagine life without them. LTO4 is still the sweet spot in storage cost (media) and capacity. The tapes hold 800GB and can be purchased for around $22 dollars each.
1- if you need to backup 20 TB today, you need to budget for 40TB in the medium term.
2- a backup is off-line, off-site, tested, and multiple. The "multiple" part is pricey, and the other 3 you can get cheapest with a PC filled with HDs. Or two (I'm making do with one). $200 for the BC, $150 per 4TB HD x 5 = $950. Hide that backup in a place safe from theft, floods, fire...
The Cloud - because you don't care if your apps and data are up in the air.
Tell him to stop hording. Streaming is good!
The only irreplaceable data that I posses is digital home videos and digital photos of the family/kids. I have a lot, but it's super easy to back up to an external HDD that I keep locked up at work. To protect against HDD failure between those monthly off-site backups, I replicate the data partition on our main PC to a second PC in my house using a scheduled Sync Toy job.
Cheap, very effective.
This was a hard lesson for me to learn, but a worthwhile one. Things that are unique - stuff I've created, programs, stories, resume, art I've scanned, pictures - those things needs to be backed up. (I use DropBox for my text based stuff and have shifted pictures through media over the years Floppies to a ZIP disk to a CD-R to a DVD-R -- next stop will be a Blue-Ray-R one of these years, most likely.)
13 GB from a pirated copy of a TV series does not need to be backed up. Odds are you can either watch it again on-demand from a streaming site, or purchase a legit copy of the series on DVD for $20 if Netflix or Hulu have failed you.
Occasionally living proof of the Ballmer peak.
First "LOL", sorry had to get that out of the way.
1st choice for that much and to be reliable use tape, 2nd choice multiple hard drives and by that I mean multiple backups not just one, 3rd choice BlueRay disk, after that backingup starts using bandwidth, Amazon Glacier is cheap to upload and store, they get you when you want it back.
Or you could sign up for Mega at 50 GB a pop.
20 Mega accounts = 1 TB, yes go ahead and laugh but I have a terabyte of storage online for free. (use their sync app)
I used a Gmail account to sign up, so lets say your Gmail is "turtle@gmail.com" well did you know you can use "turtle+01@gmail.com and it will go to "turtle@gmail.com"?
Yeah it will, so my Mega sign up scheme is turtle+01@, turtle+02@, etc, and I can manage it all from "turtle@gmail.com"
Finally I would like to offer an apology to "turtle@gmail.com", nothing personal it just popped into my head.
"If any question why we died, Tell them because our fathers lied."
20TB of backup drives.
Not that hard, although if it was really important I suggest 40TB of backup drives and 2 full backups. I'm a fan of tape for large capacity. SDLT600 would be his best solution with a small 10 tape carousel. do a full backup monthly and then incrementals every week.
20TB means he has to spend money. If someone freaks out at the cost of a real backup solution, then the data was worth less than the backup.
SO manually back up to 20 2TB hard drives on a esata interface, or automated Tape system. either way it is not going to be cheap.
Do not look at laser with remaining good eye.
It stopped when it erased '/lib/libc.so'.
Why? I can use rm to remove /lib/libc.so. The problems do not start until after rm is finished...
Finally! A year of moderation! Ready for 2019?
There are many > 1TB tape back up systems, many with very high speeds, assuming you can feed it data fast enough.
I have to wonder though.. 20TB for a single person? I'm not gonna do the math but that sounds like so much stuff to be impossible to listen/watch all of it.
But at least he has proven once again, RAID is not a backup. RAID will merrily do what ever you wish, including copying drive corruption.
Most data that people have on their hard drives can be readily re-obtained via BitTorrent or in other ways. The simple and probably best strategy is to figure out the 500 GB or less that is actually irreplaceable, and make several copies of that. I have three or more copies of my most important data.
Or, looking at the problem another way, 4 TB hard drives are selling for $160 right now including shipping. A complete insurance policy would cost $800 plus your time. What I would have done if I just had to save everything would be to simply copy all of the data in 4 TB hunks, and put each hard drive one by one into a fireproof safe, or in a safe-deposit box at the bank. A second RAID would be complete overkill, unless time to recovery is of the essence or the data churn rate is high. More than 90% of my data simply accretes over the years, and I'm sure that is true for most people.
$800 is a small price to pay for your data. I seem to recall that it cost a company I worked for over $1,000 to recover a 9 GB IBM hard drive that failed about 15 years ago.
According to this article, Seagate is promising 20 TB hard drives by 2020:
http://www.computerworld.com/s...
I don't know about you, but I'm pretty sure my ISP would flip out if I tried to transfer even 1TB in a month. Even if they didn't care about the amount of data being backed up, it would still take me around 231 days to upload that much. Any kind of online backup would be infeasible for the initial dataset, but it's also probably not a great option to ship in a box of hard drives.
Let's be honest: any large dataset like that is going to cost some serious coin to backup. You can probably "cheat" by incrementally backing stuff up to Crashplan (with its "unlimited" storage), but it'll take so long to seed that initial dataset that you're likely to experience some kind of data loss before it's done.
There is a difference between "insightful" and "inciteful" other than spelling.
As in "How Do You Back Up 20TB of Data?" "Backup" is a noun. Verb conjugations are "back up", "backs up", "backed up", etc.
It would be relatively easy to backup this on "only" four 4TB disks. They could be in one USB3 enclosure each, or in an outdated PC (pentium 4 or something) that is turned on for backups only, whatever.
A simple mechanism to make them appear as a single ~16TB volume or directory would be nice. Or perhaps optional. Or just use some real backup software.
Maybe the backup will be so painfully long (days?) that a drive failure may be a concern.
On another note, I'd like a very easy and nice to use program that simply back ups the file names etc. ; I can afford easylier to lose music/movies if I have a list of what I actually had, so the good stuff easy-to-find can be found back and reconstituted.
Ive used Crashplan for years at clients, friends, and personally, and its generally been good. They have 2 options that may work here.
The first is their all-you-can eat backup service, but they may well balk when you tell them its 20TB-- they might shove you to a $120/year business plan.
The other is buying a pack of Crashplan ProE licenses, which let you host your own cloud backup service. You can use any PC as the "server" (just make sure its reliable and on 24/7) and it handles diffs like a champ. It also verifies backups to avoid bit rot.
While I agree that the likely use case here was pirated movies or porn, there are several very legitimate use cases such as a small / home business doing video production (someone I know has a contract to do several local school sports games as an example).
Just assuming that your friend had a fully legal collection, I would think that all he needs to do is ask the media companies for a new copy. Because the media industry tells us that we do not buy music, we buy licenses, right?? So even if we lose the bits-and-bytes which are easy to replace, then we still hold a license and the media companies should facilitate that your friend can exercise his licensed rights..
[/sarcasm]
To Terminate, or not to Terminate, that's the question - SCSIROB
he didn't have a good way to backup that much data
But he did. Another RAID array of the same size would have sufficed. Oh, now I see what you mean. He didn't want to spend the money on a good way to backup that much data.
Another issue entirely :-)
8 of 13 people found this answer helpful. Did you?
Having a similar setup myself, and having looked into exactly your question, you have exactly one realistic answer:
You back it up to an identical (or larger) disk array.
If possible (though not necessary), you'll want to do the initial backup with both arrays directly connected to the same host; but after that, just rsync --link-dest (to make hardlinked differential snapshots) them on a nightly basis.
For a media server, where the typical use case consists of adding large files slowly over time and little ever changes, your backup shouldn't take up much more room than the primary storage.
Seriously, who needs 20Tb of data at home? This is like a digital version of "Hoarders" or something. Time to clean house and organize.
First, it's time to TAKE OUT THE TRASH. I'll bet the large majority of this data is stuff you never use, don't know you have or is simply out of date and unnecessary. Toss it.
Second, De-Duplicate what's left as best you can. No need to have multiple copies of the same pictures at different resolutions, or the same video encoded multiple ways in your backups. Keep the best resolution stuff in your backup, forget the rest. Don't backup anything you can re-rip from the original media (i.e. that DVD collection, Oh, don't have the DVD's anymore? Turn yourself into the MPAA...)
Third, Compress what's left.
If you find that 20Tb is what you need to keep, then stop asking Slashdot for advice and go buy yourself a professional tape drive and some brand new tapes and start doing backups like a professional. If this is too expensive, start over at step 1 and really take out the trash this time.
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
On the bright side, my math indicates that 8e13 paper chads would take up about 20 thousand cubic meters of space. That would probably be enough free fuel to heat your home for a lifetime.
If IBM punch cards were used, 1 GB equals approximately 47 cubic yards (assuming 80 bytes per 187x86x0.18mm per card) and about 70,000 lbs (at 2.42 g per card), so one standard railroad boxcar (limited by both cubic capacity and weight) could hold about 3 GB. 20 TB would need over 6000 boxcars of punch cards; at 60 feet per boxcar, that's a freight train about 70 miles long.
70 TB tape backup; http://www.zdnet.com/blog/stor...
>>"ad space available -- low rates!!!"
Take the storage media he has, and then duplicate it.
Really, was that hard?
The Kruger Dunning explains most post on
This is Slashdot, until someone makes a Beowulf cluster of punch-card processing machines, we can't call ourselves geeks!
And, someone needs to compute how many punchcards it would take to back up Google. Oh yeah, I'll just google that;
Let's assume Google has a storage capacity of 15 exabytes, or 15,000,000,000,000,000,000 bytes. A punch card can hold about 80 characters, and a box of cards holds 2000 cards. 15 exabytes of punch cards would be enough to cover my home region, New England, to a depth of about 4.5 kilometers. That's three times deeper than the ice sheets that covered the region during the last advance of the glaciers.
http://gizmodo.com/if-data-was...
I think it's important to know that Google's data would be three times thicker than the glaciers during the ice age. It's strangely comforting.
>>"ad space available -- low rates!!!"
20 TB is an awful lot of data for backing up over the net.
What I do is backing up over the net to my brother's NAS. (He lives in another country.) I use rsync and it works like a charm. It is a bit of a bother when I have been taking a lot of pictures but as it works in the background and is traffic shaped with low priority, it is manageable. I've got a fairly slow 1Mbps/6Mbps connection, so it takes some time. 20 TB would take the better part of a year, but since I do it incrementally as I get the data, it has been manageable so far. The Raspberry Pi server at my brother's replicates it to a friend's NAS as they both have 10/10 Mbps lines.
I keep a local copy on a Raspberry Pi with a couple of USB drives, just for the fun of it.
Worst case scenario that my house burns down or similar total catastrophe: My brother copies my data to an external disk and sends that by courier to me. Downtime around 24 hours.
And, obviously it is fairly easy to restore individual files over the net.
What does it mean that he didn't have "a good way to backup that much data, so he never took one"?
The concepts behind backing up data have not changed. You need to manage the size of your data to redundantly fit into the storage of your system. So either pony up the cash and time to properly store your files, stop collecting TBs of crap, or stop complaining about losing it when your system crashes.
It's frustrating to see people continuously complaining about how they have too much data to back up cheaply and conveniently. It's even more frustrating to see them complaining about losing all of their data because they didn't back it up properly.
I think that the main issue is that most people do not realistically or conservatively plan their actual storage capability. For example, it seems like 90% computer users believe that having 4 TB of hard drive space means that they can safely store 4 TB of data.
After a conversation about scratch space, redundant drives, and timestamped backups, they then will grudgingly agree to allocate 25% of their available storage to RAID/Backup space, which obviously does not get the job done! Very few are willing to accept using 66% of their available hard drive space for RAID and Backups, which is really the minimum metric for any sort of storage longevity.
20 TB is an awkward amount of data for a non-corporate individual to be storing. It's more data than most people actually need for their media and it is getting into a very expensive price range to backup for basic music/movie content. (By expensive, I mean that it would be cheaper to just re-purchase the media rather than back it up.)
To /.ers saying that 1TB+ tapes would be a good idea to do this backup, please:
Add some references and price of such hardware and media that would suit best home usage.
Why bother? It's rarely-used, practically useless bits anyway. A quote from John Nash: "Facts are available where direct memory fails in many circumstances." In this context, that could mean use spotify and netflix to stream your dumb music and movies, rather than saving them indefinitely.
I think the feds have just busted an international ring of people who like to do that sort of thing.
How do you listen to 20 terabytes of music? You won't repeat a song for at least a year, at a guess.
>>"ad space available -- low rates!!!"
He should at first assess his needs for backing up files. What kind of data for home use could possibly fill 20TB? Does he need to keep a backup of everything? Either way, you could always get a second (third?) NAS server and HDDs and setup automatic backups. A NAS with 2 bays will set you back 100$ or less. Add 2x 4TB for 400$. Better yet, build a custom file server for under 200$ with a nice case with plenty of room for disks. Be sure to choose a motherboard with 4 or more SATA ports and fill it with HDDs. FreeNAS is a great OS with many options. Five 4TB (=20TB) would cost under 1000$, so for 1200$ you have a nice long term solution with great flexibility.
The chaff would get to be a problem too, after the first several dumpsters full.
Seriously, where do you get 20TB from? I mean, if you rip 250 DVDs at home, you've got 1TB. So for 20, you'd have to rip 5000 (or a bit less), without further compression. If you compress them a bit, 20TB would store over 20.000 movies. So, what's in this 20TB and where is it from?
Anyway, simplest solution: 10 $99 2TB disk units and some time.
Part of your choice of solutions will depend on the nature of your data. Is it changing often? At all?
I use Bacula for my backups. My wife has a photography business and her collection of images is about 6TB and is being added to constantly and occasionally edited. The Archive is about 5TB and is stuff that is unlikely to change. Then there's the Working array, which is 1.5TB (max) and generally clocks in around 700GB. This is current work that hasn't been delivered to the client yet (RAW files from recent weddings/portraits, JPEGs where the client is still picking out what they want, PSD files for current album designs, etc). Both are on RAID5 arrays, and the Working array has a hot spare. At the end of each month, the Working folders are gone over for bodies of work that have been delivered to the client and are unlikely to change. This work is then moved to the Archive, backups are burned to DVD and also copied to an external hard drive.
For the Working backups, I have JBOD on another server. I think it's 6 1TB disks. These are set aside for different Bacula pools of volumes. There's two full backup pools, two differential backup pools, and two incremental backup pools.
Every night at 5AM a Bacula job kicks off. On the first sunday of the month, a full backup of the Working array is dumped to the JBOD. Bacuala makes a backup copy of everything (~700 GB). On the other sundays of the month, a differential backup dumps everything that's changed since the previous full back up to another set of volumes on the JBOD (~80GB). On every non-sunday of the month, an incremental backup copies over everything that's changed since the previous incremental backup (or differential or full backup if it's a monday). I have two sets of these pools. On odd months, it uses Full-Pool1, Diff-Pool1, and Inc-Pool1. On even months it uses Full-Pool2, Diff-Pool2, and Inc-Pool2. This way I have two sets of backup copies of everything so I don't have to delete last month's full backup to make this month's full backup.
It works pretty well, and every morning I get an email telling me that all the backups worked fine and the arrays are stable. I know it's a little anal, but well, I couldn't imagine having to tell a bride "Hey we lost your wedding photos. Hard drive crash. Too bad." With the system I've got, unless the house burns to the ground I'm fine. And if the house burns to the ground, I've got bigger problems. I wouldn't mind an off-site solution, but I don't see how I can transfer the several TB of backup data I have at any given time someplace else, except by carrying hard drives out of the house every day, and I don't think that's something I'd be able to stick with for very long.
We don't have a state-run media we have a media-run state.
The earliest tape had 128B/inch, so that would be 3.9e6km, or 100 times the earth's circumference.
Wikipedia however mentions: "As of 2010, the record for highest data on magnetic tape was 29.5GB per square inch." So assuming tape one inch wide, that would be just 17 meters, or 4.3e-7 times the earth's circumference.
First of all, don't run RAID at home for data storage. RAID systems are for corporate high availability. They are inherently dangerous the moment you have to touch the config and only worth it if you need a drive system available 24/7 with hot spare. Truly stable RAID systems are also huge power hogs and heat sources. You can build highly redundant file systems for a fraction of the cost, with a small fraction of the power.
This is easily the 15th time I've heard of someone loosing huge amount of personal data to RAID. The last one I heard was everything for the poor fool, wedding pictures, kids pictures, etc...
Beyond that, I have about 20TB myself. I use DFSR to keep it highly available, then a one way rsync job with no purge. That way if I mess up one of my replicas, it won't get purged from the rsync target. I take an encrypted version of the rsync target to a friends house regularly so there's no chance of massive loss. I also back up limited encrypted data to the cloud, but only documents, code, and pictures.
Don't add complication where you don't need it.
Sure the information density is pretty low, but it lasts forever!
Ken
Choose Cheap, Quick and Correct, but you can only choose two.
Sent from my TARDIS
It likely didn't cost him a dime to build up that collection...
Ken
I thought I was bad with my media consumption. 5TB of visi media, 3TB of audio, 2TB of um... not porn. I really only back up the audio. Do you really expect to rewatch all of the other stuff? If you did how hard would it be to reacquire?
Back up all your data to stone tablets.
You do realize that some people create their own data?
At 300GB per tape for the best, most expensive tapes that would require 69 tapes (and 69 hours) for a full backup. At around $120 per tape, that is over $8000, just in tape. A second RAID with full redundancy would be far cheaper - the 10 4TB hard drives coming in at around $1700. Doing it with 2-drive redundancy would only require 7 drives.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
If I only had mod points today.
My thought was a second RAID Device, however my out dated experience with tapes had a 2gb limit.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
The first step is to classify the data in two groups: what you would not want to lose at any cost, and the redundant data (movies, music, etc) that you could survive without. This is the most important step
The second step is to backup the important data using an external 1 TB drive, tape or similar.
Optionally, the third step is to delete the remaining 19 TB.
After a certain point, you have to go big or get out.
A tape drive able to handle 20 TB is going to be $3k+.
Online backup is out of the question. If it takes two weeks to backup 300 GB to Crashplan or Amazon Glacier, it'll take two and a half years for the 20 TB.
Being a Jottacloud customer for a long time, I really like their backup. Unlimited storage is 6$ per month. You can specify when to back up, and you can exclude subfolders from sync, and you can limit the bandwidth used.
I guess it's not very well known in the US, but it's been for several years in Europe. All servers are located in Norway.
Unlimited is limited to one computer.
Jottacloud.com: Jottacloud.
(I am in no way affiliated to jottacloud)
Do you need to back up all 20 TB? Or is half of it crap you got from usenet/torrents?
I run a 24 TB usable zfs array that I snapshot regularly so I can restore an event like me being a dumbass and doing an rm -rf /Array/.
As far as backups I separate my content into 3 major categories.
original content - this stuff i backup regularly to 2 locations. it contains things like home movies, pictures, documents, etc. I copy to a usb drive and to a cloud backup service (I use crash plan). It's stuff I can not replace and would be devastated if I lost it.
rare content - stuff that's hard to find. I back this up too, but only 1 location. It's mostly static, it consist of things that took a lot of time and effort to find but are probably still replaceable. I back it up to the cloud only.
replaceable content - stuff that's backed up already on the bit torrent network. It's mostly media i just hoard that i download off of usenet. If i lose it it's not a big deal.
Just mail them to get your data back? I tried it and it worked like a charm. The next day the latest copy of my document was in my maibox.
They even had gone through the trouble of correcting a few spelling errors, a misspelled name and a glitch in the layout.
They did censor the part about privacy though.
Try using karate-chopped tape instead. As long as you disable the dramatic shouts it's considerably quieter.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
20 TB is an awkward amount of data for a non-corporate individual to be storing.
4K movies will be out shortly. We will be looking at 50-100GB per movie. Some people want to backup their disks and have them accessible for their HTPC because its significantly more convenient.
And before you suggest, no I do not want to compress my movies to a lesser quality... That's why I got the BluRay in the first place. Because I want high quality. I watch them on a 136" projection screen. I can tell when it's been compressed...
Redundancy is the way you do it.
You have an identical system as the first one, with 30TB of drive space, and every night you copy the data over to the other system. That will cover you for anything short of a house fire/tornado/earthquake/flood.
Or you can download another copy from Torrent (since you're already a thief) whenever you want to watch it.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
4TB drives sell for $165 right now on Newegg. So five of those would cost $825. Your friend could stick them all into a large PC with multiple bays and create an enormous RAID-0 array of 20TB. Then he could use FreeFileSync to copy those files. Or he could set up another NAS with those 5 4TB drives and just do a copy/sync. It'd take days for the initial load, but it would be backed up. The problem with having that much data on CrashPlan (also my cloud backup of choice) is that it would take so long to restore it -- too long, I'd think. You'd blow all sorts of bandwidth limit to do so. And you can't use their restore-to-door plan for backups greater than 3.5TB. Until we have Google Fiber running everywhere, bandwidth just doesn't make it feasible to push all that data to the cloud.
The cost of 2 independent sets five 5GB hard drives is NOT enough to worry about compared to the cost of obtaining the 20GB of data.
You can price the cost of this in a few minutes, period, end of discussion.
If you are half way clued in, take those backup/clones over to two different physical locations.
Don't have that much data. See, it really was that easy.
http://www.backblaze.com/
Unlimited storage $5 a month. You're welcome
Tubby or not tubby. Fat is the question
Multiple 4TB drives. Best you can do.
http://hubic.com/
Solved.
-- /. is now http://soylentnews.org/
My
There are plenty of people who do 1:1 backups of movies and music. It's extremely convenient. I don't handle any physical media more than once. It keeps the house tidy and the disks in pristine shape if I ever need to re-rip.
Around 6 months ago I had a similar problem to the story. My media drive died a sudden death (Seagate drive, never again). I had all of my family pictures, home movies, music, and movies on that drive. I had done backups and stored them remotely and was able to recover most of what I had. A few re-rips of some movies and I was done.
The time investment necessary to rip a 1:1 copy for a large collection is not insignificant. I probably should setup raid + parity at some point but right now I'm only doing a clone of my stuff. I don't have bandwidth capacity at home to use any sort of cloud storage.
A 4 TB slowish seagate hard disk can be had for about $160ish if you look around. Five of them are $750. An inexpensive bod tower such as a TowerRAID 4 Bay eSATA RAID runs about $150. Get two of them.
Total cost is around $1100 and the solution is expandable .
----- In Your Cubicle No One Can Hear You Scream...
A couple of things. First, do you really need to back up all 20 GB of data? How much of that can be recovered by other means? For instance, is it reasonable to back up the OS if you would probably just reinstall anyway? How much of your content did you acquire electronically? Would it be easier to go back to the source?
Thing two: If you really have to back up all 20 GB, the only really practical, cost-effective way to back up that much data is to another set of hard drives. Build up a second array, replicate, and then turn the backup array off. Leave it off except for periodic backups.
For incremental backups, dedicate one removable SATA slot. (I use one of those "hard drive toasters" that plug into a USB slot and allow you to hot-plug a SATA drive.) Plug in a drive on a regular schedule, and copy over the files that have changed recently. Mark it with a sharpie and put it in a safe place.
The idea is to (a) back up only what you couldn't easily recover through other means, (b) back up to the cheapest and fastest per byte, which is currently other hard disks, (c) keep your backup disks turned off when not in use, and (d) Figure out a schedule that suits you. For me, it was replicating the entire array only a couple times a year, supplementing with incremental backups to individual drives every week or so. Yes, you could still lose data, but not nearly as much as if you did nothing. Don't choose a solution so ambitious that you would later tire of it and stop doing it.
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
So, guy spent around 10 x $100 (2TB drives), maybe more since you mentioned redundancy, for a total of ~$1000. Guy kept drives probably up 24-7, spending a lot in the electricity bill, I would say something in the lines of $150/month. Guy also had to manually maintain the complex disk array, prone to failure. Guy failed at it and lost invaluable amounts of (mostly) unrecoverable data (good luck getting that TV show from the 90's that now has 0 seeds on TPB, your familly event pics and videos, or your college papers).
Now tell me, how can ~$250/month be expensive for 20TB in Amazon Glacier? They will give you transparent redundancy (if they lose the data you have reasons to sue for MILLIONS, you know, those numbers with 7 figures instead of 3). They will pay the electricity bill. They will buy the hardware. They will maintain the hardware too, so no need to replace drives. Your ISP is shapping traffic to AG? Sue them or change provider. Last time I checked it was a lot easier than doing ANYTHING on your 4TB+ RAID array, especially since it's for home use and will return you absolutely nothing besides self-complacency.
Just sayin'
In all seriousness most people don't have that much data to backup. But I can see it might be possible, but it isn't going to be necesarrily cheap. Assuming that this data is a positively must KEEP, then using the 3-2-1 rule of backup here is what I would suggest. 1. Need to have a second synced copy. So you are going to have to purchase some kind of NAS or large storage device. You can go your own DIY route (FreeNAS) or BackBlaze storage Pod 3.0, or something like Drobo. Plenty of lower cost options out there. But it will cost some money to do it. 2. Use BackBlaze or CrashPlan for an offsite replication. There are no limits! I use BackBlaze for mine and have about 2 TB backed up there. It took about a week to get it all there because there are upload limitations by your ISP and by them, but it will eventually get it all. For $60 a year, you can't beat it! 3. Writable media (Blue-ray or DVD) is a viable option, as it is cheap but complicates recovery. And it has longevity issues. It should not be thrown out if keeping cost low is a priority. Also if the data is so rarely used, then this would be a better solution than paying for the energy and cost of hard drives. Other considerations: 1. Like any filing system, physical or digital it needs to be checked, purged and arranged on some kind of annual or semi-annual schedule. To get rid of stuff no longer needed, and to make sure you do not have duplicates, and to see if you are going to need more space this year. I simply have an internal 4TB drive that I use to sync data, a second drive for image backups of the computer, then I use backblaze for offsite storage. I know, I have 4 copies, but it makes me feel safe. 2. It seems like priorities haven't been established when it comes to retrieval. At times it appears Cost is your highest priority, then at others convenience. You won't be able to have an extremely convenient cheap solution. You need to decide which is the highest priority, and then the next and then the next.
Services like CrashPlan cost pennies a day and would have backed it all up. If they could afford 20TB of media and the storage to host it, there is no reason they could not afford to back it up.
I think buying 5 x 4TB hard drives would be the best solution but if you have a decent upload speed, there are online back-up solutions for $4-$5/month (usually you have to pay the whole year in advance). I've also seen people back-up on usenet: create your own alt.binaries. sub and upload everything there. Obviously, don't upload the personal files even encrypted since anyone can download them.
Comment removed based on user account deletion
If you have a PEBKAC error torch your array, you can have a PEBKAC error with your backups.
I want to delete my account but Slashdot doesn't allow it.
Hi,
just use backblaze, at 5$/month unlimited storage...
"Failure is not an option, it come bundled with the software"
The cheapest solution where you retain a decent amount of control is basically to replicated what Amazon or whoever would do - create an array of the cheapest high-cap disks you can buy and put the data on it. Your net cost will be about $1000 plus ongoing electricity cost.
Anyone who's charging you less than that (with the $1000 amortized over 3-5 years) is running at a loss and likely won't be around when you want your data back.
Either buy double the storage and periodically do a differential backup or use a cloud service. A Google search for 'unlimited cloud backup' yields tons of results.
If he has 20TB of music and movies, why even back it up at all? The majority of that content is available on BitTorrent. The idea of backup is that you only backup unique data that can't be replaced.
So my first thought is that 20TB is excessive. But if he and you are certain that the 20TB is all necessary then it is going to be expensive(ish). Buy a computer with a Perc controller and a used DAS/MD1200 from some supplier. I just bought one with 30TB of storage for 3K, with less than a few months of use on it. Take that and set up Syncback Pro on it to monitor for changes and set it to back up the new files/changes into the DAS backup folders.
20 TB is no small amount of data to accumulate. If it is precious and valuable and needs backing up then your friend needs to be prepared to accept the costs associated with protecting such a large quantity of data. If he balks at it, then ask him if the roughly 3.5 k would replace the lost files. People who have serious photography/lightroom habits are in a similar position. I spent about 40 hours trying to rescue and restructure the un-maintained mess of someone who couldn't be bothered to understand file folders and naming methods. When their primary drive failed, it was some effort to piece it all together from recoverable portions of their drive and files located across many folders on many different drives. Lightroom confused him more than manual placement would have.
A compact 16TB cube:
Total: $759.95
Then for the last 4TB, throw on a $149.99 Seagate Backup Plus 4TB USB 3.0 3.5" Desktop Hard Drive STCA4000100 Black
I have a similar set up. Between music, movies and photos I'm close to the 15TB range. I'm selective as to what I back up however. :P ).
I don't back up commercial movies or music. I have the CDs/DVDs/Blurays that I ripped. If something were to happen to the NASes that's holding that media I can always re-rip. For movies/tv shows, I find myself only watching them once or twice, so if something were to happen I probably wouldn't be re-ripping most of my collection. What would probably need to be re-ripped right away would be the Barney/Dora/Thomas DVDs for the kids. For music it's fairly quick to rip (and even faster to download
The only things I back up are home movies and photos. For home movies I backup the uncompressed files, but for photos I don't back up my RAW files, only the jpegs. Those are backed up to external hard drives that I keep either at my desk at work or at my parents' place. If by some weird coincidence I would lose those as well, a great deal of my home movies were uploaded to Youtube (private) and selected important pictures to Flickr.
With that much data, what it comes down to for me is what I absolutely do not want to lose or can't afford to lose.
It's better to burn out than to fade away
I have close to 10TB of data on my home server and the most cost effective solution I could come up with was to build two servers the 2nd one mirrors the data of the primary using DFS (Microsoft Distributed File System). Neither server uses RAID for redundancy, just Spanned disks. This makes it more cost effective because I'm not wasting a drive on each server for parity. If I lose a drive I can replace it and restore the data from the mirrored copy on the other server. I probably have a more elaborate hardware setup than most because I tend to do a lot of testing using Hyper-V and VMware. The backup server is a small and efficient Mini-itx system in a Chenbro SR30169 compact server case with 4 hot-swap bays. Even with 16GB of RAM, Core i5 4570S CPU, 120 SSD, and 3 SATA drives it only draws 40 watts from the wall. The primary server is more powerful ATX system with an 8 core Xeon CPU, 64GB RAM, SSD, 3 SATA, but still only draws 60 watts at idle using a gold rated power supply helps.
Easy, make a torrent and name it as porn. Soon enought you will get some hundred of seed, an will be a distributed backup easy to download.
Tape has come a long way since that. With lto5 (last gen current is lto6) you can do some where between 1.5tb to 3tb compressed at about 23 dollars a tape. For the cost of between 2 to 3 of the (cheapest) 4tb harddrives you could have a (questionable ebay) writer and the tapes need for 20tb of data.
All joking aside... Seriously, get 20 tb of additional storage. I don't have 20 tb of data but I do have 3 tb. I did an initial "copy" at home, it took a while. Then I took that "copy" to work. I now use robocopy to copy the differential daily over vpn. Most of the time the job takes a min or less, if there are new items to copy, it takes whatever time is needed to copy over the new data. I am able to copy over 3 gigs in about 20 or so min. If your 20 tb is changing daily, yea, you're screwed. But if you only have a couple or few gigs a day that change, this will work. Granted, 20 tb of drives off site may be an issue. I just have one 3 tb drive and before I had that, I had 3 1 tb drives. Also, my upload speed at home is between 5 and 9 mbps, that's the bottleneck
The problem with a "solution" here is there's no way to know how the data is organized.
I'd say any relatively hack-free solution will involve a commercial backup application and a storage array of sufficient size to handle at least one full backup and some chain of incrementals.
Ideally the backup array would be of sufficient size and disk count that you could gain some small protection by creating independent disk groups each capable of each holding an independent file system for a full plus backup chains. I say this having supported large backup arrays where monolithic file systems were created only to corrupt, causing the entire backup to be useless. It doesn't protect against failures caused by faulty array controllers or enclosure failure, but nothing does but multiple complete arrays.
Decent commercial backup software will make the job simpler with compression, deduplication, intelligent incremental management, cataloging, etc.
CDW says $9,000 will get you a Netgear ReadyNAS with 12x4TB disk. In RAID-10, you'd have 24TB to work with. Combined with decent backup software this would result in a fairly painless way to backup that much data and manage it.
If you had nothing but time on your hands, you could roll your own solution with rsync, de-duped ZFS, etc but the hardware piece is still not cheap and rolling your own is nearly as expensive with a lot more headache.
https://www.youtube.com/watch?... It's super easy, I will gladly send you the schematics.
Liberty - Security - Laziness - Pick any two.
I bought a Syba 5.25-Inch Dual Bay Mobile Rack for both 2.5-Inch and 3.25-Inch SATA HDD Plus 2 USB 3.0 Ports SY-MRA55006 for my latest desktop build. You could then buy 7 or 8 3TB drives, back things up, then store them someplace. After the first full, you could take incremental backups for a while. You would have to refresh it every so often but my thought is that the backup should be good for at least a year. Just make sure that the drives aren't stored next to the microwave...
Of course, the enterprise solution would be to buy a SAN or NAS, fill it with storage, and use data duplication software.....
There are two decent approaches: backup or mirror your setup offsite OR archive the previous generation intact and do incrementals starting from that point. I'm assuming that a home user isn't going to be picking up a $2000+ LTO-6 tape drive and swapping in 8+ $65 tapes for each full backup.
The first is to have your own offsite storage that you back up to, where the backup is (at least) as large as the original. Multiple people have recommended Crashplan, and that's certainly a viable option. There are undoubtedly other options that could do similar things depending on how down into the weeds you want to get - rsync, the various rsync-based versioning backup solutions, git-annex as mentioned by someone else though that one's new to me. I'll note that from experience with Crashplan's Enterprise product on some older 32-bit servers, the client software can chew some fairly significant memory when you have a lot of files or data.
The other and probably simpler option is that when you start to near capacity on the storage system, don't upgrade it - shut it down and store it, preferably not in the same (not-yet-burning) building after building the new system and copying the data over to it. After you shut the old one down, keep backups of anything you've changed since that "checkpoint" system; hopefully your data isn't changing that rapidly - 20 TB seems to me almost guaranteed to be mostly static.
fencepost
just a little off
I would definitely say external drives for the irreplaceable data (photos, home video, scanned images, voice clips, documents, etc.). The rest is already *cough torrents cough* backed up for you. Yes, it would take a while to rebuild, but ultimately it's available.
I would also perhaps back up any older or hard-to-find collections to the hard drive, or any particularly cherished movies (kids movie collection, perhaps). Personally, I back up everything to three 4TB external drives because I have the ports available on my server, but if you don't then back up what's important and don't worry about the rest...
Your only other option, really, is to get a 6-bay NAS and some hard drives to fill it. This setup would run you around $2,000, but then you'd be able to back up all of teh things...until your data grows beyond 20 TB (assuming you'd put the NAS into Raid 5 at least :)
"I love animals! Some are cute, others are tasty, what's not to like?" - Betsy Schroeder, Jeopardy contestant
And then his house burns down.
Salut,
Jacques
I was addressing the comment to use a DLT drive. Those only hold 600GB compressed.
On niggle with your response, though... you won't get any substantial compression with music or video. You will likely need double the number of tapes that you think you will.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
I just did this with a friend and we don't have 20tb but we both have about 8 so 16 total. rsync or brfs,zfs send snapshot. If you have the space keeping multiple snapshots should protect you from accidental delete. Do the first back up on site then set a cron job weekly or daily. This is easy to set up on any low power cheap Linux box with a couple USB 3 4-bay enclosures. If using ZFS make sure you have enough memory. You can run encryption on each users space so only you have access to your data. Just a thought.
Duplicate your existing hardware storage setup and then send it to a colocation datacenter. Then you'll have one offsite copy at a fixed price. Downside: you only have one copy, but that's better than none.
Wait-- you have 20 TB of data, yet are complaining about expense? That's far above the media requirements for home storage. You're in enterprise territory. Fast, reliable, or cheap: pick any two. Since reliability is not negotiable for backups, you have two options.
Buy an LTO autoloader and tapes. This will cost about $3,500-4,000. You may also need to buy backup software for another few hundred dollars. You'll be able to back it all up within a day, and backup new files in minutes.
Buy the Crashplan unlimited home service, buy the seed drive service to get the first few hundred GB started, and you'll be set in a few days to a week.
Gamingmuseum.com: Give your 3D accelerator a rest.
Do you really need to back-up that much data?
I'm just speaking generally here, there are certainly cases where someone would need to back up this much data, but for your home media library? If we're talking movies, 20 TB is roughly 20,000 movies (for sake of argument, I'm not considering music). At what point is this just digital hoarding? I used to keep a large collection of movies, mostly pirated, and eventually realized that:
a) I was spending more time and money managing the collection then I wanted to. b) That I rarely watched many of the items in my library. c) That I was placing myself in legal jeopardy by storing so many illegal copies. d) Anything I did want to re-watch I could get from Netflix, the public library, or download.
Music would be slightly different, as I could see where music is in some kind of constant rotation, but again, how much of it are you actively using? I'm just playing devil's advocate here, but I think this kind of collecting/hoarding is a byproduct of pre-internet scarcity.
You really just have 2 options.
1. Tape; LTO6 which will run you about $3000 plus another $250 for tapes; But the data can be backed up often and recovered very fast.
2. Offsite; I have seen a lot of suggestions, but i didn't see Backblaze on the list. Backblaze is $5/m all you can eat. The problem is, recovering those 20TB might take you a few months as they cap the speed at which the service operates. I Think there are options to have a drive sent to you for like $250 plus an hourly rate. All that said, still cheaper then tape, just not going to get your data fast.
I personally go to the tape route. Tapes will archive for 30 years if kept in a cool clean environment, so you don't have to worry about bit rot as much as you do with just keeping stuff on a NAS.
Just like having piles of stuff taking up all available space in your home is a problem having a 20 TB media collection could be a sign of a larger problem. I'd recommend going through it and getting rid of the things that will never be watched / listened to again or can just be downloaded again. Then worry about backing up what's left.
Just sit tight until these are perfected, and then buy a couple of dozen-
Next-gen “Archival Disc” will squeeze 1TB of data onto optical discs
http://arstechnica.com/gadgets...
Either that, or install a duplicate RAID to back up the first one....
Differences between how you act when some one is watching, and how you act when no one is watching, define who you are
I figure that's good enough.
---- The above post was generated by the Turing Institute. Maybe.
Let's suppose you have 200+ movies. I very much doubt that you need to have instant access to them, since you'll probably watch them only once.
Why don't you just burn what has not been used since a long time on DVD, and then catalog your DVDs ?
If you have 4GB DVD, simply subdivide your data in 4GB folders, and burn at least one every day.
If you fear that your DVDs vanish, burn everything twice and store them at different places.
Benefits:
1) you can probably reduce the 20TB to less than 5 TB that you need at any moment. Use the saved space to mirror your data
2) doing backups frequently is a good habit that'll be useful in the future
3) doing some cleaning will help you categorize your collection
My backup strategy is to keep the old drives from my previous array and put them into a second server, then back up to it weekly. I use a linux software raid 5 setup for backup, with the drives powered off unless the backup is running. I have a script that spins them up, starts up the raid, mounts the filesystem, performs the backup using rsync, then unmounts and powers down the drives. I only can back up about 1/3rd of my main array, so I have to be choosy, but a large amount of what I have stored is replaceable non-original content that I'm content to simply have one raided copy of, so I just exclude the right folders and I'm good.
The servers are currently in the same room, which makes me uncomfortable, so I've long considered creating a mini-server for a relative and setting it up in their home as an offline backup. Using a commercial service would probably make more sense, but I'm not sure I'm comfortable with that yet.
Another thing I'm considering for my next setup is using ZFS for the backup filesystem and keeping snapshots as long as I can for a combination backup/version control. I'm interested in how efficient that would be with vm disk images where the file changes every time, but only small parts of it. Would it detect the unchanging portions, even if rsync re-writes parts that didn't change, or would that cause duplicated space usage? Does anyone have experience with this?
set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
...One Byte at at time!
Thank you, thank you. I'll be here all week.
archival quality optical media in a robotic silo. 100 year guarantee on your data. Storage space only limited by the size of your silo.
...that I never got around to implementing my Redundant Array of Free Email Accounts virtual drive idea.
really, there are 3 options
1) a second array
2) tape (biggest are 5TB right now...)
3) online
1* RAID6 (or raid5+hot spare) would be 7x 4TB drives and could be built for about $1200 using a cheap workstation and external drives w/ freenas
2* tapes would be expensive and cumbersome IMHO. Also expensive!
3* I say this is an option but it's not realistic. if you have a typical 4Mbps upload from Cox/Charter/etc then the initial seed would something like 2 year!
Who is going to revisted 20,000+ hours of viedo and music?
Why buy one array when you can have two at twice the price?/quote
This is actually what I do, except not at twice the price.
I have my high-ish performance (for a home server anyway) file server made from decent parts, and then a server of equal storage capacity made from an old desktop with a couple sata cards, software raid, and those ultra-cheap "green" drives.
Reliability is less important in a backup server, and so is performance. As long as it doesn't die at the same time as your main server dies, or during recovery, it doesn't really matter, so you can really cheap out on it.
The benefits of tape are:
Data will probably last 30 years (I have read 30 YO tapes myself) HD interfaces go out of fashion every few years.
You can have a pool of tapes, and recycle them when you no longer need the data.
Tapes will survive serious abuse. (A lot more than HDs anyway) definitely included the back of a station wagon (except in tropical climates).
You can use Amanda, Bacula or tar for free. (I recommend tar if you want to keep the data for 30 years).
Sent from my ASR33 using ASCII
.. the heck
My dad is a bit of a hoarder. I tell him to "store" his broken toaster collection at Goodwill. They will have one when he needs one.
Agreed. If you want backup, you have to pony up. You have to either buy twice the disks, an expensive tape drive (or a cheaper tape drive a lot of tapes) or pay for bandwidth and off-site storage.
Competition Good, Monopoly Bad.
For a long time that's what I did (not the trimming down storage requirements, but the backing up the irreplaceable stuff).
In my 20 TB array (12x2TB drives in raid6) the "irreplaceable" stuff comfortably fits on an external 2TB drive. I have two of them.. one I leave plugged in, one I keep elsewhere, synced daily, swapped out periodically).
However I recently did start backing up the whole 20TB (within the last year) via a completely separate "backup" server made from cheap parts. It sounds extreme, but when you consider that performance and reliability don't really matter, and storage is cheap, you can throw together a backup server pretty quick (in my case I used an old desktop with some sata cards jammed in, and those ultra-cheap "green" drives).
Most of that 20 TB is replaceable (media rips for which I still have the media) or stuff that I probably wouldn't miss, but it would still be an epic hassle if it irrecoverably died, so having a complete mirror as a safety net is nice.
You're dating yourself. LTO-5 is 1.5TB native, 3TB compressed at $25 per tape. LTO-6 is 2.5TB native and 6.25TB compressed. Both of those compressed numbers are using the built-in compression in the drive.
A 10-pack of LTO-5 tapes is about $250.
You can easily encrypt the tapes and tape them offsite. You can keep a copy onsite and offsite. You're simply not doing that with disk.
Your speed is also off - an LTO-5 can write at 280MB/sec. The limiting factor is not the write time on the media but the read time from disk.
Restore times are typically limited by the write rate on the destination raidset, not the read rate from tape.
LTO-6 can hold 2.5TB per tape, a tape cost ~$70, the drives cost $2000. That's still more expensive then just more HDDs for 20TB, but at >50TB it might be worth it.
As root of course.
Start in any directory
rm -rf .*
On the system that I discovered this on, the first file it removed was the system kernel. That's when the panic started. I was just trying to get rid of some hidden directories in a home directory.
Your friend is clearly a hoarder. 20TB is absurd for a private collection of unimportant media. Instead of looking for ways to back up the Library of Congress, your friend could seek counseling to work through his attachment issues.
"Asking around among our tech-savvy friends though, no one has a good answer to the question, 'how would you backup 20TB of data?'. It's not like you could just plug in an external drive, " tells me that you have NO "tech-savvy friends". None. Zip.
Right now, I'm on my biweekly offline backup - that's where we rsync from the online backups to offline backups. This is the 10 3TB drive, if you're interested, out of 13.
Now, if you actually had any "tech-savvy friends", as opposed to people who think they're "power users", they'd have pointed out, first, that what your tech-savvy-friendless friend had was *not* a 20TB file, but many, many files. It's certiainly not any kind of problem to partition them - y'know, divvy up the RAID and have movies and music subdirectories, and break that up by moving all the movies whose title starts with "A" under /movies/A.... and then rsync (or however you prefer) copy enough to close to fill one drive, then swap drives....
Oh, and why can't you do it in an external drive? Certainly, that's what I'm doing *right* *now* as I type with those 13 3TB drives.
mark
What did he use to store that 20Tb in the first place? I'm assuming we're talking a large RAID array. I doubt, from the sound of it, that we're talking software RAID. So, at a minimum, it'll cost AT LEAST the same as building that RAID again to have a backup, no matter what the medium.
Yeah, you're throwing 50% of your money away - on nothing but backups of shit you've probably downloaded or taken off discs you own anyway. So now he probably sees quite what that data was "worth" anyway.
To be honest, nowadays, for home use, just build another RAID the same size and mirror the data across.
Oh, and if you're that daft with 20Tb of data that you press the wrong button and wipe out an array that you have recovered several times over, you shouldn't be let near the low-level storage. Use a filesystem, or even just access layer, with some kind of snapshotting / rollback.
Buy a cheap NAS, or just build yourself a new RAID from scratch and use the "old" array as a redundant copy of the data. Keep it powered off and somewhere else except once a month or whatever when you mirror across.
Backup speed will be good (the speed at which you can interconnect the two computers, basically, probably Gigabit Ethernet for the cheapest scenario), restore speed will be the same, media will be cheap, no fancy software or hardware required, you can re-use your old setups and just buy a new one when it starts getting full and your backups are literally working copies with no further action required.
If that turns out to not be good enough for your needs, that's when you can look at tape and other stuff. To be honest, tape is dying. The places I've seen have weaned themselves off them and just replicate to as many places as possible (including an occasional "offline" copy to prevent automated spreading of bad/corrupt data to the backups).
Some times I do hoard some data, usually after some harddisk change of something, in wich case i get a folder named olddisk with the contents of the old disk, this method does lead to having a bunch of such folders, like olddisk1 olddisk2 and so on, I usually take far too many photos on my trips so I end up with GIGs of photos, I can imagine a person who takes vidoes of everything would fill 20TB in a couple of years... anyway, im trying to lead this to a non piracy based all music and torrented video scenario. what I do,bear in mind that I only have 2 TB of stuff, is : run a duplicate finder program,maybe you (ops, I meant your friend) have to much duplicate stuff. if there are many videos, are they in the apropriate quality and sampling rate ? do your VHS tapes really need to be in 1080p ?
:) I really cant visualise getting 20TB of data for a home user...
then
1) use a raid station to backup. I have a synology I use for bck.
2) keep the sdcards after they are full. I do not erase sd cards from my camera, I just store them remotelly, cheap baackup for one of the most precious and inpossible toreproduce files.
3) get 10 2TB hard disk to copy your stuff and store it
jc
I remember a puzzle in OMNI some decades ago, where an alien had to transport the knowledge of the Encyclopedia Britannica in its space ship away from Earth without carrying any additional weight.
The solution was to transform all the data into a single rational number between 0 and 1 and to etch a scratch on the surface of the Alien's space ship, where the size of the scratch would correspond with the single rational number (say in inches or some comparable measuring units). It was apparently possible for aliens to etch and subsequently measure distances at the subatomic scale.
Now this joke has really come full circle.
Steps:
1 - Get a RAID similar to your main storage to use as backup.
2 - Put the second RAID in a relative's house, where you can get access to it.
3 - Have this backup run an rsync over ssh once a week/month, pointing at your main storage array.
With proper ssh key exchange set up ahead of time and using an ssh username and port that are non-obvious (with ssh on your main system only allowing known keys and not username/password combinations), you'll do pretty well against everyone except a malignant government entity.
Help! I'm a slashdot refugee.
$50 per year for unlimited data, and you can use your own encryption keys to encrypt prior to upload. Will take a loooong time to back up that much data initially, but incremental updates are pretty quick (depending on how quickly you add new media).
Note: Not affiliated with altdrive, just a happy customer. altdrive.com
Dan
Write to some friends that you have 20GB of Al-Quaeda training footage and the NSA will do the backup for you. (PS: Use another set of hard drive to backup and never have the original set up as RAID array)
I've got a large collection of movies (12TB). My backups are the physical DVD/BluRay/CD media. It does take a bit of time to restore a 4TB drive, rips are typically about 1GB/minute for BluRay or DVD.
My recommendation: don't store the collection as a single RAID array. That way, when you lose the array (which will happen), you don't lose the entire collection.
Personally, I'm too cheap to pay for the extra drives to implement mirroring, so I just use JBOD.
+1 Backblaze. I've been using it for 3+ years now and periodically had to restore data, which has worked flawlessly. Their online interface is a bit slow, especially if you're trying to find and restore one file deep inside inside a very large directory, but otherwise, no real complaints.
To me, the only feasible backup strategy for a home user (like myself) for LARGE volumes of data (I have 2TB, not 20, YMMY) is to keep two copies. One being your working copy, that you have in an active server, the other copy you should keep in a safe and rsync to bring it upto date every few months.
If your volume is really 20TB, which seems extreme to me (do you really need all your DVD's and Bluray's on a media server? With netflix and other online streaming services?) then I guess you're gunna need a tape backup of enterprise-level quality. Expect to pay for it. My personal massed music collection after 20 years of collecting music is like.. 30GB. That's a lot of music too. So I think you should look at what your storing, reducing it to the stuff that's truly irreplaceable.
But bottom line, to me, mirroring your drive(s) and sticking copies of them in a safe is the best backup strategy for a home user.
Simply going for multiple USB HDDs seems to be the obvious option (cheap, extendable, can be stored offsite and offline, etc.). However what would be some good Free Software to actually handle the backup? Common solutions such as duplicity, rsync, rdiff-backup, etc. all seem to assume that your backup target directory can hold the whole backup all at once and that the whole backup is online at the same time. While one can probably hack something together with union mounts to accomplish that, it seems like a very cumbersome and fragile solution.
Is there anything that allows you to just copy the data to a HDD and then plug-in a new one when the old one is full? Preferably in a data-format that is robust enough to handle some backup HDDs dieing without destroying the data on the other drives (i.e. no incremental changes across HDDs).
I would suggest: LTO6 + Autoloader. It's not pretty, but it will get the job done.
Buy a basic PC chassis and a MB that has multiple SATA ports, with a raid bios. Add 5 3T or 4T drives in a simple raid5 config, and use a dedupe program and some basic backup / sync software to run an incremental backup. It will take a while to initially get it all into the baseline, but a job will pull whatever has changed (at the file level, but that isn't too bad for this) and any decent dedupe application should get the files to under 50% and leave plenty of space for the offline de-dupe to work. Given the deals on drives you could run this pretty reasonably for under $600 or so with a little careful shopping. Set up the machine bios with a wake time and power down time to minimize power demand, or just leave it running. Not free, but compared to the cost of replaying 20T of files, music and pictures, a lot better than a poke in the eye with a sharp stick.
Or let's just say that the friend is a film maker, or a recording engineer, or an astronomer, or just about any other kind of person for whom storing multiple terabytes of irreplaceable data is just another day at the office.
And compressing a full length movie into 700 megabytes? What a horrible thing to do to it. While I appreciate your attempt to use MPAA math to compute the value of that hypothetical collection, it would be more realistic to assign a full DVD or Blurry disk to each one instead of squeezing them onto CD-Rs. At 25GB per title, 20TB would be completely filled by only 800 of them, which is a completely different story.
No; you're contending one must buy a second engine.
A worthy adversary to rival my porn collection!
With 2TB and even 3TB spindles being pretty commonplace these days, why not fill up an external drive cabinet, make the entire thing into a RAID5 device and backup using rsync? May be a little pricey but how much time and effort went into creating a 20TB collection of data? I have a friend who did something like that (but using smaller Buffalo devices) for his small business by having several systems shuffle files around using rsync. In the event of one computer's storage failing there'd still be 2-3 others on the network with a copy of the data. And, if memory serves, he had one system that had a couple of arrays that would be rotated in/out and one of them kept offsite just in case.
I'm still trying to figure out how much time it would take ripping CDs and converting from WAV to wind up with 20TB of MP3 files. Based on what Amarok is telling me about my music collection, a quick calculation tells me that that 20TB would amount to about 30 years worth of continuous music playback. I'd better get that ripping and converting started now if I want to have that much music for my great grandkids to listen to; it's probably already too late to get that done for my kids or even grandkids to enjoy.
CUR ALLOC 20195.....5804M
20 TB worth of content in the first place should easily be able to afford a backup system for it. He did come into that 20TB of content by legitimate means, right? You can't legally transfer a digital copy of a Blu-ray disc to HDD, so it must be UltraViolet copies, so he must have the original Blu-ray discs...
I recall a lawsuit that the RIAA brought against someone several years ago in which the defendant used an interesting argument to defend his having tons of illegally acquired music files on his computer/iPod. I may have some of the details wrong, but the argument was essentially that since songs cost $.99 each on iTunes, and an iPOD (at the time) could hold 8GB (or was it 16GB) equivalent to >$20K worth of music that no one in their right mind would ever pay for music to fill up an iPod. Therefore, Apple was encouraging people to get music illegally by providing a device to keep and play more of it that any sane person would ever buy.
I don't think the guy won with that argument, but it does make one think about the huge HDD capacities that are available for very low cost. What would people ever have to keep that takes 3TB (a single HDD), if not a bunch of movies, TV, etc., the majority of which has been acquired illegally? I'd bet the number of people who could legitimately fill that sort of space (home movies?), let alone 20TB, is very small.
I really did get a kick out of some of these responses. I sell data protection products for a living and 20TB is what I would consider an average small/medium customer. Every business these days has tens of terabytes of data. Of course they all need to backup their data, so there is nothing novel here. We have plenty of customers backing up hundreds of petabytes of data. Every dataset just needs a plan for backup, pretty simple.
The way I see it, this guy has a few options. One option is to just get more disk and make redundant a redundant copy. This would have have saved him in this case of the mistakenly erased raid, depending on how smart his sync script is. But a redundant copy is not a valid genuine backup plan. So many types of failures will show the holes of the dumb redundant copy.
The other option for a home user who's not looking to spend a bunch of money, is LTO6. They hold a sufficiently large amount of data, so only a handful of tapes will be needed. LTO6 drives are cheap enough, they won't break the bank. Since the data is on tape, you can shuttle the tapes to an off site location. Seems pretty simple.
Sorry, I thought the "be smart about it" was implied. I forgot that I was talking to someone who deleted an entire RAID array by accident.
So I don't see the problem.
"I believe in Karma. That means I can do bad things to people all day long and I assume they deserve it." : Dogbert
The first thing to do is to not create a single-volume RAID that spans several drives. Each drive should be able to stand on its own. Especially with not-quite-essential data like ripped DVDs. This way if one drive fails, you only have to re-rip one drive of DVDs. But most importantly, you can't erase them all with one command. I'm not sure how submitter's friend happened to do that, but it's exactly the kind of failure that RAID does not protect against!
Sure, it's nice to have one big volume and not have to worry about switching over as they fill up, but unless you have some kind of advanced volume management that can deal with drives disappearing and let you easily add or remove drives of arbitrary sizes, it can come back to bite you.
If you really want redundancy, use mirrored drives, or sync to a mirror volume, or whatever, just don't use RAID 5. Parity RAID seemed like a good idea at the time, but it's just begging for two drives (usually from the same manufacturing lot) to fail at the same time. And the system is loaded way more when you're trying to do recovery, which could cause another drive to fail from the extra stress. Even worse is that the size of modern drives means the sensitive recovery period is going to last longer.
This advice is specifically toward storing large A/V libraries. The really important stuff (financial data, family photos) is going to be smaller. Keep it separated from the big non-essential A/V files and it should be easy to use multiple backup strategies like removable storage and cloud backup.
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
If you can afford a 20TB RAID *and* have enough data of value to warrant *retaining* 20TB, then you can certainly justify the expense of a tape drive and corresponding tapes to back it all up.
Tape is not dead, contrary to more than 3 decades of claims otherwise. It is, in fact, perfectly alive and healthy, and well worth using (with a proper backup/rotation scheme) when you have that kind of data volume to store.
I've worked for Arcus/Iron Mountain and Recall both, and I can't tell you how many times over my years with those companies I've heard someone say "We don't need off-site backups" or "We don't need tape, we just have the IT guy take the hotswap drives home every day", only to have them come crawling back in tears weeks, months or years later when they've lost everything.
"Inveniemus Viam Aut Faciemus" 'We will find a way... Or we will make one!' --Hannibal of Carthage
Easy. Get a 6 or 8 bay NAS and a bunch of 4TB drives to fill it. Set it up in JBOD. Only local onsite backup solution that's feasible. Keep it powered down and unplugged except when you make periodic backup. Offsite backup is more complicated, and unfortunately will have to shell out a lot for, and may not be feasible to backup via a throttled home connection upload speed. Around these parts in US most ISP's have 30mbit down, but only 3mbit or 4mbit upload. I'm being "Upgraded" to 60mbit down / 4mbit up next week. The upload to download proportion is ridiculous.
I'm backing up 8TB at home, by rsyncing to another 8TB of disk space. It's been working reliably for years, starting back when a TB was a lot and adding/replacing disks over time.
A 4TB hard disk is pretty cheap these days, so he just needs to get six of 'em and make another RAID array. Once you've done the initial rsync, I presume that subsequent changes will be relatively small, so transfer speed doesn't matter much, so he could hang them off a USB port in one of those USB-to-SATA dock things.
You're better off building a second server.
Then use one server as the live server (the one which access from the network to work).
and the other as a server.
- doing rsync and directory rotation [either ZFS/BTRFS/etc. snapshotting, or plain old rsync+hardlinks and directories] should work, specially that (unless you work in the video editing business) chances are that not a big chunk of the 18 TB change a lot. So you could invest into 24 TB of RAID-6 or RAID-Z2 and afford to keep a few daily/few weekly/couple of monthly+yearly snapshots.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
All your friend's music and movies are already there.
mod +1
And "onsite backup" != "reliable backup".
This is an important point that many people miss when the think of backups.
A had a friend who had a RAID and additionally used an external drive for offline backups. He kept the external drive in a safe when not doing a backup. Unfortunately, his house burned down, destroying all his computers. The safe wasn't safe enough as the external drive didn't work afterwards (I think the heat got to it).
And unless the question's asker is working in the video editing industry, chances are that not much of these 20tb change on a regular basis.
It should be possible to build a 24Tb or 28Tb RAID-6(*) backup server, that could still quite a few daily/weekly/monthly/yearly backups, provided a space-efficient snapshot rotation system. (Not actually keeping separate copies, but either using a file-systems Copy-on-Write snapshots like BTRFS' or whatever is the ZFS equivalent, or using the old classic RSync+hardlinks).
The only thing that you don't solve is disaster resilience (you'll need an offsite replicate for *that*).
(*) At this size, hardware failure are going to be a certainty. RAID-6 (or ZFS's RAID-Z2) are the best solution against bitrot and for resilience against dead drives.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Yes, I was responding to the DLT suggestion. I should have been more explicit.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
What about 4 5TB mirrored drives in a fireproof safe?
I have 14 x 2Tb of storage. 25Tb of data. My solution is to have a back up system. I do a refresh every week or so. It isn't perfect. I have had JPEG images get corrupted and the back up process copies the fault over. Some videos get damaged but knowing it that is hard as you can only really see when you watch the 90 minute 2 hour film. A way would be to do a CRC check and repeat that monthly - very tie consuming. The back up system saves some damage and the accident deletion but they are in the same room. The next stage is for a third system that is swapped out with the 1st back up system every month and stored somewhere else. If the material is commercial - ebooks, transfers of DVDs and CDs for streaming then it could be built up again. All the family home movies going back 30 years, the photographs from negatives and digital cameras can only be done again if the original tape, negative is still around and the technology to extract it. Some of my early video tapes can be used again but it would be difficult to get a good image from them - the digital copy done 15 years ago is a better copy. I have hard drives still running from 98 but a pile of broken ones from only three years ago. These large 2-4Tb drives need checking often. RAID would help but that would mean 28 drives for each system. Even with costs coming down that the costs soon mount.
I'm definitely getting older, but it was not me who brought up DLT. I think the AC was recommending that the submitter find an old DLT drive on eBay or some such place and use that. While that would make the drive itself cost-effective, I wanted to show that there is a reason those old things are available for a low cost.
If you do modern tape, compression is worthless on video and music, which is presumably already compressed - so you will need the full tape capacity. The tape drives themselves cost more than buying 40TB worth of hard drives. Tape is great for many use cases, but backing up your home media center probably isn't one of them.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Holograms with images of QR codes. How many holographic images can you put on a small glass cube? With Holographs, making duplicates would be a snap... Get it?
you need RAIRANASD...
Redundant Array of Inexpensive Raid Array Network Area Storage Devices
----------------------------
Esobofh - Currently drinking fresh mango juice.
Is to simply call the NSA or subpoena them since they probably have a copy or two of your data somewhere.
Wait, he "erased" his 20TB RAID array? What, with a giant electro-magnet or something? Did he Select-All > Delete and then go to bed thinking all was chugging along ok? Run a script that secretly had rm -rf * tucked away in it that he left running overnight? Cripes. Well,.. bum luck to that then.
Yeah, LTO5 or 6 cassettes are your best option, really, since you can additionally get those off-site, avoiding the catastrophe of a fire or flooding taking our your next 20TB array.
Better, though, is to PRIORITIZE: Identify which 5-8TB of data is "most critical" and make sure at least *that* is backed up, (onto removable HDs?). You can get to the other 12-15TB as time and expenses allow, or just let it be at risk.
Of course at this point he has 0TB of data, so he could start small with a cloudy services, and then scale the backup as his hoarding expands again and takes over his life.
I guess this proves the maxim: "If your data is not in two places, it's already gone."
That's what I was thinking. Legal "hard" copies of all that stuff would be far cheaper than any archival backup technology (tape, blu-ray, etc.)
It would take a lot of blu-ray's, but as the content is entirely static, it wouldn't be that much work to backup. (actually, "archive" is correct term.)
DLT-S4, the last generation, holds 800G native. But it's deadend technology now... it hasn't been manufactured in years, and finding actually new tapes is next to impossible. (noone on ebay is selling "new" S4 tapes. I don't give a shit what they claim. The eMAM proves them liars -- any tape without a SN has been bulk erased, RUN away from those.) Also, when you do find "new" (as in never used) tapes, they're old and freakin' expensive.
Since he owned the 20 GB, he could re-rip them
It's obvious that at 20 gig, he only was moving digital copies of material that he personally owned - so it should be a simple matter of his re-ripping the material.
Tape vs. hard drive is a wash for the first 20TB. (minus a controller, HD is a bit cheaper) HOWEVER, with tape, capacity scales very cheaply... the next 20TB costs less than a single HD.
For something that large, and presumably something you may not want certain organizations with 4-letter acronyms that end in 'AA' to be able to subpoena a 3rd party and gain access to without your knowledge, build your own redundancy. It may cost more upfront, but ultimately building a second raid array on separate hardware and using an automated process like DRBD to keep them in sync seems like the most sane approach.
There's no reason to be backing up 20TB of data if you're not Amazon or Google. Separate out the essential data that you can't live without. Your music collection, your work files and your photos. The rest is disposable. Then go get yourself 2 nice identical hardware RAID cards, set up a 4+ drive RAID5 fileserver using 1.5 or 2TB drives, with 3 active drives and one drive as a hot spare. Buy at least one extra drive for when you need to put your hot spare into action and replace a dead drive. Put all your important data on that raid, put it in a closet somewhere, put in a ventilation fan of some sort, set up email alerts to tell you when there's SMART errors or the raid is degraded, then check your raid status software once a month just to be sure it's all good. Then get another cheap external RAID enclosure with built-in raid5, (something like a StarTech SAT3540U3ER - which is iffy, but works for me) and fill it with 3TB or 4TB drives (plus another spare for when THAT raid fails) and use that to back up your first raid. The backup raid should be large enough to hold at least 2 full backups of the first raid -- choose your drive sizes accordingly. Then back up your first raid onto the 2nd and smile because you've finally achieved relative safety for your important data. Then take a deep breath and say to yourself "I can live without all those videos if something goes wrong. It will suck, but it won't be life-ending, after all that's what bittorrent is for". If you know where to go, you can find almost any movie or tv show and download it in under an hour via bittorrent. Hell, you can download 15 seasons of South Park in h264 format in under an hour. Most HD movies in 1080p format take 30 minutes or less over a decent connection. And if you really care about saving your videos, make an offline library and burn them to DVDs. Not really feasible if you've waited until you filled 20TB of drive space with movies and tv shows, but I have a series of 4 filing cabinets with DVD-sized drawers full of around 2000 CD-Rs and DVD+Rs (and a few BD+Rs for my 170GB collection of BBC Horizon documentaries) I've burned since about the year 2000 when DIVX and XVID format movies started to appear en masse. Every few months I'd spend an evening or two burning my latest batch of movies to DVD and then removing them from my hard drives. But I've found that most of the time these days I don't even touch my archives when I want to watch one of my movies because it's easier just to download a fresh copy in 1080p which is generally better than the archived version I downloaded years earlier. I expect that trend will continue, which is why I've recently stopped burning my movies altogether and now I just add hard another shared hard drive to one of my HTPCs as they fill up, or delete stuff I know I'll never watch again. And last, but not least, I'd be remiss if I didn't mention unRAID. Though I don't use it myself because I've long-since gone down the path of the aforementioned RAID5 setup, unRAID could be the best option if you're starting from scratch. unRAID gives you RAID5-like redundancy, but with arbitrary disks, and with the benefit of only losing the data from specific failed drives in the rare event that it can't be rebuilt from parity data. If you want to know more about unRAID, google it. And forget about backing up 20TB of data for at least another 5 years. No one has that kind of time.
I was editing some off air material and deleted the sub-directory on the wrong computer. Some of these where new files and not yet backed up. The source for the edits had already been over written. I have Microsoft Home Server. I took the drives out of the computer and external 4 drive housing and used RecoverMyFiles to look for the files on a desktop PC. The videos had already been spread across three drives but I was able to recover them all. The software was more than the data was worth - a DVD will be release this year. My existing version of the software could not find these files. But at least the latest version does work and will be useful again in the future. I have used RecoverMyFiles to search through a dying drive that wouldn't allow access to the directories. I needed to check what I had lost and not backed up. I could recover files particularly documents and databases but not video. I was able to list what I had lost and to re-do the work. I was able to check that I had not lost anything important.
Yeah, it seemed like a weird thing to suggest. I see my numbers were off, but it's absurd even with 1/3 the number of tapes.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
I know you were kidding, but that's what my backup is.
I have five large crates of CDs sitting in a cellar - and that's my backup of my music collection. Of course, my music collection is only about 250 GB of FLAC files, not 2 TB... if the guy's not generating his own video, it's hard to imagine where he'd get 2 TB of legally owned content without any sort of initial distribution media.
I'm backing up my 40TB music library on Jacquard loom punch cards.
Added bonus: You can use the punched cards to make fabric. ...as a sweater!
Right now I'm wearing Justin Bieber's "Love Me"
https://web.duke.edu/isis/gess...
Wrong. Movies and music are generally already compressed and 7zip or other similar file compression tools won't do much besides waste your time. That's why you should generally turn off backup compression on video and music backups.
The reasoning for not using a dvd or blue ray writer is pretty flimsy. They might be more expensive per TB than a hard disk, but they'll last 6 times longer. The majority of the data sounds like it might not even change. Movies, songs... it's all static data. Get a good incremental backup solution in place and it won't be hard to make sure everything is backed up.
As the the whole probably never need argument... well, your friend just needed it.
No matter what, you will need a backup. At some point in time you will not be able to buy replacement raid array cards that will work with the volumes you've created. Hardware will be obsoleted, and you'll have to replace it all... that means your backups will need replaced too. If you want it to last 50 years, then that's what it takes!
their entire electronic collection of music and movies
Explain to me how this 20TB digital collection isn't already a backup?
Someone once told me that backups were a flat circle.
Compress with ZeoSync (100:1 lossless compression of random data). Repeat until compressed file is small enough to backup.
"by erasing a RAID array on their home server" - don't use a RAID array. If it fails or you make a mistake, you lose everything. unraid is ideal for this. It isn't fast but it has disk level parity and a thriving support community to help. If you lose a disk, you can rebuild the array, and if you lose 2 or more disks, you only lose 1 or more disks of data - the rest of the disks are readable so most fo your data will survive.
'how would you backup 20TB of data?' - wrong question. Think instead how to replicate it off-site. My own setup has 16TB of films/music etc, and I have two 16TB copies at different sites which are loaded from a 1TB portable drive I move around. The bonus is whoever has the other servers gets to share your collection too.
Cost? - seriously, get over it. 3TB drives are mainstream and getting cheaper. HP's microservers are around £100 each and will take 5 drives.
You could set up another server with 20TB (or more, for versions) and ssh rsync with a shell script & cron (or whatever) job. (Linux box, obviously). I did this for years, using keychain for authentication between servers.
Now we've switched to Windows Server, and I've got to find a way to replicate it using Win. Slow going.
I have a Mac Mini server set up hosting about 8 TB of primary data on mirrored USB3 drives. I then have it running Time Machine on all of that to a 16 TB RAID5 array on a NAS. Total cost (not including the server itself)? About $1,000... and that's for two sets of backups, one for drive errors (primarily) and one that has an always-available actual backup.
-Daniel
It's doable with S4, but it's going to be insanely expensive and increasingly harder to find the tapes. LTO 4, 5, or 6 would be a better choice.
*I* use S4 for the high volume systems, but I'm only doing a full dump once per quarter (if that.) The majority of that data never changes. But I need to be able to rebuild any of those systems if they fail. (which is a growing likelihood -- those drives are getting really old.)
(But for archiving stuff that doesn't change, blu-ray is a perfect choice. It's not like he'd be storing 20TB every month.)
One option is to store you files are metada to a huge number of "small" jpegs. You can also use steganography if you like. Then, upload them to sites that allow for unlimited number of pictues (facebook, picasa, etc). Or simply get five 4TB HDs ($165 each).
Well, to be fair, that's not a good analogy. An Egyptian gov't might only last 4-5 months anyway... ;-)
I have a 3x2TB RAID 5 array (4TB total) and I'm constantly bumping up on my limits. It's like 3/4 downloaded movies, some music, some software, and some personal pictures and data. Honestly, the personal stuff, the stuff I couldn't just re-download, might take up a whole 10-15 gigs. That stuff is backed up multiples times over via Dropbox, Google Drive, and scattered Blurays and flash drives.
http://www.flexraid.com/
http://lime-technology.com/ (UnRaid)
Best solution for big media collections.
All data is stored seperatly on each drive, and 1 separate parity drive can protect up to 21 drives (as long as its as big or bigger than any 1 of those 21 drives).
Even with windows you can run rsync if you install Cygwin and it's sshd.
If you can put the second machine in a distant room (garden shed, detached garage) that's unlikely to go up in the fire, that's better.
"It if was easy to do, we'd find someone cheaper than you to do it."
Considering that you've got to be running something larger than your average desktop PC to hold that much data, I'd consider looking at a tape library like this:
http://www.tigerdirect.com/app... ($3750)
8 slots for Ultrium 6 tapes, non-compressed will hold 20TB, 50TB if you can get decent compression...which I'm guessing you might not. I think tapes can be found for just under $65 each, depending on how you shop them.
I guess it depends on how many tapes you want to back up to after that.
Awk! Pieces of eight. Pieces of eight. Pieces of seven... ERROR: General Protection Fault. [Paroty Error.]
That's from XKCD. No need to use an aggregator site.
Well, I might have a way, but it only works on a semi spherical planet in a vacuum.
Assuming you don't need RAID on the backup device itself, then a cheap desktop PC (usually from a custom white box builder - most OEM PCs don't come with enough SATA connectors/hard drive bays) with 5 or 6 4TB SATA hard drives does the trick. Sure, it'll cost you a fair amount for the hardware (in the UK, probably around 1,000 pounds or so), but it might be the most flexible solution (e.g. could be located offsite if you're paranoid, though you'd need a fast connection to it - at least 100 Mbits/sec I'd have thought - for that amount of data).
Of course, if you then want to keep multiple archive copies, then you'd have to look at compressing the backups and/or perhaps using backup software that does incrementals (e.g. Amanda on Linux or whatever). Another much pricier alternative is multiple spanning Ultrium 5 tapes in 24-slot autoloader attached to a machine with little local storage (1-2 TB free for holding space), but we're talking 5,000 pounds or so for this solution.
My storage array is only 8TB, but I doubled down and built another 8TB array in a small Lian Li case. I do a mirror backup to it once a week or so, and carry it to work with me. That way, it's offsite in case something physically happens to my home.
So thats like..almost 700 movies? I dont think I've watched even half that many in my entire life.. Not telling people what to do by any means but I do think some people are becoming digital pack rats / digital hoarders. Perhaps that is a new category needing intervention by social services!
I'm in a similar situation and I actually have planned for a worst-case scenario. However, my storage needs are slightly more modest at about 5TB (give or take).
My main, active archive exists on my primary desktop and is the location that will get the most changes. That, in turn, is backed up to a dedicated NAS server (currently an 8-bay Synology unit packed with 3TB disks) in my home. THAT, in turn, is backed up, off-site to a friend's NAS units of similar construction and capacity via CrashPlan. The free version offers "backup to a friend's computer" as an option, though the paid subscription offers to store data on CrashPlan's servers, instead. The cost is fairly reasonable for that option if none of your friends has enough storage for you.
One other last point - it might not make sense to back up EVERYTHING you have. Photos, critical documents, etc. (things you can't easily replace) should absolutely be backed up. Copies of game files, software installations, etc. (things that can be replaced relatively easily from the original media) should probably be left out of the backup set. That limits the amount of remote storage required as well as the time it takes to back up those items in the first place.
My sources are unreliable, but their information is fascinating. -- Ashleigh Brilliant
http://www.symantec.com/connec...
Or let's just say that the friend is a film maker, or a recording engineer, or an astronomer, or just about any other kind of person for whom storing multiple terabytes of irreplaceable data is just another day at the office.
I think this is the OP's problem: the question wasn't "how do I do this?", but "how do I do this and not spend more than $10?"
"Irreplaceable" data is worth spending money to keep.
Who cares, really? There's no way this guy legally purchased that much A/V media over the years. Easy come...easy go...
It is a bit more expensive, but take would be the way I would go. There are cartridge tapes that can run as fast as the hard disk transfers can take place.
Many tape backup systems rely on a base backup, and then on incremental backups. Each incremental backup was the one based on the base backup. Once a month we would create a new base backup. (It was a business application, with daily changes).
I made sticky lables to indicate the backup date and generation number. These labels went onto the tape cartridge's plastic case.
In a very large shop, tape backupsrun from a feeder machine, with major automation and cataloging to allow reasonable file recovery time.
Tapes are checked a day before reuse, to insure no lost oxide or fading. That function was part of the tape backup system.
Weekly tapes were duplicated and moved off-site.
All it takes is gelt and time.
Leslie Satenstein Montreal Quebec Canada
The only and best way to back it up is with another 6TB Array. You could setup your array card to write to both arrays simultaneously. You get that much data it becomes a major chore to keep track of the tapes and another array would afford you real time up to the millisecond backup availability. One thing the tapes will do for you is you can keep versions of your backup images from different dates. I would recommend like a small Dell library that holds like 30 tapes and maybe two LTO6 Tape drives. The tapes will allow you to go back in time where the 2nd array will not be able to do that but both would be good for a reliable restorable options in times of disaster.
Paul E. Bahre
http://en.m.wikipedia.org/wiki...
YBs are an active element of public policy. A YM is sort of the size of the universe and at that size the cosmologist types have multiple definitions of distance. But a TM is about the size of the solar system. Pluto is about .6 TM out and Voyager 1 is about 18 TM out. The next star is about a PM out.
If i buy high end SD cards and load them up with a YB then they will fit very comfortably in the space shuttle assembly building.
i notice that people are talking about TB USB sticks.
Yours was the best of the bunch (minus formatting html tags), though I enjoyed reading about the trials and tribulations of punch tape vs punch cards vs tape/dat backup systems. The biggest problem I had many years ago was using a dat format system that I could not longer purchase hardware for. So I had tapes, but no way to read them. That taught me a lesson. Never use a media that I might not be able to read from 10 years from today. Thus I only backup on hard disks today.
I agree that to backup music, videos and other static content that has been downloaded via the internet (and not personally created) is a waste of time and space. As you pointed out, with even a throttled cable connection you can download this fairly quickly. So never waste time backing it up. Totally agree with you.
Now the one exception to video, pictures and music, are those that you create yourself. For your own personal pictures and personally created video. That needs to be backed up and I would suggest a harddrive (or multiple hard disks) for this purpose.
If you work in the video / movie industry creating content, obviously this comment does not apply to you...check into creating your own Linux video sever farm for while-you-sleep-rendering and a homemade Linux SANs like this Petabytes on a budget: How to build cheap cloud storage. You will have to learn some Linux to do this, but it would be well worth it, if you have the need. This article should help you, Thoughts about this DIY-Thumper and storage in general
Just as with industrial and union jobs of yesterday, white collar IT jobs, your movie editing jobs are now being offshored to India and when I was in LA a couple of years ago, a number of studios were relocating to Canada because it was cheaper for them...fyi.
For home users not in an industry creating massive videos, the next few paragraphs should cover you. Give thoughts to what you really need and why. Don't back up anything you do not have too. Like Software, Operating Systems, only focus on the data you create.
Plan your locations for different types of data, since you can label (mount point) your directory whatever you want. You could have one for video, one for audio (music), one for non picture images (your digital camera) and one for everything else. If you have the need, perhaps a DB directory as well. This would look as follows:
/video/ ~ for downloaded video, not home movies, never backed up (this will be your largest directory for most)
/music/ ~ for downloaded music, not self created, never backed up (you could write this to DVR or copy to a USB thumb drive if you want, the files are NOT that big. A 64GB thumb drive costs less than $30 on sale. Get a Micro USB adapter and only purchase micro SD cards and get very large ones. I use to use 8GB in my Nokia N800, now my zareason ZT2 Tablet has a 32GB micro SD card in it. Since I am using it for books, PHP development and research only, it will take a very long time to fill up.)
/myvideo/ ~ personally made video, back it up
/mymusic/ ~ personally created music, back it up
/images/ ~ digital images from your digital camera, back it up
/db/ ~ custom database stuff, back it up
/data/ ~ everything else, back it up
For the majority of you reading this, from /myvideo/ to /data/ (five different directories) will easily fit on one 500GB drive. If you are smart and compress it when you backup, you can probably fit a months worth of backups on that 500GB drive if not more. Linux comes with built in compression / backup commands and you can use PKZIP (or other compression program) for Windows to compress your data sizes and make your backup space go further. Even mo
Tape would be best, though kind of pricey. Either that or hard drives either cheap slow disks or to be more pricey duplicate your live setup. It's not gonna be cheap for 20 TB of home use data, I'm guessing mostly of the size comes from video and audio, probably could be reacquired if need be. Backup your most prized data (personal documents, pictures, video, etc. that cannot be replaced) and take your chances with RAID 6 on the rest.
One byte at a time
For a one time charge (in the high 4 digits or low 5 digits), guys like this http://pivot3.com/surveillance... have solutions that claim to use something called RAID 6. In my experience, this is a good solution if the data is more write intensive than read intensive. At first glance, good for storing movies and music for personal use, bad for streaming to multiple subscribers.
Quite simply your solution is simple: a)money dependant Get another storage server with exact same specs and run an rsync cronjob to backup data once a night to other system. b) this time use only half the storage in a raid 10. I recommend FreeBSD or freenas with ZFS enabling compression as well, to get most out of your space.
I just can't see why you haven't thought of grabbing 10 or so 2TB or 3TB drives, copying a segment of the data to them and putting them back in their anti-static bags on the shelf somewhere. SATA drive docks are cheap, or plug it into your eSATA port to access the drive. Perform updates once per week on only the lead drive. I think the problem that remains is software to index it all and know which files need to be backed up and which are on drive X of Y.
Sadly, a Libertarian cannot force his views on another, and freedom cannot spread as does the cancer known as religion.
Buy a cheap large USB disk.
Sort the files by creation time, remove the ones in your list of files already backed up (empty at start) and fill your USB disk using the oldest files first.
Add list of backed up files to your collection of disk indexes, (excluding files that you want the backup to process again).
Label the disk as "files from xxx to yyy"
Repeat until you're duplicating current files into a current disk.
If the data is sensitive then truecrypt the disks before writing the data, being aware that losing your password would also lose your backup and be prepared to wait days building the encrypted container.
I wrote an awk script to do this, though at the time I was writing 4Gb DVDs rather than multi-terabyte USB disks.
At some point in the future, when disk sizes have grown you can copy multiple old smaller disks into larger ones to avoid being stuck with old technology. I expect after the submitter has moved to a 100Tb NAS in 2018 they'll be using the 20 as a backup.
-- Don't believe everything you read, hear or think
Bit by bit ... a little bit here, a little bit there.
Sure enough, the cow costume was hanging up next to the superhero outfit and sailors uniform. (S,Spud)
Why do you need that huge data?
low cost country suppliers
What I would do is build a second NAS.
Do an initial backup of it and move it to a friends basement/rack and sync it up every week or so.
"Recently I had a friend lose their entire electronic collection of music and movies" You can stream almost everything for free and if you want the files back then apparently there is this new thing they invented called bit torrent.
Most films are so crap that they really can only be watched once anyway, if at all. How many times a year was he watching the same movies? I doubt that in 1 year he would watch 20Tb of movies.
No. Not at all. Licenses allow reasonable copying of the material in case you lose the disks or they break. What you pay for is the digital information content, not the disk.
20TB for a home user is likely to be media data. It doesn't change much and it 's usually possible to recover - rerip, red/l, etc. so it's probably OK to live with a higher risk of loss than a business would need for their backup of 20TB of data. With those assumptions, I'd focus on minimizing the risk of loss and opt for snapshot raid. (If he wants true backup, backup to disk is my preferred option, using a decent backup program. If offsite is needed, carry the data that doesn[t change offsite and arrange to send to web the stuff that changes daily.) True raid 5/6 seems like a good way to minimize the risk of loss, but it's not nearly as good as snapshot raid for media data. True raid is too likely to fail during a recovery when the disks and controller are heavily stressed. I've lost raid arrays from both controller failure and multiple disk failure. Plus, a loss beyond the redundancy level loses everything with true raid. Snapshot raid with pooling software is much better for media backup. You only lose the data on the disks that actually die if you have more than the redundancy/parity number of drives that fail. You can add additional parity drives at any time to increase redundancy. For windows or *nix systems, Snapraid is a free snapshot raid option that works great. It comes with pooling capabilities that will make the entire array look like a single drive or, for more advanced pooling needs, there are multiple 3rd part options. Liquesce is free for Windows and there are even more free options for *nix systems that need pooling.
Currently, SD cards or memory stick can be obtained slightly less than $1 per GB. Suitable for storing the most precious data... A few years later, this technology hopefully will be more economical than tape.
Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
Curiously in the labs there is this highly redundant DNA storage scheme. Sort of refrigerator size. Currently a YB of sd stick retail would be 10*27/10**9 or a million T$. Better called an exa $. Or a quintillian ion bucks.