Ask Slashdot: Simple Way To Backup 24TB of Data Onto USB HDDs ?
An anonymous reader writes "Hi there ! I'm looking for a simple solution to backup a big data set consisting of files between 3MB and 20GB, for a total of 24TB, onto multiple hard drives (usb, firewire, whatever) I am aware of many backup tools which split the backup onto multiple DVDs with the infamous 'insert disc N and press continue', but I haven't come across one that can do it with external hard drives (insert next USB device...). OS not relevant, but Linux (console) or MacOS (GUI) preferred... Did I miss something or is there no such thing already done, and am I doomed to code it myself ?"
I'm guessing you don't have enough space to split a backup on the original storage medium and then mirror the splits onto each drive?
Given the size requirements, it seems that might be prohibitive, but it would make things easier for you:
How to Create a Multi Part Tar File with Linux
For that much data you want a RAID since drives tend to fail if left sitting on the shelf, and they also tend (for different reasons) if they are spinning.
Basically: buy a RAID enclosure, insert drives so it looks like one giant drive, then copy files.
For 24TB you can use eight 4TB drives for a 6+2 RAID-6 setup. Then if any two of the drives fail you can still recover the data.
multi volume tarJust mount a new usb disk whenever it is full.
However to have reasonable retrieve rate (going through 24 TB of data will rake some days over USB2), You better split the dataset in multiple smaller sets. That also has the advantage that if one disk chrashes (AND Consumer grade USB disk will chrash!) not your entire dataset is lost.
For that reason (diskfailure), do not use some linux spanning disk feature. File systems are lost when one of the disks they write on are lost. Unless you use a feature that can handle lost disks (Raid/ Zraid)
And last but not least: Test your backup. I have seen myself cheap USB interfaces failing to write the data to disk without a good error messages. All looks ok until you retreive the data and some files are corrupted.
No kidding. For $2400, you get 24x TB HDs and a bookkeeping nightmare if you ever actually resort to the "backup." For $3k, you get a network-ready tape autoloader with 50-100TB capacity and easy access through any number of highly refined backup and recovery systems.
Now, if the USB requirement is because that's the only way to access the files you want to steal from an employer or government agency, then the time required to transfer across the USB will almost guarantee you get caught. Even over the weekend. You should come up with a different method for extracting the data.
Yes, Bacula is the only real solution out there that isn't going to cost you an arm and a leg, and that allows you to switch easily between any backup medium.
Except for good old tar, which is present on all systems.
Most people are probably not aware that tar has the ability to create split tar archives. Add the following options to tar: ... where myscript.sh echoes out the name to use for the next tar file in the series. It can be as easy as a for loop checking where the tar file already exists and returning the next hooked up volume where it doesn't.
-L <max-size-in-k-per-tarfile> -M myscript.sh
Or it could even unmount the current volume and automount the next volume for you. Or display a dialogue telling you to replace the drive.
One advantage is that you can easily extract from just one of the tar files; you don't need all of them or the first-and-last like with most backup systems. Each tar file is a valid one, and at most you need two tar files to extract any file, and most of them just one.
Tar multivolume can, of course, be combined with tar's built in compression.
No. It's slower. Informative, my ass.