Ask Slashdot: Simple Way To Backup 24TB of Data Onto USB HDDs ?
An anonymous reader writes "Hi there ! I'm looking for a simple solution to backup a big data set consisting of files between 3MB and 20GB, for a total of 24TB, onto multiple hard drives (usb, firewire, whatever) I am aware of many backup tools which split the backup onto multiple DVDs with the infamous 'insert disc N and press continue', but I haven't come across one that can do it with external hard drives (insert next USB device...). OS not relevant, but Linux (console) or MacOS (GUI) preferred... Did I miss something or is there no such thing already done, and am I doomed to code it myself ?"
I'm guessing you don't have enough space to split a backup on the original storage medium and then mirror the splits onto each drive?
Given the size requirements, it seems that might be prohibitive, but it would make things easier for you:
How to Create a Multi Part Tar File with Linux
For that much data you want a RAID since drives tend to fail if left sitting on the shelf, and they also tend (for different reasons) if they are spinning.
Basically: buy a RAID enclosure, insert drives so it looks like one giant drive, then copy files.
For 24TB you can use eight 4TB drives for a 6+2 RAID-6 setup. Then if any two of the drives fail you can still recover the data.
Out on bail mate?
Why not tape, backup RAID, SAN or some other dedicated backup hardware solution?
24TB is well within the range that a professional solution would be required.
Given a harddisk size of ~1TB, making a single backup to 24 disk isn't a backup; it's throwing data in a garbage can.
More than likely atleast one of those disks will die before it's time.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Evidently, our UNIX founding fathers had similar challenges...
Porn is a renewable resource, there's no need to store so much of it.
multi volume tarJust mount a new usb disk whenever it is full.
However to have reasonable retrieve rate (going through 24 TB of data will rake some days over USB2), You better split the dataset in multiple smaller sets. That also has the advantage that if one disk chrashes (AND Consumer grade USB disk will chrash!) not your entire dataset is lost.
For that reason (diskfailure), do not use some linux spanning disk feature. File systems are lost when one of the disks they write on are lost. Unless you use a feature that can handle lost disks (Raid/ Zraid)
And last but not least: Test your backup. I have seen myself cheap USB interfaces failing to write the data to disk without a good error messages. All looks ok until you retreive the data and some files are corrupted.
If the OP's porn collection can be logically broken up at some level, eg:
then the backup software could create one job for each directory, and multiple USB disks could be attached at once giving increased throughput. USB3 also increases speed to the point where the 7200RPM disk itself will become the bottleneck.
So at 100MB/second per disk write speed with 4 disks going at once (assuming the source disks are capable of this supplying this volume of data and there are no other throughput limitations), you could do it in 16 hours, or 24 hours with more realistic margins.
If it turns out that the source data is not porn (unlikely) and is highly compressible, then it could be done in far less time.
Bacula can do all of this.
Agreed. Best thing I ever did was get a computer case with a SATA sled bay, like one of these. It won't help with breaking up the files, but a plain SATA connection will be many times faster and many times cheaper than getting external USB drives (because you don't have to keep paying for external case + power supply). After you copy it over, you just store the bare drives in a nice safe place.
This assumes it's a one-time or rare thing. If you do want access or the backup process is a regular thing, then an NAS or RAID setup is probably more convenient so that you don't have to keep swapping drives in and out.
Yes, Bacula is the only real solution out there that isn't going to cost you an arm and a leg, and that allows you to switch easily between any backup medium.
Except for good old tar, which is present on all systems.
Most people are probably not aware that tar has the ability to create split tar archives. Add the following options to tar: ... where myscript.sh echoes out the name to use for the next tar file in the series. It can be as easy as a for loop checking where the tar file already exists and returning the next hooked up volume where it doesn't.
-L <max-size-in-k-per-tarfile> -M myscript.sh
Or it could even unmount the current volume and automount the next volume for you. Or display a dialogue telling you to replace the drive.
One advantage is that you can easily extract from just one of the tar files; you don't need all of them or the first-and-last like with most backup systems. Each tar file is a valid one, and at most you need two tar files to extract any file, and most of them just one.
Tar multivolume can, of course, be combined with tar's built in compression.
Bacula went on to be Enterprise grade software.
No. It's slower. Informative, my ass.