Slashdot Mirror


Ask Slashdot: Simple Way To Backup 24TB of Data Onto USB HDDs ?

An anonymous reader writes "Hi there ! I'm looking for a simple solution to backup a big data set consisting of files between 3MB and 20GB, for a total of 24TB, onto multiple hard drives (usb, firewire, whatever) I am aware of many backup tools which split the backup onto multiple DVDs with the infamous 'insert disc N and press continue', but I haven't come across one that can do it with external hard drives (insert next USB device...). OS not relevant, but Linux (console) or MacOS (GUI) preferred... Did I miss something or is there no such thing already done, and am I doomed to code it myself ?"

39 of 405 comments (clear)

  1. USB and disk Speed by gagol · · Score: 4, Insightful

    May be your limiting factor here.

    --
    Tomorrow is another day...
    1. Re:USB and disk Speed by gagol · · Score: 4, Informative

      If you can achieve a sustained write speed of 50 megabytes per second, you are in for 140 hours of data transfer. I hope it is not a daily backup!

      --
      Tomorrow is another day...
    2. Re:USB and disk Speed by drsmithy · · Score: 3, Insightful

      If you can achieve a sustained write speed of 50 megabytes per second, you are in for 140 hours of data transfer. I hope it is not a daily backup!

      I'd be willing to bet his change rate isn't 24TB/day.

    3. Re:USB and disk Speed by jamesh · · Score: 5, Funny

      If the OP's porn collection can be logically broken up at some level, eg:

      /porn/blonde
      /porn/brunette
      /porn/redhead

      then the backup software could create one job for each directory, and multiple USB disks could be attached at once giving increased throughput. USB3 also increases speed to the point where the 7200RPM disk itself will become the bottleneck.

      So at 100MB/second per disk write speed with 4 disks going at once (assuming the source disks are capable of this supplying this volume of data and there are no other throughput limitations), you could do it in 16 hours, or 24 hours with more realistic margins.

      If it turns out that the source data is not porn (unlikely) and is highly compressible, then it could be done in far less time.

      Bacula can do all of this.

    4. Re:USB and disk Speed by Anonymous Coward · · Score: 5, Interesting

      Agreed. Best thing I ever did was get a computer case with a SATA sled bay, like one of these. It won't help with breaking up the files, but a plain SATA connection will be many times faster and many times cheaper than getting external USB drives (because you don't have to keep paying for external case + power supply). After you copy it over, you just store the bare drives in a nice safe place.

      This assumes it's a one-time or rare thing. If you do want access or the backup process is a regular thing, then an NAS or RAID setup is probably more convenient so that you don't have to keep swapping drives in and out.

    5. Re:USB and disk Speed by Pieroxy · · Score: 4, Funny

      then the backup software could create one job for each directory,

      Is that what we call a blow job?

    6. Re:USB and disk Speed by ilikejam · · Score: 4, Funny

      No. No it is not.

      --
      C-x C-s C-x k
    7. Re:USB and disk Speed by Anonymous Coward · · Score: 3, Funny

      Bacula can do all of this

      So he quantum leaps into you, and isn't allowed to leave until he performs the backup? Oh wait! Bacula, not Bakula.

    8. Re:USB and disk Speed by v1 · · Score: 4, Informative

      I have a setup here where the server's video media is about 8tb in size. That backs up via rsync to the backup server which is in another room over rsync. It contains a large number of internal and external drives. None of them are over 2tb in capacity. The main drive has data separated into subfolders and the rsync jobs back up specific folders to specific drives.

      A few times I've had to do some rearranging of data on the main and backup drives when a volume filled up. So it helps to plan ahead to save time down the road. But it works well for me here.

      The only thing with rsync you need to worry about is users moving large trees or renaming root folders in large trees. This tends to cause rsync to want to delete a few TB of data and then turn around and copy it all over again on the backup drive. It doesn't follow files and folders by inode, it just goes by exact location and name.

      I help mitigate this by hiding the root folders from the users. The share points are a couple levels deeper so they can't cause TOO big of a problem if someone decides to "tidy up". If they REALLY need something at a lower level moved or renamed, I do it myself, on both the source and the backup drives at the same time.

      Another alternative is to get something like a Drobo where you can have a fairly inexpensive large pool of backup storage space that can match your primary storage. This prevents the problem of smaller backup volumes filling up and requiring data shuffling, but does nothing for the issue of users mucking with the lower levels of the tree.

      --
      I work for the Department of Redundancy Department.
    9. Re:USB and disk Speed by deniable · · Score: 4, Funny

      Send error messages to a Blackberry and it's a RIM job.

    10. Re:USB and disk Speed by deniable · · Score: 5, Funny

      Bacula went on to be Enterprise grade software.

    11. Re:USB and disk Speed by MMC+Monster · · Score: 3, Funny

      Maybe he's personally backing up CERN?

      --
      Help! I'm a slashdot refugee.
    12. Re:USB and disk Speed by voltorb · · Score: 3, Informative
    13. Re:USB and disk Speed by hairyfeet · · Score: 4, Funny

      USB would be just the most retarded way to go for something like this, its too slow and he's gonna be swapping worse than when we used to have to back up things to CDs.

      I'm guessing he's going USB because he don't have the cash to buy a NAS of that size but you can always jury rig you a NAS, its really not hard. We did something similar at the last shop I worked at when the boss scored a ton of SCSI drives at an auction and ended up with nearly a Tb NAS when the average HDD was 40Gb. Here is how you do it..

      You take a couple of full size towers, bigger the better, preferably twinkies as it makes the job a LOT easier. You strip 'em to the frames and use a couple of spot welds to make them into one giant case along with another couple of weld to mount a shitload of drive cages into the case. Then you take a cheap server or even desktop board, all that matters is it has a shitload of PCI slots which you fill with controller cards, SCSI in our case but SATA today, mount the board along with a big PSU to feed the drives and voila! One big ass DIY NAS unit that can hold a huge pile of drives. Just to finish our white trash conversion we tied on a Walmart box fan to keep the sucker cool and stuck it in a corner, worked great.

      The only software that I think would work with USB is Paragon Drive Backup as you can have it split by just about any size you want. They also have their own Linux based recovery media but damned if i know if you can get the software as a Linux installer, never ran into that situation to need it in that way. I know its worked great for me making OS images and backing up files and folders onto USB drives but if you're gonna be splitting to a ton of little drives then you are just gonna have to swap, no way out of that. If you want to fill the drives up then set Paragon to a small size, say 700Mb, but good fucking luck checking your backup as the amount of swapping you're gonna do is just insane.

      --
      ACs don't waste your time replying, your posts are never seen by me.
  2. Bacula is your friend by bernywork · · Score: 4, Informative
    --
    Curiosity was framed; ignorance killed the cat. -- Author unknown
    1. Re:Bacula is your friend by Anonymous Coward · · Score: 3, Informative

      Yes, Bacula is the only real solution out there that isn't going to cost you an arm and a leg, and that allows you to switch easily between any backup medium. As long as your mySQL catalog is intact restoration is a synch...

      Did I mention it supports backup archiving as well if you want duplicate copies for Tapes being shipped off site...

    2. Re:Bacula is your friend by arth1 · · Score: 5, Informative

      Yes, Bacula is the only real solution out there that isn't going to cost you an arm and a leg, and that allows you to switch easily between any backup medium.

      Except for good old tar, which is present on all systems.

      Most people are probably not aware that tar has the ability to create split tar archives. Add the following options to tar:
      -L <max-size-in-k-per-tarfile> -M myscript.sh ... where myscript.sh echoes out the name to use for the next tar file in the series. It can be as easy as a for loop checking where the tar file already exists and returning the next hooked up volume where it doesn't.
      Or it could even unmount the current volume and automount the next volume for you. Or display a dialogue telling you to replace the drive.

      One advantage is that you can easily extract from just one of the tar files; you don't need all of them or the first-and-last like with most backup systems. Each tar file is a valid one, and at most you need two tar files to extract any file, and most of them just one.

      Tar multivolume can, of course, be combined with tar's built in compression.

  3. Split into multiple tar files? by Anonymous Coward · · Score: 5, Informative

    I'm guessing you don't have enough space to split a backup on the original storage medium and then mirror the splits onto each drive?

    Given the size requirements, it seems that might be prohibitive, but it would make things easier for you:

    How to Create a Multi Part Tar File with Linux

  4. RAID by Anonymous Coward · · Score: 5, Informative

    For that much data you want a RAID since drives tend to fail if left sitting on the shelf, and they also tend (for different reasons) if they are spinning.

    Basically: buy a RAID enclosure, insert drives so it looks like one giant drive, then copy files.

    For 24TB you can use eight 4TB drives for a 6+2 RAID-6 setup. Then if any two of the drives fail you can still recover the data.

    1. Re:RAID by Sarten-X · · Score: 3, Informative

      As mentioned already, RAID is not a backup solution. While it will likely work fine for a while, the risk of a catastrophic failure rises as drive capacity increases. From the linked article:

      With a twelve -terabyte array the chances of complete data loss during a resilver operation begin to approach one hundred percent - meaning that RAID 5 has no functionality whatsoever in that case. There is always a chance of survival, but it is very low.

      Granted, this is talking about RAID 5, so let's naively assume that doubling the parity disks for RAID 6 will halve the risk... but then since we're trying to duplicate 24 terabytes instead of twelve, we can also assume the risk doubles again, and we're back to being practically guaranteed a failure.

      Bottom line is that 24 terabytes is still a huge amount of data. There is no reliable solution I can think of for backing it all up that will be cheap. At that point, you're looking at file-level redundancy managed by a backup manager like Backup Exec (or whatever you prefer) with the data split across a dozen drives. As also mentioned already, the problem becomes much easier if you're able to reduce that volume of data somewhat.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    2. Re:RAID by louic · · Score: 3, Informative

      As mentioned already, RAID is not a backup solution.

      Nevertheless, there is nothing wrong with using disks that happen to be in a RAID configuration as backup disks. In fact, it is probably a pretty good idea for large files and large amounts of data.

  5. Julian? by WinstonWolfIT · · Score: 5, Funny

    Out on bail mate?

  6. git-annex by Anonymous Coward · · Score: 4, Informative

    You might want to look into git-annex:
    http://git-annex.branchable.com/

    I've not tried it, but it sounds like an ideal solution for your request, especially if your data is already compressed.

  7. Tape? by mwvdlee · · Score: 5, Insightful

    Why not tape, backup RAID, SAN or some other dedicated backup hardware solution?
    24TB is well within the range that a professional solution would be required.
    Given a harddisk size of ~1TB, making a single backup to 24 disk isn't a backup; it's throwing data in a garbage can.
    More than likely atleast one of those disks will die before it's time.

    --
    Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    1. Re:Tape? by Lumpy · · Score: 4, Insightful

      Yup. spool to tape. get a SDLT600 tape cabinet and call it done. if you get a 52 tape robot cabinet you will have space to not only hold a complete backup but a second full backup in incrementals that will all run automatically. Plus it has the highest reliability.

      And anyone whining about the cost. If your 24Tb of data is not worth that much then why are you bothering to back it up?

      --
      Do not look at laser with remaining good eye.
    2. Re:Tape? by Anonymous Coward · · Score: 5, Informative

      No kidding. For $2400, you get 24x TB HDs and a bookkeeping nightmare if you ever actually resort to the "backup." For $3k, you get a network-ready tape autoloader with 50-100TB capacity and easy access through any number of highly refined backup and recovery systems.

      Now, if the USB requirement is because that's the only way to access the files you want to steal from an employer or government agency, then the time required to transfer across the USB will almost guarantee you get caught. Even over the weekend. You should come up with a different method for extracting the data.

  8. tar --multi-volume by jegerjensen · · Score: 5, Interesting

    Evidently, our UNIX founding fathers had similar challenges...

  9. Tar already does this by cyocum · · Score: 3, Informative

    Have a look at tar and it's "multi-volume" option.

    1. Re:Tar already does this by leuk_he · · Score: 5, Informative

      multi volume tarJust mount a new usb disk whenever it is full.

      However to have reasonable retrieve rate (going through 24 TB of data will rake some days over USB2), You better split the dataset in multiple smaller sets. That also has the advantage that if one disk chrashes (AND Consumer grade USB disk will chrash!) not your entire dataset is lost.

      For that reason (diskfailure), do not use some linux spanning disk feature. File systems are lost when one of the disks they write on are lost. Unless you use a feature that can handle lost disks (Raid/ Zraid)

      And last but not least: Test your backup. I have seen myself cheap USB interfaces failing to write the data to disk without a good error messages. All looks ok until you retreive the data and some files are corrupted.

  10. You know... by marsu_k · · Score: 5, Funny

    Porn is a renewable resource, there's no need to store so much of it.

  11. Seriously: Build your own homebrew NAS. by Qbertino · · Score: 4, Interesting

    What your attemting isn't easy, it's actually difficult.
    Buy a cheap and big refurbished workstation or rackmount server, install a few extra SATA controllers and maybe a new power supply, hook up 12 2TB drives, install Debian, check out LVM and your all set.

    Messing around with 12 - 24 external HDDs and their power supplys is a big hassle and asking for trouble. Don't do it. Do seriously go through the possibilty of building your own NAS. You'll be thankfull in the end and it won't take much longer, it might even go faster and be cheaper if you can get the parts fast.

    My 2 cents.

    --
    We suffer more in our imagination than in reality. - Seneca
    1. Re:Seriously: Build your own homebrew NAS. by DRJlaw · · Score: 4, Interesting

      What your attemting isn't easy, it's actually difficult.
      Buy a cheap and big refurbished workstation or rackmount server, install a few extra SATA controllers and maybe a new power supply, hook up 12 2TB drives, install Debian, check out LVM and your all set.

      Messing around with 12 - 24 external HDDs and their power supplys is a big hassle and asking for trouble. Don't do it. Do seriously go through the possibilty of building your own NAS. You'll be thankfull in the end and it won't take much longer, it might even go faster and be cheaper if you can get the parts fast.

      Way to redefine the problem instead of working within the specifications.

      Perhaps:
      1. The poster ALREADY has a NAS and wants to have airgapped or even offsite/offline backup.

      2. External HDDs are fast, common, reasonably cheap, and do not have a single point of failure (e.g., the tape backup drive in many suggested alternatives)

      I'm interested in this question. I use this general setup, but on a smaller scale. I cannot put a NAS in a safety deposit box. I cannot ensure that my "backup" NAS would not be drowned in a flood, burned in a fire, fried by a lightning strike...

      Let's pretend the poster is not an idiot, and answer the actual question. If he has 24TB of data, IT'S ALREADY ON DAS/NAS. Geesh.

  12. Bash.... by djsmiley · · Score: 4, Informative

    First bash script to grab the size of the "current" storage;

    compress the files up until that size;

    Move compressed file onto storage;

    request new storage, start again.

    ----------

    Or, if you've got all the storage already connected; bash for 0..x; do { cp $archive$x /mount/$x/ }; done :D

    --
    - http://www.milkme.co.uk
  13. Re:solution by aglider · · Score: 4, Informative

    3.samba

    Uh? Why?
    cp -a is all you need once you put the HDD inside the target machine.
    And if you put it into another machine on the same network, then rsync is the answer.
    Forget about the buggy and slow SAMBA.

    --
    Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
  14. PAR by fa2k · · Score: 3, Informative

    I have just seen "PAR" a couple of times here on slashdot, haven't used it, but it seems great for this: http://en.wikipedia.org/wiki/Parchive . You need enough redundancy to allow one USB drive to fail. And I would rather get a SATA bay and use "internal" drives than having to deal with external USB drives. Get "green" drives, they are slow but cheap.

  15. Read it from Torvald's lips by zapyon · · Score: 4, Funny

    "Only wimps use tape[*] backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)"
    Linus Torvalds (1996) http://en.wikiquote.org/wiki/Linus_Torvalds

    (Isn't that prescience of "The Cloud"?)

    ––––––––––
    * replace this with your favorite backup media of today ;-)

    --
    I like my spaghetti with source.
  16. Re:solution by AvitarX · · Score: 4, Insightful

    Is it that much faster for 3mb to 20 gb files?

    --
    Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
  17. Re:solution by fnj · · Score: 5, Informative

    No. It's slower. Informative, my ass.

  18. Re:DaisyChain by Painted · · Score: 4, Informative

    DON'T DO THIS.

    We did this exact thing using WD Green drives for our 18Tb backup problem. Got two of 'em, planning on using their built-in rsync for onsite/off siting the data. Unfortunately, the units never broke 1MB/s transfer, and no amount of work with Drobo yielded faster performance reliably. Both of our units are now sitting unused, ($2500 each!), and we put the drives into a RAID-50 8 bay USB3 enclosure. The new unit runs about 150x faster, and ended up costing $400 (prices are for enclosures only, drives were additional).

    Most disappointing was Drobo's support- they just seemed to shrug a lot, and were hyper-agressive about closing trouble tickets.

    --
    http://marsandmore.com - Posters of space, spacecraft, and astronomy.