Slashdot Mirror


Ask Slashdot: Simple Way To Backup 24TB of Data Onto USB HDDs ?

An anonymous reader writes "Hi there ! I'm looking for a simple solution to backup a big data set consisting of files between 3MB and 20GB, for a total of 24TB, onto multiple hard drives (usb, firewire, whatever) I am aware of many backup tools which split the backup onto multiple DVDs with the infamous 'insert disc N and press continue', but I haven't come across one that can do it with external hard drives (insert next USB device...). OS not relevant, but Linux (console) or MacOS (GUI) preferred... Did I miss something or is there no such thing already done, and am I doomed to code it myself ?"

76 of 405 comments (clear)

  1. USB and disk Speed by gagol · · Score: 4, Insightful

    May be your limiting factor here.

    --
    Tomorrow is another day...
    1. Re:USB and disk Speed by gagol · · Score: 4, Informative

      If you can achieve a sustained write speed of 50 megabytes per second, you are in for 140 hours of data transfer. I hope it is not a daily backup!

      --
      Tomorrow is another day...
    2. Re:USB and disk Speed by drsmithy · · Score: 3, Insightful

      If you can achieve a sustained write speed of 50 megabytes per second, you are in for 140 hours of data transfer. I hope it is not a daily backup!

      I'd be willing to bet his change rate isn't 24TB/day.

    3. Re:USB and disk Speed by jamesh · · Score: 5, Funny

      If the OP's porn collection can be logically broken up at some level, eg:

      /porn/blonde
      /porn/brunette
      /porn/redhead

      then the backup software could create one job for each directory, and multiple USB disks could be attached at once giving increased throughput. USB3 also increases speed to the point where the 7200RPM disk itself will become the bottleneck.

      So at 100MB/second per disk write speed with 4 disks going at once (assuming the source disks are capable of this supplying this volume of data and there are no other throughput limitations), you could do it in 16 hours, or 24 hours with more realistic margins.

      If it turns out that the source data is not porn (unlikely) and is highly compressible, then it could be done in far less time.

      Bacula can do all of this.

    4. Re:USB and disk Speed by Anonymous Coward · · Score: 5, Interesting

      Agreed. Best thing I ever did was get a computer case with a SATA sled bay, like one of these. It won't help with breaking up the files, but a plain SATA connection will be many times faster and many times cheaper than getting external USB drives (because you don't have to keep paying for external case + power supply). After you copy it over, you just store the bare drives in a nice safe place.

      This assumes it's a one-time or rare thing. If you do want access or the backup process is a regular thing, then an NAS or RAID setup is probably more convenient so that you don't have to keep swapping drives in and out.

    5. Re:USB and disk Speed by Anonymous Coward · · Score: 2, Funny

      Or, he could watch the content as it is copied. At 600 Mbytes/hour (assuming standard mpeg compression), it would be a month of 24/7 nonstop action!

      "- Hey boss, I need to, uhh, work from home for the next four weeks to handle the backup..."

    6. Re:USB and disk Speed by Pieroxy · · Score: 4, Funny

      then the backup software could create one job for each directory,

      Is that what we call a blow job?

    7. Re:USB and disk Speed by shokk · · Score: 2

      If he's looking for reliability in a backup, then his choice of disks is going to be a factor. A drive with consumer grade chances of URE is going to die in a handful of writes and reads. USB grade drives (Caviar Green anyone?) aren't known for their reliability. Something like a Hitachi Ultrastar RE has a very very low chance of encountering a URE, so will be much more reliable.

      --
      "Beware of he who would deny you access to information, for in his heart, he dreams himself your master."
    8. Re:USB and disk Speed by ilikejam · · Score: 4, Funny

      No. No it is not.

      --
      C-x C-s C-x k
    9. Re:USB and disk Speed by Anonymous Coward · · Score: 2, Informative

      It's "nudge-nudge", not "notch-notch".

      Also, you left out "wink-wink".

      Yes, I know, I should get a life..

    10. Re:USB and disk Speed by Anonymous Coward · · Score: 3, Funny

      Bacula can do all of this

      So he quantum leaps into you, and isn't allowed to leave until he performs the backup? Oh wait! Bacula, not Bakula.

    11. Re:USB and disk Speed by v1 · · Score: 4, Informative

      I have a setup here where the server's video media is about 8tb in size. That backs up via rsync to the backup server which is in another room over rsync. It contains a large number of internal and external drives. None of them are over 2tb in capacity. The main drive has data separated into subfolders and the rsync jobs back up specific folders to specific drives.

      A few times I've had to do some rearranging of data on the main and backup drives when a volume filled up. So it helps to plan ahead to save time down the road. But it works well for me here.

      The only thing with rsync you need to worry about is users moving large trees or renaming root folders in large trees. This tends to cause rsync to want to delete a few TB of data and then turn around and copy it all over again on the backup drive. It doesn't follow files and folders by inode, it just goes by exact location and name.

      I help mitigate this by hiding the root folders from the users. The share points are a couple levels deeper so they can't cause TOO big of a problem if someone decides to "tidy up". If they REALLY need something at a lower level moved or renamed, I do it myself, on both the source and the backup drives at the same time.

      Another alternative is to get something like a Drobo where you can have a fairly inexpensive large pool of backup storage space that can match your primary storage. This prevents the problem of smaller backup volumes filling up and requiring data shuffling, but does nothing for the issue of users mucking with the lower levels of the tree.

      --
      I work for the Department of Redundancy Department.
    12. Re:USB and disk Speed by deniable · · Score: 4, Funny

      Send error messages to a Blackberry and it's a RIM job.

    13. Re:USB and disk Speed by deniable · · Score: 5, Funny

      Bacula went on to be Enterprise grade software.

    14. Re:USB and disk Speed by MMC+Monster · · Score: 3, Funny

      Maybe he's personally backing up CERN?

      --
      Help! I'm a slashdot refugee.
    15. Re:USB and disk Speed by milgr · · Score: 2, Informative

      The LHC generates a petabyte per second.

      --
      Where law ends, tyranny begins -- William Pitt
    16. Re:USB and disk Speed by jedidiah · · Score: 2

      Actually, they are roughly the same price.

      Although SATA is more widespread and avoids any reduction in performance you might get from putting an intermediate layer in front of the native interface of the drive. A large drive is going to require a wall wart and all of those will need to be looked after.

      The problem with case+power supply is not the cost but the fact that it is something else to lose. This goes for the extra cabling too.

      Plus with a bare drive you can buy with performance in mind since the drive will likely be your bottleneck.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    17. Re:USB and disk Speed by voltorb · · Score: 3, Informative
    18. Re:USB and disk Speed by thereitis · · Score: 2

      If you're just using RAID to make a bunch of disks look like a single logical unit, consider mhddfs. It's a FUSE filesystem which makes a bunch of disks look like a single unit. I've used it for storing backups - it works as advertised.

      IIRC there were one or two caveats like a lack of hard link support so make sure you try all your use cases before relying on it.

    19. Re:USB and disk Speed by hairyfeet · · Score: 4, Funny

      USB would be just the most retarded way to go for something like this, its too slow and he's gonna be swapping worse than when we used to have to back up things to CDs.

      I'm guessing he's going USB because he don't have the cash to buy a NAS of that size but you can always jury rig you a NAS, its really not hard. We did something similar at the last shop I worked at when the boss scored a ton of SCSI drives at an auction and ended up with nearly a Tb NAS when the average HDD was 40Gb. Here is how you do it..

      You take a couple of full size towers, bigger the better, preferably twinkies as it makes the job a LOT easier. You strip 'em to the frames and use a couple of spot welds to make them into one giant case along with another couple of weld to mount a shitload of drive cages into the case. Then you take a cheap server or even desktop board, all that matters is it has a shitload of PCI slots which you fill with controller cards, SCSI in our case but SATA today, mount the board along with a big PSU to feed the drives and voila! One big ass DIY NAS unit that can hold a huge pile of drives. Just to finish our white trash conversion we tied on a Walmart box fan to keep the sucker cool and stuck it in a corner, worked great.

      The only software that I think would work with USB is Paragon Drive Backup as you can have it split by just about any size you want. They also have their own Linux based recovery media but damned if i know if you can get the software as a Linux installer, never ran into that situation to need it in that way. I know its worked great for me making OS images and backing up files and folders onto USB drives but if you're gonna be splitting to a ton of little drives then you are just gonna have to swap, no way out of that. If you want to fill the drives up then set Paragon to a small size, say 700Mb, but good fucking luck checking your backup as the amount of swapping you're gonna do is just insane.

      --
      ACs don't waste your time replying, your posts are never seen by me.
  2. Bacula is your friend by bernywork · · Score: 4, Informative
    --
    Curiosity was framed; ignorance killed the cat. -- Author unknown
    1. Re:Bacula is your friend by Anonymous Coward · · Score: 3, Informative

      Yes, Bacula is the only real solution out there that isn't going to cost you an arm and a leg, and that allows you to switch easily between any backup medium. As long as your mySQL catalog is intact restoration is a synch...

      Did I mention it supports backup archiving as well if you want duplicate copies for Tapes being shipped off site...

    2. Re:Bacula is your friend by arth1 · · Score: 5, Informative

      Yes, Bacula is the only real solution out there that isn't going to cost you an arm and a leg, and that allows you to switch easily between any backup medium.

      Except for good old tar, which is present on all systems.

      Most people are probably not aware that tar has the ability to create split tar archives. Add the following options to tar:
      -L <max-size-in-k-per-tarfile> -M myscript.sh ... where myscript.sh echoes out the name to use for the next tar file in the series. It can be as easy as a for loop checking where the tar file already exists and returning the next hooked up volume where it doesn't.
      Or it could even unmount the current volume and automount the next volume for you. Or display a dialogue telling you to replace the drive.

      One advantage is that you can easily extract from just one of the tar files; you don't need all of them or the first-and-last like with most backup systems. Each tar file is a valid one, and at most you need two tar files to extract any file, and most of them just one.

      Tar multivolume can, of course, be combined with tar's built in compression.

    3. Re:Bacula is your friend by Orsmo · · Score: 2

      > Yes, Bacula is the only real solution

      What a minute. Really?

      OP is asking for a linux console application that can perform a backup over multiple block devices (in this case externally attached hot-plugable drives like USB), and Bacula is what you come up with as the *only* real solution? Obviously you've never heard of dump.

      http://linux.about.com/od/commands/l/blcmdl8_dump.htm

      --
      -- Begin thoughtfuly, end insensitively.
      It has more impact that way.
    4. Re:Bacula is your friend by arth1 · · Score: 2

      I know you tried to make an asshat joke, but I'll respond anyhow:

      Yes, Microsoft provides tar (and many other useful apps primarily associated with Unix and Linux).

      Quoting Wikipedia:
      "Interix versions 5.2 and 6.0 are respective components of Microsoft Windows Server 2003 R2, Windows Vista Enterprise, Windows Vista Ultimate, and Windows Server 2008 as Subsystem for Unix-based Applications[1] (SUA[2]). Version 6.1 is included in Windows 7 (Enterprise and Ultimate editions), and in Windows Server 2008 R2 (all editions).[3]"

      If you have XP, W2k or a lesser version of Windows Vista or 7, you need to register with Microsoft to download Services For Unix.
      If you have Windows Server 2003 or newer, or Windows Vista/7 Ultimate or Enterprise, you can turn it on or off through the Windows features in the control panel.

  3. Split into multiple tar files? by Anonymous Coward · · Score: 5, Informative

    I'm guessing you don't have enough space to split a backup on the original storage medium and then mirror the splits onto each drive?

    Given the size requirements, it seems that might be prohibitive, but it would make things easier for you:

    How to Create a Multi Part Tar File with Linux

  4. A Full 24TB using only 2 USB ports by Bondolon · · Score: 2

    Assuming you're not worried about backup speed, you could use a four-bay external hard-drive enclosure in combination with RSYNC and LVM on any linux variety. I don't know if they all do, but the MediaSonic HF2-SU3S2 supports 3TB hard drives per bay, which means that two of them could be used in conjunction to provide 24TB of backup storage. Since you can make a large volume out of the full 24TB using LVM, you could even use something like dd to write to the disk (RSYNC with the archive option would be a better choice though, imho).

  5. RAID by Anonymous Coward · · Score: 5, Informative

    For that much data you want a RAID since drives tend to fail if left sitting on the shelf, and they also tend (for different reasons) if they are spinning.

    Basically: buy a RAID enclosure, insert drives so it looks like one giant drive, then copy files.

    For 24TB you can use eight 4TB drives for a 6+2 RAID-6 setup. Then if any two of the drives fail you can still recover the data.

    1. Re:RAID by Kjella · · Score: 2

      For that much data you want a RAID since drives tend to fail if left sitting on the shelf, and they also tend (for different reasons) if they are spinning. Basically: buy a RAID enclosure, insert drives so it looks like one giant drive, then copy files. For 24TB you can use eight 4TB drives for a 6+2 RAID-6 setup. Then if any two of the drives fail you can still recover the data.

      Yeah... though I suspect with the price premium for 4TB drives - they're huge - and the cost of an 8-port RAID6 capable RAID card you're considerably above the budget he was going for. If this is like "projects" or something I'd probably suggest the human archiving method - split your live disk into three areas, "work in progress" and "to archive" and "archive". Your WIP you back up completely every time, your "to archive" you add to the latest archive disk (plain, no RAID), and make an index of it so you can easily find on which archive disk it is then move it to "archive" on the live disk. Very low tech incremental backup but this seems like a hobby project. I certainly hope it's not a company's backup / disaster recovery plan...

      --
      Live today, because you never know what tomorrow brings
    2. Re:RAID by Sarten-X · · Score: 3, Informative

      As mentioned already, RAID is not a backup solution. While it will likely work fine for a while, the risk of a catastrophic failure rises as drive capacity increases. From the linked article:

      With a twelve -terabyte array the chances of complete data loss during a resilver operation begin to approach one hundred percent - meaning that RAID 5 has no functionality whatsoever in that case. There is always a chance of survival, but it is very low.

      Granted, this is talking about RAID 5, so let's naively assume that doubling the parity disks for RAID 6 will halve the risk... but then since we're trying to duplicate 24 terabytes instead of twelve, we can also assume the risk doubles again, and we're back to being practically guaranteed a failure.

      Bottom line is that 24 terabytes is still a huge amount of data. There is no reliable solution I can think of for backing it all up that will be cheap. At that point, you're looking at file-level redundancy managed by a backup manager like Backup Exec (or whatever you prefer) with the data split across a dozen drives. As also mentioned already, the problem becomes much easier if you're able to reduce that volume of data somewhat.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    3. Re:RAID by the_B0fh · · Score: 2

      You should check out ZFS. These issues go away. And with RAID-Z3, up to 3 drives can die before you have a problem.

    4. Re:RAID by the_B0fh · · Score: 2

      ZFS + snapshots. problem solved. Though you do need more drives than a 8x3TB.

    5. Re:RAID by d3vi1 · · Score: 2

      You didn't read what I said. Yes, ZFS+Snapshots, but you also need at least Sun Cluster replication and tape backup. ZFS + Snapshots doesn't save you from fires, floods, software bugs and ill-will. It does save you from idiots, and disk failure though.

      --
      UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever ones.
    6. Re:RAID by the_B0fh · · Score: 2

      Sure it does, when you have a second set of them. Where you store tapes, store the drives instead. What do you think all those virtual tape vaults are made of?

    7. Re:RAID by louic · · Score: 3, Informative

      As mentioned already, RAID is not a backup solution.

      Nevertheless, there is nothing wrong with using disks that happen to be in a RAID configuration as backup disks. In fact, it is probably a pretty good idea for large files and large amounts of data.

    8. Re:RAID by meddle99 · · Score: 2

      For that much data, I'd recommend just keeping the original HD-DVDs and Blu-ray media for your "20 GB" files, and the original CDs for your "3 MB" files. We are helping someone back up his music and audio collection, aren't we? And they won't be pirated, will they?

      The idea of re-ripping ~2400 assorted CDs, DVDs, and BR (total 15tb) really does not appeal to me. So- I back it up, even though I have every one of the originals. Just because you don't understand something is not a reason to assume the person is doing something illegal.

    9. Re:RAID by Sarten-X · · Score: 2

      Quite the contrary, and that's my point. The errors here aren't just "let's try again" failures. They're unrecoverable, final, data-is-gone-forever errors, and the chances of encountering one are very high with so much data. Resilvering such a large array is practically impossible (as described in the article I linked to). Without resilvering and having blocks spread among disks, losing one disk means you've lost a little bit of everything, so all your data is corrupt, rather than just the fraction that was stored on the failing drive. Add to that hassle the extra expense of having more disks, controllers, and setup time, and the submitter would be better off writing a few thousand DVDs.

      --
      You do not have a moral or legal right to do absolutely anything you want.
  6. Julian? by WinstonWolfIT · · Score: 5, Funny

    Out on bail mate?

  7. Re:DaisyChain by Captain+Hook · · Score: 2

    It's not mentioned by the Author, so I might be assuming too much but if he's trying to write to USB Drives as opposed to a RAID of some sort I figured he wanted to be able to read the drives individually, prehaps on a different machine without a network connection between them.

    The drobo won't allow that, the file system is spread across all the drives.

    I guess it kind of depends on what the author needs to do with the drives when he's finished writing to them.

    --
    These comments are my personal opinions and do not necessarily reflect the opinions of the other voices in my head.
  8. git-annex by Anonymous Coward · · Score: 4, Informative

    You might want to look into git-annex:
    http://git-annex.branchable.com/

    I've not tried it, but it sounds like an ideal solution for your request, especially if your data is already compressed.

  9. Tape? by mwvdlee · · Score: 5, Insightful

    Why not tape, backup RAID, SAN or some other dedicated backup hardware solution?
    24TB is well within the range that a professional solution would be required.
    Given a harddisk size of ~1TB, making a single backup to 24 disk isn't a backup; it's throwing data in a garbage can.
    More than likely atleast one of those disks will die before it's time.

    --
    Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    1. Re:Tape? by Lumpy · · Score: 4, Insightful

      Yup. spool to tape. get a SDLT600 tape cabinet and call it done. if you get a 52 tape robot cabinet you will have space to not only hold a complete backup but a second full backup in incrementals that will all run automatically. Plus it has the highest reliability.

      And anyone whining about the cost. If your 24Tb of data is not worth that much then why are you bothering to back it up?

      --
      Do not look at laser with remaining good eye.
    2. Re:Tape? by Anonymous Coward · · Score: 5, Informative

      No kidding. For $2400, you get 24x TB HDs and a bookkeeping nightmare if you ever actually resort to the "backup." For $3k, you get a network-ready tape autoloader with 50-100TB capacity and easy access through any number of highly refined backup and recovery systems.

      Now, if the USB requirement is because that's the only way to access the files you want to steal from an employer or government agency, then the time required to transfer across the USB will almost guarantee you get caught. Even over the weekend. You should come up with a different method for extracting the data.

  10. tar --multi-volume by jegerjensen · · Score: 5, Interesting

    Evidently, our UNIX founding fathers had similar challenges...

  11. Tar already does this by cyocum · · Score: 3, Informative

    Have a look at tar and it's "multi-volume" option.

    1. Re:Tar already does this by leuk_he · · Score: 5, Informative

      multi volume tarJust mount a new usb disk whenever it is full.

      However to have reasonable retrieve rate (going through 24 TB of data will rake some days over USB2), You better split the dataset in multiple smaller sets. That also has the advantage that if one disk chrashes (AND Consumer grade USB disk will chrash!) not your entire dataset is lost.

      For that reason (diskfailure), do not use some linux spanning disk feature. File systems are lost when one of the disks they write on are lost. Unless you use a feature that can handle lost disks (Raid/ Zraid)

      And last but not least: Test your backup. I have seen myself cheap USB interfaces failing to write the data to disk without a good error messages. All looks ok until you retreive the data and some files are corrupted.

  12. Linuxquestions thread on multi-disk backups by Anonymous Coward · · Score: 2, Informative

    Here's a Linuxquestions thread outlining multi-disk backup strategies.

    The gist of the discussion is to use DAR.

  13. You know... by marsu_k · · Score: 5, Funny

    Porn is a renewable resource, there's no need to store so much of it.

  14. Seriously: Build your own homebrew NAS. by Qbertino · · Score: 4, Interesting

    What your attemting isn't easy, it's actually difficult.
    Buy a cheap and big refurbished workstation or rackmount server, install a few extra SATA controllers and maybe a new power supply, hook up 12 2TB drives, install Debian, check out LVM and your all set.

    Messing around with 12 - 24 external HDDs and their power supplys is a big hassle and asking for trouble. Don't do it. Do seriously go through the possibilty of building your own NAS. You'll be thankfull in the end and it won't take much longer, it might even go faster and be cheaper if you can get the parts fast.

    My 2 cents.

    --
    We suffer more in our imagination than in reality. - Seneca
    1. Re:Seriously: Build your own homebrew NAS. by DRJlaw · · Score: 4, Interesting

      What your attemting isn't easy, it's actually difficult.
      Buy a cheap and big refurbished workstation or rackmount server, install a few extra SATA controllers and maybe a new power supply, hook up 12 2TB drives, install Debian, check out LVM and your all set.

      Messing around with 12 - 24 external HDDs and their power supplys is a big hassle and asking for trouble. Don't do it. Do seriously go through the possibilty of building your own NAS. You'll be thankfull in the end and it won't take much longer, it might even go faster and be cheaper if you can get the parts fast.

      Way to redefine the problem instead of working within the specifications.

      Perhaps:
      1. The poster ALREADY has a NAS and wants to have airgapped or even offsite/offline backup.

      2. External HDDs are fast, common, reasonably cheap, and do not have a single point of failure (e.g., the tape backup drive in many suggested alternatives)

      I'm interested in this question. I use this general setup, but on a smaller scale. I cannot put a NAS in a safety deposit box. I cannot ensure that my "backup" NAS would not be drowned in a flood, burned in a fire, fried by a lightning strike...

      Let's pretend the poster is not an idiot, and answer the actual question. If he has 24TB of data, IT'S ALREADY ON DAS/NAS. Geesh.

  15. Bash.... by djsmiley · · Score: 4, Informative

    First bash script to grab the size of the "current" storage;

    compress the files up until that size;

    Move compressed file onto storage;

    request new storage, start again.

    ----------

    Or, if you've got all the storage already connected; bash for 0..x; do { cp $archive$x /mount/$x/ }; done :D

    --
    - http://www.milkme.co.uk
  16. Use DAR or KDAR by pegasustonans · · Score: 2, Informative

    If you don't want to invest in new hardware, you could use DAR or KDAR (KDE front-end for DAR).

    With KDAR, what you want is the slicing settings.

    There's an option to pause between slices, which gives you time to mount a new disk.

    --
    And all our yesterdays have lighted fools The way to dusty death. --Will
  17. Re:solution by aglider · · Score: 4, Informative

    3.samba

    Uh? Why?
    cp -a is all you need once you put the HDD inside the target machine.
    And if you put it into another machine on the same network, then rsync is the answer.
    Forget about the buggy and slow SAMBA.

    --
    Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
  18. Re:JBOD or more accurately, spanned volume by sumdumass · · Score: 2

    how transportable is that though?

    I mean, if i copied 200 gig across 3 drives in a jbod raid, could i plug just one drive in to access the information on another machine? Suppose my laptop only has 2 usb ports and i do not have a hub plus i'm running a different OS, does this mean i can't look for information on the set?

    I have never used JBOD for raid, I have however used regular mirrored and stripped raids with and without fault tolerance (raid 5 and 10 or a mirrored stripe for instance) and know this can be a problem. In fact, I've even seen issues reading a complete raid set across systems when you aren't using a true hardware raid controller.

  19. Re:DaisyChain by GCsoftware · · Score: 2

    Actually 8x4 TB disks will do it, with the overhead etc, giving you 24.96 TB usable space.

  20. PAR by fa2k · · Score: 3, Informative

    I have just seen "PAR" a couple of times here on slashdot, haven't used it, but it seems great for this: http://en.wikipedia.org/wiki/Parchive . You need enough redundancy to allow one USB drive to fail. And I would rather get a SATA bay and use "internal" drives than having to deal with external USB drives. Get "green" drives, they are slow but cheap.

  21. NAS by Wolfling1 · · Score: 2

    A 24TB NAS is not very hard to assemble. Relatively cheap, and basically transfers data at Gb speed - assuming that you populate it with fast disks. Set one up with RAID and you're away. Personally, I would do it with a low end server and a big-ass RAID array. That way, you can really control its behaviour via the OS. Linux is ferpect for this kind of thing.

  22. Re:JBOD or more accurately, spanned volume by hawkinspeter · · Score: 2

    Seems like a very bad idea to me. You'll have trouble creating a JBOD device without connecting all the drives simultaneously. Also, you're basically increasing the chance that the entire JBOD volume will be broken as the number of drives goes up. If you've got one drive failing, you'll be lucky to get any data back at all.

    To my mind, Bacula would be a good choice as you can set up virtual tapes that will correspond to the drives and you can set the backup to wait for the operator to swap over the drive and then continue the backup. Also, once you've got Bacula installed and working, it's easy to do incremental backups and thus not need to write out the whole dataset again.

    --
    You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
  23. I know by Anonymous Coward · · Score: 2, Funny

    The iCloud! ;-)

  24. Read it from Torvald's lips by zapyon · · Score: 4, Funny

    "Only wimps use tape[*] backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)"
    Linus Torvalds (1996) http://en.wikiquote.org/wiki/Linus_Torvalds

    (Isn't that prescience of "The Cloud"?)

    ––––––––––
    * replace this with your favorite backup media of today ;-)

    --
    I like my spaghetti with source.
    1. Re:Read it from Torvald's lips by rvw · · Score: 2

      "Only wimps use tape[*] backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)"

      Linus Torvalds (1996) http://en.wikiquote.org/wiki/Linus_Torvalds

      (Isn't that prescience of "The Cloud"?)

      ––––––––––

      * replace this with your favorite backup media of today ;-)

      "Only wimps use ftp[*] backup: real men just upload their important stuff to the iCloud, and let the rest of the world mirror it ;)"

      An Amazon support employee (2012)

  25. Count Bacula by freaker_TuC · · Score: 2

    Count Bacula as your friend ;) -> http://www.bacula.org/

    --
    --- I am known for the ones who want to find me on the net. Is that a privacy risk or a privilege? One might wonder..
  26. Backup advice by Anonymous Coward · · Score: 2, Insightful

    I do things like this all the time with a data set about half of that, ~ 12TB. You didnt say anything about what the data is but from the request and the fact you mentioned USB I would gather this is your typical warez hording mp3/flac, mkv, apps and also a personal picture and video collection of fam.

    Here is a checklist i would execute similiar to mine. I find the most reliable way to keep your data over the years is by following a checklist or procedure and choosing when to move to the next storage platform.

    Step 0: Get USB out of your head. Pop upon the drive and attach it to the native bus, PATA, SATA. if SATA may want to invest in ESATA cases. Its not solely the speed. I have done stupid things like this, in which the data backup takes over 2 days, and on the 2nd day some unrelated event affecting my USB bus causes all kinds of problems with the transfer. Over time doing cheesy things like this affects other things, like doing stupid shit in real life, usually with duct tape or guerrilla glue, then you have your wife on you. Right now your wife may not catch on to this, but it will escalate. Just do shit the right way.

    Step 1: Organize. Actually understand what you are backing up. I never got into these tools like google desktop that allow a user to accept the fact that he/she has no idea where their files are. Understand and make an effort to organize your files before you back them up and know the capacity of each 'genre' of crap you are backing up. Run a tool like 'jdiskreport' to find this information out after you organize. Create a mapping on paper of where shit is going, zork style. If you have really important shit like family pictures, taking up say 200GB, and your mkv collection is 12TB, you may want to make 2x copies of your family shit. Anything you download off the internet is easily replaceble despite how obscure your tastes may be and will turn up again. I would question even backing it up but that is another conversation.

    Step 2. Label your drives accordingly to your documentation.

    Step 3. Format the drives in the most likely native format you will use and are familiar with. If you are a noob linux guy who runs Windows 7 all the time, dont be an idiot and experiment with your backup on ext3. It is not that ext3 is a bad filesystem, but you may not be the most skilled in restoring your data in various scenarios. For example im a linux and solaris geek but am just getting into macs --- im not comfortable enough with mac failures enough to store my crap on a mac fs. Whatever your skillset is, dont use the most optimal file system on paper, use what you know, even if it is NTFS (which imo is very reliable).

    Step 4. Copy your shit over using your knowledge of your data organization and native OS commands or tools.

    Step 5. Run a checksum on your important stuff and store the hashes to verify everything is fine over time. Odd situations occur when backing up data. I have run into cases where i didnt realize the files i was about to backup were bad/corrupt until i saw the good copy on a backup drive i was about to incrementally overwrite.

    Step 6. Store the shit somewhere else if you can reasonably do this and feel confident in the security of your data. If you have to start encrypting your crap, you add some more complexity that can effect the reliability of your restoration, but again if you proceduralize and keep up on it you will be fine.

    Backup design and integrity is hard work and serious business when dealing with large volumes. It reminds me of the Seinfeld episode where he goes to the car rental place and they dont have his car and he goes into his "Anyone can take the ticket" diatribe. Anyone can back up their data. But can you get it back? I am not an expert in this area and dont pretend to be, i am just a seasoned IT administrator who has performed alot of backups in my day and have managed to keep most of my data safe over the years.

  27. Keep it simple by jampola · · Score: 2

    # rsync -avz /this /that. Split your directories corresponding to the sizes of your drives. If on Linux, run smartctl -H /dev/sdX to check your disk health and if possible, take the HDD's our of their usb enclosures and connect them directly to SATA for faster xfer speeds. These drives will 9/10 mount just like a normal drive since usually they are just a normal drive housed in an enclosure.

    Good luck :)

  28. Damn! by robbie73 · · Score: 2

    Damn, that's a lotta pr0n!

  29. Re:No. by asdf7890 · · Score: 2

    USB 2.0 provides 480Mbps of (theoretical) bandwidth. So unless you go Gigabit all over your network (not unreasonable), you won't beat it with a NAS. Even then, it's only 1-and-a-bit times as fast as USB working flat-out (and the difference being if you have multiple USB busses, you can get multiple drives working at once).

    The 480Mbps is nowhere near what you will see in practise, unlike network speeds which are far closer to the rated maximum. Most USB drives I've seen top out at somewhere between 25 and 30MByte/sec, and if there are no other bottlenecks it isn't unusual to see 100Mbyte/sec from a gbit switched network. My main desktop pulls things from the fileserver at around 80Mbyte/sec, which is as fast as local reads tend to be on that array. So you are right about 100mbit networks: that'll be the bottleneck not USB, but gbit networking should outdo USB2 by at least a factor of 2, possibly 3, maybe even more if you have better drives in you main storage array than I do.

    Before trying to run several USB drives to max out your network bandwidth, consider that you will taking the source disks too. Unless they are SSDs having 2, 3, or more concurrent bulk reads going on may not be any faster than one concurrent read as all the extra head movements will wipe out the bulk speed potential. If the OP's 24Tb is spread over numerous physical drives this need not ban an issue though (with planning careful enough to ensure there aren't two bulk processes reading from the same physical devices.

    And USB 3.0 would beat it again.

    That it would. I have an SSD in a USB3 enclosure, and it can happily consume 80Mbyte/sec read over my little network. It might even be able to do better than that: I've not measured a bulk write read from the internal SSD yet.

    And 10Gb between the client and a server is an expensive network to deploy still.
    Granted, eSATA would probably be faster but there's nothing wrong with USB for such tasks if you *don't* want to provide Gigabit connections everywhere and (presumably) greater-than-gigabit backbones.

    If I wanted more speed than USB3+gbit can provide (due to the size of data being backed up on each run) I'd be plugging the backup device(s) in locally to the source (vie eSATA, USB3, or such) rather than using the network (though again taking note to be careful how things are done if trying to use more than one backup device at once).

    For the size of data being described, I'd not want a set of USB drives to be my primary backup solution though.

  30. Re:solution by AvitarX · · Score: 4, Insightful

    Is it that much faster for 3mb to 20 gb files?

    --
    Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
  31. Re:solution by fnj · · Score: 5, Informative

    No. It's slower. Informative, my ass.

  32. Re:Use purpose designed backup media. by tomtomtom · · Score: 2

    Whether tape or disk is appropriate really depends what you are intending to use the backup for and how important your data is. You might even choose to use a mixture of the two.

    If it's your only backup, I would suggest that it's not wise to leave it permanently online in the way you suggest; that leaves you open to any number of potential issues which your backup is supposed to protect you from (OS bug, misconfiguration, lightning strike, power failure, overheating, ...). Tape libraries have the same issue although at least there you are exposed to a different set of software bugs and the other tapes in the library might be OK if they are not physically in use when the worst happens.

    For the inadvertent file deletion, you can cover this with better tools using true online storage - effectively some form of regular snapshotting (ZFS snapshots, rdiff-backup, Windows VSS, etc) to keep a (shortish) recent history. This should cover a good proportion of restore requests depending on how much history you can keep. For the rest, you're right that if you need to restore files very regularly then you might need a second drive and/or robot. Whether you need to do that or not will just depend on your use case.

    Even if you do go with disk, make sure you use something which can properly keep multiple versions of files - just rsync'ing a big directory onto another disk is a recipe for disaster. My personal favourites are rdiff-backup and DAR (which can handle multiple volumes as others have pointed out) but there are others out there too, eg bacula.

  33. Re:solution by bastafidli · · Score: 2

    cp doesn't preserver exact timestamp. If you want to do rsync later, it will copy all files all over. Jusd do

    rsync --dry-run --archive --stats --progress --whole-file --exclude "/lost+found" --delete-after /source/ /destination

    which is reproducible and later on will copy only the newer files.

  34. BackBlaze by minijedimaster · · Score: 2

    A cloud backup service released information on how they build their own disk based backup servers. Maybe something that would help with your endeavor? http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/

  35. Simple answer: don't. by dandaman32 · · Score: 2

    I work for a data backup company as a dev monkey/admin/jack-of-all-trades.

    Do you ever want to restore these backups? If the answer is "yes" (and it should be, otherwise why are you backing up in the first place...?), then you need to be guarded against failure of an individual disk. That means you need some sort of RAID solution.

    For reference, Datto's 3U nodes store 20TB across 14 2TB drives, and the next larger size of node we have is somewhere around 55TB in 4U. No, I'm not trying to sell you our hardware (we only sell to resellers anyway) but hear me out. You really are going to save yourself some headache if you build a NAS device.

    USB 2.0 is SLOW AS BALLS. I see our USB seed drives (HDDs we mail out to customers to get their initial datasets up into the ether) max out at 20-30MB/sec on a good day. By comparison, Gigabit Ethernet will give you 112MB/sec after NFS/TCP/Ethernet overhead -- much better. For this reason, and because it's just so impractical to handle large collections of failure-prone USB drives, our largest round trip drive that is shipped as USB is 4TB. After that, we actually ship our customers NAS devices (usually a returned/development box with a different OS image on it).

    Go with NAS. You need the resilience against disk failure, you need the additional speed, and while yes, it's a greater investment, the alternative is utter agony when one of your 12 2TB disks takes a dump.

  36. Re:DaisyChain by Painted · · Score: 4, Informative

    DON'T DO THIS.

    We did this exact thing using WD Green drives for our 18Tb backup problem. Got two of 'em, planning on using their built-in rsync for onsite/off siting the data. Unfortunately, the units never broke 1MB/s transfer, and no amount of work with Drobo yielded faster performance reliably. Both of our units are now sitting unused, ($2500 each!), and we put the drives into a RAID-50 8 bay USB3 enclosure. The new unit runs about 150x faster, and ended up costing $400 (prices are for enclosures only, drives were additional).

    Most disappointing was Drobo's support- they just seemed to shrug a lot, and were hyper-agressive about closing trouble tickets.

    --
    http://marsandmore.com - Posters of space, spacecraft, and astronomy.
  37. External RAID enclosure by kimvette · · Score: 2

    You buy one of these:

    http://www.newegg.com/Product/Product.aspx?Item=N82E16816322007

    populate it with 4GB drives and create two RAID5 (or one RAID6) array, then you've got 24 or 28 TB of backup space, without having to change drives or break up your backup into smaller chunks.

    But really, your backup methodology is broken; you need to organize the data into manageable chunks because aside from a large dedicated backup server/SAN, there is no reliable (don't tell me tape is reliable) backup solution for a such a large quantity of data in a single chunk.

    What I do for backups: in my 24-bay server I have eight large drives in a (HARDWARE) RAID5 array (were 4TB drives available at the time I'd have gone RAID6) and rsync the virtualized server contents to that, then archive them into tarballs, and send copies of them across the LAN to another server that is running (HARDWARE) RAID5 as well. Every once in a while I back up the critical data (source, scripts, financial data, production web sites, /etc, and so forth but not the program binaries nor system binaries which are easily recreated or reinstalled, respectively) to optical media and external hard drives.

    So what I have in summary is:
    * Massive server with a backup array separate from the production array
    * Separate backup server running another array (again, using a quality HARDWARE RAID controller. Safeguard your data and don't bother with Intel, Adaptec, Promise, or Highpoint "hybrid" RAID)
    * Periodic backups of non-recreatable data to USB drives and optical media that are moved off site.

    --
    The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50
  38. Re:solution by Richy_T · · Score: 2

    Yes. The above tar command is really from a time when cp did not have r and p options (and still likely doesn't on some systems so it's worth knowing). OTOH, you can add in the z option (compress) if you're doing something networky (though you'll probably want to throw in netcat or ssh too in that case). Of course, if you're doing that, rsync is probably the better option if available and leads to some interesting backup options going forward.

  39. Re:NAS Box by Richy_T · · Score: 2

    The 200GB range drives in my main server have been trundling along for many years while I have a pile of 0.5-2TB hard drives I need to go through and get warrantied (three of them Caviar blacks). Not impressed with the big drives.