Slashdot Mirror


Ask Slashdot: Simple Way To Backup 24TB of Data Onto USB HDDs ?

An anonymous reader writes "Hi there ! I'm looking for a simple solution to backup a big data set consisting of files between 3MB and 20GB, for a total of 24TB, onto multiple hard drives (usb, firewire, whatever) I am aware of many backup tools which split the backup onto multiple DVDs with the infamous 'insert disc N and press continue', but I haven't come across one that can do it with external hard drives (insert next USB device...). OS not relevant, but Linux (console) or MacOS (GUI) preferred... Did I miss something or is there no such thing already done, and am I doomed to code it myself ?"

405 comments

  1. USB and disk Speed by gagol · · Score: 4, Insightful

    May be your limiting factor here.

    --
    Tomorrow is another day...
    1. Re:USB and disk Speed by gagol · · Score: 4, Informative

      If you can achieve a sustained write speed of 50 megabytes per second, you are in for 140 hours of data transfer. I hope it is not a daily backup!

      --
      Tomorrow is another day...
    2. Re:USB and disk Speed by drsmithy · · Score: 3, Insightful

      If you can achieve a sustained write speed of 50 megabytes per second, you are in for 140 hours of data transfer. I hope it is not a daily backup!

      I'd be willing to bet his change rate isn't 24TB/day.

    3. Re:USB and disk Speed by jamesh · · Score: 5, Funny

      If the OP's porn collection can be logically broken up at some level, eg:

      /porn/blonde
      /porn/brunette
      /porn/redhead

      then the backup software could create one job for each directory, and multiple USB disks could be attached at once giving increased throughput. USB3 also increases speed to the point where the 7200RPM disk itself will become the bottleneck.

      So at 100MB/second per disk write speed with 4 disks going at once (assuming the source disks are capable of this supplying this volume of data and there are no other throughput limitations), you could do it in 16 hours, or 24 hours with more realistic margins.

      If it turns out that the source data is not porn (unlikely) and is highly compressible, then it could be done in far less time.

      Bacula can do all of this.

    4. Re:USB and disk Speed by Anonymous Coward · · Score: 5, Interesting

      Agreed. Best thing I ever did was get a computer case with a SATA sled bay, like one of these. It won't help with breaking up the files, but a plain SATA connection will be many times faster and many times cheaper than getting external USB drives (because you don't have to keep paying for external case + power supply). After you copy it over, you just store the bare drives in a nice safe place.

      This assumes it's a one-time or rare thing. If you do want access or the backup process is a regular thing, then an NAS or RAID setup is probably more convenient so that you don't have to keep swapping drives in and out.

    5. Re:USB and disk Speed by Anonymous Coward · · Score: 2, Funny

      Or, he could watch the content as it is copied. At 600 Mbytes/hour (assuming standard mpeg compression), it would be a month of 24/7 nonstop action!

      "- Hey boss, I need to, uhh, work from home for the next four weeks to handle the backup..."

    6. Re:USB and disk Speed by Pieroxy · · Score: 4, Funny

      then the backup software could create one job for each directory,

      Is that what we call a blow job?

    7. Re:USB and disk Speed by shokk · · Score: 2

      If he's looking for reliability in a backup, then his choice of disks is going to be a factor. A drive with consumer grade chances of URE is going to die in a handful of writes and reads. USB grade drives (Caviar Green anyone?) aren't known for their reliability. Something like a Hitachi Ultrastar RE has a very very low chance of encountering a URE, so will be much more reliable.

      --
      "Beware of he who would deny you access to information, for in his heart, he dreams himself your master."
    8. Re:USB and disk Speed by Anonymous Coward · · Score: 1

      Is that what we call a blow job?

      We're talking about multiple "directories" (know-what-I-mean? notch-notch. say-no-more), so I think we would call it blow bang, DP, gang bang or orgy.

      Now, imagine a beowulf cluster fuck of those... :-)

    9. Re:USB and disk Speed by ilikejam · · Score: 4, Funny

      No. No it is not.

      --
      C-x C-s C-x k
    10. Re:USB and disk Speed by Anonymous Coward · · Score: 2, Informative

      It's "nudge-nudge", not "notch-notch".

      Also, you left out "wink-wink".

      Yes, I know, I should get a life..

    11. Re:USB and disk Speed by Anonymous Coward · · Score: 3, Funny

      Bacula can do all of this

      So he quantum leaps into you, and isn't allowed to leave until he performs the backup? Oh wait! Bacula, not Bakula.

    12. Re:USB and disk Speed by v1 · · Score: 4, Informative

      I have a setup here where the server's video media is about 8tb in size. That backs up via rsync to the backup server which is in another room over rsync. It contains a large number of internal and external drives. None of them are over 2tb in capacity. The main drive has data separated into subfolders and the rsync jobs back up specific folders to specific drives.

      A few times I've had to do some rearranging of data on the main and backup drives when a volume filled up. So it helps to plan ahead to save time down the road. But it works well for me here.

      The only thing with rsync you need to worry about is users moving large trees or renaming root folders in large trees. This tends to cause rsync to want to delete a few TB of data and then turn around and copy it all over again on the backup drive. It doesn't follow files and folders by inode, it just goes by exact location and name.

      I help mitigate this by hiding the root folders from the users. The share points are a couple levels deeper so they can't cause TOO big of a problem if someone decides to "tidy up". If they REALLY need something at a lower level moved or renamed, I do it myself, on both the source and the backup drives at the same time.

      Another alternative is to get something like a Drobo where you can have a fairly inexpensive large pool of backup storage space that can match your primary storage. This prevents the problem of smaller backup volumes filling up and requiring data shuffling, but does nothing for the issue of users mucking with the lower levels of the tree.

      --
      I work for the Department of Redundancy Department.
    13. Re:USB and disk Speed by deniable · · Score: 4, Funny

      Send error messages to a Blackberry and it's a RIM job.

    14. Re:USB and disk Speed by deniable · · Score: 5, Funny

      Bacula went on to be Enterprise grade software.

    15. Re:USB and disk Speed by postbigbang · · Score: 1

      SATA is good, and a set of SATA cages might be the cheapest solution.

      More expensive is getting SAS drives (if the interface is there) and it'll be quicker.

      More expensive, but not TOO expensive (these days) is a set of SATA cages and some SSDs, which will do the job far faster still (than conventional SATA drives).

      USB3, if supported, with SSDs, would do the job nicely, too.

      The software? Good old tar can be used, self-managed. Tar never seems to go out of style, and using it in conjunction with drive-spanning is a well-known exercise.

      --
      ---- Teach Peace. It's Cheaper Than War.
    16. Re:USB and disk Speed by CastrTroy · · Score: 1

      You can even get add-on cages for you existing case like this one. Big advantage is that you can fit 3 drives in 2, 5.25" slots, which very rarely have much use anymore, yet most cases seem to still have 3 or 4 of them. You can get other models that will fit fit 5 drives in 3 bays, or have 1 in 1 depending on your needs.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    17. Re:USB and disk Speed by MMC+Monster · · Score: 3, Funny

      Maybe he's personally backing up CERN?

      --
      Help! I'm a slashdot refugee.
    18. Re:USB and disk Speed by milgr · · Score: 2, Informative

      The LHC generates a petabyte per second.

      --
      Where law ends, tyranny begins -- William Pitt
    19. Re:USB and disk Speed by Anonymous Coward · · Score: 0

      many times cheaper than getting external USB drives (because you don't have to keep paying for external case + power supply).

      In theory, you'd be right.

      Except that at current prices, USB3 HDDs are cheaper than bare-metal drives of same capacity.

    20. Re:USB and disk Speed by jedidiah · · Score: 2

      Actually, they are roughly the same price.

      Although SATA is more widespread and avoids any reduction in performance you might get from putting an intermediate layer in front of the native interface of the drive. A large drive is going to require a wall wart and all of those will need to be looked after.

      The problem with case+power supply is not the cost but the fact that it is something else to lose. This goes for the extra cabling too.

      Plus with a bare drive you can buy with performance in mind since the drive will likely be your bottleneck.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    21. Re:USB and disk Speed by Laebshade · · Score: 1

      I had to re-read this several times before getting it. Bravo.

    22. Re:USB and disk Speed by GameboyRMH · · Score: 1

      And better yet you can get them as 5 1/4 bay accessories so you don't have to change your case or end up with obsolete tech permanently stuck in your case 5 years from now! (Since these things have maximum capacities they can support).

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    23. Re:USB and disk Speed by noh8rz6 · · Score: 0

      I read once, and if I don't get it I move on. 99% of the time the person is being illogical. No point in wasting my time on crazies.

      --
      Don't be a h8r.
    24. Re:USB and disk Speed by jamstar7 · · Score: 1

      Total failure on many levels? I see your point...

      --
      Understanding the scope of the problem is the first step on the path to true panic.
    25. Re:USB and disk Speed by Anonymous Coward · · Score: 0

      *RIM shot*

    26. Re:USB and disk Speed by sglewis100 · · Score: 1

      More expensive, but not TOO expensive (these days) is a set of SATA cages and some SSDs, which will do the job far faster still (than conventional SATA drives).

      24tb worth of SSDs? That's a lot of cages, and yes, it would be TOO expensive.

    27. Re:USB and disk Speed by jemenake · · Score: 1

      You can even get add-on cages for you existing case...

      I was going to suggest the same kinda thing. Easier than sliding them in and out of the main case and futzing with power and SATA cables. In fact, the ones you suggested are the very cases I use in my 2U rack servers that I build. I like 'em for what I use them for, but I'm going to make a different recommendation for the OP. The problem is that that rack requires that the drives be in special sleds/trays. If the OP is going to be swapping drives, that means swapping sleds.

      The new love of my SATA rack world are these StarTech trayless babies. The doors open with a little pull-latch... like most car-doors, and there's no tray. Just slide the drive in and close the door.

    28. Re:USB and disk Speed by voltorb · · Score: 3, Informative
    29. Re:USB and disk Speed by Anonymous Coward · · Score: 0
    30. Re:USB and disk Speed by Anonymous Coward · · Score: 0

      Well given the fact he can't make USB faster, I'd have to say his best option would be a program which creates the backups, then performs the write operations and allows multiple write operations to occur at once.

      If one write at a time takes 140, that means having two going at once would be 70, four would be 35, etc.

      It still seems doing it this way (backups to USB drives) is kind of absurd, but whatever.

      Ok, so what he'd need is temporary storage to cache the backups, a box of USB drives, and a computer with a lot of USB controllers on it (since a USB hub is just sharing bandwidth). Oh, and obviously someone who wants to spend their entire day swapping out USB drives and keeping them organized.

    31. Re:USB and disk Speed by postbigbang · · Score: 1

      Not as much as even a year ago.... not quite half as expensive. SSDs are plentifully fast.

      SAS is faster than SATA is faster than USB, generally speaking. the SSD SATAII drives are becoming almost commodity. The cage is the same cost, as it's form factor and power supply.

      --
      ---- Teach Peace. It's Cheaper Than War.
    32. Re:USB and disk Speed by Anonymous Coward · · Score: 0

      you're doing it wrong

    33. Re:USB and disk Speed by Anonymous Coward · · Score: 0

      Use the unison file synchronizer, it will detect moves and renames (and it is extremely fast).

    34. Re:USB and disk Speed by Pieroxy · · Score: 1

      you're doing it wrong

      Maybe I'm holding it wrong?

    35. Re:USB and disk Speed by robi2106 · · Score: 1

      I have two SATA over USB docks for bare SATA harddrives. I have ~7TB of unique data I keep (video editing business generates a lot of data).

      I copy data using BeyondCompare from the edit RAID array to each twin pair of SATA drives. That gives me some redundancy for the data, but is a PITA to track.

      I also would love to have a software solution that lets me just number the drives, tell the system to keep two copies of everything, and then would just instruct me which drive # to pop in the dock in order to back up file X.

    36. Re:USB and disk Speed by Anonymous Coward · · Score: 0

      One source and two destinations is what we call a three-way

    37. Re:USB and disk Speed by oldmac31310 · · Score: 1

      A nod's as good as a wink to a blind bat.

      --
      http://www.acetonestudio.com
    38. Re:USB and disk Speed by Immerman · · Score: 1

      I would be hesitant to use SSDs for long-term backup though. Hard drives, like tape, are a time-tested technology, in fact they're basically the same technology in a different form factor, though hard drives add the additional point of failure in their mechanical and electrical components.

      SSDs on the other hand are still a relatively new technology, especially at the feature sizes being used in modern drives. You don't have the mechanical failures to worry about - but your storage medium is essentially billions of tiny capacitors, and the one thing you can absolutely guarantee about a capacitor is that it won't hold it's charge indefinitely. And as drive capacities increase feature size decreases, and with it your charge-per-bit ratio, which can only bode ill for long-term storage.

      The more I look at SSDs, the more convinced I am that while SSDs may take over many niches (mobile storage especially) magnetic storage isn't going away any time soon. On the other hand as the controller technology develops the potential of using SSDs as cache for slower storage is becoming quite dramatic, and once it reaches a mature level that may open the way for other reliable high-capacity storage technologies to start entering the field "behind the cache", where poor performance of early generations wouldn't be as big a drawback

      --
      --- Most topics have many sides worth arguing, allow me to take one opposite you.
    39. Re:USB and disk Speed by thereitis · · Score: 2

      If you're just using RAID to make a bunch of disks look like a single logical unit, consider mhddfs. It's a FUSE filesystem which makes a bunch of disks look like a single unit. I've used it for storing backups - it works as advertised.

      IIRC there were one or two caveats like a lack of hard link support so make sure you try all your use cases before relying on it.

    40. Re:USB and disk Speed by jcoy42 · · Score: 1

      Is that what we call a blow job?

      Yes, that would probably be an effective way to blow your job.

      --
      Never trust an atom. They make up everything.
    41. Re:USB and disk Speed by hairyfeet · · Score: 4, Funny

      USB would be just the most retarded way to go for something like this, its too slow and he's gonna be swapping worse than when we used to have to back up things to CDs.

      I'm guessing he's going USB because he don't have the cash to buy a NAS of that size but you can always jury rig you a NAS, its really not hard. We did something similar at the last shop I worked at when the boss scored a ton of SCSI drives at an auction and ended up with nearly a Tb NAS when the average HDD was 40Gb. Here is how you do it..

      You take a couple of full size towers, bigger the better, preferably twinkies as it makes the job a LOT easier. You strip 'em to the frames and use a couple of spot welds to make them into one giant case along with another couple of weld to mount a shitload of drive cages into the case. Then you take a cheap server or even desktop board, all that matters is it has a shitload of PCI slots which you fill with controller cards, SCSI in our case but SATA today, mount the board along with a big PSU to feed the drives and voila! One big ass DIY NAS unit that can hold a huge pile of drives. Just to finish our white trash conversion we tied on a Walmart box fan to keep the sucker cool and stuck it in a corner, worked great.

      The only software that I think would work with USB is Paragon Drive Backup as you can have it split by just about any size you want. They also have their own Linux based recovery media but damned if i know if you can get the software as a Linux installer, never ran into that situation to need it in that way. I know its worked great for me making OS images and backing up files and folders onto USB drives but if you're gonna be splitting to a ton of little drives then you are just gonna have to swap, no way out of that. If you want to fill the drives up then set Paragon to a small size, say 700Mb, but good fucking luck checking your backup as the amount of swapping you're gonna do is just insane.

      --
      ACs don't waste your time replying, your posts are never seen by me.
    42. Re:USB and disk Speed by Anonymous Coward · · Score: 0

      Check out www.hudzee.com for the best plastic case for a bare internal hard drive.

    43. Re:USB and disk Speed by Anonymous Coward · · Score: 0

      ... and because of that you can't use it for neither blow jobs, nor RIM jobs?

    44. Re:USB and disk Speed by Anonymous Coward · · Score: 0

      Actually, you make a good a point. It used to be a greater difference, although you have to be careful to compare drive mechanisms. They usually put pretty low-end drives in the enclosures, and in my experience the failure rates are disappointingly high for those. Ever since the prices spiked after the flooding in SE Asia there have been some weird anomalies in the pricing of hard drives. I don't quite understand it.

    45. Re:USB and disk Speed by deniable · · Score: 1

      Don't worry T'Pol, it's a human thing.

    46. Re:USB and disk Speed by screwdriver · · Score: 1

      Unless you're using USB 3.0, which I use for backups. It's just as fast as if the HD were internal.

    47. Re:USB and disk Speed by Anonymous Coward · · Score: 0

      its too slow and he's gonna be swapping worse

      I have 8 multi-TB drives plugged into a 10 port USB2 hub on a laptop. Works great. Cheap and convenient. It's not going to win any speed records but it's not exceptionally slow either and USB3 would be faster.

    48. Re:USB and disk Speed by Anonymous Coward · · Score: 0

      Honestly, man, get a SAN or go uts with off site and pay for it. Backing up to external drives will take time, money, and heartache if you go USB, or Firewire. Heck. Even EBAY has decent deals if you're not picky. I also like Geeks.com but that is for rack boxen such as my HP DL360 G5 :)

    49. Re:USB and disk Speed by HArchH · · Score: 1

      24x7?

    50. Re:USB and disk Speed by atamido · · Score: 1

      If he's looking for reliability in a backup, then his choice of disks is going to be a factor. A drive with consumer grade chances of URE is going to die in a handful of writes and reads. USB grade drives (Caviar Green anyone?) aren't known for their reliability. Something like a Hitachi Ultrastar RE has a very very low chance of encountering a URE, so will be much more reliable.

      Citation required. Information I've read from the manufacturers themselves indicates a similar error rate and MTBF between various drive lines. The difference between consumer and enterprise grade drives tends to be in the firmware where the enterprise drive will give up faster when encountering a read error so as to not risk dropping the drive out of the RAID array.

    51. Re:USB and disk Speed by Anonymous Coward · · Score: 0

      Why not get one that's a little more suited to the job? http://www.newegg.com/Product/Product.aspx?Item=N82E16817576012

  2. solution by Anonymous Coward · · Score: 1

    1.take all hard drives out of USB enclosures
    2.install in PC with multiple SATA cards
    3.samba

    1. Re:solution by aglider · · Score: 4, Informative

      3.samba

      Uh? Why?
      cp -a is all you need once you put the HDD inside the target machine.
      And if you put it into another machine on the same network, then rsync is the answer.
      Forget about the buggy and slow SAMBA.

      --
      Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
    2. Re:solution by Pieroxy · · Score: 1

      Agreed. Samba should be at the very bottom of the list. It is the best solution only when there's no other solution.

    3. Re:solution by myowntrueself · · Score: 1, Informative

      3.samba

      Uh? Why?
      cp -a is all you need once you put the HDD inside the target machine.
      And if you put it into another machine on the same network, then rsync is the answer.
      Forget about the buggy and slow SAMBA.

      cp copies file by file.

      A more efficient way is something like

      tar -cf - .|(cd /somewhere/ ; tar xf -)

      tar treats the directory contents as a data stream. Its much faster for large amounts of files and data.

      --
      In the free world the media isn't government run; the government is media run.
    4. Re:solution by AvitarX · · Score: 4, Insightful

      Is it that much faster for 3mb to 20 gb files?

      --
      Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
    5. Re:solution by fnj · · Score: 5, Informative

      No. It's slower. Informative, my ass.

    6. Re:solution by Anonymous Coward · · Score: 1

      This simple tar pipe has saved my ass too many times to count!

      By the way, simplified version:
      tar -cC [sourceDir] . | tar -xC [destinationDir]

    7. Re:solution by Anonymous Coward · · Score: 0

      Build a Backblaze-style server. They cram 135TB in 4U.

    8. Re:solution by bastafidli · · Score: 2

      cp doesn't preserver exact timestamp. If you want to do rsync later, it will copy all files all over. Jusd do

      rsync --dry-run --archive --stats --progress --whole-file --exclude "/lost+found" --delete-after /source/ /destination

      which is reproducible and later on will copy only the newer files.

    9. Re:solution by GameboyRMH · · Score: 1

      cp has options to preserve timestamps and everything, RTFM.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    10. Re:solution by Cormacus · · Score: 1

      Well, it might not be faster for the example in question, but that's a neat idea if someone reading this has a slightly different need in mind - ie backing up many small files.

      --
      Mon chien, il n'a pas du nez. Comment scent-il? TrÃs mauvais!
    11. Re:solution by bastafidli · · Score: 1

      didnt know that, will have to take a look at it, thanks

    12. Re:solution by Culture20 · · Score: 1

      But it preserves more filesystem data than cp. And running a tar stream like this over ssh is easy. Of course I'd use rsync if I wanted to do it more than once.

    13. Re:solution by fnj · · Score: 1

      Not any more than cp -rp

      Examine the command. He's packing the files into a tar format stream, then unpacking it back into files on the other end. He's doing the same number of source opens and destinations creates, but he's adding a step.

    14. Re:solution by fnj · · Score: 1

      Why?

      Examine the command. He's packing the files into a tar format stream, then unpacking it back into files on the other end. He's doing the same number of source opens and destinations creates; moving the same number of bytes; but he's adding a step.

    15. Re:solution by Cormacus · · Score: 1

      Ah, I had not analyzed the command. I assumed that since it was using tar that one archive file was being created, and therefore your derision of that example was due to file overhead to file size ratio related.

      --
      Mon chien, il n'a pas du nez. Comment scent-il? TrÃs mauvais!
    16. Re:solution by berashith · · Score: 1

      this one looks like GNU tar, where the original deals with the crappiness that can be the native tar in solaris. I tend to use the original posters version out of habit as it moves to the dir and unpacks here (.) . Also, it can be fun to ssh stuff to different systems this way.

    17. Re:solution by Richy_T · · Score: 2

      Yes. The above tar command is really from a time when cp did not have r and p options (and still likely doesn't on some systems so it's worth knowing). OTOH, you can add in the z option (compress) if you're doing something networky (though you'll probably want to throw in netcat or ssh too in that case). Of course, if you're doing that, rsync is probably the better option if available and leads to some interesting backup options going forward.

    18. Re:solution by flargleblarg · · Score: 1

      Don't use cp.
      cp doesn't preserve hard links and symlinks. When cp copies a symlink, it follows the link and copies that file, rather than copying the symlink.

      cp is a very bad idea for archiving.

      rsync or tar are what you want.

    19. Re:solution by Score+Whore · · Score: 1

      tar treats the directory contents as a data stream. Its much faster for large amounts of files and data.

      WTF? Tar creates a stream, but the source data is still files and still have to be read as individual files and are still scattered all over your platters like the individual files that they are. Tar cannot magically transform random reads into sequential reads any more than I can have a four way with Jessica Biel, Scarlett Johansen and Rhianna.

      However if you recognize the fact that you have two different sets of media and they will have their own read/write characteristics and there is no need for them to be in any kind of lock step you could track down mbuffer and do something like this which should shave some time off the copy:

      tar -cf - * | mbuffer -s 256k -m 512m -o - | (cd /destination && tar -xpf -)

      Plus it gives you a nice little status line so you can see how fast it's going and how much has been transferred.

    20. Re:solution by Anonymous Coward · · Score: 0

      Ever wondered what cp -P does? RTFM

    21. Re:solution by Tanktalus · · Score: 1

      But he's also setting up reads and writes in different threads (processes, actually). Opening a new file and starting to read it will happen while the other thread is still finishing its writes and closing (syncing) the previous file.

      cp probably opens input, opens output, loops { reads input, writes output }, closes input, closes output. All in serial. Probably. (Well, the -p flag adds a bit more reading and writing, which, again, the tar solution would buffer between the processes.)

      I'm not convinced that one method is necessarily much better than the other. That said, I use the tar solution largely because it does the job more portably: I can copy all the files, symlinks, permissions, etc., recursively, on more platforms, than cp (I'm pretty sure the -R and/or -p flags haven't existed everywhere that I've used it). Maybe that has changed, I've not looked for a while. Also, the tar command allows me to split up the work. Not just different processors, but different machines: tar cf - . | ssh bigassserver 'cd /bigassdrive; tar xvf -' (Or reverse it if I want to pull instead of push my data.) (Or both can be ssh to use my public keys to get access to two remote machines, though I think the performance will suck at this point.)

    22. Re:solution by amorsen · · Score: 1

      He's doing the same number of source opens and destinations creates, but he's adding a step.

      An extremely cheap step, free in fact because the task is I/O limited. If cp is unable to read one file while writing the next (quite likely), it will lose badly.

      --
      Finally! A year of moderation! Ready for 2019?
    23. Re:solution by amorsen · · Score: 1

      With two tars going on different devices, either the reading tar or the writing tar will be slowest, and the other will not contribute anything much to the wall clock time. Compare to cp -r which requires threading (or built-in multiple processes, or async I/O) to achieve the same speed. I bet GNU cp isn't using any of those.

      I would expect the built-in pipe buffer in a modern OS to be large enough on its own, but perhaps mbuffer will help. On Linux you can adjust the pipe buffer size with fcntl, so you could implement the equivalent of mbuffer with just a small preloaded library.

      --
      Finally! A year of moderation! Ready for 2019?
    24. Re:solution by Score+Whore · · Score: 1

      What you say is true in certain situations, but if the two storage mediums have similar characteristics -- both SATA or maybe SATA & SAS or SATA & IEEE-1394 -- then mbuffer will help keep all your devices going all the time. It's not going to make an enormous difference, but for the minimal effort involved, what not take advantage of the 1% improvement it gives you over a day or two of copying?

      Additionally mbuffer gives you the possibility to have multiple writers, so you can make multiple copies without having to reread the source material.

    25. Re:solution by Demena · · Score: 1

      cp originally did one thing with a few options. Scrubbing it up to do the functions that cpio was written for could be regarded as system bloat.

    26. Re:solution by allo · · Score: 1

      but the buffer of a pipe is a few kb, afaik. so when the first process runs out of data, the second one does not have much buffer to work on. and if the second one is blocking, the first one has not buffer space to fill.

    27. Re:solution by tywjohn · · Score: 0

      Samba is not that bad. In fact in my NAS, I have to use samba instead of NFS because it doesn't work well with ZFS on FreeBSD. I know this is a little off topic but it also handles ACLs and user access much better than NFS.

  3. DaisyChain by Anonymous Coward · · Score: 1

    I believe you can daisy chain external drives together if you have the right cases.
    For ease though, I'd consider a DroBo http://www.drobo.com/products/professionals/drobo-5d/index.php

    1. Re:DaisyChain by Captain+Hook · · Score: 2

      It's not mentioned by the Author, so I might be assuming too much but if he's trying to write to USB Drives as opposed to a RAID of some sort I figured he wanted to be able to read the drives individually, prehaps on a different machine without a network connection between them.

      The drobo won't allow that, the file system is spread across all the drives.

      I guess it kind of depends on what the author needs to do with the drives when he's finished writing to them.

      --
      These comments are my personal opinions and do not necessarily reflect the opinions of the other voices in my head.
    2. Re:DaisyChain by GCsoftware · · Score: 1

      An 8 drive DroboPro with 3 TB disks might just about do it.

      Check out:
      http://www.drobo.com/products/professionals/drobo-pro/index.php

    3. Re:DaisyChain by GCsoftware · · Score: 2

      Actually 8x4 TB disks will do it, with the overhead etc, giving you 24.96 TB usable space.

    4. Re:DaisyChain by Painted · · Score: 4, Informative

      DON'T DO THIS.

      We did this exact thing using WD Green drives for our 18Tb backup problem. Got two of 'em, planning on using their built-in rsync for onsite/off siting the data. Unfortunately, the units never broke 1MB/s transfer, and no amount of work with Drobo yielded faster performance reliably. Both of our units are now sitting unused, ($2500 each!), and we put the drives into a RAID-50 8 bay USB3 enclosure. The new unit runs about 150x faster, and ended up costing $400 (prices are for enclosures only, drives were additional).

      Most disappointing was Drobo's support- they just seemed to shrug a lot, and were hyper-agressive about closing trouble tickets.

      --
      http://marsandmore.com - Posters of space, spacecraft, and astronomy.
    5. Re:DaisyChain by robi2106 · · Score: 1

      your problem wasn't the Drobo, which can be connected with eSATA or GigEthernet. The problem is using crappy "eco friendly low power" green drives. If you concern is performance, then don't get the slowest most energy efficient drives you can find. Get yourself some 10k RPM drives with 64MB cache. Like these:
      http://www.newegg.com/Product/Product.aspx?Item=N82E16822236243

      And then use a RAID 5 setup in the Drobo.

    6. Re:DaisyChain by Ost99 · · Score: 1

      Seriously? Your're blaming 1MB/s on the drives?

      There must be something seriously wrong with the Drobo for the performance to be that bad.
      I have a 6 drive array (raid6) with WD Green 3TB drives, and I get ~300MB/s read and write speed.

      Disabling the 8sec "green" head park timer improve response times, but it should not impact transfer speed in any significant way if left at it's default.

      --
      ---- Sig. gone.
    7. Re:DaisyChain by grim4593 · · Score: 1

      Even an "eco friendly low power" drive is between 1-2 orders of magnitude faster than 1MB/s.

    8. Re:DaisyChain by Anonymous Coward · · Score: 0

      I've personally used these WD green drives and I can assure you that the performance is comparable with regular drives. Yes, you won't be setting any benchmark records. But, you can definitely write faster than 1 MB/s. Even when striping across 4-8 of them.

      The only reason I stopped using these drives was because conventional (non-green) drives caught up with them in terms of power consumption; so I just went with the cheapest.

      I'll also say that the only thing you get by using 10k RPM drives is the sound of a jet-engine in your data center. For random-access you might see a performance improvement in the benchmarks, but it isn't going to be more than 5% and you would be much better served by a high-quality SSD (which will cost roughly the same amount).

      The note about the cache is good; a large cache will do more to improve throughput than just about anything else.

    9. Re:DaisyChain by robi2106 · · Score: 1

      and yes I know RAID 5 has a big performance hit. It is a reasonable tradeoff between redundancy & speed.

    10. Re:DaisyChain by flargleblarg · · Score: 1

      I get ~80 MB/sec read/write on my WD Green drives over FireWire 800.

    11. Re:DaisyChain by StillAnonymous · · Score: 1

      I'm tired of hearing this "crappy slow" line about the green drives. Sure, they're not as fast as the regular 7200 performance line, but 1MB/sec?? Come on, the problem is obviously NOT the drives.

      I have a number of these drives, and individually they can read between 60-80MB/sec when copying video files off them. In Linux software RAID with 4 drives, I've seen writing of 140MB/sec on a crappy Silicon Image controller.

      I've seen frequent complaints about the Drobo units being slow (in the 20-40MB/sec range), probably due to a slow CPU in the thing, but 1MB/sec there's something really wrong there.

    12. Re:DaisyChain by Circuit+Breaker · · Score: 1

      Can you say which model your new model is, and what it is used for?

      I've had good results with a Synology setup.

    13. Re:DaisyChain by Nogami_Saeko · · Score: 1

      Curious what that problem could have been - Drobos aren't fast, but at 1MB/s, something is seriously wrong.

      Mine has 8TB of storage currently and the transfer rate is about 30MB/s. Not fast, but quick enough for media storage and such. That said, I'd probably buy Synology hardware in the future as it is quite a bit faster, less expensive and more versatile.

      N.

      --
      "Nothing strengthens authority so much as silence." - Charles de Gaulle
    14. Re:DaisyChain by pegdhcp · · Score: 1

      .................... Most disappointing was Drobo's support- they just seemed to shrug a lot, and were hyper-agressive about closing trouble tickets.

      You know, this is what I hate most about some tech support organisations. Some idiot in the support team managment with an engineering degree and a MBA on top of that, with no field experience, sets some aggressive KPIs for support jockeys. They try to meet their quota. For the first six-nine months he would have a very happy upper management. Then customer complaints reach upper managment via sales, in the form of customer threats for moving away the business. Upper magament fires support manager. KPI stays there, as they cannot believe a customer would prefer slower service, if it means quality -remembe those guys were the ones assigned MBA holder for support management at the first place-. Company would eventually fail.... Management by objectives my ass.....

    15. Re:DaisyChain by Cili · · Score: 1

      That's because raid6 actually means something. A 6-drive raid6 system should have about 4x throughput.

    16. Re:DaisyChain by Anonymous Coward · · Score: 0

      Your problem is most likely those Green drives. If you want performance, then buy it. I bought an 8 drive Drobo and filled it with WD Black drives. A single LUN sustained two simultaneous transfers at 35MB/s each.

      Green = slower, power efficient, small cache.
      Blue = mid range, normal power consumption, decent cache
      Black = high performance, fast drives, large cache, longest MTBF, and 5 year warranty from WD

    17. Re:DaisyChain by flargleblarg · · Score: 1

      No, it's because the drives are natively capable of that speed over FireWire. Not using RAID of any sort.

  4. JBOD or more accurately, spanned volume by Anonymous Coward · · Score: 0

    http://en.wikipedia.org/wiki/Spanned_volume
    http://macs.about.com/od/usingyourmac/ss/raidjbod.htm

    JBOD allows you to create a large virtual disk drive by concatenating two or more smaller drives together. The individual hard drives that make up a JBOD RAID can be of different sizes and manufacturers. The total size of the JBOD RAID is the combined total of all the individual drives in the set.

    1. Re:JBOD or more accurately, spanned volume by sumdumass · · Score: 2

      how transportable is that though?

      I mean, if i copied 200 gig across 3 drives in a jbod raid, could i plug just one drive in to access the information on another machine? Suppose my laptop only has 2 usb ports and i do not have a hub plus i'm running a different OS, does this mean i can't look for information on the set?

      I have never used JBOD for raid, I have however used regular mirrored and stripped raids with and without fault tolerance (raid 5 and 10 or a mirrored stripe for instance) and know this can be a problem. In fact, I've even seen issues reading a complete raid set across systems when you aren't using a true hardware raid controller.

    2. Re:JBOD or more accurately, spanned volume by hawkinspeter · · Score: 2

      Seems like a very bad idea to me. You'll have trouble creating a JBOD device without connecting all the drives simultaneously. Also, you're basically increasing the chance that the entire JBOD volume will be broken as the number of drives goes up. If you've got one drive failing, you'll be lucky to get any data back at all.

      To my mind, Bacula would be a good choice as you can set up virtual tapes that will correspond to the drives and you can set the backup to wait for the operator to swap over the drive and then continue the backup. Also, once you've got Bacula installed and working, it's easy to do incremental backups and thus not need to write out the whole dataset again.

      --
      You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
    3. Re:JBOD or more accurately, spanned volume by leptons · · Score: 1

      "I mean, if i copied 200 gig across 3 drives in a jbod raid, could i plug just one drive in to access the information on another machine? Suppose my laptop only has 2 usb ports and i do not have a hub plus i'm running a different OS, does this mean i can't look for information on the set?"

      This falls outside of what OP is requesting. He just wants to backup 24TB of data onto multiple USB drives.

      USB can support up to 127 devices connected to a single host controller, so with a few hubs OP could connect all the drives he'd need for the back up all at once. I've run my own drives via external USB for a time, probably around 8TB of various sized drives using cheap USB-to-SATA adapters ($3.00 on ebay), and cheap 7-port USB hubs ($5.00 on ebay). It's not the fastest solution but it never gave me a problem. It was an experiment to see how many drives I could hook up with cheap Chinese parts. I had it running for a year before I started switching things over to USB3.

  5. Bacula is your friend by bernywork · · Score: 4, Informative
    --
    Curiosity was framed; ignorance killed the cat. -- Author unknown
    1. Re:Bacula is your friend by richlv · · Score: 1

      was going to suggest bacula as well, but came a bit late :)

      --
      Rich
    2. Re:Bacula is your friend by Anonymous Coward · · Score: 3, Informative

      Yes, Bacula is the only real solution out there that isn't going to cost you an arm and a leg, and that allows you to switch easily between any backup medium. As long as your mySQL catalog is intact restoration is a synch...

      Did I mention it supports backup archiving as well if you want duplicate copies for Tapes being shipped off site...

    3. Re:Bacula is your friend by hoover · · Score: 1

      Another thumbs up for bacula if you need more than a single backup of your data (like copying it to drives only once)

      --
      Ever wondered whats wrong with the world? http://www.ishmael.org/
    4. Re:Bacula is your friend by arth1 · · Score: 5, Informative

      Yes, Bacula is the only real solution out there that isn't going to cost you an arm and a leg, and that allows you to switch easily between any backup medium.

      Except for good old tar, which is present on all systems.

      Most people are probably not aware that tar has the ability to create split tar archives. Add the following options to tar:
      -L <max-size-in-k-per-tarfile> -M myscript.sh ... where myscript.sh echoes out the name to use for the next tar file in the series. It can be as easy as a for loop checking where the tar file already exists and returning the next hooked up volume where it doesn't.
      Or it could even unmount the current volume and automount the next volume for you. Or display a dialogue telling you to replace the drive.

      One advantage is that you can easily extract from just one of the tar files; you don't need all of them or the first-and-last like with most backup systems. Each tar file is a valid one, and at most you need two tar files to extract any file, and most of them just one.

      Tar multivolume can, of course, be combined with tar's built in compression.

    5. Re:Bacula is your friend by Anonymous Coward · · Score: 0

      Bacula is really good, for what i've seen but a little difficult to install, configure it and have it running in the end, I believe tar is a good and fastest solution too! You can start with it and then take the time to set up Bacula. BTW the array of disks is a real fast-easy-cheap-non-handwork-requiring solution compared with USB HDDs.

    6. Re:Bacula is your friend by robbie73 · · Score: 1

      "came a bit late" - for real, after all this huge amount of pr0n, no wonder...! ;)

    7. Re:Bacula is your friend by h4rr4r · · Score: 1

      You can extract from just one bacula file, you do not need the first and last. Bacula really just automates all this stuff for you. It really is the way to go.

    8. Re:Bacula is your friend by Qzukk · · Score: 1

      Did I mention it supports backup archiving as well if you want duplicate copies for Tapes being shipped off site...

      We had a DLT tape break inside the drive about 8 years ago and we had no other backup of the backup. Back then I wasn't able to figure out how to get bacula to mirror backups so we switched to mirrored USB drives to ensure that it never happened again.

      Any links to a guide for this? Googling bacula "backup archive" or "backup mirror" isn't coming up with anything helpful and I'm not seeing anything like it described in the bacula manual. I've got two very expensive tape drives just sitting idle that I could be putting back to use...

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
    9. Re:Bacula is your friend by Orsmo · · Score: 2

      > Yes, Bacula is the only real solution

      What a minute. Really?

      OP is asking for a linux console application that can perform a backup over multiple block devices (in this case externally attached hot-plugable drives like USB), and Bacula is what you come up with as the *only* real solution? Obviously you've never heard of dump.

      http://linux.about.com/od/commands/l/blcmdl8_dump.htm

      --
      -- Begin thoughtfuly, end insensitively.
      It has more impact that way.
    10. Re:Bacula is your friend by Anonymous Coward · · Score: 0

      This seems like it takes a quantum leap in making backups.

    11. Re:Bacula is your friend by datapharmer · · Score: 1
      --
      Get a web developer
    12. Re:Bacula is your friend by mdielmann · · Score: 1

      To be pedantic, you only need the tar files that contain part of the file being extracted. This could be one or more (even all).

      --
      Sure I'm paranoid, but am I paranoid enough?
    13. Re:Bacula is your friend by Anonymous Coward · · Score: 1

      Hmm, I have a "sTARt menu" folder in my c:\users\default... folder, but no TAR command. You said it is present on all system; can you tell me where I can find it?

    14. Re:Bacula is your friend by Anonymous Coward · · Score: 0

      You are correct in your observation that toy systems lack it.

    15. Re:Bacula is your friend by Stuarticus · · Score: 1

      RTFM, it's been too long since I've seen that advice, brings a warm glow to my heart.

      --
      If you think someone isn't free to have a different definition of "freedom" you may be a tyrant.
    16. Re:Bacula is your friend by Anonymous Coward · · Score: 0

      Except for good old tar, which is present on all systems.

      Most people are probably not aware that tar has the ability to create split tar archives.

      That's amazing, if only every other major archiving program like Winzip and Winrar could do that too. Oh wait...

    17. Re:Bacula is your friend by DigiShaman · · Score: 1

      I'm evaluating ARC Serve D2D actually. I needed a BRM recovery that also supports full with incremental backups. But most importantly, I needed a solution that will backup MS Exchange 2010 and supports native 4k disk sector sizes. Acronis still does not support Exchange 2010 (WTF?, so late now), and MS 2008 native Windows Backup will not support 4k sector sizes on newer drive. Supposedly 512e emulated support may work. But with newer external drives that use 4k sector sizes, VSS writer will fail!. As for Backup Exec 2010 and 2012, I'm fucking done with what amounts to a hacked up "API". I want a turn-key product, not a fucking roll-your-own enterprise solution for a SMB market.

      And to extend my rant even further. It's the year 2012. Why on this 3rd planet from the Sun do we not have any viable working backup solutions out there!!! Every G-Damned one of them has their major problems and/or incompatibilities. WTF is up with that shit?! Blows my mind.

      --
      Life is not for the lazy.
    18. Re:Bacula is your friend by Qzukk · · Score: 1

      Well, now I know what bacula calls it.

      It's still not quite what I was hoping to do (something like sending the same data to two storage daemons at once to write two tapes at once) since I can't copy from a broken tape (and honestly, I could copy from a non-broken tape just fine without bacula's help).

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
    19. Re:Bacula is your friend by g1zmo · · Score: 1

      The way to not be a jerk about it would to be to inform the guy that Bakula uses the term migration rather than the industry standard archive.

      Linking to the relevant chapter in the manual was a nice thing to do, however, so I don't think you were a jerk about it.

      --
      I have found there are just two ways to go.
      It all comes down to livin' fast or dyin' slow.
      -REK, Jr.
    20. Re:Bacula is your friend by Onymous+Coward · · Score: 1

      s/synch/cinch/

      A cinch is a sure thing, easy.

      "Synch" would be pronounced "SEENK", be more generally spelled "sync", and mean "to synchronize" or "a synchronization".

      Though I suppose you might mean that restoration is actually some kind of synchronization (from the backup to the live server), in which case this is a very clever pun. But I doubt it.

    21. Re:Bacula is your friend by Anonymous Coward · · Score: 0

      You need to install the cygwin base packages. Doesn't that go without saying on such a toy system?

    22. Re:Bacula is your friend by arth1 · · Score: 1

      That's amazing, if only every other major archiving program like Winzip and Winrar could do that too. Oh wait...

      They can't split to separate target locations, no. Splitting a 5 TB backup file into ten pieces isn't very useful if all of the pieces have to be created on a single volume that can't hold 5 TB.

      With tar, that job is easy.

    23. Re:Bacula is your friend by arth1 · · Score: 2

      I know you tried to make an asshat joke, but I'll respond anyhow:

      Yes, Microsoft provides tar (and many other useful apps primarily associated with Unix and Linux).

      Quoting Wikipedia:
      "Interix versions 5.2 and 6.0 are respective components of Microsoft Windows Server 2003 R2, Windows Vista Enterprise, Windows Vista Ultimate, and Windows Server 2008 as Subsystem for Unix-based Applications[1] (SUA[2]). Version 6.1 is included in Windows 7 (Enterprise and Ultimate editions), and in Windows Server 2008 R2 (all editions).[3]"

      If you have XP, W2k or a lesser version of Windows Vista or 7, you need to register with Microsoft to download Services For Unix.
      If you have Windows Server 2003 or newer, or Windows Vista/7 Ultimate or Enterprise, you can turn it on or off through the Windows features in the control panel.

    24. Re:Bacula is your friend by pegdhcp · · Score: 1

      :) Producer of the unnamed toy OS, used to supply a toolkit free of charge (probably as an oversight) that provides UNIX utilities. I guess it was the part of their attempt to be POSIX compatible, back then. I fortunately am not with the company anymore, whose general manager believed that any UNIX jockey must be able to run M$ products as well, thus I lost track of the current status of tool box I mentioned....

  6. use 'dd' inlinux by Anonymous Coward · · Score: 1

    Use 'dd' in linux

  7. Why USB HDDs? by Anonymous Coward · · Score: 1, Interesting

    Are you REALLY sure that you want to use USB HDDs? The cost savings of using a box of HDDs may well be offset by the hassle in finding the backup software, the manual labor of swapping them, finding the correct drive to retrieve a certain file, etc.

    How about a pair of Synology DS1512+ NASes? In addition to getting all of the storage online at all times, you get RAID support, etc.

    1. Re:Why USB HDDs? by jamesh · · Score: 1

      Are you REALLY sure that you want to use USB HDDs? The cost savings of using a box of HDDs may well be offset by the hassle in finding the backup software, the manual labor of swapping them, finding the correct drive to retrieve a certain file, etc.

      How about a pair of Synology DS1512+ NASes? In addition to getting all of the storage online at all times, you get RAID support, etc.

      No reason why they can't all be attached at once. with 3TB disks, and 8 USB3 ports, you'ld only need to plug them all in to do the backup then remove them all to take them offsite when the backup is done.

      A few portable NAS's holding 4 disks each might be a better option, but don't exclude USB for its simplicity.

    2. Re:Why USB HDDs? by aaarrrgggh · · Score: 1

      Hell, USB is a pain, but going with single-drive NAS units is the easiest (portable) approach I can think of. The WD MyBookLive units are Linux based, so you can even run an rsync daemon on them.

      Our setup is just for offsite backup, but we run a cron job every day to detect what drives are present, determine which drives to use by even/odd week, and rsync portions of the system to each. The drives are put into sleep mode and can be rotated in the morning. If you are backing up from a server with high throuput, you can rsync running in parallel so the individual drives aren't a limiting factor

      You have to plan for how the data will change; in the /blonde /brunette /redhead case, changing tastes could put more stress on a single drive... But it is easy to sub-divide later if need be as your collection grows. If however, those directories reside in a 24TB tru-crypt container you make things much less portable, needing access to all drives for any useful data transfer.

    3. Re:Why USB HDDs? by SQL+Error · · Score: 1

      After a lot of messing around with USB drives, we bought half a dozen Synology DS1812+s. Backup problems solved.

    4. Re:Why USB HDDs? by Anonymous Coward · · Score: 0

      I just bought one of these units a few days ago. It's fucking awesome! With the WD red 3TD drives it's damn near silent too.

  8. Split into multiple tar files? by Anonymous Coward · · Score: 5, Informative

    I'm guessing you don't have enough space to split a backup on the original storage medium and then mirror the splits onto each drive?

    Given the size requirements, it seems that might be prohibitive, but it would make things easier for you:

    How to Create a Multi Part Tar File with Linux

  9. A Full 24TB using only 2 USB ports by Bondolon · · Score: 2

    Assuming you're not worried about backup speed, you could use a four-bay external hard-drive enclosure in combination with RSYNC and LVM on any linux variety. I don't know if they all do, but the MediaSonic HF2-SU3S2 supports 3TB hard drives per bay, which means that two of them could be used in conjunction to provide 24TB of backup storage. Since you can make a large volume out of the full 24TB using LVM, you could even use something like dd to write to the disk (RSYNC with the archive option would be a better choice though, imho).

  10. RAID by Anonymous Coward · · Score: 5, Informative

    For that much data you want a RAID since drives tend to fail if left sitting on the shelf, and they also tend (for different reasons) if they are spinning.

    Basically: buy a RAID enclosure, insert drives so it looks like one giant drive, then copy files.

    For 24TB you can use eight 4TB drives for a 6+2 RAID-6 setup. Then if any two of the drives fail you can still recover the data.

    1. Re:RAID by dutchwhizzman · · Score: 1

      If not a RAID (those tend to fail just as hard) get at least two, possibly three copies of each file on separate drives. The last thing you want is to wait for RAIDs to recover and watch them fail during recovery, with your only copy of a file on them.

      --
      I was promised a flying car. Where is my flying car?
    2. Re:RAID by Kjella · · Score: 2

      For that much data you want a RAID since drives tend to fail if left sitting on the shelf, and they also tend (for different reasons) if they are spinning. Basically: buy a RAID enclosure, insert drives so it looks like one giant drive, then copy files. For 24TB you can use eight 4TB drives for a 6+2 RAID-6 setup. Then if any two of the drives fail you can still recover the data.

      Yeah... though I suspect with the price premium for 4TB drives - they're huge - and the cost of an 8-port RAID6 capable RAID card you're considerably above the budget he was going for. If this is like "projects" or something I'd probably suggest the human archiving method - split your live disk into three areas, "work in progress" and "to archive" and "archive". Your WIP you back up completely every time, your "to archive" you add to the latest archive disk (plain, no RAID), and make an index of it so you can easily find on which archive disk it is then move it to "archive" on the live disk. Very low tech incremental backup but this seems like a hobby project. I certainly hope it's not a company's backup / disaster recovery plan...

      --
      Live today, because you never know what tomorrow brings
    3. Re:RAID by DRJlaw · · Score: 1

      We are helping someone back up his music and audio collection, aren't we?

      Well, actually you're not helping someone do anything. You're just vomiting up speculative accusations.

      And they won't be pirated, will they?

      See above. Then ignore children, CD rot, and every other legitimate reason for backing up the optical media that you've spent your hard-earned money on.

      Now run along and sue everyone who's provided actual, helpful advice. However, you may want to look up the standard for "contributory infringement" first...

    4. Re:RAID by Anonymous Coward · · Score: 1

      Very low tech incremental backup but this seems like a hobby project.

      Call me old, but I wasn't expecting anyone calling 24TB a hobby project...

      Now get off my lawn!

    5. Re:RAID by MachineShedFred · · Score: 1

      More to the point - Do what the parent post said, but use something like FreeBSD or Solaris and ZFS with a raidz3 setup (essentially RAID6), which gives you block level dedup, snapshotting, compression, encryption, etc.

      --
      Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
    6. Re:RAID by d3vi1 · · Score: 0, Troll

      RAID is a method of reducing the chances of a disk failure being fatal to the data. RAID is not a backup solution. Anyone who answers a question about backup with RAID is an IDIOT who doesn't deserve his oxygen quota and should be put down.
      Disk failure is not the only reason for using backups. More often than not you run into an idiot user (who happens to be executive) that deleted stuff by mistake and you need it back.
      Furthermore, disk failure can happen on all the disks at once. You have: fires, idiots, floods, more idiots, bad wiring, idiot admins, software bugs, and my personal favourite: tired admins.
      Always have an off-line back and an off-site replica is my personal favourite.

      --
      UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever ones.
    7. Re:RAID by Sarten-X · · Score: 3, Informative

      As mentioned already, RAID is not a backup solution. While it will likely work fine for a while, the risk of a catastrophic failure rises as drive capacity increases. From the linked article:

      With a twelve -terabyte array the chances of complete data loss during a resilver operation begin to approach one hundred percent - meaning that RAID 5 has no functionality whatsoever in that case. There is always a chance of survival, but it is very low.

      Granted, this is talking about RAID 5, so let's naively assume that doubling the parity disks for RAID 6 will halve the risk... but then since we're trying to duplicate 24 terabytes instead of twelve, we can also assume the risk doubles again, and we're back to being practically guaranteed a failure.

      Bottom line is that 24 terabytes is still a huge amount of data. There is no reliable solution I can think of for backing it all up that will be cheap. At that point, you're looking at file-level redundancy managed by a backup manager like Backup Exec (or whatever you prefer) with the data split across a dozen drives. As also mentioned already, the problem becomes much easier if you're able to reduce that volume of data somewhat.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    8. Re:RAID by the_B0fh · · Score: 2

      You should check out ZFS. These issues go away. And with RAID-Z3, up to 3 drives can die before you have a problem.

    9. Re:RAID by the_B0fh · · Score: 2

      ZFS + snapshots. problem solved. Though you do need more drives than a 8x3TB.

    10. Re:RAID by Qzukk · · Score: 1

      In this case, RAID is a method of reducing the chances of a disk failure being fatal to the backup.

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
    11. Re:RAID by d3vi1 · · Score: 2

      You didn't read what I said. Yes, ZFS+Snapshots, but you also need at least Sun Cluster replication and tape backup. ZFS + Snapshots doesn't save you from fires, floods, software bugs and ill-will. It does save you from idiots, and disk failure though.

      --
      UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever ones.
    12. Re:RAID by Amouth · · Score: 1

      if you go with 3TB drives you would need 10 for the R6, you can get that + controller + enclosure for ~2,600$ or ~10-11 cents per gig. not too bad really.

      --
      '...if only "Jumping to a Conclusion" was an event in the Olympics.'
    13. Re:RAID by jedidiah · · Score: 1

      You can get an 8 bay SATA JBOD enclosure for about $300 and that will include an extra SATA card. It won't be a proper RAID setup but it will allow you to mount 8 drives at once and it won't cost more than your car.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    14. Re:RAID by the_B0fh · · Score: 2

      Sure it does, when you have a second set of them. Where you store tapes, store the drives instead. What do you think all those virtual tape vaults are made of?

    15. Re:RAID by jedidiah · · Score: 1

      10TB is certainly in hobby territory. Putting 24TB in that same territory is not really that much of a stretch.

      Times change. Tech improves. Before you know it that $100K SSD you had to borrow from a client has been replicated by cheap consumer grade stuff.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    16. Re:RAID by jedidiah · · Score: 1

      Do you want your media to fail?

      Do you want a medial failure to be intolerable?

      If not, then RAID is a reasonable part of a backup strategy. It also turns an O(n) problem into an O(c) problem as you don't have to worry about n+1 little disks and how nothing is a clean 1:1 mapping from one drive to another.

      This guy is basically trying to replicate the classic 80s PC backup where you would have a stack of floppy disks except the size of the media is bigger.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    17. Re:RAID by kesuki · · Score: 1

      in my experience shelved hdd live as long or longer than powered ones, but i do not have 24tb of data. at least not on hdd. then again i've never had optical disks fail either, despite having owned a 2x cdrw, ymmv

    18. Re:RAID by Anonymous Coward · · Score: 1

      The problem with a large raid like that is recovering from a failed drive. Even with a raid 6.. if one drive fails and you add a new drive to rebuild.. rebuilding 4 TB of data is going to take quite a few hours of heavy heavy heavy disk usage. There is a good chance of a second drive failing during rebuild, which means as soon as you are done rebuilding the drive you added you are adding adding another drive to rebuild... Remember the larger the array the longer the rebuild time. 8 drives has a pretty big risk of failure.

      Realistically, if you are backing up 24 TB of data and this is a home project and not business or enterprise (which good solutions exist, albeit expensive), you should really cut down your data. You are looking at $1000/yr to keep that data alive.

    19. Re:RAID by jones_supa · · Score: 1

      Yes, but if it's not practical for someone to backup a huge amount of data at all, housing it in a RAID at least increases the reliability a bit.

    20. Re:RAID by ixidor · · Score: 1

      no not in and of it self, but i would think it would be hard to argue with calling a second copy of his data not a backup. simply because that 2nd set resides on a raid should not automatically invalidate it as a backup.

    21. Re:RAID by louic · · Score: 3, Informative

      As mentioned already, RAID is not a backup solution.

      Nevertheless, there is nothing wrong with using disks that happen to be in a RAID configuration as backup disks. In fact, it is probably a pretty good idea for large files and large amounts of data.

    22. Re:RAID by Jawnn · · Score: 1

      Granted, this is talking about RAID 5, so let's naively assume that doubling the parity disks for RAID 6 will halve the risk... but then since we're trying to duplicate 24 terabytes instead of twelve, we can also assume the risk doubles again, and we're back to being practically guaranteed a failure.

      Bottom line is that 24 terabytes is still a huge amount of data. There is no reliable solution I can think of for backing it all up that will be cheap. At that point, you're looking at file-level redundancy managed by a backup manager like Backup Exec (or whatever you prefer) with the data split across a dozen drives. As also mentioned already, the problem becomes much easier if you're able to reduce that volume of data somewhat.

      Quite so.
      So the correct answer to the question in TFS is "There is no 'simple' way to do this."
      Better questions would be along the lines of, "How can I organize my data better, so that I don't have to perform one huge monolithic backup of 24 TB?" or, "Where can I go to learn about basic principles of IT architecture?"

    23. Re:RAID by AJodock · · Score: 1

      RAID is not a backup solution

      RAID certainly isn't a backup solution, however he is making a copy of his data to put onto the RAID array (aka a backup).

    24. Re:RAID by meddle99 · · Score: 2

      For that much data, I'd recommend just keeping the original HD-DVDs and Blu-ray media for your "20 GB" files, and the original CDs for your "3 MB" files. We are helping someone back up his music and audio collection, aren't we? And they won't be pirated, will they?

      The idea of re-ripping ~2400 assorted CDs, DVDs, and BR (total 15tb) really does not appeal to me. So- I back it up, even though I have every one of the originals. Just because you don't understand something is not a reason to assume the person is doing something illegal.

    25. Re:RAID by tlhIngan · · Score: 1

      You can get an 8 bay SATA JBOD enclosure for about $300 and that will include an extra SATA card. It won't be a proper RAID setup but it will allow you to mount 8 drives at once and it won't cost more than your car.

      For backup purposes, I'd say skip the RAID card and use software RAID for multiple reasons.

      First and foremost, if the card dies, you'd be hooped in trying to find a replacement with correct firmware down the line.

      Instead, use Linux's software RAID support - the formats don't change that much (and there's lots of legacy support), plus being open source, and you're able to find a PC able to run Linux enough to be able to recover data from it - as long as ONE computer can run Linux, it'll work (even booting off a CD).

      Between LVM and md, they should be supported for a good long time so your backup media will be accessible for ages.

    26. Re:RAID by johanwanderer · · Score: 1

      An alternative to RAID is to get a Drobo unit (http://www.drobo.com/), fill it up with drives, and copy the data over, then remove the drives.

      The advantage is that the Drobo disk set can have dual-disk redundancy, just like RAID-6, and the drives contain enough meta data that you can insert them in any order. Just make sure you have the Drobo unit in hand to read them out.

      I have not had great experiences with the enterprise model (B1200i) but the consumer models (4- or 5-bay) seem to work great.

    27. Re:RAID by Anonymous Coward · · Score: 0

      I have about 24TB in my hobby system and I put that together about 3 years ago.
      In fact, it is about time for me to upgrade again. I'm looking at 48 TB next, which I hope to put together in the next 2-3 months.

    28. Re:RAID by Richy_T · · Score: 1

      +1. Software raid also allows you more flexibility. At home, I have two drives, one much larger than the other. The first 4 partitions are mirrored and the last one on the larger drive is not. I get to use all the size of the bigger drive but have the security of RAID for important stuff. Also, one drive is IDE and one SATA.

    29. Re:RAID by Sarten-X · · Score: 2

      Quite the contrary, and that's my point. The errors here aren't just "let's try again" failures. They're unrecoverable, final, data-is-gone-forever errors, and the chances of encountering one are very high with so much data. Resilvering such a large array is practically impossible (as described in the article I linked to). Without resilvering and having blocks spread among disks, losing one disk means you've lost a little bit of everything, so all your data is corrupt, rather than just the fraction that was stored on the failing drive. Add to that hassle the extra expense of having more disks, controllers, and setup time, and the submitter would be better off writing a few thousand DVDs.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    30. Re:RAID by Sarten-X · · Score: 1

      I always seem to forget ZFS, which is a shame. It certainly seems magical, and far more reliable than plain RAID. I'm afraid I don't know enough about it, though, to determine if the same resilvering risks apply, or if there's enough additional layers of error correction to mitigate the risk of read errors.

      As I write this, a question occurs to me: Does ZFS even do disk-at-once resilvering to recover from a failed disk? If not, what's it do?

      --
      You do not have a moral or legal right to do absolutely anything you want.
    31. Re:RAID by darkmeridian · · Score: 1

      RAID is not a backup solution if you are using the drives in production. The concern is that if you delete or otherwise corrupt the file system on the RAID, you have a perfectly redundant copy of garbage without no backup to restore from. However, if you write to a RAID array, then put the device into cold storage, then you have a redundant backup system. That definitely makes sense. Even if this dude gets a dozen 2 TB hard drives, he still isn't going to have redundancy. Copying everything onto RAID will make his life easier as long as he doesn't use the RAID in production.

      --
      A NYC lawyer blogs. http://www.chuangblog.com/
    32. Re:RAID by CityZen · · Score: 1

      I'm jumping in here to point out some possible loose ends:
      - What other medium can realistically hold a couple dozen terabytes of data? (Thousands of DVDs probably isn't viable.)
      - The original poster didn't say what kind of backups he needed: long term (cold storage), available (warm), or online (hot).
          But presumably, hard drives that are turned off and put away have a different MTBF than ones that are running constantly.
      - In any case, the point that any single backup can fail (spectacularly) should be well taken.

    33. Re:RAID by NikeHerc · · Score: 1

      in my experience shelved hdd live as long or longer than powered ones...

      Hard data point: I opened two sealed 60gb disks Tuesday (2012/08/07) and stuck'em into a server. The manufacture date was 2002 on both drives. One drive is DOA, one is working normally. YMMV.

      --
      Circle the wagons and fire inward. Entropy increases without bounds.
    34. Re:RAID by the_B0fh · · Score: 1

      yes yes yes and more yesses. It is bloody magical.

      I just upgraded from 7x1TB to 7x3TB drives by replacing them one at a time, wait for reslivering to finish, and then swap next drive out (telling ZFS that I've replaced each drive with the new one of course).

      In place replacement. When all of them are done, I suddenly have a 12 TB RAID-Z3 disk set. Raid-Z3 = 3 redundant disks. Also, peak reads and writes at 500MB/s. Bloody impressive.

      And if you want to mirror a snapshot, it's just ome command to send that mirrored snapshot to an external array, or another computer or another network.

      And end-to-end checksums, so your data *IS NEVER* corrupted, as long as you have sufficient copies of it around, unlike Raid, where if you read the corrupted data in, and write it back out, you're screwed.

    35. Re:RAID by Anonymous Coward · · Score: 0

      That's not much of a "data point" because you have no idea whether the DOA drive died during its time on the shelf or was DOA at the time of purchase.

      Most HDDs today retract their heads off the platters when powered down, which eliminates the possibility of the heads sticking to the platter when the drive is off. That removes one of the most common on-the-shelf failure modes. The real thing to watch out for is handling: HDDs cannot take a lot of mechanical shock and if a drive is going to sit on a shelf for a year or three you should do something to ensure that it either never moves or isn't subject to harsh handling during that time. If they're stored bare, ESD (static electricity) exposure is also a concern. Probably the #1 cause of DOA drives is mishandling during shipping.

    36. Re:RAID by dfghjk · · Score: 1

      ...so let's naively assume that doubling the parity disks for RAID 6 will halve the risk..."

      Why assume that when it's grossly in error? If you're going to spend effort teaching us about RAID not being a backup solution, why not put a little effort into getting things right?

    37. Re:RAID by Anonymous Coward · · Score: 0

      24TB is a huge amount of data? I manage over 30PB every day, and do have SRDF jobs that have a DAILY change rate far higher than 24TB. And get off my lawn with the RAID is not a backup shit. Its a protection level, for disks. and your data. /Storage Architect.

    38. Re:RAID by bingoUV · · Score: 1

      True, but -1 for RAID just for s

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
    39. Re:RAID by bingoUV · · Score: 1

      My bad , submitted by mistake.

      -1 for RAID just for security. This makes your SATA drive perform equal to the IDE drive for writes. Also, a periodic rsnapshot from IDE drive to SATA drive would give more security and from more vectors (data corruption through misbehaving software, human operator or hardware).

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
    40. Re:RAID by bingoUV · · Score: 1

      There is no saviour from idiots yet known to mankind

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
    41. Re:RAID by Anonymous Coward · · Score: 0

      You didn't read what I said.

      Can't blame the guy.

    42. Re:RAID by atamido · · Score: 1

      Granted, this is talking about RAID 5, so let's naively assume that doubling the parity disks for RAID 6 will halve the risk... but then since we're trying to duplicate 24 terabytes instead of twelve, we can also assume the risk doubles again, and we're back to being practically guaranteed a failure.

      Bottom line is that 24 terabytes is still a huge amount of data. There is no reliable solution I can think of for backing it all up that will be cheap. At that point, you're looking at file-level redundancy managed by a backup manager like Backup Exec (or whatever you prefer) with the data split across a dozen drives. As also mentioned already, the problem becomes much easier if you're able to reduce that volume of data somewhat.

      Use Greyhole.
      http://www.greyhole.net/

      It distributes files across multiple storage locations, with a user defined level of duplication (lots of options for per file type and location duplication levels). Full drive failures aren't terribly common, but read errors are. To survive a drive failure AND a read error reduplicating/redistributing files to recover from the failed drive, then you would need 3x redundancy. If you used a 3x redundancy with 4TB drives, it would be 18 drives, which is a lot but not unreasonable.

      Really though, that's for more of an onsite backup. To duplicate the data for taking offsite you'd only need six 3TB drives because you only need a single copy in the backup. Set up a system with the six drives, and tell Greyhole to store one copy of every file on that system. When you're ready, shutdown that system and take it offsite. For best results have two of these systems that you rotate. When you hook it up to the network, the main onsite Greyhole system will see it show up and start syncing files to it. The best part is that a full drive failure on the backup doesn't impact files on the other drives. Just replace the failed drive and wait for everything to sync back up.

      In a crisis where the onsite location is destroyed, and all but one backup drive fails, you can still hook up the one working drive to any system and copy the files off.

    43. Re:RAID by Anonymous Coward · · Score: 0

      Why not just use RAID 1 across three drives? (Linux software RAID can handle things like this, a regular RAID controller will be like ???)

  11. Julian? by WinstonWolfIT · · Score: 5, Funny

    Out on bail mate?

    1. Re:Julian? by Anonymous Coward · · Score: 0

      Kim DotCom?

  12. git-annex by Anonymous Coward · · Score: 4, Informative

    You might want to look into git-annex:
    http://git-annex.branchable.com/

    I've not tried it, but it sounds like an ideal solution for your request, especially if your data is already compressed.

    1. Re:git-annex by Anonymous Coward · · Score: 0

      This ^

      I have a ~27TB array spread around a bunch of disks, and git annex works fantastically. I _highly_ recommend it. You can also always see all your files, and if you try to access one that's not on that particular media, git annex will helpfully tell you exactly which drive you need to pop in.

  13. NAS Box by second_coming · · Score: 1

    http://www.synology.com/products/product.php?product_name=DS2411%2B&lang=uk Still portable enough to do your backup then take offsite.

    1. Re:NAS Box by Anonymous Coward · · Score: 0

      This brings the question. Who does the data belong to, where will it go when it's copied and honestly. Why would you use USB drives for backup? There are better professional solutions out there. If you like experimenting, why not go for tape? I'm told a lot of major institutions/corporations use them. Is that exotic enough for ya?

    2. Re:NAS Box by symes · · Score: 1

      I have a Synology NAS and I'm very pleased with it. I don't have anywhere near the volume of data the OP has though. One thing with a NAS is that you'll be subject to the networks available bandwidth and, depending on your set up, this could make backing up lots of data pretty darn tedious. And might annoy admin (and other users). So while a decent portable raid might be the better option, it might be better to find one that just plugs in rather than use the network. Might find one that can be setup to use SSDs as well.

    3. Re:NAS Box by Anonymous Coward · · Score: 0

      24 TB SSD ??? It would severely stress MY bank account :-(

    4. Re:NAS Box by second_coming · · Score: 1

      I use a DS1511+ at work with 5 x 3TB in RAID5 setup and I love it. It's rock solid, good performance and reasonably cheap to setup. I have all my Centos servers rsync to it everynight.

    5. Re:NAS Box by Overzeetop · · Score: 1

      Easy - this is his personal music, photo, and movie collection - most mp3 and older digital camera files are 3MB, and BR discs are about 20-25GB when ripped to HD.

      I'm at about 9TB at the moment (well, 7TB actual usage, 11TB total disc space, 2TB of which is parity) and I have a fairly small collection. With thousands of RAW family photos, several thousand music tracks (most from a 30 year collection of CDs), and about 400 DVDs & BR discs I've ripped, it's not hard to see how I've ended up with 7TB. I know many movie collectors with libraries several times the size of mine.

      I presume he has a bunch of old HDs he can put into USB enclosures or a docking station - perhaps having upgraded from a 16-20 disc 1-2TB/disc NAS to a 6x4TB NAS. There are several people on the unRaid forums with larger arrays. Rather than pitch the old hardware, they could be used for a backup.

      --
      Is it just my observation, or are there way too many stupid people in the world?
    6. Re:NAS Box by jedidiah · · Score: 1

      The problem with small drives is the fact that every drive needs it's own port or needs to share part of a common bus. Old drives also tend to be slow drives. So you end up with something that is closer to dying, slower, and more complicated.

      Sometimes it's just time to chuck the old stuff.

      If a drive is going to keep your solution from being able to saturate a gigabit network connection then it needs to go.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    7. Re:NAS Box by Anonymous Coward · · Score: 0

      If the OP already has the data on external drives (or separate drives) most NAS boxes have eSata on the back for increased throughput. Even if they are backup/restore only from the device's web interface you can SSH onto the NAS and use it for input. I've done so with mine.

    8. Re:NAS Box by Richy_T · · Score: 2

      The 200GB range drives in my main server have been trundling along for many years while I have a pile of 0.5-2TB hard drives I need to go through and get warrantied (three of them Caviar blacks). Not impressed with the big drives.

    9. Re:NAS Box by bingoUV · · Score: 1

      Closer to dying is still of positive value if it is only holding a backup copy. Slower doesn't matter if it will sit in a usb2 enclosure when used and cupboard/some place away from home to be sort of offsite backup when not actively being used. And of course, a huge data amount for hobby might make most people value money, so chucking will be seen as wasteful.

      So there are lots of people for whom your advice doesn't make sense.

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
  14. Sometimes Simple is Harder... by Anonymous Coward · · Score: 0

    If you have 24TB of data to backup, it would be easier to just build another 24TB storage array. The amount of time you would spend swapping disks and then validating that disks don't go bad would sap any "savings" of not building a big array to begin with.

    So, I would buy up some cheap dual-core dual processor xeon systems that ebay is flooded with currently, buy as much raid 5 and sata disks as it takes to get to 24tb with raid 5, and then you can actually do a meaningful backup that doesn't have a labor cost factored to each iteration.

    I'm assuming the original 24tb exists in RAID 5 already, so if you have access to the existing hardware infrastructure, just buid a RAID 5 mirror. If you're doing a web mirror, RAID5 should be good enough and if you loose more than one disk then worry about restoring from the other mirror members.

  15. Tape? by mwvdlee · · Score: 5, Insightful

    Why not tape, backup RAID, SAN or some other dedicated backup hardware solution?
    24TB is well within the range that a professional solution would be required.
    Given a harddisk size of ~1TB, making a single backup to 24 disk isn't a backup; it's throwing data in a garbage can.
    More than likely atleast one of those disks will die before it's time.

    --
    Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    1. Re:Tape? by Lumpy · · Score: 4, Insightful

      Yup. spool to tape. get a SDLT600 tape cabinet and call it done. if you get a 52 tape robot cabinet you will have space to not only hold a complete backup but a second full backup in incrementals that will all run automatically. Plus it has the highest reliability.

      And anyone whining about the cost. If your 24Tb of data is not worth that much then why are you bothering to back it up?

      --
      Do not look at laser with remaining good eye.
    2. Re:Tape? by Anonymous Coward · · Score: 0

      "24TB is well within the range that a professional solution would be required."

      Wot? It's my movie collection!

    3. Re:Tape? by Anonymous Coward · · Score: 0

      Parent is right on the money, this type of backup requires a professional solutions and tape or some sort of SAN is definitely the method that you will want to use. I've seen plenty of Hadoop systems use this method.

    4. Re:Tape? by mwvdlee · · Score: 1

      Assuming the 24TB is worthy to backup as a single backup.
      Your movie collection is probably (A) not worthy of backup and (B) far more easily backed up as individual movies.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    5. Re:Tape? by Anonymous Coward · · Score: 5, Informative

      No kidding. For $2400, you get 24x TB HDs and a bookkeeping nightmare if you ever actually resort to the "backup." For $3k, you get a network-ready tape autoloader with 50-100TB capacity and easy access through any number of highly refined backup and recovery systems.

      Now, if the USB requirement is because that's the only way to access the files you want to steal from an employer or government agency, then the time required to transfer across the USB will almost guarantee you get caught. Even over the weekend. You should come up with a different method for extracting the data.

    6. Re:Tape? by Anonymous Coward · · Score: 0

      More than likely atleast one of those disks will die before it's time.

      Depending on what is stored(porn collection?) and how those 24 disks are handled it might not be a big problem. You lose 4.16% of the data if one disk fails. But this is the backup so you would have to lose the original and the backup at pretty much the same time to actually lose data.

      Having them in a "one-fail-everyone-fail"-mode would be idiotic.

    7. Re:Tape? by aaarrrgggh · · Score: 1

      It depends on the purpose, but I am guessing 20 3TB NAS drives in a weekly rotation at $4,000 is going to be cheaper and more portable than even a drobo. The reliability, assuming it isn't used as JBOD and each drive is independent, will likely be better than you can get with anything else.

      If it is for "live" data, then the argument goes out the window and the OP should go with a SAN. But I have trouble finding a case where tape is a better solution.

    8. Re:Tape? by SlashDev · · Score: 1

      I don't think tape is a good idea for large backups, mostly because of the restore times it would take, tapes are sequentially read/written so a search and restore for data at the end of the tape would take a very long time.

      --

      TOP DSLR Cameras Reviews of the top DSLRs
    9. Re:Tape? by Anonymous Coward · · Score: 0

      But an Oracle T10000C tape drive and 6 tapes. That's enough storage for the latest 3 generational copies of your backup. (Disclosure: I work for Oracle)

    10. Re:Tape? by Anonymous Coward · · Score: 0

      Yep. Tape is the way to go here, mate, and that's the reason why this medium is still available... DUH!

    11. Re:Tape? by SecurityGuy · · Score: 1

      Riiiight. And what do T10KC drives cost again?

    12. Re:Tape? by mjwx · · Score: 1

      Yup. spool to tape. get a SDLT600 tape cabinet and call it done. if you get a 52 tape robot cabinet you will have space to not only hold a complete backup but a second full backup in incrementals that will all run automatically. Plus it has the highest reliability.

      And anyone whining about the cost. If your 24Tb of data is not worth that much then why are you bothering to back it up?

      And having 2 full backups + incrementals in your tape drive when the office burns down is going to be a great help.

      Backups are useless unless you cycle them. At the very least the monthly's must go off site.

      GIS produces a shitload of data that needs to be preserved. 30 GIS analysts can produce 8 TB in no time, Thank Buddha that LTO4 came out when it did. I was backing up 18 TB in a full backup with a weekly change of 4-6 TB (depending on work volume) back in 2008. I could have a single 24 tape loader do full backups, I insisted on 2 but they never found the money...

      Yep, I handled up to 6 TB of unique data generated per week and they still called me a storage nazi.

      --
      Calling someone a "hater" only means you can not rationally rebut their argument.
    13. Re:Tape? by Minwee · · Score: 1

      Riiiight. And what do T10KC drives cost again?

      If you are the kind of person who asks questions like then then Oracle is clearly not for you.

    14. Re:Tape? by Anonymous Coward · · Score: 0

      Don't forget another ~$40 each for the 50 tapes, bringing your total to ~$5k (40*50+3k) not just $3k.

    15. Re:Tape? by Lumpy · · Score: 1

      if you are still housing critical data gear in buildings that can burn, you are in the wrong business. Cement bunker/ box building. steel roof. nothing at all will burn and when the halon bombs go off, the only fire that is a concern is a magnesium fire. and we have a rule not allowing magnesium in the server building.

      Mitigate your fire risk. and yes it's built on a hill so no flood risk either. no a tornado can do squat to it, this is the cool part of using an old Civil Defense shelter as a server farm. we even think it's bomb proof.

      --
      Do not look at laser with remaining good eye.
    16. Re:Tape? by mjwx · · Score: 1

      if you are still housing critical data gear in buildings that can burn, you are in the wrong business.

      Dont ask me, ask management who decided the ideal place for a server room was a wooden shack mounted to the second story of the side of their warehouse.

      I cringe every time I see this.

      --
      Calling someone a "hater" only means you can not rationally rebut their argument.
  16. Just plain old rsync... by Anonymous Coward · · Score: 0

    The Btrfs filesystem allows you to merge multiple physical disks to a single filesystem.
    (AFAIK it's not stable yet, but it just had to be mentioned!)

    1. Re:Just plain old rsync... by ducman · · Score: 1

      Don't use btrfs. My approach (I only have 8Tb) was two low-powered linux file servers. The "main" one was running btrfs over a mixed set of disks with a nightly rsync to the backup server. A power outage that was more than my UPS could handle resulted in a corrupted btrfs filesystem. After a couple days of trying to fix the btrfs filesystem, I gave up and restored from the backup server. Fortunately it was using LVM and ext4.

      Now I have the "main" fileserver running LVM and ext4, and a backup server running FreeBSD with zfs. The two are in physically different locations, and I use rsync with --only-create-batch and a USB hard drive to move changes from the main server to the backup.

      --
      "We have nothing in common, your attitude annoys me, and your political views are appalling."
    2. Re:Just plain old rsync... by mark_osmd · · Score: 1

      Your UPS wasn't set up correctly, it's supposed to signal the server to gracefully shutdown before the battery goes dry.

  17. tar --multi-volume by jegerjensen · · Score: 5, Interesting

    Evidently, our UNIX founding fathers had similar challenges...

  18. Tar already does this by cyocum · · Score: 3, Informative

    Have a look at tar and it's "multi-volume" option.

    1. Re:Tar already does this by leuk_he · · Score: 5, Informative

      multi volume tarJust mount a new usb disk whenever it is full.

      However to have reasonable retrieve rate (going through 24 TB of data will rake some days over USB2), You better split the dataset in multiple smaller sets. That also has the advantage that if one disk chrashes (AND Consumer grade USB disk will chrash!) not your entire dataset is lost.

      For that reason (diskfailure), do not use some linux spanning disk feature. File systems are lost when one of the disks they write on are lost. Unless you use a feature that can handle lost disks (Raid/ Zraid)

      And last but not least: Test your backup. I have seen myself cheap USB interfaces failing to write the data to disk without a good error messages. All looks ok until you retreive the data and some files are corrupted.

    2. Re:Tar already does this by coldsalmon · · Score: 1

      Yes, and don't forget the danger of bad blocks in a job this size. A single bad block can take out an entire archive. I used to backup my 300GB of data into a single .tar file on external USB drives. Unfortunately, when I tried to restore after a system failure, I discovered that BOTH of my backups had bad blocks about 30GB into the .tar file. After the bad blocks, the rest of the file was unreadable. I attribute the bad blocks to the high temperatures inside fanless external HDD enclosures (this is also a danger in your scheme). Now I use rsync, so bad blocks won't kill more than one file at a time. If you must use an archiving backup program (like tar), break up the backups into small files, and make sure that they can be restored individually.

    3. Re:Tar already does this by Richy_T · · Score: 1

      Tar can seek past bad data FWIW (though if you zipped, you're in trouble).

  19. Linuxquestions thread on multi-disk backups by Anonymous Coward · · Score: 2, Informative

    Here's a Linuxquestions thread outlining multi-disk backup strategies.

    The gist of the discussion is to use DAR.

  20. No. by AdmV0rl0n · · Score: 1, Insightful

    I'm not sure if you posed the question out of being nieve, or if its just being daft. You don't want to be moving 24TB over the USB bus. End of discussion really - at least in terms of USB.

    Whoever or however you ended up looking at USB for this was wrong/wrong way.

    You have lots of choice in terms of boxes, servers, NAS boxes, locally attached storage. 24TB is in the range of midrange NAS boxes.

    Once you have this, you can start to make choices on the many backup, replication, and duplication bits of software that already exist, both free and proprietary.

    --
    We`re all equal .. Just some of us are less equal than others.
    1. Re:No. by ledow · · Score: 1, Informative

      USB 2.0 provides 480Mbps of (theoretical) bandwidth. So unless you go Gigabit all over your network (not unreasonable), you won't beat it with a NAS. Even then, it's only 1-and-a-bit times as fast as USB working flat-out (and the difference being if you have multiple USB busses, you can get multiple drives working at once). And USB 3.0 would beat it again. And 10Gb between the client and a server is an expensive network to deploy still.

      Granted, eSATA would probably be faster but there's nothing wrong with USB for such tasks if you *don't* want to provide Gigabit connections everywhere and (presumably) greater-than-gigabit backbones.

    2. Re:No. by Anonymous Coward · · Score: 0

      Re: picking the media

      Go with SAS, eSATA, or USB 3.0. (in that order.)
      Your main constraints are the speed at which the first array can read blocks and the speed at which the second array can write blocks. And, the risk of hitting a data hazard at some later point.
      You may be wise to wrap the output using a Run Length Limited (RLL) coding scheme and a FEC coding scheme.

      Re: Archiving by splitting across multiple disks.
      There is always dd and tar. ;-)

    3. Re:No. by Bill_the_Engineer · · Score: 1

      You don't need a Gigabit connection everywhere, just on your computer and the NAS directly connected to your computer.

      USB2 is not a very good option. For some reason, I've been getting poor performance from Linux with storage mounted via USB. Your best bet is eSata. If you can't install eSata, but have a Gigabit eithernet connection then go that route. USB2 is the connection of last resort when talking about backing up 24TB.

      --
      These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
    4. Re:No. by tippe · · Score: 1

      USB 2 is half duplex and from what I'm told has quite a bit of overhead, not to mention the fact that depending on how things are connected together (i.e one controller vs many), the available BW might actually be shared amongst multiple devices. Ethernet can in some cases suffer from some of these problems as well, but it is at least full duplex, and in this day and age, who doesn't run Gigabit Ethernet?

    5. Re:No. by asdf7890 · · Score: 2

      USB 2.0 provides 480Mbps of (theoretical) bandwidth. So unless you go Gigabit all over your network (not unreasonable), you won't beat it with a NAS. Even then, it's only 1-and-a-bit times as fast as USB working flat-out (and the difference being if you have multiple USB busses, you can get multiple drives working at once).

      The 480Mbps is nowhere near what you will see in practise, unlike network speeds which are far closer to the rated maximum. Most USB drives I've seen top out at somewhere between 25 and 30MByte/sec, and if there are no other bottlenecks it isn't unusual to see 100Mbyte/sec from a gbit switched network. My main desktop pulls things from the fileserver at around 80Mbyte/sec, which is as fast as local reads tend to be on that array. So you are right about 100mbit networks: that'll be the bottleneck not USB, but gbit networking should outdo USB2 by at least a factor of 2, possibly 3, maybe even more if you have better drives in you main storage array than I do.

      Before trying to run several USB drives to max out your network bandwidth, consider that you will taking the source disks too. Unless they are SSDs having 2, 3, or more concurrent bulk reads going on may not be any faster than one concurrent read as all the extra head movements will wipe out the bulk speed potential. If the OP's 24Tb is spread over numerous physical drives this need not ban an issue though (with planning careful enough to ensure there aren't two bulk processes reading from the same physical devices.

      And USB 3.0 would beat it again.

      That it would. I have an SSD in a USB3 enclosure, and it can happily consume 80Mbyte/sec read over my little network. It might even be able to do better than that: I've not measured a bulk write read from the internal SSD yet.

      And 10Gb between the client and a server is an expensive network to deploy still.
      Granted, eSATA would probably be faster but there's nothing wrong with USB for such tasks if you *don't* want to provide Gigabit connections everywhere and (presumably) greater-than-gigabit backbones.

      If I wanted more speed than USB3+gbit can provide (due to the size of data being backed up on each run) I'd be plugging the backup device(s) in locally to the source (vie eSATA, USB3, or such) rather than using the network (though again taking note to be careful how things are done if trying to use more than one backup device at once).

      For the size of data being described, I'd not want a set of USB drives to be my primary backup solution though.

    6. Re:No. by AdmV0rl0n · · Score: 1

      You just wrote that because you think you're clever and you like an argument. You're not clever, and its no argument.

      Gigabit ethernet and a midrange NAS box roughly complment their throughputs, and having a NAS thats setup, possibly offering RAID is a far better solution that your idea of entertaining multiple drives over USB. And theoretical bandwidth is meaningless. USB2 real world tends to be limited to 33MB/s ballpark. Doing something badly multiple times doesn't stop it from being bad.

      You are better on your eSATA point, but I regard that as lost given you wrote drivel about thinking USB2 is an ok way to go for this.

      Its not.

      And who the fuck gave you (3) for your posting. Idiots

      --
      We`re all equal .. Just some of us are less equal than others.
    7. Re:No. by jedidiah · · Score: 1

      > That it would. I have an SSD in a USB3 enclosure, and it can happily consume 80Mbyte/sec read over my little network. It might even be able to do better than that: I've not measured a bulk write read from the internal SSD yet.

      Unless you've got a slow SSD drive, it should have more performance potential than just 80MB/s.

      I have gotten 100MB/s transferring between two SATA spinny drives across my gigabit network.

      Now this is a lot of data we're talking about. Creating the backup and restoring the backup both will be performance sensitive. Every little bit you can do to help push things along will be very worthwhile in the end. It probably makes sense to be as close to the bare metal as you can be.

      You might even want to buy more expensive drives to get better performance. Perhaps a better RAID card too.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    8. Re:No. by jones_supa · · Score: 1

      Whoever or however you ended up looking at USB for this was wrong/wrong way.

      Wrong/wrong? You could have written "wrong/{wrong way}" to make clear which words are part of the two options.

    9. Re:No. by asdf7890 · · Score: 1

      The 80Mbyte seems to be a network limit, not a bulk disc throughput limit. A quick test reading ~10Gb (in two large files not many small ones) locally on the server gets just under 120Mbyte/sec overall. I might be able to get better from those drives but I wasn't overly careful about stripe alignment and such (they are 4K sector 2Tb disks using Linux's software RAID). I'm not sure if those particular files are closer to the slow end of the spinning metal or closer to the fast end. Pulling the same files to the Windows box from a Samba share is where I mentioned 80Mbyte (in fact a quick test with the same two files shows it top out at just over 65Mbyte). The server doesn't have anything SSD but the desktop box uses them for internal storage, I've not benchmarked copying from them to the external SSD though (I have copied large things that way, but just not paid attention while it happened, where I have watched the transfer rates while copying from the server. The server has no USB3, so plugging the drive in directly would not get any benefit from the local connection due to USB2 throughput limits. I've not investigated where that drop from ~120Mbyte/s to ~65 comes from. It could be a mix of Samba not being terribly efficient (it does sit eating >65% of a CPU core while the operation proceeds), me having cheap NICs and a cheap switch, and me not having jumbo frames enabled anywhere (the network is shared with devices that I think don't support larger frames), and probably some other factors I've not considered.

  21. You know... by marsu_k · · Score: 5, Funny

    Porn is a renewable resource, there's no need to store so much of it.

    1. Re:You know... by Anonymous Coward · · Score: 0

      Now if only our cars could run on porn.

    2. Re:You know... by Anonymous Coward · · Score: 1

      So renewable, you can even make your own..

    3. Re:You know... by Anonymous Coward · · Score: 0

      Porn is a renewable resource, there's no need to store so much of it.

      But he doesn't know when his Dad's high speed internet is going to be shut down due to bittorrent bandwidth abuse.

    4. Re:You know... by couchslug · · Score: 1

      "Porn is a renewable resource, there's no need to store so much of it."

      There was only ONE Beatrice Arthur.

      --
      "This post is an artistic work of fiction and falsehood. Only a fool would take anything posted here as fact."
  22. Yeah, you missed basic common Linux knowledge by Anonymous Coward · · Score: 1

    Script your own solution for your specific problems.

    That’s kinda the whole point of having a computer... as opposed to a set of appliances that happen to run on a computer you never use directly.

  23. Seriously: Build your own homebrew NAS. by Qbertino · · Score: 4, Interesting

    What your attemting isn't easy, it's actually difficult.
    Buy a cheap and big refurbished workstation or rackmount server, install a few extra SATA controllers and maybe a new power supply, hook up 12 2TB drives, install Debian, check out LVM and your all set.

    Messing around with 12 - 24 external HDDs and their power supplys is a big hassle and asking for trouble. Don't do it. Do seriously go through the possibilty of building your own NAS. You'll be thankfull in the end and it won't take much longer, it might even go faster and be cheaper if you can get the parts fast.

    My 2 cents.

    --
    We suffer more in our imagination than in reality. - Seneca
    1. Re:Seriously: Build your own homebrew NAS. by DRJlaw · · Score: 4, Interesting

      What your attemting isn't easy, it's actually difficult.
      Buy a cheap and big refurbished workstation or rackmount server, install a few extra SATA controllers and maybe a new power supply, hook up 12 2TB drives, install Debian, check out LVM and your all set.

      Messing around with 12 - 24 external HDDs and their power supplys is a big hassle and asking for trouble. Don't do it. Do seriously go through the possibilty of building your own NAS. You'll be thankfull in the end and it won't take much longer, it might even go faster and be cheaper if you can get the parts fast.

      Way to redefine the problem instead of working within the specifications.

      Perhaps:
      1. The poster ALREADY has a NAS and wants to have airgapped or even offsite/offline backup.

      2. External HDDs are fast, common, reasonably cheap, and do not have a single point of failure (e.g., the tape backup drive in many suggested alternatives)

      I'm interested in this question. I use this general setup, but on a smaller scale. I cannot put a NAS in a safety deposit box. I cannot ensure that my "backup" NAS would not be drowned in a flood, burned in a fire, fried by a lightning strike...

      Let's pretend the poster is not an idiot, and answer the actual question. If he has 24TB of data, IT'S ALREADY ON DAS/NAS. Geesh.

    2. Re:Seriously: Build your own homebrew NAS. by Anonymous Coward · · Score: 0

      OK, lets assume that it is already on NAS.

      NAS is still my suggestion. rsync local-NAS remote-NAS

      Use backuppc to do it and you get full and incremental backups.

      Need it secure? Their are lawyers that will let you put a NAS in a vault in their office. Very secure.

    3. Re:Seriously: Build your own homebrew NAS. by Bill_the_Engineer · · Score: 1

      Let's pretend the poster is not an idiot, and answer the actual question. If he has 24TB of data, IT'S ALREADY ON DAS/NAS. Geesh.

      Don't assume he was the one that created his current storage solution. It could be a turnkey solution that he purchased, like one of those movie storage devices we read about on slashdot earlier this year.

      If he installed his current storage configuration himself then why did he need to ask this question on Slashdot? I don't see any particular bad answers, and no one is insulting him. I'm sure he is mature enough to filter all these responses and pick the right one for his situation.

      --
      These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
    4. Re:Seriously: Build your own homebrew NAS. by asdf7890 · · Score: 1

      For 24Tb of data you want more than 12x 2Tb drives. At least 18x 2Tb so you can apply RAID5 (or some compound RAID based on arrives of RAID5) or 24x2Tb for RAID1, so that you can survive at least some drive failure situations without data loss. Be careful of array rebuild times with drives that big though. ZFS is something to consider with an array this size (with smaller drives preferably as re-image/re-checksum times will not be short with this over large drives either).

      You are right that anything like this would be much less hassle in the long run than many separate external drives, and their PSUs (something I'd not considered).

    5. Re:Seriously: Build your own homebrew NAS. by gl4ss · · Score: 1

      it's pretty probable that the data is already on such device, but needs to be moved around to somewhere else on something cheap.

      --
      world was created 5 seconds before this post as it is.
    6. Re:Seriously: Build your own homebrew NAS. by Joce640k · · Score: 1

      Let's pretend the poster is not an idiot, and answer the actual question.

      On slashdot...?

      "Ask slashdot" is never about reading the question or formulating a relevant answer.

      --
      No sig today...
    7. Re:Seriously: Build your own homebrew NAS. by Anonymous Coward · · Score: 0

      use a glass connection to a remote location.

    8. Re:Seriously: Build your own homebrew NAS. by Anonymous Coward · · Score: 0

      In which case the best way to back up 24TB of data on a NAS is either to run a tape system or another NAS/FreeNAS with ZFS and snapshots. External USB drives are not ever the way to go *unless* say, for example, you are a photographer/videographer and the data can be split up into specific projects/clients whereby 24 into 24x1 would make sense. If the data cannot be split then best of luck keeping track of what is where.

    9. Re:Seriously: Build your own homebrew NAS. by Anonymous Coward · · Score: 0

      There simply is no other sane option but to redefine this problem.

      Perhaps:
      1. The poster ALREADY has a NAS and wants to have airgapped or even offsite/offline backup.

      Then put your backup NAS in the garage or a garden shed, only power the NAS while needed and yank out the network cable if you have to.

      2. External HDDs are fast, common, reasonably cheap, and do not have a single point of failure (e.g., the tape backup drive in many suggested alternatives)

      External HDDs means USB, and USB 2.0 is not fast (30-35 MB/s, faced with 24 TB source storage this is very slow, GBit/s Ethernet can give you 80 MB/s to 110 MB/s). The price for the individual disks is about identical, a cheap (but not old) system that takes 8-10 4 GB HDDs (you're dealing with 24 TB source data, don't try to be too cheap and go with smaller drives to save some buck and increase the disk count and the problems to keep them running) should be cheaper than the disks, RAID5/6 and lvm2 (zfs if you like an are already familiar with it) and you're done in an easy and maintainable way, without having to think about where your chunks of data will fit.

      Dealing with ~10 external disks guarantees just one thing, that you'll never actually do the backup.

    10. Re:Seriously: Build your own homebrew NAS. by DRJlaw · · Score: 1

      Then put your backup NAS in the garage or a garden shed, only power the NAS while needed and yank out the network cable if you have to.

      Attached garages burn and get broken into. Tenth floor condominiums don't have garden sheds. I can knock down any alternative that you propose just as easily as you can ignore the fact that while alternatives are nice, the Ask Slashdot request was for a solution to a particular problem, not a general method of backing up data.

      External HDDs means USB, and USB 2.0 is not fast (30-35 MB/s, faced with 24 TB source storage this is very slow, GBit/s Ethernet can give you 80 MB/s to 110 MB/s).

      External HDDs means USB or eSATA, and USB3.0 is as fast as any 1-3 HDDs. Hence the Seagate GoFlex 3 TB USB3 drives that I purchased for $99 each at Best Buy last Black Friday.

      RAID5/6 and lvm2 (zfs if you like an are already familiar with it) and you're done in an easy and maintainable way

      Except that any two or three drive failures, such as caused by a lightning strike, can eliminate all that data. Particularly if your NAS and your backup NAS are in the same location. My 'NAS' is already RAID6. Fat lot of good it'll do me if the house burns down.

      Dealing with ~10 external disks guarantees just one thing, that you'll never actually do the backup.

      No, it guarantees that you won't have a current backup of frequently changing data. If your NAS has a large quantity of stale but online data (because you want the availability, or it's a media collection, or it's 5 years of home movies from a camera-head), frequency may be almost a non-issue.

      External HDDs have a number of advantages that the "insane" may desire:

      1. If it's not hooked up to power, it can't be hacked and can't be fried except by EMP.

      2. If its not bundled together in a RAID, nothing short of complete failure will cause complete data loss.

      3. External HDDs are very easy to haul between home/office and a remote site.

      4. (becuase other comments keep raising this issue) If you buy the same model eHDDs, you're only managing as many power supplies as you need simultaneous connections. The aforementioned Seagate drives also use essentially the same power supply as my prior FreeAgent 750GB USB2/eSATA/FW400 drives.

      Organizing your storage so that the live and changing data is well segregated from the live and stale data eliminates much of this problem. For example, a live data drive with operating system, client computer backups, user profiles, and other live data could be automatically backed up to an identical eHDD or -- crazy, I know, -- iHDD installed in a hotswap dock. A stale data section of a NAS could be manually backed up on eHDDs. The stale data backups, and any live backup that you happen to have at the time, can be dropped off in your saftety deposit box -- which for most normal people will not fit a NAS.

      So... there is a sane option in answering the stated problem rather than proposing your own defective solution based on your own priorities and parameters. You simply don't know a relevant answer and choose to attack the question instead.
      Grade: C-.

    11. Re:Seriously: Build your own homebrew NAS. by rastoboy29 · · Score: 1

      You have never worked with business grade tape.  It is way more reliable than HDD.

      And it can be put in a safe deposit box.

      There is a reason it still exists--there is nothing better, yet.

    12. Re:Seriously: Build your own homebrew NAS. by Anonymous Coward · · Score: 0

      I like the idea of having a bunch of large USB drives for a backup. Before you hit that reply button, remember that the original data counts as a copy of it.

      I assume that your data is a permanent archive of fixed data like media; not data that changes like SQL databases for a business for which you really need something a little more RAIDish.

      I add the requirement that different subdirectories are backed up on different drives so that I only need to fire up one backup drive to access a particular file. I like a simple copy of directory trees, but if your data is compressable, tar or some other compressor might be better.

      - I use NTFS as a backup file system because it is easily accessable on Linux and Windows.
      - Mentally partition your data into roughly equal "USB drive sized" chunks.
      - Copy one chunk to one USB drive.
      - Make a written label for each hard drive "primary - redheads". Masking tape and Magic Marker works.
      - If the original cannot be counted as a backup (e.g. will be deleted), duplicate the critical backups (I duplicate the whole set).
      - Verify the backups. If both backups were copied from original data source, it is of course good enough to simply compare the two backups. If the second backup was simply copied from the first backup you need to verify against the original data. My data is a simple copy (not even tar'ed) of directory trees. I wrote some very simple scripts to do md5sums of both backups (or original and backup), cat the massaged md5sum lists together, sort, and uniq the whole thing to find unduplicated files or CRC differences. I found that my USBID 152d:2338 JMicron JM20337 (older version) USB -> IDE/SATA adapter was giving one bit errors every few hundred gigs without logging any error!!! (Binary diffing the files that had different md5sums.) I had fought this as a memory error for a year or two!!! Look it up, hardware solution is to remove R15. I verified that this fix works. You loose hot swapping. Yes I know how to drop_caches.
      - You can usually have a production line going where you are making both backups here, and verifying the backups in another place.
      - Store one backup set off site
      - Grow your additional backups (plus possibly small incremental backups) onto another USB drive till you cant stand for that USB drive to fail, at which time dupe it and store the dupe offsite.
      - A few years down the road when you have more data and hard drives are bigger and have new connections (e.g. USB3 / eSATA with 8TB drives), buy ONE set of those and repeat with one old set of drives and one new set of drives. You have even updated your hardware technology as part of the process.

  24. Bash.... by djsmiley · · Score: 4, Informative

    First bash script to grab the size of the "current" storage;

    compress the files up until that size;

    Move compressed file onto storage;

    request new storage, start again.

    ----------

    Or, if you've got all the storage already connected; bash for 0..x; do { cp $archive$x /mount/$x/ }; done :D

    --
    - http://www.milkme.co.uk
    1. Re:Bash.... by Anonymous Coward · · Score: 0

      The OP is basically backing up their media library of mp3 and mkv files. They don't compress very well seeing as they're already compressed.

  25. we generate a lot of data (3 GB/min)... by acidfast7 · · Score: 1

    ... by employing a detector with a size of 2463 x 2527 pixels (6M) at 12 Hz (12 times / sec). When run continuously for a set of data (roughly 900 degrees) ...

    we collect 900 frames in roughly 2 minutes including hardware limitations for starting/stopping.

    In proper format for processing, this works out to about 6MB/image and roughly 3GB/min for 2 minutes.

    With an experienced crew of 3-4 people ... one handling the samples, one handling the liquid nitrogen, one running the software and one taking notes (overall monitoring also) ... we can run through 600 samples in a 24 shift ...

    Which roughly works out to about 600 x 6GB = 3.6 TB on a "working" day.

    To answer your question ... we never make physical copies of stuff ... the data stays online in multiple places on multiple continents ... and when something is published the data becomes publicly available in a central database

    Why do you need a physical copy anyway?

    1. Re:we generate a lot of data (3 GB/min)... by ledow · · Score: 1

      I'm not the OP but:

      Because downloading 3.6Tb to restore from a backup for just one day is pretty ridiculous for someone on a home broadband?

      Backup to external servers is ridiculous for anyone without university-sized access to the net. Hell, the school I work for try to back up 10Gb to a remote server each night and it often fails because it took too long (and we're only allowed to do that because we're a school - the limits for even business use on the same connection are about 100Gb a month).

      Absent a stupidly fast connection for a home, you have to have a physical copy that you can put somewhere else.

      The fact that you *don't* see that, tells me that you probably have far too much hardware and connectivity available to you.

    2. Re:we generate a lot of data (3 GB/min)... by acidfast7 · · Score: 1

      Two quick things:

      1. Why do a complete restore of the 3.6TB? Just take the files that want to use again/have been lost.

      2. Why work at home? It's home, not work.

    3. Re:we generate a lot of data (3 GB/min)... by Bill_the_Engineer · · Score: 1

      I agree. I have similar data generation and storage requirements. Our science team is spread over two continents and we too use multiple storage locations for our online backup. Having said that, I just had our IT guy install a brand new high capacity tape drive so that we can start archiving the data into long term storage. So the two backup schemes aren't mutually exclusive.

      --
      These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
    4. Re:we generate a lot of data (3 GB/min)... by History's+Coming+To · · Score: 1

      1: Yes, syncing makes more sense than full backups, but you still have to do the initial 100% "sync". If you lose your whole HD then before you can restore a single file you need to reinstall the OS and apps, plus any custom scripts and other customisations you might have made. A full disc image is a life saver if your main desktop goes to crap.

      2: In my case, because my job doesn't have a physical location. If I didn't work from home I'd be working from the beach, the car or the top of a mountain. It's happened, but home is much more comfortable, has all of my books etc, and I can claim back rent, electricity, phone bills etc against tax. I could hire some office space, but I'd have the extra expense and a 1.5hr commute each day, and I couldn't claim back some of my household bills.

      --
      Please consider this account deleted, I just can't be bothered with the spam anymore.
    5. Re:we generate a lot of data (3 GB/min)... by acidfast7 · · Score: 1

      1. not an issue

      2. sorry man, that kinda sucks.

    6. Re:we generate a lot of data (3 GB/min)... by Anonymous Coward · · Score: 0

      Where do you go to university?
      In Canada many universitys are part of a gigabit network between universits.
      I regularly transfer 100's of gB's a month back and forth.

      Also even at the residential level here we have 250/30Mbps with no cap for only 249.95.

      On the 30/30 package I use over 2Tb of bandwidth a month.

    7. Re:we generate a lot of data (3 GB/min)... by Richy_T · · Score: 1

      2. Huh?

    8. Re:we generate a lot of data (3 GB/min)... by Anonymous Coward · · Score: 0

      Are you an structural biologist/chemist using x-rays?

    9. Re:we generate a lot of data (3 GB/min)... by flargleblarg · · Score: 1

      > Because downloading 3.6Tb to restore from a backup for just one day is pretty ridiculous for someone on a home broadband?

      3.6 terabits isn't that much.

      > Hell, the school I work for try to back up 10Gb to a remote server each night and it often fails because it took too long

      100 gigabits isn't that much.

  26. USB is not for backup by aglider · · Score: 1

    USB is for a second working copy.
    Backups should also ensure durability of the copy, while USB HDD have a shorter lifespan than a normal HDD which in turn has shorter lifespan than tapes, the usual medium for durable backups.

    --
    Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
    1. Re:USB is not for backup by Anonymous Coward · · Score: 1

      USB HDD have a shorter lifespan than a normal HDD which in turn has shorter lifespan than tapes

      What planet are you from?

    2. Re:USB is not for backup by phillymjs · · Score: 1

      I think he might mean that a HDD sitting in a server and running 24/7 will likely last longer than a HDD that's in an external enclosure and gets physically moved around and powered on/off frequently.

    3. Re:USB is not for backup by History's+Coming+To · · Score: 1

      Are you confusing Flash storage (eg a USB stick) with a normal platter-based HDD which uses USB cables for transfer? USB != flash, HDD != USB, ethernet., Firewire etc. OP is talking about HDDs connecting via USB as far as I can tell.

      --
      Please consider this account deleted, I just can't be bothered with the spam anymore.
  27. Use DAR or KDAR by pegasustonans · · Score: 2, Informative

    If you don't want to invest in new hardware, you could use DAR or KDAR (KDE front-end for DAR).

    With KDAR, what you want is the slicing settings.

    There's an option to pause between slices, which gives you time to mount a new disk.

    --
    And all our yesterdays have lighted fools The way to dusty death. --Will
  28. eSATA instead of USB or Firewire by Anonymous Coward · · Score: 0

    My experience is that eSATA II (3G) is about 4X faster than USB2. The benchmarks I have seen show that it is still faster than USB3. Today you can probably get eSATA III (6G)

  29. Use purpose designed backup media. by gallondr00nk · · Score: 1

    Backup tapes were designed precisely for the problem you have. LTO-5 tapes are about 1.5TB, if I remember right. Stored correctly they shouldn't give any problems when you come to retrieve whatever is backed up. Most archiving efforts use backup tape, and they can't all be wrong :)

    1. Re:Use purpose designed backup media. by Antique+Geekmeister · · Score: 1

      Actually handling all those tapes and recovering data from them is very expensive in manpower and time, and can be very awkward for recovering data. Those tapes, and tape drives, are also _expensive_. They're useful for sites that require secure off-site storage, or encrypted off-site storage, but for most environments today they are pointless. Easily detachable physical storage has become very inexpensive, far more economical, and is far less vulnerable to the vulnerabilities of mishandling SCSI connections. I've seen far, far too many SCSI setups for tape drives and external media fail due to misconfiguration, miscabling, and the very poor driver integration of SCSI controllers in far too many operating systems. USB has proven startlingly simple, resilient, and _cheap_ to manage.

      I use the external drive approach very frequently for data center migration and virtualization OS image migration, though usually I only back up the configuration files from the virtualized hosts, not the complete images. It's very effective. 24 TB is bigger than I've personally done this way, but it's certainly feasible if it's not treated as a single lump. If the the data can be factored reasonably before transferring, don't simply duplicate it every time. Split it up into reasonably sized chunks and _mirror_ it onto the USB drives, so the first backup is lengthy but following backups are far more efficieint.

      Assuming that the backup system is Linux based, the ""rsync" tool can be written into a script to see which media is attached, to mount those media, and to mirror the contents of a set of directories to those media.. It's also reasonable to use a USB hub to allow mounting multiple USB devices simultaneously so it can be done all at one time, rather than having to swap media.

    2. Re:Use purpose designed backup media. by tomtomtom · · Score: 1

      Actually, for a data set this large it will probably work out only very slightly more expensive - and the benefit to be gained is worth it IMHO (in speed if nothing else - USB disks are *slow* and eat a lot of CPU). I live in the UK so I'll work in GBP. I think US prices are likely to be cheaper but the relative sizes will be similar.

      I'd figure around ~£1100 for drive and SAS interface plus £500-700 for 24TB worth of media. Throw in an extra 2TB drive to spool to before you write to tape as well for say £150 (if you are buying SAS) and you get to less around £81/TB (which works out roughly the same as current external hard drive costs). If your data is precious though you'll want double the amount of media so you can store offsite (or at least have a spare backup). Then the lower marginal cost of tape vs disk will become apparent.

      Yes, tape can be harder to configure correctly and swapping tapes over etc will be a pain for a set this large. But that's equally true for disk; and we all know that it's not a backup until you've checked that you can restore from it. User error in configuration of the backup scripts is way more likely to cause an issue than any kind of hardware error and for that reason alone, you are stupid in my opinion if you don't test your backups. If you test them, then you will spot any SCSI misconfiguration etc immediately.

      I agree that for moving data around, disk (or network) is much much easier. But that wasn't the submitter asked about.

    3. Re:Use purpose designed backup media. by Antique+Geekmeister · · Score: 1

      The slow disk is why you use rsync or other such efficient mirroring technologies. The tapes have a limited lifespan, they require significant maintenance, and have been prone to far too many mechanical failures and expensive downtime in my experence. The disks can actually be simultaneously connected for casual "read" access with a reasonable USB hub and possibly an additional USB card.

      You've also left out the cost of recovery time for users. Swapping tapes to get recovery of arbitrary files is rather awkward. and requires physical access to the server. Either you invest in a tape library that can hold that 24 TB of tape online and provide user access to the tape library (which means two drives in the library, one for backup activity and one for read activity), or you invest in onsite manpower to swap tapes. If you've staff already in the computer room to swap tapes for you, that's workable, but it's very expensive in a small environment, and it interferes with backup operations if you've only the one tape drive.

      Swapping USB ports, or even leaving the multiple backup drives constantly on, is much quicker to access the same data, and leaves the full backed up systems or designated pieces of it directly accessible for network exporting, such as with a read-only NFS mount or read-only CIFS mount, for clients to easily recover their own files rather than requiring backup system access. You do have to be cautious with export configurations for performance and security reasons, and to leave the drives easy to swap out, but it's workable.

    4. Re:Use purpose designed backup media. by Bill_the_Engineer · · Score: 1

      No matter the backup solution being proposed there are hardware costs involved. When the backup size reaches 24TB, dedicated backup devices become quite competitive. Especially when you consider that some of these tape drives are designed for autonomous operations with a tape jukebox. Sure the hard drive may be slightly less expensive, but they consume much more electricity and require more cooling than a tape drive. Also if you do purchase one of those jukebox tape drives, you won't have to waste time changing media every time you want to make the backup.

      The easier and less time consuming the backup procedure is, the more likely regular backups will be made.

      --
      These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
    5. Re:Use purpose designed backup media. by tomtomtom · · Score: 2

      Whether tape or disk is appropriate really depends what you are intending to use the backup for and how important your data is. You might even choose to use a mixture of the two.

      If it's your only backup, I would suggest that it's not wise to leave it permanently online in the way you suggest; that leaves you open to any number of potential issues which your backup is supposed to protect you from (OS bug, misconfiguration, lightning strike, power failure, overheating, ...). Tape libraries have the same issue although at least there you are exposed to a different set of software bugs and the other tapes in the library might be OK if they are not physically in use when the worst happens.

      For the inadvertent file deletion, you can cover this with better tools using true online storage - effectively some form of regular snapshotting (ZFS snapshots, rdiff-backup, Windows VSS, etc) to keep a (shortish) recent history. This should cover a good proportion of restore requests depending on how much history you can keep. For the rest, you're right that if you need to restore files very regularly then you might need a second drive and/or robot. Whether you need to do that or not will just depend on your use case.

      Even if you do go with disk, make sure you use something which can properly keep multiple versions of files - just rsync'ing a big directory onto another disk is a recipe for disaster. My personal favourites are rdiff-backup and DAR (which can handle multiple volumes as others have pointed out) but there are others out there too, eg bacula.

  30. use zip by Anonymous Coward · · Score: 0

    " I am aware of many backup tools which split the backup onto multiple DVDs with the infamous 'insert disc N and press continue', but I haven't come across one that can do it with external hard drives (insert next USB device...)" - the split archive functions in the Linux zip program might be able to do this. But, I've never used this feature in Linux but remember using it on good old pkzip on dos when trying to span files across multiple floppy disks.

  31. PAR by fa2k · · Score: 3, Informative

    I have just seen "PAR" a couple of times here on slashdot, haven't used it, but it seems great for this: http://en.wikipedia.org/wiki/Parchive . You need enough redundancy to allow one USB drive to fail. And I would rather get a SATA bay and use "internal" drives than having to deal with external USB drives. Get "green" drives, they are slow but cheap.

    1. Re:PAR by Anonymous Coward · · Score: 0

      Do not get green drives - I have now had multiple failures of the "green" circuitry on such drives. They are a liability. Get normal or, if you can afford it, enterprise drives. Power saving is just not an issue if you are using for backup rather than online.

    2. Re:PAR by Anonymous Coward · · Score: 0

      Do not get green drives - I have now had multiple failures of the "green" circuitry on such drives. They are a liability.

      You're an idiot. There is no such thing as special "green" circuitry on "green" drives. It's often literally the same stuff as non-green drives.

      "Green" is just marketdroidspeak for "the platter spins at lower RPMs and the seek voice coil motor (VCM) is not as powerful". These two design choices reduce power (and performance!), but the electronics to control the drive, read and write data, etc., remain more or less the same. They probably do a unique firmware image too, one which is more aggressive about saving power, but I can just about guarantee the silicon is the same.

  32. 24TB? by codman1 · · Score: 1

    Some sort of NAS or tape would be your best option without knowing more. How often do you need to do the "backup"? Is it really a "backup" or data replication eg. are you needing to restore the data after a serious failure. Have a look at this seems to have some good advise and i think could be a solution to your issue, as i see the big problem is the amount of time and the restorability of the data after a failure. http://www.smallnetbuilder.com/nas/nas-howto/31485-build-your-own-fibre-channel-san-for-less-than-1000-part-1

  33. Work-related? Get a REAL backup solution by Anonymous Coward · · Score: 0

    If this is work-related, and the 24 TB of data is critical to your company, DON'T FUCK AROUND WITH TOYS.

    Get a real backup solution - before they get a real sysadmin.

  34. NAS by Wolfling1 · · Score: 2

    A 24TB NAS is not very hard to assemble. Relatively cheap, and basically transfers data at Gb speed - assuming that you populate it with fast disks. Set one up with RAID and you're away. Personally, I would do it with a low end server and a big-ass RAID array. That way, you can really control its behaviour via the OS. Linux is ferpect for this kind of thing.

  35. Madness by Anonymous Coward · · Score: 0

    What I want to know is this:

    Who would have managed to get 24TB of data, without already having a backup solution in place?

    24TB is a lot of data. It isn't something you get overnight. It should have been apparent a *long* time ago that some kind of backup was going to be needed.

    If this is business data, then someone has been neglegent.

  36. eSATA by tmshort · · Score: 1

    Your best bet for speed is likely to be eSATA.

    Have you looked into something like this:
    http://eshop.macsales.com/shop/NewerTech/Voyager/Hard_Drive_Dock

    The cost becomes noise when you consider how many drives you will end up needing, and per TB, will be cheaper than USB solutions.

    I don't know how your data is organized, but if possible, you may want to back it up by project/directory/etc.

    There are also online backup systems that can do what you want, but it'll take an extremely long time...

  37. I know by Anonymous Coward · · Score: 2, Funny

    The iCloud! ;-)

  38. Going about it all wrong by Charliemopps · · Score: 1

    Get an old computer... anything will work really. You have to know someone that has one laying in their basement. Plug your drives into that. share the drives on your network. Use any general backup software and sequentially backup what you need to backup over the network. Now it will do it overnight and you really don't care how long it takes. It can even do it every night. If you want it safe from fire and such.... build a box out of 2x4s and Drywall scraps form homedepot. Make it 5 sheets thick and it'll withstand any housefire you could possibly have. If you really want to go hardcore you can pour a box out of concrete, but that'll be hard to move.

    1. Re:Going about it all wrong by DarkOx · · Score: 1

      build a box out of 2x4s and Drywall scraps form homedepot. Make it 5 sheets thick and it'll withstand any housefire you could possibly have

      I find that statement suspect. I am not saying you are wrong but extraordinary claims require extraordinary evidence.

      I have seen some pretty nasty house fires, the kind were the fire department sprays water on the neighbors houses to keep them from catching rather than try to do anything about the one that is actually burning. With all the modern synthetic materials in furniture, carpeting, and other flooring a house fire can hit 600 degrees and stay that way for hours.

      If five sheets of dry wall ( .5" or .75" ?) was sufficient to insulate a hard disk such that it will useable later, in what amounts to a 600 degree oven for an extended period; I suspect all our outside walls would be five layers thick with the stuff. Cheap as dry wall is the energy costs in terms of heating and cooling would be recouped in a few years if it were that good.

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
    2. Re:Going about it all wrong by Anonymous Coward · · Score: 0

      It's only an extraordinary claim to someone who post's all day on Slashdot and doesn't otherwise know jack.

      Here's another extraordinary claim.

      Even morons know enough not to put DRYwall on the rainy side of the house.
      Your fire department sucks.

  39. Read it from Torvald's lips by zapyon · · Score: 4, Funny

    "Only wimps use tape[*] backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)"
    Linus Torvalds (1996) http://en.wikiquote.org/wiki/Linus_Torvalds

    (Isn't that prescience of "The Cloud"?)

    ––––––––––
    * replace this with your favorite backup media of today ;-)

    --
    I like my spaghetti with source.
    1. Re:Read it from Torvald's lips by rvw · · Score: 2

      "Only wimps use tape[*] backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)"

      Linus Torvalds (1996) http://en.wikiquote.org/wiki/Linus_Torvalds

      (Isn't that prescience of "The Cloud"?)

      ––––––––––

      * replace this with your favorite backup media of today ;-)

      "Only wimps use ftp[*] backup: real men just upload their important stuff to the iCloud, and let the rest of the world mirror it ;)"

      An Amazon support employee (2012)

    2. Re:Read it from Torvald's lips by wvmarle · · Score: 1

      Linus should have patented his idea! He could have become a rich man.

  40. ZFS by Anonymous Coward · · Score: 0

    Connect 8 x 3 TB USB drives (more if you want RAIDZ), add all the disks to a pool and copy your data. If you need more space later, just add more disks to the pool. This will obviously be slow, but if what you need is a navigable copy of a lot of data, once you've made the copy, it won't matter.

    This is what ZFS was designed for. I use Solaris & OpenIndiana, but there's a MacOS port, MacZFS.

    1. Re:ZFS by jampola · · Score: 1

      Problem with ZFS is that he'll need a shit ton of RAM. Once ZFS eats into swap, everything goes slowmo!

    2. Re:ZFS by repetty · · Score: 1

      Problem with ZFS is that he'll need a shit ton of RAM. Once ZFS eats into swap, everything goes slowmo!

      RAM is free, or nearly so.

    3. Re:ZFS by sh3rp4 · · Score: 1

      I don't think 12 gig is a shit ton. if you are not doing dedupe ram usage is reasonable. and there is no reason to do dedupe on this.

    4. Re:ZFS by ixidor · · Score: 1

      serious question: i am getting to the point where i will have 24 x 2tb drives connected. i have heard a rule of thumb you need 1 gb of ram per 1 tb of drive space, will i really need ~48 gb of ram? am considering a server grade board that can support 32gb ...

  41. Considerations by Anonymous Coward · · Score: 0

    1 You need to maximise your computer's ram
    2 Firewire is definitely preferable to USB, its much faster, you can get discs which offer both.
    If you are actually going to be copying from another USB device and not an internal hard disc, thats even more necessary
    3 Use high capacity, high speed discs, minimum a terabyte
    4 I use the OS facilities from my internal drives to the BU drives and just copy the drive. If there is space left for another drive I'll put more on. I never split a drive across one or more BU drives. Thus its easy to keep track of what's where. Wastes space, but Gb are cheap.
    5 If you are going to overwrite data on a BU drive then format it first, it will then work faster and there is less likelyhood of errors.
    6 If the data is really important to you, then you have an additional bind. For that sort of data, you need to also have a copy in another location in case where the computer is gets wrecked for some reason – that takes out both the original and the backups. If its at work then your home should suffice to store the BU.
    JS

  42. Kim Dotcom by Albinoman · · Score: 1

    It's a little late to be asking that now.

  43. ZFS + Tape Backup by Anonymous Coward · · Score: 0

    Online RaidZ ZFS with dual parity + ongoing offsite tape backups is the only way I would conduct this backup.

  44. This is the key question by Anonymous Coward · · Score: 0

    Let's say both the primary file and a 1TB backup disk fails. Is the damage felt by OP equal to 1/24 of his happiness or less? Then multiple drives is very justified. Is there a chance this 1TB drive contains a database that makes half the data completely useless when it crashes? Then multiple backups/redundancy is required. This is a vital piece of information to make a recommendation.

  45. Count Bacula by freaker_TuC · · Score: 2

    Count Bacula as your friend ;) -> http://www.bacula.org/

    --
    --- I am known for the ones who want to find me on the net. Is that a privacy risk or a privilege? One might wonder..
  46. What do you have now? by DarwinSurvivor · · Score: 1

    Sometimes the easiest way to duplicate (back up) data is to simply duplicate the hardware it's already on. If it's on a 16-disk (x 2TB) NAS system, build another one. If it's on tape, buy more tapes, if it's on random HDD's scattered all over the place, then you have bigger problems to deal with first (like building a NAS box)!

  47. Backup advice by Anonymous Coward · · Score: 2, Insightful

    I do things like this all the time with a data set about half of that, ~ 12TB. You didnt say anything about what the data is but from the request and the fact you mentioned USB I would gather this is your typical warez hording mp3/flac, mkv, apps and also a personal picture and video collection of fam.

    Here is a checklist i would execute similiar to mine. I find the most reliable way to keep your data over the years is by following a checklist or procedure and choosing when to move to the next storage platform.

    Step 0: Get USB out of your head. Pop upon the drive and attach it to the native bus, PATA, SATA. if SATA may want to invest in ESATA cases. Its not solely the speed. I have done stupid things like this, in which the data backup takes over 2 days, and on the 2nd day some unrelated event affecting my USB bus causes all kinds of problems with the transfer. Over time doing cheesy things like this affects other things, like doing stupid shit in real life, usually with duct tape or guerrilla glue, then you have your wife on you. Right now your wife may not catch on to this, but it will escalate. Just do shit the right way.

    Step 1: Organize. Actually understand what you are backing up. I never got into these tools like google desktop that allow a user to accept the fact that he/she has no idea where their files are. Understand and make an effort to organize your files before you back them up and know the capacity of each 'genre' of crap you are backing up. Run a tool like 'jdiskreport' to find this information out after you organize. Create a mapping on paper of where shit is going, zork style. If you have really important shit like family pictures, taking up say 200GB, and your mkv collection is 12TB, you may want to make 2x copies of your family shit. Anything you download off the internet is easily replaceble despite how obscure your tastes may be and will turn up again. I would question even backing it up but that is another conversation.

    Step 2. Label your drives accordingly to your documentation.

    Step 3. Format the drives in the most likely native format you will use and are familiar with. If you are a noob linux guy who runs Windows 7 all the time, dont be an idiot and experiment with your backup on ext3. It is not that ext3 is a bad filesystem, but you may not be the most skilled in restoring your data in various scenarios. For example im a linux and solaris geek but am just getting into macs --- im not comfortable enough with mac failures enough to store my crap on a mac fs. Whatever your skillset is, dont use the most optimal file system on paper, use what you know, even if it is NTFS (which imo is very reliable).

    Step 4. Copy your shit over using your knowledge of your data organization and native OS commands or tools.

    Step 5. Run a checksum on your important stuff and store the hashes to verify everything is fine over time. Odd situations occur when backing up data. I have run into cases where i didnt realize the files i was about to backup were bad/corrupt until i saw the good copy on a backup drive i was about to incrementally overwrite.

    Step 6. Store the shit somewhere else if you can reasonably do this and feel confident in the security of your data. If you have to start encrypting your crap, you add some more complexity that can effect the reliability of your restoration, but again if you proceduralize and keep up on it you will be fine.

    Backup design and integrity is hard work and serious business when dealing with large volumes. It reminds me of the Seinfeld episode where he goes to the car rental place and they dont have his car and he goes into his "Anyone can take the ticket" diatribe. Anyone can back up their data. But can you get it back? I am not an expert in this area and dont pretend to be, i am just a seasoned IT administrator who has performed alot of backups in my day and have managed to keep most of my data safe over the years.

    1. Re:Backup advice by Anonymous Coward · · Score: 0

      Potty mouth detected. :p

  48. Use checksums (MD5 etc.) against bit errors by Anonymous Coward · · Score: 0

    When moving really large amounts of data it is not unlikely to see an incidental bit error, especially when new hardware is involved. Data on disk is generally safe because of ECC. But pumping that much data through RAM, associated controllers and all the non-ECC protected buses on a mainboard will increase the chance of experiencing bitrot because of tolerance or thermal issues. At some point it is just a matter of statistics.

  49. Why USB? by Anonymous Coward · · Score: 0

    I really have to ask why USB? Your looking at a top speed of 40MB/s on USB 2.0, more commonly you get 20 to 30MB/s.

    Either a cable designed to hot swap drives, or a drive bay would work a lot better if a NAS is out of the question. The cable solution involves a SATA cable which has both the data and power lines bundled together on the HDD side. Reduces the risk of damaging the HDD when pulling the plug or plugging another in. A drive bay would cost a bit more, but is significantly less risky and much easier to use.

    Even putting the destination drives in another machine and connecting the machines with a crossover cable will be much faster than USB 2.0 speeds. I really wouldn't suggest going through a router unless you take care to keep the router cool (a desk fan should be sufficient).

    That leads me to my last point. When transferring that much data, the machine(s) are going to get much hotter than they do even in intense computational work. Your going to want to pop the side of the case off and set up a good fan to pull heat away from it. Easiest way to kill a HDD is to let it run hot for long periods of time.

  50. Keep it simple by jampola · · Score: 2

    # rsync -avz /this /that. Split your directories corresponding to the sizes of your drives. If on Linux, run smartctl -H /dev/sdX to check your disk health and if possible, take the HDD's our of their usb enclosures and connect them directly to SATA for faster xfer speeds. These drives will 9/10 mount just like a normal drive since usually they are just a normal drive housed in an enclosure.

    Good luck :)

  51. Crashplan by Anonymous Coward · · Score: 0

    Why not use Crashplan http://www.crashplan.com/consumer/compare.html
    50$/Year for one computer and unlimited data or 120$ for 2-10 computers and unlimited data.
    Cloud is the way to go!

  52. 20GB + 3MB = 24TB ????? by Anonymous Coward · · Score: 0

    Someone don't know how to count... The answer to this question will be easier if we know that we are helping you with a solution to backup ~20GigaBytes of stuff or 24TeraBytes....

  53. man tar by flyingfsck · · Score: 1

    Plug all the disks into a USB hub. Ensure that each one has a unique volume name eg bak1, bak2... The old skool way is to make a little tar script and use volume spanning. Otherwise, configure all the disks as a single JBOD and run DejaDup.

    --
    Excuse me, but please get off my Pennisetum Clandestinum, eh!
  54. relability, seperate, ? by Anonymous Coward · · Score: 0

    It depends on whether you are looking for reliability or a separate copy.

    For reliability, you should be using a large raid array (Raid-6)?

    For a separate backup, cobbling together a collection of external hard drives sounds painful. Recovery from them would be even more painful. You want a tape drive.

    Or you may consider creating archive copies of each of your files, possibly writing them to a blu-ray disk. Kinda slow, depends on how often you need to backup the files.

  55. Why!? by Anonymous Coward · · Score: 0

    24 TB of backups is not done with hdd. They invented tapes for that kind of work.

    24TB = 25165824MB. Even with a 20MB sec write speed (I assume 24TB SSD will be too expensive) it will take 14 days.

  56. Software solution by Anonymous Coward · · Score: 0

    A new software coming out in alpha this month will solve your problem. Check out Infinit (infinit.io). It's a distributed network that integrates into your file management system like DropBox. It encrypts al of your data and stores it in chunks on a p2p network that you can create with other users. If you're going to buy hardware to connect to your own network, then it makes everything your storing accessible on demand from any device. It's secure and safe and it also will give you instant access to all of your files via streaming. There's a wikipedia article here: http://en.wikipedia.org/wiki/Infinit

  57. Linus did become ... by zapyon · · Score: 1

    a rich man by NOT patenting stuff (i.e. using the GPL2 for Linux). So why shouldn't he do the same with other stuff? Also, I guess there is loads of "prior art" regarding this cloudy PR talk of today.

    --
    I like my spaghetti with source.
    1. Re:Linus did become ... by CastrTroy · · Score: 1

      I guess it depends on your definition of "rich". While he probably isn't hard up for cash, neither are most extremely competent software developers. Bill Gates on the other hand is undeniably rich, although he is probably not as good of a software developer as Linus Torvalds. I don't think that Linus wrote Linux to become rich. Linus probably has a lot of money, and he didn't patent stuff, but correlation is not causation. A smart guy like that could have probably been quite a lot richer if he had chosen the other side.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
  58. Fishy by PopeRatzo · · Score: 1

    "Hi there ! I'm looking for a simple solution to backup a big data set consisting of files between 3MB and 20GB, for a total of 24TB, onto multiple hard drives (usb, firewire, whatever)

    Private Manning, is that you?

    --
    You are welcome on my lawn.
  59. Damn! by robbie73 · · Score: 2

    Damn, that's a lotta pr0n!

    1. Re:Damn! by Anonymous Coward · · Score: 0

      Not really. What, honey?

  60. My experience by Anonymous Coward · · Score: 0

    When faced with similar situations, cheaper can be better. Put multiple DVD burners in the computer (all separate buses, or else burn speed suffers), run QuickPar on the source directories and burn those directories and par files endlessly until the job is done. The discs already have error correcting codes, plus with the recovery of QuickPar it should last about a decade - even with cheap media if you avoid a variable or harsh storage environment.

    Discs are one step up from etched in stone, IMHO. They just don't make stones these days (last) like they used to. I look forward to the day where archivists use a plasma cutter to burn barcodes into granite (or something else as effective). That'd be a permanent optical medium.

    1. Re:My experience by Anonymous Coward · · Score: 0

      I look forward to the day where archivists use a plasma cutter to burn barcodes into granite

      Plasma cutters don't normally work on rock because it's an insulator. It'd have to be really thin rock on top of metal (low dielectric rock, quartz perhaps), carbonaceous (slightly conductive), or there'd have to be a thin metal sheet over the rock, and the rock gets marred by the cutting of the metal.

      For example, any white stone covered in a thin layer of black graphite would do the trick (assuming reading it back with a special laser barcode scanner is intended). If the graphite wore off eventually, the scarred rock beneath would still hold the information.

      The idea would be to avoid precious metals, because that would compromise their safety in the long run.

  61. arj -xvf by gl4ss · · Score: 1

    was it that?

    and to extract arj -va.

    there, problem solved.

    --
    world was created 5 seconds before this post as it is.
  62. shhhh blizzard might see by Anonymous Coward · · Score: 0

    shhhh blizzard might see

  63. ZFS snapshots by the_B0fh · · Score: 1

    Why go through all that? Set up a ZFS volume, and snapshot it to another ZFS volume, and then offline that. Put it on a sata cage, and you can just take it with you when it is done.

  64. Backing up Linux large datasets via USB by Anonymous Coward · · Score: 0

    I'm assuming you have a bunch of disks that are old and of different sizes. I'd recommend assuming no disks will crash in your backup set (you really wont be using it for extended periods) and creating a large JBOD partition using mdadm in Linux. I'd also recommend using ext4 for a Linux filesystem which probably means getting the latest code from GIT because probably some features are missing from your version of ext4 to create a large enough partition. Assuming you don't need anything fancy for file permissions, rysnc will probably be your best copy tool. As other have pointed out, the physical transfer will take days and will bottleneck at the USB interface. A "problem" with Linux us that the hard disk portion of the transfer is fast in comparision so as much RAM as possible will be internally used by the Linux kernel for buffers to support the transfer (this is not the disk cache incidentally). This means when you want to run a program while the large transfer is in progress, you will have to wait a long time for sufficent memory to free up. You can work around this by inserting a large number in /proc/sys/vm/min_free_kbytes

  65. Why don't you just... by Anonymous Coward · · Score: 0

    Takes the drives out of the enclosures, put them in a system and copy the data over network.

  66. use mhddfs by Anonymous Coward · · Score: 0
  67. LTO tapes by Anonymous Coward · · Score: 0

    Tape!

    Tape it out. You don't store giant data sets on hard drives as a back-up. Store them on LTO-5 tapes, which are 1.5 TB each. LTO-4 tapes are 800 GB, and I don't know how pricing works out for the tape writers and tapes to decide which is better for you (I'd suggest LTO-5 if you're making a long term investment and likely to have many data sets to deal with in the future).

    LTO tapes are meant to be archival quality. They're meant to store your data for 15+ years. Hard drives, by contrast, fail easily and aren't meant for archiving anything. Naturally, you should still make 2 copies if this is very important to you.

    If you subsequently go somewhere that doesn't have an LTO drive, then any local data recovery service will be able to get your data off tape and back onto hard drives easily if and when you need it.

    This is how we make backups of large data sets in nuclear physics. My career depends on these data sets, so I have a lot invested in backing them up correctly.

  68. One word: cpio by SlashDev · · Score: 1

    Check out cpio under Linux or many Unix flavor OS, cpio can span backups over multiple target media. Make sure to test backup AND just as important: restore.

    --

    TOP DSLR Cameras Reviews of the top DSLRs
  69. on the cheap solution by Anonymous Coward · · Score: 0

    All the answers are what I expected: there are lots of professional or high cost solutions to your problem like raids, based, tapes but if you are after a on-the-cheap solution this is what I did: I bought a 10-way USB board on ebay and I have a stack of external drives plugged into it. On my Linux server I have all the drives combined into one with mhddfs. From linux side the drives look like a single directory. Obviously this solution lacks the error-checking or redundancy features of pro solutions but it is CHEAP - you don't need anything but an extra USB hub and a power board to plug in all the external drive plug packs into...

  70. Time versus money. by Anonymous Coward · · Score: 0

    If this is a one or two time occurrence, then just bite the awful time bullet and do the transfers. It will get done eventually. If this is an ongoing thing then Panasas in Pittsburgh specializes in terabyte and petabyte parallel backup systems. When I dealt with them maybe 5 years ago, their prices were not that bad. There should be similar companies by now, too.

  71. Time Machine? by Anonymous Coward · · Score: 0

    Since MacOS is an option, doesn't the latest Time Machine support multiple drives?

  72. LVM by bastafidli · · Score: 1

    LVM is another possibility. If you can get SATA drives and plug them all in, you can then create LVM volume spanning all the drives and just simply copy the data over to one large volume. LVM will take care how to span it across the drives.

    1. Re:LVM by Wolfrider · · Score: 1

      --LVM is not RAID. Trust me on this, I speak from experience. LVM in a virtual machine can be nice. However, after having an entire Linux LVM fail IRL, I switched to Freebsd + ZFS and never looked back.

      --
      .
      == WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
  73. Disk Utility RAID in OS X by Anonymous Coward · · Score: 1

    I have actually had to do this in an OS X environment before. We have an xserve hosting up about 30TB of data in small files, and we are scheduled to move away from the system, but we need backups in the meantime. My solution for the short-term was to create a concatenated "RAID" of 35TB worth of external hard drives connected via firewire AND usb, (the external drives range from 6TB to 12TB), and use retrospect to back up to the resulting volume. There is no room for anything but an up-to-date backup, but it's getting the job done until we move to a large RAID with offsite backup.

    Apple's software RAID as configured through Disk Utility is surprisingly versatile, and though my transfer speed is slow when the data hits a USB drive, it is entirely transparent to the software when switching between FW and USB. It is also fairly robust, because if there is a hardware failure on our server, we can take the disks, plug them into another mac, and the RAID configuration is maintained without any futzing around (as the config is listed on the beginning of each volume).

    Now, before everyone goes apeshit on me for using a concatenated set instead of a RAID solution, there were a couple limiting factors in my decision to concatenate rather than RAID 5/0, the major one being the range of sizes of external drives that we have, and a lack of funds available to purchase more. OS X's software RAID goes by the lowest common denominator (6TB in my case), so I would lose ~1/2 to 1/3 of my space if I used ANY of the RAID options, and I didn't have any space to spare.

    I feel your pain, and good luck.

  74. Not really a solution to backing up but... by Anonymous Coward · · Score: 0

    Has anyone posted ^A Shift-Del yet?

  75. "insert disc N and press continue"... by Anonymous Coward · · Score: 0

    I would advise against using anything like that. If one disc fails...you're entire backup is gone. Unless you can logically separate the data as the guy says regarding brunette/blonde etc.. lol.. then you can just backup each portion onto a different disc

  76. Rar files? by Agent0013 · · Score: 1

    I guess there could be issues with space while making the rar files, but they can break the archive up into chunks of any size you desire. You will need all of them accessible to unpack them again though. Perhaps it isn't the greatest solution, but it may do what the poster wants.

    --

    -- ssoorrrryy,, dduupplleexx sswwiittcchh oonn.. -Quote found on actual fortune cookie.
  77. The Cloud? by fongaboo · · Score: 1

    Just put it in the cloud... *rimshot*

  78. BackBlaze by minijedimaster · · Score: 2

    A cloud backup service released information on how they build their own disk based backup servers. Maybe something that would help with your endeavor? http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/

  79. Online backup: USB 3 ... or thunderbolt. by sonamchauhan · · Score: 1

    Get a PC with 6 USB 3 ports, connect a powered, 4-port USB 3 hub to each PC USB port. Then connect 24 1TB external USB HDDs (or SSDs) to the hubs, format as necessary and run your backup software.

    Thunderbolt may be higher performance and have daisy chaining capabilities. But the USB solution should work just fine.

  80. Simple answer: don't. by dandaman32 · · Score: 2

    I work for a data backup company as a dev monkey/admin/jack-of-all-trades.

    Do you ever want to restore these backups? If the answer is "yes" (and it should be, otherwise why are you backing up in the first place...?), then you need to be guarded against failure of an individual disk. That means you need some sort of RAID solution.

    For reference, Datto's 3U nodes store 20TB across 14 2TB drives, and the next larger size of node we have is somewhere around 55TB in 4U. No, I'm not trying to sell you our hardware (we only sell to resellers anyway) but hear me out. You really are going to save yourself some headache if you build a NAS device.

    USB 2.0 is SLOW AS BALLS. I see our USB seed drives (HDDs we mail out to customers to get their initial datasets up into the ether) max out at 20-30MB/sec on a good day. By comparison, Gigabit Ethernet will give you 112MB/sec after NFS/TCP/Ethernet overhead -- much better. For this reason, and because it's just so impractical to handle large collections of failure-prone USB drives, our largest round trip drive that is shipped as USB is 4TB. After that, we actually ship our customers NAS devices (usually a returned/development box with a different OS image on it).

    Go with NAS. You need the resilience against disk failure, you need the additional speed, and while yes, it's a greater investment, the alternative is utter agony when one of your 12 2TB disks takes a dump.

  81. How many USB ports do you have? by DarthVain · · Score: 1

    I know you are likely trying to do this for a cheap alternative, but just don't. It is really an unworkable solution for that amount of data.

    Some have mentioned Tape, which I know very little of. However I would simply build another RAID machine to copy to, or use a NAS if you can find one big enough, as it amounts to pretty much the same thing, but more specialized.

    If this isn't sensitive data, another option might be to cloud it. Amazon and a few others have some competitive prices. The advantage here is you additionally get off site backup.

    I guess one of the key factors in your decision will be how refreshed this 24TB of data is. WIll it only get occasional updates, or will a big chunk need to be backed up regularly. That is the other question, how often will you need to back up? Lastly, how quickly do you need recovery?

  82. No offense meant, but ... by Anonymous Coward · · Score: 0

    You're doing it wrong.

    Single drives sitting on a shelf is not a "backup." You need to invest in renting or purchasing a tape drive and some tapes. THAT is long-term reliable backup. Those USB hard drives are a disaster waiting to happen.

    1. Re:No offense meant, but ... by Outtascope · · Score: 1

      That's a load of crap brought to you by the people who would rather that you pay 10 times as much for 1/10th the performance and 1/10th the capacity.

      But let me rebut that in a more logical fashion. Tapes take considerably longer, meaning the backup strategy ends up backing up less than is optimal.

      Sure, disk backups are fragile. But if you system is going to be borked by the failure of a backup volume or two, then I would posit that your backup strategy is a disaster waiting to happen regardless of the media that you use.

  83. JBOD Drive Array by Brewster+Jennings · · Score: 1

    Sans Digital Makes an 8 slot drive enclosure with either a PCI-E or USB 3.0 interface for about 350 bucks. Put 8 3tb drives in it, run it JBOD. You can buy the cheap 3tb drives because you're going to run them JBOD. At 150 bucks a drive, Your total cost is about $1600.

    You might be able to get Windows to do Incrementals to those drives, although I haven't tried it myself. And remember to run the enclosure sparingly, because non-enterprise drives aren't rated for the same number of spin-up hours.

    Of course, it's not as safe as putting everything on a billion optical disks. But even using a BD-rom (at 46gig a pop), you're talking about 534 Blu-rays, and that's pretty much ridiculous, unless you have an intern you really dislike or something.

  84. Data Deduplication by Aphonia · · Score: 1

    USB seems inane and insane for that level of data. How redundant is this 24 tb of data as well? Running it through a data de duplicator (possibly to reduce storage requirements depending on the type of data) and then a tape drive or raid array may be a cheaper and more time effective option.

  85. aufs by Anonymous Coward · · Score: 0

    I backup a 10tb array to multiple usb HD's using aufs. I have the aufs mount configured to drop new files onto the drive with the most amount of free space ( I simply add drives as the data in the array gets larger ) but aufs supports other modes like round robin. I then rsync the data from the array to the aufs mount as a nightly cron job.

  86. Impact of USB xfers on Linux performance by Anonymous Coward · · Score: 0

    How about Linux system performance while doing even a single 2.0 USB copy? It seems to gank things up.

    My system gets really sluggish, in odd ways. Mouse focus updates are very slow, and sometimes the mouse pointer gets left in odd states because of oddness that that my window mgr (Enlightenment) doesn't seem to anticipate.

    I find I need to slow down my mousing and browsing to avoid issues.

    Debian Squeeze user on a Thinkpad T61, 2.6.32.

  87. Not simple! Geez. by whitelabrat · · Score: 1

    "I'm looking for a simple solution to backup"

    And USB drives are your idea of simple? Seriously? Please hand the lady your Admin card at the door when you leave.

    For 24TB if you wan't to have a job after someone asks you to restore a chunk of that you'll want to insist on tape. Or perhaps a equally sized NAS or SAN array. USB? Hope your resume is up to date.

  88. External RAID enclosure by kimvette · · Score: 2

    You buy one of these:

    http://www.newegg.com/Product/Product.aspx?Item=N82E16816322007

    populate it with 4GB drives and create two RAID5 (or one RAID6) array, then you've got 24 or 28 TB of backup space, without having to change drives or break up your backup into smaller chunks.

    But really, your backup methodology is broken; you need to organize the data into manageable chunks because aside from a large dedicated backup server/SAN, there is no reliable (don't tell me tape is reliable) backup solution for a such a large quantity of data in a single chunk.

    What I do for backups: in my 24-bay server I have eight large drives in a (HARDWARE) RAID5 array (were 4TB drives available at the time I'd have gone RAID6) and rsync the virtualized server contents to that, then archive them into tarballs, and send copies of them across the LAN to another server that is running (HARDWARE) RAID5 as well. Every once in a while I back up the critical data (source, scripts, financial data, production web sites, /etc, and so forth but not the program binaries nor system binaries which are easily recreated or reinstalled, respectively) to optical media and external hard drives.

    So what I have in summary is:
    * Massive server with a backup array separate from the production array
    * Separate backup server running another array (again, using a quality HARDWARE RAID controller. Safeguard your data and don't bother with Intel, Adaptec, Promise, or Highpoint "hybrid" RAID)
    * Periodic backups of non-recreatable data to USB drives and optical media that are moved off site.

    --
    The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50
    1. Re:External RAID enclosure by AmiMoJo · · Score: 1

      Hardware RAID is a bad idea. If the card fails you need a compatible one. With software RAID any machine running the same OS can read it back.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    2. Re:External RAID enclosure by jamesskaar · · Score: 1

      and i've heard that linux software raid outperforms many hardware raid varieties...

  89. Windows 8 Storage Space by Barlo_Mung_42 · · Score: 1

    Plug the drives in. Tell Win8 to treat them as one large drive. Good to go.

  90. NAS by Anonymous Coward · · Score: 0

    Make a NAS server with enough space in a RAID5 array and do the transfer via the network.

  91. its possible, but risky by Supp0rtLinux · · Score: 1

    If you're willing to deal with the time it will take to write it all out, then its doable. You need a backup software that supports VTL (virtual tape library). With this, the physical drives are seen as tape devices. So it will start writing to drive #1 and when its full it will say "out of media" and it *should* pause for new media. You "eject" the drive, attach a fresh one, and hit continue. Then wash, rinse, repeat til complete. As others pointed out, it will take some time. You can speed it up with eSATA or USB 3. If you're on a Mac, you can speed it up using t-bolt. I believe Arkeia still offers a free version and they did/do support VTL. Haven't been current on free backup wares for a while. One thing to bear in mind as well once you write this 24Tb to a collection of media any single media failure will result in all data being unrecoverable. So you might opt for doubling your backup window and making a duplicate copy. Otherwise your best bet is to put all the drives in a NAS configuration (think FreeNAS) with a RAID6 structure, then have the backup s/w use this as its destination. You could do this with an 8 drive chassis of 8x4Tb SATA disks (2 lost for RAID6, leaves 6x4TB=24Tb raw). A similar idea could be accomplished with ZFS, but its future is somewhat unknown with Oracle these days. If you need longevity, I'd stick with a more open/compatibly filesystem. If you manage to setup it correctly and use exFAT, you could mount the backup volume to any current Linux, Windows, or Mac system and if the backup s/w runs on all platforms you'd have a lot more compatibility and recovery options.

    1. Re:its possible, but risky by Wolfrider · · Score: 1

      --Dude - ZFS has been ported to FreeBSD, and has been running on that OS for some years now. And as ZFS is the best thing to come along since the invention of the hard drive, Oracle would have to be completely insane in the membrane to try and drop it.

      --
      .
      == WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
  92. Symlinks script by Anonymous Coward · · Score: 0

    Make a script that creates symbolic links to all your files. Not much extra space required.
    Script figures out which files will fit based on output disk size (X GB), and puts links in created \Disk1 \Disk2 subdirs.

    Then copy the DiskX subdir (follow symlinks) to HDX. Something like this surely exists by now.

    1. Re:Symlinks script by Anonymous Coward · · Score: 0

      man dirsplit

      http://linux.die.net/man/1/dirsplit

      dirsplit is designed to for a simple purpose: convert a directory with many multiple files (which are all smaller
      than a certain medium, eg. DVD) and "splits" it into "volumes", looking for the optimal order to get the best
      space/medium-number efficiency.

      The actual action is either adding the files to mkisofs catalogs or real moving of files into new directories (or
      creating links/symlinks). The method is not limited to files, whole directories can also be handled this way
        (see various filesystem exploration modes).

        -m|--move Move files to target dirs (default: create mkisofs catalogs)
        -l|--symlink similar to -m but just creates symlinks in the target dirs

  93. Use a portable file system like ZFS by sh3rp4 · · Score: 1

    we do much the same thing. we have a backup nas that we then rsync to a set of "offsite" drives.

    My recommendation would be to investigate ZFS. (picture software raid and LVM rolled into one with filesystem encryption and compression built in.) Easy to compile and install on linux.

    We created a pool for the offsite drives, then rsync to that file system. "Export" the file system and take the drives out. (Hot swap in trays, buy extra trays for rotation drives.) When you need to put in the next set just put them in and import it. Order and placement does not matter as long as enough drives are in. You could even have one or two parity drives in case a drive fails.

    We have a cron job that rsyncs to the offsite drives, then exports them and emails the admins that it is ready for rotation. We keep 2 sets, one is in all the time and the other rotates offsite. You can swap on whatever schedule you are comfortable with. With compression, depending on data you could easily cut your drive requirements in half. Turn on encryption to keep your porn safe while in transit. All you need is a hot swap JBOD chassis. you could backup directly to the removable filespace, or do what we do, backup to a set local (local to datacenter, not to machine) filespace and rsync it over regularly.

    It is something else to learn, etc. But it is a system that works well.

  94. Simple way by Anonymous Coward · · Score: 0

    Prepare a week long vacation, before you leave, copy & then paste.

  95. Linux FUSE: mhddfs filesystem by Anonymous Coward · · Score: 0

    The mhddfs FUSE (filesystem in user space) for Linux is good at this sort of thing.
    It combines a bunch of "real" filesystems into a large single-filesystem storage pool.

    So take eight drives, each 3TB in size. Partition/format each of them with a single large ext4/xfs/whatever filesystem.
    Mount all eight of them. Then issue a mhddfs command to create a new mount that pools the storage from all eight drives
    into a single 24TB filesystem. Copy your data there. mhddfs will allocate individual files to individual drives, and the underlying
    filesystems can be accessed without mhddfs involved if you like.

    A very powerful tool -- should really be in the kernel, but isn't.

  96. Np-complete by Anonymous Coward · · Score: 0

    This problem is NP-complete. It is a bin sorting problem regardless.

    Use an archival tool that allows you to specify chunk size and set it to be 1/drivesize or something. Use 10% parity files.

    Or back up onto s3 with something like s3ql.
    That would probably be cheaper and less of a headache, since space isn't a concern.

  97. doing it wrong by Anonymous Coward · · Score: 0

    Stop doing it wrong. There is no reason to do this over USB.

    Buy an md3000, pack it out with 1tb disks. If that is not enough, get another md1000, and daisy chain it to the md3000.

    In another job, I had put 2 md1000's on a single box, and use it as a backup server. The total was 24TB.

  98. Partimage by Anonymous Coward · · Score: 0

    Partimage can do this.

  99. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  100. skip the USB by whitroth · · Score: 1

    If you've got that much data, with a setup like that, you can afford to buy something better than USB. Consider eSATA, though I, personally, would push for a simple, fast backup server.

    However you go, and I admit to not having read all the comments, you didn't mention how often the backups need to occur. Here, were we've got terabytes of data on many systems, we do a nightly rsync, and use hard links, which speeds it up and decreases space usage.

                    mark

  101. The only answer is by herve_masson · · Score: 1

    42

  102. 8 disk USB3 enclosure + 4TB drives by ekimminau · · Score: 1
    --
    Armaments, 2-9-21 And Saint Attila raised the hand grenade up on high, saying, 'O Lord, bless this Thy hand grenade' N
  103. GAFFitter by Anonymous Coward · · Score: 0

    If you want to minimize the number of volumes required to pack your files I recommend GAFFitter (http://gaffitter.sf.net/), which reorders a set of files/directories to best fit the volumes and so to avoid waste of space.

  104. compressible porn by davidwr · · Score: 1

    If it turns out that the source data is not porn (unlikely) and is highly compressible

    Would dirty photos of his blow-up doll count as "compressible?"

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  105. I guess it depends on HIS definition of "richt": by zapyon · · Score: 1

    quoting Linus again: "First off, I'm actually perfectly well off. I live in a good-sized house, with a nice yard, with deer occasionally showing up and eating the roses (my wife likes the roses more, I like the deer more, so we don't really mind). I've got three kids, and I know I can pay for their education. What more do I need? The thing is, being a good programmer actually pays pretty well; being acknowledged as being world-class pays even better. I simply didn't need to start a commercial company. And it's just about the least interesting thing I can even imagine. I absolutely hate paperwork. I couldn't take care of employees if I tried. A company that I started would never have succeeded – it's simply not what I'm interested in! So instead, I have a very good life, doing something that I think is really interesting, and something that I think actually matters for people, not just me. And that makes me feel good." http://en.wikiquote.org/wiki/Linus_Torvalds

    --
    I like my spaghetti with source.
  106. That should have been "rich" (n.t.) by zapyon · · Score: 1

    no text

    --
    I like my spaghetti with source.
  107. A few comments on TAR by Anonymous Coward · · Score: 0

    I've done this (tar backups) for ages, partly from back in the day when I *had* to backup to cdrs/dvdrs, and partly from a desire to be able to more easily restore a partially corrupted backup.

    Watch out for ACLs and Sparse files. They can cause grief. (Test before you rely on tar's -S flag.)

    Watch out for using bzip2 (-j) and gzip (-z) compression. Aside from greatly slowing things down, the output stream is compressed rather than the individual files and thus a single bitflip can render the remaining tarfile (tarfiles?) unreadable.

    Watch out for system pseudo-directories (/sys, /proc, /dev, etc). Letting tar backup /dev/hda can be a mistake.

    You can mount all your backup drives and specify --file= multiple times without a tape-changing (disk-mounting) script.

    You can use -F, --info-script=NAME, or --new-volume-script=NAME to run a script at the end of each file (tape), umounting and mounting new disks.

    Older versions of tar used to have problems with long file/path names. Shouldn't be a problem these days, but it gave me headaches half a decade ago.

    One Grand Final Rule: Don't backup the backup tarfiles. It just doesn't end well.

    Oh, and consider eSATA or firewire or at least USB3. Disk throughput will (obviously) be a huge issue on a backup of this magnitude.

    --Anon. (Don't have time to find my old Slashdot password right now. Machine it was stored on died and was replaced, about 6 times over now. Someday I have to dig through my old backups and find that thing. Really glad I used tar and not something obscure/closed-source/obsolete.)

  108. Build a USB array by freshlimesoda · · Score: 1

    Use FreeNAS to manage RAID on the array. And rsync. Yes, you may have to do some handywork yourself. GTG!

    --
    I come to Slashdot only to read sigs. One you are reading is mine.
  109. Another option by sootman · · Score: 1

    I know this has nothing to do with USB and maybe the OP has very good reasons for wanting it on USB. In any case...

    Amazon S3 pricing:
    First 1 TB / month: $0.125 per GB
    Next 49 TB / month: $0.110 per GB

    (1 x 0.125 + 23 x 0.11) * 12 = about $32 per year for 24 TB. That's a lot less than buying a bunch of hard drives.

    --
    Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
    1. Re:Another option by Anonymous Coward · · Score: 0

      Uhm GB != TB
      I'm thinking there might be some scaling in there somewhere (like closer to $32k?)

    2. Re:Another option by Anonymous Coward · · Score: 0

      Or $3.2k per year even...

    3. Re:Another option by Synesthes · · Score: 1

      I think you might be off with your maths there.

      1TB/month @ $0.125 per GB would be $128/month
      Next 49 TB/month @ $0.110 per GB would be $lots/month

    4. Re:Another option by Anonymous Coward · · Score: 0

      If it is $ per GB, then for TB your maths ain't right and it's actually around $32,600 per year. You'll then also need either a lot of time or a lot of connectivity to get it there, and pay for the data transfer too. link http://aws.amazon.com/s3/#pricing

    5. Re:Another option by sootman · · Score: 1

      Ugh. Everyone is right, I'm an idiot. I mid-read the whole thing. I got as far as "First TB" and stopped, so yeah, I'm way off. Never mind me.

      I thought that sounded pretty low, but then I just thought "meh, maybe Amazon is just so huge they can do that."

      --
      Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
  110. JAFM by Anonymous Coward · · Score: 0

    But only 1GB/s is recorded

    Only?

    1. Re:JAFM by voltorb · · Score: 1

      I take it you have never logged into CERN.

  111. Just use rsync by Anonymous Coward · · Score: 0

    You can do all of this with rsync and some text editing/a script.

    rsync -av /mnt/srctree /mnt/destdrive > backup-list

    When that copy runs out of drive space, insert next disk,

    rsync -av /mnt/srctree /mnt/destdrive --exclude-from=backup-list >> backup-list

    You do have to do some maintenance on "backup-list" in two ways. First, rsync will list the parent directory before individual files as output when copying. Unfortunately if it doesn't FINISH that directory, and you then provide it in the --exclude-from, it will skip the entire directory. A simple script to run through the backup-list file and remove any entry with a trailing forward slash, e.g. "parent/child/child/ instead of "parent/child/child/porn.jpg" will cause rsync to inspect each directory next time and catch anything it missed.

    Secondly, the very last line of the "backup-list" file will most likely be in error. This will be the file that rsync was copying when disk space ran out on the destination. Delete that line and it will be caught on the next disk.

    Writing the script takes about a minute, rsync is in every distro, et voila, your complete backup solution for large volumes to a series of different sized small volumes.

  112. The real issue here is hardware. by Anonymous Coward · · Score: 0

    Most external HDD enclosures are limited in capacity around 2 Tb. Using larger drives is quite possible, but very unsafe for your data since a single connection to a BIOS unable to handle it and your could loose your file system, effectively erasing your drives. Be advised that NTFS is very vulnerable to this.

    Large drives have a hard time being on the same controller, because you are exceeding hardware limitations, and that goes for both the enclosure and the computer side controllers.

    SATA is GOOD, since it's 1 controller per drive.
    Firewire is BAD, limit yourself to less than 2^32 (3.7 Tb) total or you loose everything.
    USB is BAD, since they can to handle multiple drives per controller and you would need to look carefully at the hardware of any computer touching those drives.

    I propose to build yourself a file server with multiple drives, like a smaller NAS enclosure. The objective is to keep the drives and the hardware that operate them together.

    If you want to move all this data in a decent amount of time. You'll need to look into optical network cards. This might require computers on both ends.

  113. LVM+RAID by Anonymous Coward · · Score: 0

    Buy some USB hubs and maybe some SansDigital multi-drive enclosures. Hook them all up at once, build RAIDs out of each SansDigital chassis, and use LVM to aggregate the chassis. lvcreate, mkfs, and start copying the data.

  114. Question about throughput by TheLoneGundam · · Score: 1

    "I'm just a simple caveman, ..." with a mainframe background, so I have a question of curiosity here At what point does the bandwidth/throughput of the DMA start limiting the performance of your backup? In my world, DMA for I/O is called a "channel". We have many, and while there are a lot of nuances we could discuss, basically we try to segregate the I/O for the input to backup (disk) and the output of backup (usually tape) , and have the backup task process in parallel as much as possible - my nightly backup, for example, runs 9 parallel tasks, 9 being the limit that this particular backup program has. I could run multiple instances of the program, but then I have to have mechanisms to make sure I don't back up the same disk twice between two concurrent executions; with one instance and 9 tasks I can just say 'back up everything that's online at the moment'. So, the throughput is limited by the performance of the slowest devices, multiplied by the parallelism we are able to achieve. In the PC / server environment, does the DMA limit the I/O capability?

    1. Re:Question about throughput by amorsen · · Score: 1

      At what point does the bandwidth/throughput of the DMA start limiting the performance of your backup?

      With USB2? Never, it does PIO. The CPU gets to babysit the transfer. That is very much the exception in the PC world of today, every other bus in use today offloads nicely.

      In general, storage buses on PC's are rarely fast enough to saturate system buses, and therefore DMA does not limit throughput. If you are really lucky you can get perhaps 2GBps through a fast PC, but you can probably copy memory around at 10 times that speed. So just like the mainframe, it will be limited by the performance of the slowest devices multiplied by parallelism.

      --
      Finally! A year of moderation! Ready for 2019?
  115. Backup Exec by Vrtigo1 · · Score: 1

    Backup Exec does exactly what you are asking for. Free 30 day trial.

  116. MHDDFS FUSE module by n3xu5 · · Score: 1

    I ran across a FUSE module (mhddfs) that seemed relevant when I wanted to combine several USB drives into a single file system. My main goal was to make each drive usable independently for file recovery if I had to move it to another system.

    The module appears to be a fairly thin wrapper over an existing file system. It only appears to choose which of the sub-file systems to write new data to, automatically writing files to whichever drive has the most space. This provides nothing in the way of redundancy, however.

    What is nice is that you can easily access the files on a drive without needing the other drives. May be helpful for someone.

    http://romanrm.ru/en/mhddfs

  117. How to eat an... by Anonymous Coward · · Score: 0

    One byte at a time

  118. Use 2 - 3 networked HP N40L MicroServers by Anonymous Coward · · Score: 0

    Each has at least 6 external USB 2.0 ports, an eSATA port &
    for use as a possible back-up, up to 4x internal SATA HHD's,
    not to mention a Gigabit wired-network port.

    Using 3 TB external USB HDD's, of the same brand & model
    running, eg, FreeNAS, or your fav x86 (32- or 64-bit) op sys;
    boot from internal USB stick frees an internal SATA drive for
    use as back-up.

    It may not be the fastest, but it's a SIMPLE solution, that fits
    in a small space.

  119. Skip usb by Outtascope · · Score: 1

    Honestly, I know it isn't your question, but skip USB. Too slow. WAY too expensive. Get yourself a rocket raid card or similar, a sas expander, and an 8+ trayless disk enclosure. I use a 12 disk enclosure (8 for regular backups, 4 for all the one off stuff I do) with 2TB drives. I wrote a program in Java using NIO that stripes the backups across the disks so that it can saturate the bus. A solution like this will ultimately be faster and cheaper. One day I will port the code to native as the Java program was just a proof of concept that has worked so well I haven't gotten around to it. This setup works exponentially better than the VXA-3 tape backup we were using before, and couldn't imagine having to do it with usb drives, either from a cost or a logistics perspective.

  120. USB is trouble by EricScott · · Score: 1

    USB will copy files, but not identical copies. Firewire is better.

    But the best/cheapest solution, is a Dell MD-1000. It will take 2tb generic drives.

    1. Re:USB is trouble by gl4ss · · Score: 1

      are you implying that usb inherently corrupts files?

      --
      world was created 5 seconds before this post as it is.
    2. Re:USB is trouble by Anonymous Coward · · Score: 0

      That has been my experience when copying multi-gigabyte sized files. It's not the copy method either -- I've tried several methods. It is correlated to when the CPU goes high, but I've never been able to quantify it more.

  121. nfs over lvm2? by jamesskaar · · Score: 1

    if you build a small system, cheapish, an itx with 6sata, each connected to a port multiplier, os on flash... you could have 30tb from 1tb drives. set it all up as an lvm2 volume, then you can slap the drives back in a new system any way you want, and they'll come back up in the right order. rsync(backupmypc) will keep the backup in good condition, you'd of course need spares, in case the verify shows a drive fail, a duplicate system. using linux raid to turn the two nfs mounts into a raid 1 array would be nice, but parity would be better, yet it'd take even more drives. yah, bigger drives would be a good first start.

  122. should be modded up by gl4ss · · Score: 1

    this sounds exactly what the guy is looking for..

    --
    world was created 5 seconds before this post as it is.
  123. The Easy Way by jep305 · · Score: 1

    Step 1: Buy yourself something like this: http://www.aberdeeninc.com/abcatg/Stirling-X339.htm
    Step 2: Install it
    Step 3: rsync
    Step 4: Go do something else -- this is going to take a while

    --
    In Reason We Trust
  124. Well not too hard... by niftymitch · · Score: 1

    At $35 each you get a dozen Raspberry Pi's.

    While not fast you have a USB port and can connect them
    via ethernet and ssh and start tinkering.

    A good USB hub can turn one USB to four
    The local Costco has 3TB USB disks. Yes
    you have to organize your data into 2.8TB chunks
    or so with some script foo but rsync can help
    verify the bits.

    N.B. this is 10/100 ethernet not GigE and USB2 (at best)
    and they share a single USB link to an onboard USB hub.

    But you could automate the thing and not have to
    swap out USB cables for a week.

    MD5 checksums and an index...

    Let us know how it goes. ;-)

    No matter what you do you will have to do some
    scripting. Do label each of the USB disks
    (physical and logical names that match).

    Did someone way that this was a marginal
    idea? Backing up to USB has some value but does not
    sound magical and error free.

    Since 24TB is a lot of junk -- good luck
    but with the crazy big USB disks -- what the hey.

    --
    Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
  125. DLT? by HArchH · · Score: 1

    Shouldn't you be looking at DLT devices for this kind of dat set size?

  126. Network attached removable harddrive units by BagOBones · · Score: 1

    http://www.high-rely.com/

    We ran some of these for off siting data in rotation... Way faster than tape and designed for swapping... Might not be the best for long term storage.

    --
    EA David Gardner -"... but the consumers have proven that actually what they want is fun."
  127. It's nice by aglider · · Score: 1

    to see someone here knows this.

    --
    Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.