Ask Slashdot: Simple Way To Backup 24TB of Data Onto USB HDDs ?
An anonymous reader writes "Hi there ! I'm looking for a simple solution to backup a big data set consisting of files between 3MB and 20GB, for a total of 24TB, onto multiple hard drives (usb, firewire, whatever) I am aware of many backup tools which split the backup onto multiple DVDs with the infamous 'insert disc N and press continue', but I haven't come across one that can do it with external hard drives (insert next USB device...). OS not relevant, but Linux (console) or MacOS (GUI) preferred... Did I miss something or is there no such thing already done, and am I doomed to code it myself ?"
May be your limiting factor here.
Tomorrow is another day...
1.take all hard drives out of USB enclosures
2.install in PC with multiple SATA cards
3.samba
I believe you can daisy chain external drives together if you have the right cases.
For ease though, I'd consider a DroBo http://www.drobo.com/products/professionals/drobo-5d/index.php
http://www.bacula.org/en/
There's even a howto here:
http://wiki.bacula.org/doku.php?id=removable_disk
Curiosity was framed; ignorance killed the cat. -- Author unknown
Use 'dd' in linux
Are you REALLY sure that you want to use USB HDDs? The cost savings of using a box of HDDs may well be offset by the hassle in finding the backup software, the manual labor of swapping them, finding the correct drive to retrieve a certain file, etc.
How about a pair of Synology DS1512+ NASes? In addition to getting all of the storage online at all times, you get RAID support, etc.
I'm guessing you don't have enough space to split a backup on the original storage medium and then mirror the splits onto each drive?
Given the size requirements, it seems that might be prohibitive, but it would make things easier for you:
How to Create a Multi Part Tar File with Linux
Assuming you're not worried about backup speed, you could use a four-bay external hard-drive enclosure in combination with RSYNC and LVM on any linux variety. I don't know if they all do, but the MediaSonic HF2-SU3S2 supports 3TB hard drives per bay, which means that two of them could be used in conjunction to provide 24TB of backup storage. Since you can make a large volume out of the full 24TB using LVM, you could even use something like dd to write to the disk (RSYNC with the archive option would be a better choice though, imho).
For that much data you want a RAID since drives tend to fail if left sitting on the shelf, and they also tend (for different reasons) if they are spinning.
Basically: buy a RAID enclosure, insert drives so it looks like one giant drive, then copy files.
For 24TB you can use eight 4TB drives for a 6+2 RAID-6 setup. Then if any two of the drives fail you can still recover the data.
Out on bail mate?
You might want to look into git-annex:
http://git-annex.branchable.com/
I've not tried it, but it sounds like an ideal solution for your request, especially if your data is already compressed.
http://www.synology.com/products/product.php?product_name=DS2411%2B&lang=uk Still portable enough to do your backup then take offsite.
Why not tape, backup RAID, SAN or some other dedicated backup hardware solution?
24TB is well within the range that a professional solution would be required.
Given a harddisk size of ~1TB, making a single backup to 24 disk isn't a backup; it's throwing data in a garbage can.
More than likely atleast one of those disks will die before it's time.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Evidently, our UNIX founding fathers had similar challenges...
Have a look at tar and it's "multi-volume" option.
Here's a Linuxquestions thread outlining multi-disk backup strategies.
The gist of the discussion is to use DAR.
I'm not sure if you posed the question out of being nieve, or if its just being daft. You don't want to be moving 24TB over the USB bus. End of discussion really - at least in terms of USB.
Whoever or however you ended up looking at USB for this was wrong/wrong way.
You have lots of choice in terms of boxes, servers, NAS boxes, locally attached storage. 24TB is in the range of midrange NAS boxes.
Once you have this, you can start to make choices on the many backup, replication, and duplication bits of software that already exist, both free and proprietary.
We`re all equal
Porn is a renewable resource, there's no need to store so much of it.
Script your own solution for your specific problems.
That’s kinda the whole point of having a computer... as opposed to a set of appliances that happen to run on a computer you never use directly.
What your attemting isn't easy, it's actually difficult.
Buy a cheap and big refurbished workstation or rackmount server, install a few extra SATA controllers and maybe a new power supply, hook up 12 2TB drives, install Debian, check out LVM and your all set.
Messing around with 12 - 24 external HDDs and their power supplys is a big hassle and asking for trouble. Don't do it. Do seriously go through the possibilty of building your own NAS. You'll be thankfull in the end and it won't take much longer, it might even go faster and be cheaper if you can get the parts fast.
My 2 cents.
We suffer more in our imagination than in reality. - Seneca
First bash script to grab the size of the "current" storage;
compress the files up until that size;
Move compressed file onto storage;
request new storage, start again.
----------
Or, if you've got all the storage already connected; bash for 0..x; do { cp $archive$x /mount/$x/ }; done :D
- http://www.milkme.co.uk
... by employing a detector with a size of 2463 x 2527 pixels (6M) at 12 Hz (12 times / sec). When run continuously for a set of data (roughly 900 degrees) ...
we collect 900 frames in roughly 2 minutes including hardware limitations for starting/stopping.
In proper format for processing, this works out to about 6MB/image and roughly 3GB/min for 2 minutes.
With an experienced crew of 3-4 people ... one handling the samples, one handling the liquid nitrogen, one running the software and one taking notes (overall monitoring also) ... we can run through 600 samples in a 24 shift ...
Which roughly works out to about 600 x 6GB = 3.6 TB on a "working" day.
To answer your question ... we never make physical copies of stuff ... the data stays online in multiple places on multiple continents ... and when something is published the data becomes publicly available in a central database
Why do you need a physical copy anyway?
USB is for a second working copy.
Backups should also ensure durability of the copy, while USB HDD have a shorter lifespan than a normal HDD which in turn has shorter lifespan than tapes, the usual medium for durable backups.
Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
If you don't want to invest in new hardware, you could use DAR or KDAR (KDE front-end for DAR).
With KDAR, what you want is the slicing settings.
There's an option to pause between slices, which gives you time to mount a new disk.
And all our yesterdays have lighted fools The way to dusty death. --Will
Backup tapes were designed precisely for the problem you have. LTO-5 tapes are about 1.5TB, if I remember right. Stored correctly they shouldn't give any problems when you come to retrieve whatever is backed up. Most archiving efforts use backup tape, and they can't all be wrong :)
how transportable is that though?
I mean, if i copied 200 gig across 3 drives in a jbod raid, could i plug just one drive in to access the information on another machine? Suppose my laptop only has 2 usb ports and i do not have a hub plus i'm running a different OS, does this mean i can't look for information on the set?
I have never used JBOD for raid, I have however used regular mirrored and stripped raids with and without fault tolerance (raid 5 and 10 or a mirrored stripe for instance) and know this can be a problem. In fact, I've even seen issues reading a complete raid set across systems when you aren't using a true hardware raid controller.
I have just seen "PAR" a couple of times here on slashdot, haven't used it, but it seems great for this: http://en.wikipedia.org/wiki/Parchive . You need enough redundancy to allow one USB drive to fail. And I would rather get a SATA bay and use "internal" drives than having to deal with external USB drives. Get "green" drives, they are slow but cheap.
Some sort of NAS or tape would be your best option without knowing more. How often do you need to do the "backup"? Is it really a "backup" or data replication eg. are you needing to restore the data after a serious failure. Have a look at this seems to have some good advise and i think could be a solution to your issue, as i see the big problem is the amount of time and the restorability of the data after a failure. http://www.smallnetbuilder.com/nas/nas-howto/31485-build-your-own-fibre-channel-san-for-less-than-1000-part-1
A 24TB NAS is not very hard to assemble. Relatively cheap, and basically transfers data at Gb speed - assuming that you populate it with fast disks. Set one up with RAID and you're away. Personally, I would do it with a low end server and a big-ass RAID array. That way, you can really control its behaviour via the OS. Linux is ferpect for this kind of thing.
Your best bet for speed is likely to be eSATA.
Have you looked into something like this:
http://eshop.macsales.com/shop/NewerTech/Voyager/Hard_Drive_Dock
The cost becomes noise when you consider how many drives you will end up needing, and per TB, will be cheaper than USB solutions.
I don't know how your data is organized, but if possible, you may want to back it up by project/directory/etc.
There are also online backup systems that can do what you want, but it'll take an extremely long time...
Seems like a very bad idea to me. You'll have trouble creating a JBOD device without connecting all the drives simultaneously. Also, you're basically increasing the chance that the entire JBOD volume will be broken as the number of drives goes up. If you've got one drive failing, you'll be lucky to get any data back at all.
To my mind, Bacula would be a good choice as you can set up virtual tapes that will correspond to the drives and you can set the backup to wait for the operator to swap over the drive and then continue the backup. Also, once you've got Bacula installed and working, it's easy to do incremental backups and thus not need to write out the whole dataset again.
You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
The iCloud! ;-)
Get an old computer... anything will work really. You have to know someone that has one laying in their basement. Plug your drives into that. share the drives on your network. Use any general backup software and sequentially backup what you need to backup over the network. Now it will do it overnight and you really don't care how long it takes. It can even do it every night. If you want it safe from fire and such.... build a box out of 2x4s and Drywall scraps form homedepot. Make it 5 sheets thick and it'll withstand any housefire you could possibly have. If you really want to go hardcore you can pour a box out of concrete, but that'll be hard to move.
"Only wimps use tape[*] backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)"
Linus Torvalds (1996) http://en.wikiquote.org/wiki/Linus_Torvalds
(Isn't that prescience of "The Cloud"?)
–––––––––– ;-)
* replace this with your favorite backup media of today
I like my spaghetti with source.
It's a little late to be asking that now.
Count Bacula as your friend ;) -> http://www.bacula.org/
--- I am known for the ones who want to find me on the net. Is that a privacy risk or a privilege? One might wonder..
Sometimes the easiest way to duplicate (back up) data is to simply duplicate the hardware it's already on. If it's on a 16-disk (x 2TB) NAS system, build another one. If it's on tape, buy more tapes, if it's on random HDD's scattered all over the place, then you have bigger problems to deal with first (like building a NAS box)!
I do things like this all the time with a data set about half of that, ~ 12TB. You didnt say anything about what the data is but from the request and the fact you mentioned USB I would gather this is your typical warez hording mp3/flac, mkv, apps and also a personal picture and video collection of fam.
Here is a checklist i would execute similiar to mine. I find the most reliable way to keep your data over the years is by following a checklist or procedure and choosing when to move to the next storage platform.
Step 0: Get USB out of your head. Pop upon the drive and attach it to the native bus, PATA, SATA. if SATA may want to invest in ESATA cases. Its not solely the speed. I have done stupid things like this, in which the data backup takes over 2 days, and on the 2nd day some unrelated event affecting my USB bus causes all kinds of problems with the transfer. Over time doing cheesy things like this affects other things, like doing stupid shit in real life, usually with duct tape or guerrilla glue, then you have your wife on you. Right now your wife may not catch on to this, but it will escalate. Just do shit the right way.
Step 1: Organize. Actually understand what you are backing up. I never got into these tools like google desktop that allow a user to accept the fact that he/she has no idea where their files are. Understand and make an effort to organize your files before you back them up and know the capacity of each 'genre' of crap you are backing up. Run a tool like 'jdiskreport' to find this information out after you organize. Create a mapping on paper of where shit is going, zork style. If you have really important shit like family pictures, taking up say 200GB, and your mkv collection is 12TB, you may want to make 2x copies of your family shit. Anything you download off the internet is easily replaceble despite how obscure your tastes may be and will turn up again. I would question even backing it up but that is another conversation.
Step 2. Label your drives accordingly to your documentation.
Step 3. Format the drives in the most likely native format you will use and are familiar with. If you are a noob linux guy who runs Windows 7 all the time, dont be an idiot and experiment with your backup on ext3. It is not that ext3 is a bad filesystem, but you may not be the most skilled in restoring your data in various scenarios. For example im a linux and solaris geek but am just getting into macs --- im not comfortable enough with mac failures enough to store my crap on a mac fs. Whatever your skillset is, dont use the most optimal file system on paper, use what you know, even if it is NTFS (which imo is very reliable).
Step 4. Copy your shit over using your knowledge of your data organization and native OS commands or tools.
Step 5. Run a checksum on your important stuff and store the hashes to verify everything is fine over time. Odd situations occur when backing up data. I have run into cases where i didnt realize the files i was about to backup were bad/corrupt until i saw the good copy on a backup drive i was about to incrementally overwrite.
Step 6. Store the shit somewhere else if you can reasonably do this and feel confident in the security of your data. If you have to start encrypting your crap, you add some more complexity that can effect the reliability of your restoration, but again if you proceduralize and keep up on it you will be fine.
Backup design and integrity is hard work and serious business when dealing with large volumes. It reminds me of the Seinfeld episode where he goes to the car rental place and they dont have his car and he goes into his "Anyone can take the ticket" diatribe. Anyone can back up their data. But can you get it back? I am not an expert in this area and dont pretend to be, i am just a seasoned IT administrator who has performed alot of backups in my day and have managed to keep most of my data safe over the years.
# rsync -avz /this /that. Split your directories corresponding to the sizes of your drives. If on Linux, run smartctl -H /dev/sdX to check your disk health and if possible, take the HDD's our of their usb enclosures and connect them directly to SATA for faster xfer speeds. These drives will 9/10 mount just like a normal drive since usually they are just a normal drive housed in an enclosure.
:)
Good luck
Problem with ZFS is that he'll need a shit ton of RAM. Once ZFS eats into swap, everything goes slowmo!
Plug all the disks into a USB hub. Ensure that each one has a unique volume name eg bak1, bak2... The old skool way is to make a little tar script and use volume spanning. Otherwise, configure all the disks as a single JBOD and run DejaDup.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
a rich man by NOT patenting stuff (i.e. using the GPL2 for Linux). So why shouldn't he do the same with other stuff? Also, I guess there is loads of "prior art" regarding this cloudy PR talk of today.
I like my spaghetti with source.
Private Manning, is that you?
You are welcome on my lawn.
Damn, that's a lotta pr0n!
was it that?
and to extract arj -va.
there, problem solved.
world was created 5 seconds before this post as it is.
Why go through all that? Set up a ZFS volume, and snapshot it to another ZFS volume, and then offline that. Put it on a sata cage, and you can just take it with you when it is done.
Check out cpio under Linux or many Unix flavor OS, cpio can span backups over multiple target media. Make sure to test backup AND just as important: restore.
TOP DSLR Cameras Reviews of the top DSLRs
LVM is another possibility. If you can get SATA drives and plug them all in, you can then create LVM volume spanning all the drives and just simply copy the data over to one large volume. LVM will take care how to span it across the drives.
I have actually had to do this in an OS X environment before. We have an xserve hosting up about 30TB of data in small files, and we are scheduled to move away from the system, but we need backups in the meantime. My solution for the short-term was to create a concatenated "RAID" of 35TB worth of external hard drives connected via firewire AND usb, (the external drives range from 6TB to 12TB), and use retrospect to back up to the resulting volume. There is no room for anything but an up-to-date backup, but it's getting the job done until we move to a large RAID with offsite backup.
Apple's software RAID as configured through Disk Utility is surprisingly versatile, and though my transfer speed is slow when the data hits a USB drive, it is entirely transparent to the software when switching between FW and USB. It is also fairly robust, because if there is a hardware failure on our server, we can take the disks, plug them into another mac, and the RAID configuration is maintained without any futzing around (as the config is listed on the beginning of each volume).
Now, before everyone goes apeshit on me for using a concatenated set instead of a RAID solution, there were a couple limiting factors in my decision to concatenate rather than RAID 5/0, the major one being the range of sizes of external drives that we have, and a lack of funds available to purchase more. OS X's software RAID goes by the lowest common denominator (6TB in my case), so I would lose ~1/2 to 1/3 of my space if I used ANY of the RAID options, and I didn't have any space to spare.
I feel your pain, and good luck.
I guess there could be issues with space while making the rar files, but they can break the archive up into chunks of any size you desire. You will need all of them accessible to unpack them again though. Perhaps it isn't the greatest solution, but it may do what the poster wants.
-- ssoorrrryy,, dduupplleexx sswwiittcchh oonn.. -Quote found on actual fortune cookie.
Just put it in the cloud... *rimshot*
A cloud backup service released information on how they build their own disk based backup servers. Maybe something that would help with your endeavor? http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/
Get a PC with 6 USB 3 ports, connect a powered, 4-port USB 3 hub to each PC USB port. Then connect 24 1TB external USB HDDs (or SSDs) to the hubs, format as necessary and run your backup software.
Thunderbolt may be higher performance and have daisy chaining capabilities. But the USB solution should work just fine.
I work for a data backup company as a dev monkey/admin/jack-of-all-trades.
Do you ever want to restore these backups? If the answer is "yes" (and it should be, otherwise why are you backing up in the first place...?), then you need to be guarded against failure of an individual disk. That means you need some sort of RAID solution.
For reference, Datto's 3U nodes store 20TB across 14 2TB drives, and the next larger size of node we have is somewhere around 55TB in 4U. No, I'm not trying to sell you our hardware (we only sell to resellers anyway) but hear me out. You really are going to save yourself some headache if you build a NAS device.
USB 2.0 is SLOW AS BALLS. I see our USB seed drives (HDDs we mail out to customers to get their initial datasets up into the ether) max out at 20-30MB/sec on a good day. By comparison, Gigabit Ethernet will give you 112MB/sec after NFS/TCP/Ethernet overhead -- much better. For this reason, and because it's just so impractical to handle large collections of failure-prone USB drives, our largest round trip drive that is shipped as USB is 4TB. After that, we actually ship our customers NAS devices (usually a returned/development box with a different OS image on it).
Go with NAS. You need the resilience against disk failure, you need the additional speed, and while yes, it's a greater investment, the alternative is utter agony when one of your 12 2TB disks takes a dump.
I know you are likely trying to do this for a cheap alternative, but just don't. It is really an unworkable solution for that amount of data.
Some have mentioned Tape, which I know very little of. However I would simply build another RAID machine to copy to, or use a NAS if you can find one big enough, as it amounts to pretty much the same thing, but more specialized.
If this isn't sensitive data, another option might be to cloud it. Amazon and a few others have some competitive prices. The advantage here is you additionally get off site backup.
I guess one of the key factors in your decision will be how refreshed this 24TB of data is. WIll it only get occasional updates, or will a big chunk need to be backed up regularly. That is the other question, how often will you need to back up? Lastly, how quickly do you need recovery?
Problem with ZFS is that he'll need a shit ton of RAM. Once ZFS eats into swap, everything goes slowmo!
RAM is free, or nearly so.
Sans Digital Makes an 8 slot drive enclosure with either a PCI-E or USB 3.0 interface for about 350 bucks. Put 8 3tb drives in it, run it JBOD. You can buy the cheap 3tb drives because you're going to run them JBOD. At 150 bucks a drive, Your total cost is about $1600.
You might be able to get Windows to do Incrementals to those drives, although I haven't tried it myself. And remember to run the enclosure sparingly, because non-enterprise drives aren't rated for the same number of spin-up hours.
Of course, it's not as safe as putting everything on a billion optical disks. But even using a BD-rom (at 46gig a pop), you're talking about 534 Blu-rays, and that's pretty much ridiculous, unless you have an intern you really dislike or something.
USB seems inane and insane for that level of data. How redundant is this 24 tb of data as well? Running it through a data de duplicator (possibly to reduce storage requirements depending on the type of data) and then a tape drive or raid array may be a cheaper and more time effective option.
Don't use btrfs. My approach (I only have 8Tb) was two low-powered linux file servers. The "main" one was running btrfs over a mixed set of disks with a nightly rsync to the backup server. A power outage that was more than my UPS could handle resulted in a corrupted btrfs filesystem. After a couple days of trying to fix the btrfs filesystem, I gave up and restored from the backup server. Fortunately it was using LVM and ext4.
Now I have the "main" fileserver running LVM and ext4, and a backup server running FreeBSD with zfs. The two are in physically different locations, and I use rsync with --only-create-batch and a USB hard drive to move changes from the main server to the backup.
"We have nothing in common, your attitude annoys me, and your political views are appalling."
"I'm looking for a simple solution to backup"
And USB drives are your idea of simple? Seriously? Please hand the lady your Admin card at the door when you leave.
For 24TB if you wan't to have a job after someone asks you to restore a chunk of that you'll want to insist on tape. Or perhaps a equally sized NAS or SAN array. USB? Hope your resume is up to date.
You buy one of these:
http://www.newegg.com/Product/Product.aspx?Item=N82E16816322007
populate it with 4GB drives and create two RAID5 (or one RAID6) array, then you've got 24 or 28 TB of backup space, without having to change drives or break up your backup into smaller chunks.
But really, your backup methodology is broken; you need to organize the data into manageable chunks because aside from a large dedicated backup server/SAN, there is no reliable (don't tell me tape is reliable) backup solution for a such a large quantity of data in a single chunk.
What I do for backups: in my 24-bay server I have eight large drives in a (HARDWARE) RAID5 array (were 4TB drives available at the time I'd have gone RAID6) and rsync the virtualized server contents to that, then archive them into tarballs, and send copies of them across the LAN to another server that is running (HARDWARE) RAID5 as well. Every once in a while I back up the critical data (source, scripts, financial data, production web sites, /etc, and so forth but not the program binaries nor system binaries which are easily recreated or reinstalled, respectively) to optical media and external hard drives.
So what I have in summary is:
* Massive server with a backup array separate from the production array
* Separate backup server running another array (again, using a quality HARDWARE RAID controller. Safeguard your data and don't bother with Intel, Adaptec, Promise, or Highpoint "hybrid" RAID)
* Periodic backups of non-recreatable data to USB drives and optical media that are moved off site.
The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50
Plug the drives in. Tell Win8 to treat them as one large drive. Good to go.
If you're willing to deal with the time it will take to write it all out, then its doable. You need a backup software that supports VTL (virtual tape library). With this, the physical drives are seen as tape devices. So it will start writing to drive #1 and when its full it will say "out of media" and it *should* pause for new media. You "eject" the drive, attach a fresh one, and hit continue. Then wash, rinse, repeat til complete. As others pointed out, it will take some time. You can speed it up with eSATA or USB 3. If you're on a Mac, you can speed it up using t-bolt. I believe Arkeia still offers a free version and they did/do support VTL. Haven't been current on free backup wares for a while. One thing to bear in mind as well once you write this 24Tb to a collection of media any single media failure will result in all data being unrecoverable. So you might opt for doubling your backup window and making a duplicate copy. Otherwise your best bet is to put all the drives in a NAS configuration (think FreeNAS) with a RAID6 structure, then have the backup s/w use this as its destination. You could do this with an 8 drive chassis of 8x4Tb SATA disks (2 lost for RAID6, leaves 6x4TB=24Tb raw). A similar idea could be accomplished with ZFS, but its future is somewhat unknown with Oracle these days. If you need longevity, I'd stick with a more open/compatibly filesystem. If you manage to setup it correctly and use exFAT, you could mount the backup volume to any current Linux, Windows, or Mac system and if the backup s/w runs on all platforms you'd have a lot more compatibility and recovery options.
we do much the same thing. we have a backup nas that we then rsync to a set of "offsite" drives.
My recommendation would be to investigate ZFS. (picture software raid and LVM rolled into one with filesystem encryption and compression built in.) Easy to compile and install on linux.
We created a pool for the offsite drives, then rsync to that file system. "Export" the file system and take the drives out. (Hot swap in trays, buy extra trays for rotation drives.) When you need to put in the next set just put them in and import it. Order and placement does not matter as long as enough drives are in. You could even have one or two parity drives in case a drive fails.
We have a cron job that rsyncs to the offsite drives, then exports them and emails the admins that it is ready for rotation. We keep 2 sets, one is in all the time and the other rotates offsite. You can swap on whatever schedule you are comfortable with. With compression, depending on data you could easily cut your drive requirements in half. Turn on encryption to keep your porn safe while in transit. All you need is a hot swap JBOD chassis. you could backup directly to the removable filespace, or do what we do, backup to a set local (local to datacenter, not to machine) filespace and rsync it over regularly.
It is something else to learn, etc. But it is a system that works well.
Comment removed based on user account deletion
I don't think 12 gig is a shit ton. if you are not doing dedupe ram usage is reasonable. and there is no reason to do dedupe on this.
If you've got that much data, with a setup like that, you can afford to buy something better than USB. Consider eSATA, though I, personally, would push for a simple, fast backup server.
However you go, and I admit to not having read all the comments, you didn't mention how often the backups need to occur. Here, were we've got terabytes of data on many systems, we do a nightly rsync, and use hard links, which speeds it up and decreases space usage.
mark
"I mean, if i copied 200 gig across 3 drives in a jbod raid, could i plug just one drive in to access the information on another machine? Suppose my laptop only has 2 usb ports and i do not have a hub plus i'm running a different OS, does this mean i can't look for information on the set?"
This falls outside of what OP is requesting. He just wants to backup 24TB of data onto multiple USB drives.
USB can support up to 127 devices connected to a single host controller, so with a few hubs OP could connect all the drives he'd need for the back up all at once. I've run my own drives via external USB for a time, probably around 8TB of various sized drives using cheap USB-to-SATA adapters ($3.00 on ebay), and cheap 7-port USB hubs ($5.00 on ebay). It's not the fastest solution but it never gave me a problem. It was an experiment to see how many drives I could hook up with cheap Chinese parts. I had it running for a year before I started switching things over to USB3.
42
Buy 8 x 4TB drives + this enclosure
http://www.amazon.com/Mediasonic-H8R2-SU3S2-ProRaid-External-Enclosure/dp/B005GYDMYQ
Armaments, 2-9-21 And Saint Attila raised the hand grenade up on high, saying, 'O Lord, bless this Thy hand grenade' N
If it turns out that the source data is not porn (unlikely) and is highly compressible
Would dirty photos of his blow-up doll count as "compressible?"
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
quoting Linus again: "First off, I'm actually perfectly well off. I live in a good-sized house, with a nice yard, with deer occasionally showing up and eating the roses (my wife likes the roses more, I like the deer more, so we don't really mind). I've got three kids, and I know I can pay for their education. What more do I need? The thing is, being a good programmer actually pays pretty well; being acknowledged as being world-class pays even better. I simply didn't need to start a commercial company. And it's just about the least interesting thing I can even imagine. I absolutely hate paperwork. I couldn't take care of employees if I tried. A company that I started would never have succeeded – it's simply not what I'm interested in! So instead, I have a very good life, doing something that I think is really interesting, and something that I think actually matters for people, not just me. And that makes me feel good." http://en.wikiquote.org/wiki/Linus_Torvalds
I like my spaghetti with source.
no text
I like my spaghetti with source.
serious question: i am getting to the point where i will have 24 x 2tb drives connected. i have heard a rule of thumb you need 1 gb of ram per 1 tb of drive space, will i really need ~48 gb of ram? am considering a server grade board that can support 32gb ...
Your UPS wasn't set up correctly, it's supposed to signal the server to gracefully shutdown before the battery goes dry.
Use FreeNAS to manage RAID on the array. And rsync. Yes, you may have to do some handywork yourself. GTG!
I come to Slashdot only to read sigs. One you are reading is mine.
I know this has nothing to do with USB and maybe the OP has very good reasons for wanting it on USB. In any case...
Amazon S3 pricing:
First 1 TB / month: $0.125 per GB
Next 49 TB / month: $0.110 per GB
(1 x 0.125 + 23 x 0.11) * 12 = about $32 per year for 24 TB. That's a lot less than buying a bunch of hard drives.
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
"I'm just a simple caveman, ..." with a mainframe background, so I have a question of curiosity here
At what point does the bandwidth/throughput of the DMA start limiting the performance of your backup?
In my world, DMA for I/O is called a "channel". We have many, and while there are a lot of nuances we could discuss, basically we try to segregate the I/O for the input to backup (disk) and the output of backup (usually tape) , and have the backup task process in parallel as much as possible - my nightly backup, for example, runs 9 parallel tasks, 9 being the limit that this particular backup program has. I could run multiple instances of the program, but then I have to have mechanisms to make sure I don't back up the same disk twice between two concurrent executions; with one instance and 9 tasks I can just say 'back up everything that's online at the moment'. So, the throughput is limited by the performance of the slowest devices, multiplied by the parallelism we are able to achieve. In the PC / server environment, does the DMA limit the I/O capability?
Backup Exec does exactly what you are asking for. Free 30 day trial.
I ran across a FUSE module (mhddfs) that seemed relevant when I wanted to combine several USB drives into a single file system. My main goal was to make each drive usable independently for file recovery if I had to move it to another system.
The module appears to be a fairly thin wrapper over an existing file system. It only appears to choose which of the sub-file systems to write new data to, automatically writing files to whichever drive has the most space. This provides nothing in the way of redundancy, however.
What is nice is that you can easily access the files on a drive without needing the other drives. May be helpful for someone.
http://romanrm.ru/en/mhddfs
Honestly, I know it isn't your question, but skip USB. Too slow. WAY too expensive. Get yourself a rocket raid card or similar, a sas expander, and an 8+ trayless disk enclosure. I use a 12 disk enclosure (8 for regular backups, 4 for all the one off stuff I do) with 2TB drives. I wrote a program in Java using NIO that stripes the backups across the disks so that it can saturate the bus. A solution like this will ultimately be faster and cheaper. One day I will port the code to native as the Java program was just a proof of concept that has worked so well I haven't gotten around to it. This setup works exponentially better than the VXA-3 tape backup we were using before, and couldn't imagine having to do it with usb drives, either from a cost or a logistics perspective.
That's a load of crap brought to you by the people who would rather that you pay 10 times as much for 1/10th the performance and 1/10th the capacity.
But let me rebut that in a more logical fashion. Tapes take considerably longer, meaning the backup strategy ends up backing up less than is optimal.
Sure, disk backups are fragile. But if you system is going to be borked by the failure of a backup volume or two, then I would posit that your backup strategy is a disaster waiting to happen regardless of the media that you use.
USB will copy files, but not identical copies. Firewire is better.
But the best/cheapest solution, is a Dell MD-1000. It will take 2tb generic drives.
if you build a small system, cheapish, an itx with 6sata, each connected to a port multiplier, os on flash... you could have 30tb from 1tb drives. set it all up as an lvm2 volume, then you can slap the drives back in a new system any way you want, and they'll come back up in the right order. rsync(backupmypc) will keep the backup in good condition, you'd of course need spares, in case the verify shows a drive fail, a duplicate system. using linux raid to turn the two nfs mounts into a raid 1 array would be nice, but parity would be better, yet it'd take even more drives. yah, bigger drives would be a good first start.
this sounds exactly what the guy is looking for..
world was created 5 seconds before this post as it is.
Step 1: Buy yourself something like this: http://www.aberdeeninc.com/abcatg/Stirling-X339.htm
Step 2: Install it
Step 3: rsync
Step 4: Go do something else -- this is going to take a while
In Reason We Trust
At $35 each you get a dozen Raspberry Pi's.
While not fast you have a USB port and can connect them
via ethernet and ssh and start tinkering.
A good USB hub can turn one USB to four
The local Costco has 3TB USB disks. Yes
you have to organize your data into 2.8TB chunks
or so with some script foo but rsync can help
verify the bits.
N.B. this is 10/100 ethernet not GigE and USB2 (at best)
and they share a single USB link to an onboard USB hub.
But you could automate the thing and not have to
swap out USB cables for a week.
MD5 checksums and an index...
Let us know how it goes. ;-)
No matter what you do you will have to do some
scripting. Do label each of the USB disks
(physical and logical names that match).
Did someone way that this was a marginal
idea? Backing up to USB has some value but does not
sound magical and error free.
Since 24TB is a lot of junk -- good luck
but with the crazy big USB disks -- what the hey.
Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
Shouldn't you be looking at DLT devices for this kind of dat set size?
http://www.high-rely.com/
We ran some of these for off siting data in rotation... Way faster than tape and designed for swapping... Might not be the best for long term storage.
EA David Gardner -"... but the consumers have proven that actually what they want is fun."
I take it you have never logged into CERN.
to see someone here knows this.
Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.