Slashdot Mirror


Backing Up 100 Gigs in an Hour?

cybrthng asks: "I am faced with finding a backup solution capable of archiving to tape about 200 gigs of a financials data in a 2 hour window. I originally looked into DLT8000 Jukeboxes with 2-4 drives but have recently discovered the new LTO drives. I am interested in knowing real world experiences with these drives as there has to be a catch. I mean there is a 3 fold performance increase in data transfers, two fold increase in tape capacity and a minimal price increase overall. With these drastic differences is there something I'm giving up with LTO over DLT or vice versa? Which backup applications are more geared to handling volume and integrate with Oracle RDBMS? Restoring speed is even more critical then backup speed so i'm curious about how these two drives compare and which applications are best geared for this much data on a nightly bases. Mind you there will also be about 500 gigs of data in an end-of-week backup as well."

8 of 79 comments (clear)

  1. Why... by Stone+Rhino · · Score: 5, Insightful

    Does the backup medium have to be tape? Hard drives are in fact more reliable than some tape, and would have a faster data transfer rate. A pair of hard drives hooked into a RAID array could backup over 200 GB of data and then be taken offsite just as easily as tape. Considering the fact that the drives would likely cost $400, tops, and could be reused many more times than tapes, I don't understand why people bother with tape anymore.

    --


    Remember, there were no nuclear weapons before women were allowed to vote.
    1. Re:Why... by duffbeer703 · · Score: 5, Funny

      Hard drives may be more reliable than tapes, but when the server room has water spewing from the AC and your controllers short out, guess what?

      Your "backups" are toast.

      Floods, tornadoes, fires, etc happen. Sometimes people fly planes into buildings. When that happens, tapes are the only thing that keeps your business in business.

      --
      Conformity is the jailer of freedom and enemy of growth. -JFK
    2. Re:Why... by foobar104 · · Score: 5, Informative

      Floods, tornadoes, fires, etc happen. Sometimes people fly planes into buildings. When that happens, tapes are the only thing that keeps your business in business.

      I know this is completely off topic, but sometimes tape just isn't cost effective, particularly when you figure in the costs of manually storing and maintaining a library of data tapes in a vault somewhere. (Most of that cost is in head count: you have to pay somebody to do that work, and that's not a $19,000 a year job.)

      We're presently doing the cost analysis on a kind of radical idea. We're storing many terabytes of data in a data center in San Jose, California. The data center is as good as it can be, but there's still the danger (however unlikely) of earthquake or some other drastic event.

      Rather than trying to back everything up to data tape, we've gotten pricing from a telco on a dark fiber link between the San Jose data center and another data center somewhere in Colorado-- can't remember where. Since we're already putting an HDS 9960 in the San Jose data center, we can put an identical one in Colorado and use the 9960's internal "NanoCopy" software to keep them in sync.

      Believe it or not, it's working out to be more cost effective. One of the big reasons is that keeping that much tape on-line in a data center would require a StorageTek PowderHorn silo, and data center floor space is expensive. The difference in cost between the floor space and the dark fiber is so small that they cancel out.

      Like I said, I realize this is light-years away from what the poster was originally asking about, but it's kinda neat nonetheless.

  2. is removable necessary? by stilwebm · · Score: 4, Insightful

    You never mentioned it, so I thought I'd ask. If you don't need something easily removable, you can still have the data backed up to the other side of the data center, or even possibly they other side of the campus. With a storge array on a fibre loop you can back data up hourly, all 100GB in a full backup. Even a Gbit ethernet link could do this in under an hour, provided the link is not shared too much. Then you could run daily or twice daily tape backups off of that archive to send to you offsite safe archive location.

  3. Easy - History. by tdyson · · Score: 5, Informative

    Backing to a stagging array and then moving to tape is a great idea. Backing up to HD means you only get to keep the last couple of back ups. The reason to use tape is to go way back in time. With a good grand father-father-son rotation, you can keep a lot of history with a reasonable amount of tapes. I have backups dating to my first week at this company several years ago. I can raise the dead from almost any 2 week checkpoint since then. I have had requests for e-mail that has been gone for 6 months and documents that have gone for more than 18 months. It makes you look really good to just smile a friendly smile when somebody asks sheepishly, "I need a file from an employee who quit last Easter. I'm not really sure what it was called, but I think I know where it was on the network. Can you help me?"

  4. Not such a big deal... by j.e.hahn · · Score: 4, Insightful

    It's by no means an easy feat, but the following should probably get you there.

    First, you need to consider how the data is getting to your backup server. This looks like a job for gig-e. (since you don't really want to run you DBs on the same machine as your backup server.) You should use multiple streams. (either break it into multiple smaller jobs or enable the multiple streams option in your backup software if it has one. Many do.) It's hard to flood even a 10base network with a single TCP/IP connection. (your bandwidth utilization decreases in inverse proportion with your latency. I forget the exact formula though.)

    Next there's how you're getting it to tape. I recommend running the backups to disk first if you can. This means you won't stall a network connection if you change tapes, or the like. But it does mean you need a lot of storage on the bkup server. Also, if that's a 2h from DB to tape window, this might not be useful. However, barring using a SAN and snapshots (or the like) your only other option is to go straight to tape.

    To go straight to tape you'll need at least 6 DLT drives, assuming you can keep the tape streaming and get 6Mbytes/s, and you balance them across a wide enough SCSI and PCI bus(or whatever system bus you choose) This will give you N+1 redundancy and meet your bandwidth requirement of 28Mbytes/s.

    As for the LTO/DLT trade off. We're moving to an LTO solution where I work, and it generally seems to be the way to go. It's worth evaluating, but I don't think your choice of tape either way should be your restricting factor. And there is something to be said for the reliability of DLT.

  5. A different take on the HD idea by sigemund · · Score: 5, Insightful

    By no means do I know a whole lot about backup technologies or any of that, but I do have a suggestion that kindof takes a different angle on the hard drive idea.

    I understand that you would want and need to keep the data off-site on tape (requirement). However, getting that transfer rate is going to be difficult. Perhaps you could do something like this:

    Use the hard drive backup (SCSI RAID perhaps?) idea to backup the data quickly and reliably. THEN, you've got it backed up in your time limit. Now, you can back up that back up with a tape, but you don't have the incredible time requirement. Get it?

    Concept:

    Original Data on Hard Drive
    --> Back it up onto a separate Hard Drive within the time limit
    --> Now, back up that hard drive that has just backed up the original. You have a backup done already, so you've met the time needed. Now, you can back it up with tape or whatever without having to do it within such a short amount of time. You can use the technology you desire to back up the hard drive copy while the original data drive keeps working.

    Then to restore, you can do it from whatever the removable media is.

    Again, I don't know a ton about this, but it's just a thought of another way to accomplish this.

  6. it's straight forward by tachijuan · · Score: 4, Informative

    Here's a copy of the email I sent to cybrthg:

    Storage is all that I do for a living. Here's a quick summary of how you could do it:

    1) First you have to make sure that the drives you have your data on are going to be able to give you the read rates that you need. I highly suggest that you go with a raid array of some sort. Preferrably one with substantial cache in front of it. The raid controller should be smart enough to do what's called "read ahead" caching. That reduces the read miss ratio and speeds up sequential (i.e. backup) applications tremendously.
    2) Get whatever tape drive you choose. Base your decision on the speed of the device - use only the native performance. LTO is either 15mb/s or 16mb/s depending on whos drive you get. It is safe to assume that you will get about 1.2x compression. So for LTO that would get you either 18mb/s or 19.2mb/s. Assuming you get the 15mb/s drive you can realistically expect to get ~64GB/hr per drive.
    3) You have to get some sort of advanced backup package to support those rates. I would suggest that you go with either Veritas NetBackup, Legato Networker, SyncSort Backup Express, or any of the enterprise class products. Don't go with cheap software - in general they do not have the performance coding necessary to move data at very high rates efficiently. This is a hard choice, but if you stick with the three I told you, you should do fine.
    4) Get a library that can handle several drives, so that you can use them in parallel.
    5) Put each tape drive on a separate scsi bus, or if you go with fibre channel put at most 3 drives on the channel. There's a ton of way of architecting this side of the house, but in general if you stick with those numbers, you should be fine.
    6) Try not to send data over the network, even with GigE, the effective rates are going to be drastically slower than those of direct attached devices. GigE also severely impacts the server - tcp/ip overhead is a bear for high throughput environments. There are ways around this, but that ouside the scope of this email :-).

    That should do it for you for the traditional backup methodology. There are other ways of doing backups - making mirrors that you can split of. Taking snapshots..... There are a ton of products that can help you on this. Some of them are software based packages that sit on the server with the data. Some of them are hardware/software devices that sit on a SAN or a NAS. Again, going into this is quite lengthy, but it can be done. I have a customer for which we are doing over 1TB/hr backups using a combination of lots of tools. You problem is quite a bit simpler.

    --
    -- thoughts on one of those things: http://amuyu.com/