What is Your Backup Policy?
higuita asks: "A few days ago, I was asked to check our backups policy, how they are being applied and to try to make it safer and more useful. Being new to the company, I started to check what is being done right now and found several problems. Since I don't have much experience with enterprise backups, what are the most used backup policies, software and global ideas about this issue? We have less than 1000 workstations (Windows and Macs), about 20 Oracle and Exchange servers (split between Windows, Solaris, and Linux), and it all needs to be backed up. Right now, we use the HP data protector with several tapes, where most things have a weekly full backup and daily incremental backups, and that most full backups are archived permanently in a safe we have for this purpose. We also have off-site storage for backups, as well. What practices and policies do Slashdot users implement for backups they perform at their office (home backups practices I am not interested in)?"
"I've investigated Veritas NetBackup and other solutions, and I'm also curious if Amanda could be better or at approximate the features offered by HP Data Protector. What backup software have you used that you found enjoyable with the least bit of hassle?
I've thought about using Dirvish to backup the user's homes to a cheap server with several HDs, and only backup to tapes once every 15 days or even once a month. They will lose their Windows permissions, but I don't think that matters much, since this is just for safekeeping the users' work. I thought about making full backups of the servers every 15 days with daily incremental backups. This way I will free up tape drives' time and gain more flexibility with the backup schedule.
I would love it if users worked off of file servers, but right now this just isn't possible. It's a planned addition that we still don't have the time to make."
I've thought about using Dirvish to backup the user's homes to a cheap server with several HDs, and only backup to tapes once every 15 days or even once a month. They will lose their Windows permissions, but I don't think that matters much, since this is just for safekeeping the users' work. I thought about making full backups of the servers every 15 days with daily incremental backups. This way I will free up tape drives' time and gain more flexibility with the backup schedule.
I would love it if users worked off of file servers, but right now this just isn't possible. It's a planned addition that we still don't have the time to make."
I can't think of any good reason to do that. All the important data should be on the server. If the user wants to save a picture on the local disk to use as a background or something that's one thing (although I wouldn't allow that myself) but nothing important should be on those disks.
Past that, I don't have the experience to help you. All I can do is reiterate what another poster has already put up. Check the backups. I can't tell you how many stories I've heard about backups that "went fine" until someone needed data. Stories where the tapes were so old they almost shredded themselves in the drives. Stories of "backing up" for at least 6 months onto a cleaning tape (I bet the drive was in good condition though!). Stories of the backup data being garbage because of a faulty cable or something. The backup is worthless if you can't get the data back off it successfully.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
I think you're jumping the gun a little here.
The first question you need to ask is:
What is the time frame for your servers to be restored in should servers and such completely fail?
If you don't know that answer to that question then how does your company know how much money to budget? Are you bound by HIPAA or Sarbanes-Oxley? You should know how much is your company's data worth prior to assigning a bidget.
Are some of your database servers supposed to be up 24x7? Maybe you should look at distributed transactions across databases located at different sites so if one server fails you still have everything live? Have you timed how long it takes to rebuild your servers to confirm your allotted time in your disaster recovery plan? Has your company considered imaging servers/ Is it possible to?
Have you consulted your disaster recovery plan? Have you checked with suppliers to see how long replacement parts will take to order? I can't tell you how many administrators get caught out by buying an expensive tape drive only to have it fail along woith the server and nothing can be restored until a new one can be sourced.
Without requirements, a disaster recovery time frame you will never be in control in the event of a disaster.
Your companies board of directors/owners will need this information. It's called operating under conditions of "due care and diligence".
If something goes wrong and you can't tell your boss exactly what is required and how long it will take to recover then you're working in the wrong job - a big part of being a network administrator is planning for ANY event.
Oh, most of the time my customers are happy with Robocopy. I hate paying for expensive hardware and backup software solutions when I can write something much simpler and document it properly rather than depending on someone else's buggy software. Of course this depends on the industry and their requirements.
Make sure that your boss completely understands these questions and issues. Ask him to see the current Business Continuity plan and Disaster Recovery documentation before you touch anything on those servers - can't stress that enough.
Hope that helps, sorry it's brief but if you're in charge of backups it's your job to be ANAL and PEDANTIC.
"You'll need to identify each application that is being used, where its data is being stored and what type of "backup" is needed for it."
I second this. Nothing's worse than someone telling you "back up this system, full once a week, incrementals every other day, all local drives, blah blah" and then not telling you they've got some database on it (you can't back up a live database by just copying the files.) Of course, when failure hits, guess what needs to be restored and isn't usable?
BMR has been standard for years.
I've seen attempts to build large enterprise backup environments with "simple open" software. They melt down somewhat short of the size that the original questioner is asking about, typically.
I've built environments with NBU and used Legato, at large sites (much larger than the original questioner). They just work. Configuring them initially can be non-trivial if you have no prior experience with them, but once set up right they just work.
Throwing a bunch of open source tech at the wall and seeing if it sticks will kill you here. I've been at places which were big enough to use 40, 60, 180, multi-hundred, 5000 tape changers. They use professional grade stuff. It works. If you can't make it work, don't go to work for large sites. Don't wire your whole five hundred cube building up with daisy-chained 8 port 10/100 switches, and don't use toy backup equipment if you're an enterprise class environment. Data backups aren't a tech game: they're how you survive the statistically likely disk outages and statistically unlikely building fires/floods/earthquakes/tornados/etc. This is important, and half-assed solutions shouldn't apply.
You missed a few:
#4: User deletes a file deemed by somebody important to be critical and you have to get it back.
Its amazing how much money is spent planning for the once-in-a-lifetime Twin-Towers disaster event, and how little is spent on the daily occurance of user-error. Unfortunately "User is an idiot" doesn't wash when its the company's financial records or the birthday party shots of the CEO's kid.
- Don't permit users to save things to their local disks. Ensure all files go onto a share that can be centrally backed-up. Important people (CEO, COO, etc) need to be treated as exceptions and have their Personal PCs, PDAs and even phone memories backed up somehow.
#5: Your CFO is found to be embezzeling money from the company, and you have to show compliance with whichever standards.
This is actually not a backup issue, but an archiving issue, but should be addressed at the same time as backup to ensure you have no holes in your solution.
You've opened a huge can of worms, but one that rightly should be opened. My advice to you is to call in EMC, Sun StorageTek and Symantec and get their presales engineers to do as much legwork as possible before they try to turn it into a chargable engagement. Having all three in there means th
(DONT call HP - Omniback is a dog). This will give you enough information to present to your manager or whoever is appropriate and get some idea of what budget they will give you. That's going to be your single biggest constraint. Backup/DR/BC is something that will easily absorb all the cash that you throw at it.
Just remember - No matter what EMC say, Tape's not dead - not even close - though it is no longer necessarily the best solution for quick restores of recently changed/deleted information.
There are experts in this stuff - trust me I am one - and we get paid a shitload. Trick is, we don't really know that much more than you, we just do it everyday. Exploit vendors' presales engineers. That's what they are there for.
Norman Cook's Ode to Sl
I use plenty of stuff for which I have the source code. Going back to the 4.2mumble BSDs, through SunOS, Linux, Solaris, the various x86 BSDs, and plenty of applications (this is Mozilla I'm /.ing with, and before that a long line of other open source browsers). I have no problem with installing large Linux farms, using Apache for an enterprise web deployment, using MySQL for moderate sized databases (or PostgreSQL, though I haven't deployed it personally).
Tape backup... NBU wins. Legato's a close second. Sorry, charlie. Open source as a category does not suck. The open source backup stuff doesn't suck, for small to medium sized sites. It's not enterprise class, though, and most of the trick to succeeding in IT is knowing when the tools you use aren't applicable anymore and how to figure out what are.
NBU can't RAIT, but it can stream across multiple tapes, and can write duplicate tapes if you want redundancy. And you can extract the files off tape with tar if you have to.
Amanda certainly doesn't suck, but it's not NBU.
i would suggest minimum different zip codes different time zones would be best
Sounds funny but very true. Backups across town aren't terriby useful if across town is flat too. Sound farfetched? Ask a sysadmin in Miami how far off he ships his backups. If he was there when Andrew visited, I'll bet they're in New Mexico.
This may seem a tad offtopic, but it is relevant:
You have to think through both distance from and access to your backups as a part of disaster recovery planning. Backup isn't just recovering the CEO's email, though that is a (hopefully) far more frequent occurance than recovering from a hurricane/fire/mudslide/blizzard. Easy access to the backup media is important for daily operations. Recovery from disaster is quite a bit more complex. Your backup solution needs to be able to cover the full spectrum - from yestarday's lost spreadsheet to the area flattened by mother nature.
Personally, I keep two backups - one here locally, one 1000 miles away in another state. Backup to CD here, online rsync in NC.
"Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway." - Variously attributed, frequently to Andrew Tanenbaum
-- "Never underestimate the power of human stupidity." - R.A.H.