Scheduling Large Scale Server Upgrades/Outages?
thesandbender asks: "I've inherited my companies DST patching project and I have to schedule upgrades for 7000+ servers over the course of the next few weeks. Of course each group inside the company has different SLA's and outage windows. I need to somehow turn the pile of spreadsheets I have into a database and create a schedule that spreads the load over our pool of system administrators. There is no way I can reasonably accomplish this by hand, and even software for other industries/applications that could take a few steps out of the process would be appreciated. Does anyone know of a rule based scheduling system where I provide the available outage windows and a priority ranking for each system and the scheduler will recommend the order in which they should be upgraded?"
I think if I had to do this, I'd establish a priority ranking of the systems, taking into consideration critical path and cascading dependances, and then assign the highest priority ones first. When you finish that, come back to the pool for the next high priority job. When you're out of high priority jobs in the pool, move on to mid-priority, and so on. Trying to keep a bunch of inter-related steps in synch will drive you, and your sysadmins, crazy. Set priorities and let the big boys and girls do their job.
shutdown -h now
Fuck the users! They exist solely to bemuse the sysadmin! Odds are they've been getting uppity lately and need to be taught a lesson, anyway.
If you just put this off for a few months, the problem will probably just go away...
"Not an actor, but he plays one on TV."
This used to be my problem ... for the DST change, we have thousands of servers and workstations to deal with. I was getting worried, but instead of taking it on, we found a PM and now it's their problem.
The moral of the story: never try.
We emerge from our mother's womb an unformatted diskette; our culture formats us. - Douglas Coupland
If you have 7000+ Windows Servers you should already be running a software patching solution such as SMS, WSUS, etc...
Sure you'll spend a large amount of time sorting out which server[s] (server group[s])should be patched when, but once that is done - you should be able to schedule them within your chosen solution.
Take WSUS for example. Organise your servers into groups, approve the update and set each group's Windows Update GP properties appropriately.
When computers get overloaded with work like this (host lookups, for example) they ask for help from other computers. As my stupid first try, how about asking each sysadmin to run a spreadsheet column of hostnames through an md5hash and let them convert servers with a '1' on the first day, 'a' on the tenth day, etc.?
"Provided by the management for your protection."
I hate to say it, but Microsoft Access fits your needs almost perfectly, in this case. It can import the data from your spreadsheets, if they're properly formatted. (And they'd have to be, if you wanted to have software make your schedule for you.)
Once your data is in place, you write a query that includes a calculated field for the heuristics you're looking for. Run a query against that that checks against a table containing your available time slots, and you'll have the data you're looking for. (Or, at least, something that will do most of the work for you.)
You've got to patch 7000 servers in four weeks. Do you really want to spend a few days learning a a new software package that will do everything when you could take a piece of software you probably already know and simplify the problem in only a day?
tasks(723) drafts(105) languages(484) examples(29106)
How do you defiantly look at a product?
SCREW YOU! I'M GOING TO REVIEW YOU, AND IF I LIKE YOU, I'M GOING TO IMPLEMENT YOU, AND YOU'LL LIKE IT!
(Lameness filter says I have too many caps. But I think they were appropriate. Bah.)
Managers must manage.
You don't have the time to put in a system, but you can craft a one off solution.
Your solution starts by sub-dividing your 7k servers into groups based on business units. Poke around to find out what their SLA is, and then _tell_ them that you are going to bend the SLA a little in order to get this 'OMG CRITICAL PATCH' onto your farm.
No offense, but I have found scripting abilities in Unix/Linux shops to be of a lot higher quality than Windows shops. nevertheless, you do have some talent whether you know it or not. Enlist this talent and use scripting for a lot of the nitty gritty details.
Quest Fastlane Reporter, Winbatch, and native WMI are great ways to report on pre and post conditions of servers.
Delegate, delegate, delegate. Let your team plan the methods and schedules for each business unit's servers
Once over the crisis, use the information you have gathered to generate a requirements document and go shopping.
Remember, the key to delegating is trust. You are in charge of managing the 7k servers; you are not in charge of doing the individual upgrades/patches.
I'm sorry to take a bit of a condescending tone, but I'm trying to be clear, not flatter your ego. To reiterate, the bottom line here is that with the time you have, you will be doing an automated manual upgrade. You may find that the process you cobble together will actually become a great plan B when critical patches need to be made; especially if you design with that goal in mind.
Use the 'scare' from the event quickly to get budget money for a Real Patch System(TM).
Good luck!