Scheduling Large Scale Server Upgrades/Outages?
thesandbender asks: "I've inherited my companies DST patching project and I have to schedule upgrades for 7000+ servers over the course of the next few weeks. Of course each group inside the company has different SLA's and outage windows. I need to somehow turn the pile of spreadsheets I have into a database and create a schedule that spreads the load over our pool of system administrators. There is no way I can reasonably accomplish this by hand, and even software for other industries/applications that could take a few steps out of the process would be appreciated. Does anyone know of a rule based scheduling system where I provide the available outage windows and a priority ranking for each system and the scheduler will recommend the order in which they should be upgraded?"
I think if I had to do this, I'd establish a priority ranking of the systems, taking into consideration critical path and cascading dependances, and then assign the highest priority ones first. When you finish that, come back to the pool for the next high priority job. When you're out of high priority jobs in the pool, move on to mid-priority, and so on. Trying to keep a bunch of inter-related steps in synch will drive you, and your sysadmins, crazy. Set priorities and let the big boys and girls do their job.
This used to be my problem ... for the DST change, we have thousands of servers and workstations to deal with. I was getting worried, but instead of taking it on, we found a PM and now it's their problem.
The moral of the story: never try.
We emerge from our mother's womb an unformatted diskette; our culture formats us. - Douglas Coupland
I hate to say it, but Microsoft Access fits your needs almost perfectly, in this case. It can import the data from your spreadsheets, if they're properly formatted. (And they'd have to be, if you wanted to have software make your schedule for you.)
Once your data is in place, you write a query that includes a calculated field for the heuristics you're looking for. Run a query against that that checks against a table containing your available time slots, and you'll have the data you're looking for. (Or, at least, something that will do most of the work for you.)
You've got to patch 7000 servers in four weeks. Do you really want to spend a few days learning a a new software package that will do everything when you could take a piece of software you probably already know and simplify the problem in only a day?
tasks(723) drafts(105) languages(484) examples(29106)
Managers must manage.
You don't have the time to put in a system, but you can craft a one off solution.
Your solution starts by sub-dividing your 7k servers into groups based on business units. Poke around to find out what their SLA is, and then _tell_ them that you are going to bend the SLA a little in order to get this 'OMG CRITICAL PATCH' onto your farm.
No offense, but I have found scripting abilities in Unix/Linux shops to be of a lot higher quality than Windows shops. nevertheless, you do have some talent whether you know it or not. Enlist this talent and use scripting for a lot of the nitty gritty details.
Quest Fastlane Reporter, Winbatch, and native WMI are great ways to report on pre and post conditions of servers.
Delegate, delegate, delegate. Let your team plan the methods and schedules for each business unit's servers
Once over the crisis, use the information you have gathered to generate a requirements document and go shopping.
Remember, the key to delegating is trust. You are in charge of managing the 7k servers; you are not in charge of doing the individual upgrades/patches.
I'm sorry to take a bit of a condescending tone, but I'm trying to be clear, not flatter your ego. To reiterate, the bottom line here is that with the time you have, you will be doing an automated manual upgrade. You may find that the process you cobble together will actually become a great plan B when critical patches need to be made; especially if you design with that goal in mind.
Use the 'scare' from the event quickly to get budget money for a Real Patch System(TM).
Good luck!