Scheduling Large Scale Server Upgrades/Outages?
thesandbender asks: "I've inherited my companies DST patching project and I have to schedule upgrades for 7000+ servers over the course of the next few weeks. Of course each group inside the company has different SLA's and outage windows. I need to somehow turn the pile of spreadsheets I have into a database and create a schedule that spreads the load over our pool of system administrators. There is no way I can reasonably accomplish this by hand, and even software for other industries/applications that could take a few steps out of the process would be appreciated. Does anyone know of a rule based scheduling system where I provide the available outage windows and a priority ranking for each system and the scheduler will recommend the order in which they should be upgraded?"
I think if I had to do this, I'd establish a priority ranking of the systems, taking into consideration critical path and cascading dependances, and then assign the highest priority ones first. When you finish that, come back to the pool for the next high priority job. When you're out of high priority jobs in the pool, move on to mid-priority, and so on. Trying to keep a bunch of inter-related steps in synch will drive you, and your sysadmins, crazy. Set priorities and let the big boys and girls do their job.
>I have to schedule upgrades for 7000+ servers ... pile of spreadsheets ...
Somebody bought 7000 servers with no plan for upgrades?
(Patching for DST, get a new OS...)
My little Linux and tech blog
This used to be my problem ... for the DST change, we have thousands of servers and workstations to deal with. I was getting worried, but instead of taking it on, we found a PM and now it's their problem.
The moral of the story: never try.
We emerge from our mother's womb an unformatted diskette; our culture formats us. - Douglas Coupland
There are also some GPL things that may work. Can't think of them right off hand. If these are *nux desktops/servers, you have plenty of time to write a perl/bash/python to accomplish the task. Some other slashdot user is going to have to give you advice for a windows environment at this stage of the game you are in.
Hi, I'm "some other Slashdot user," and my advice for the Windows environment is the same as for Linux. Well...almost. If you are running Windows XP on the desktop or 2003 on the server (or later) then Microsoft already has released a patch that should have been part of your regular patch cycle. If not, it's time to dig out WSUS (it's free from Microsoft) or whatever patch management system that you are using to manage your 7000 servers. If you truly have 7000 servers and no patch management system in place, then you are not only screwed, but you are stupid as well.
Now, for anything that is Windows 2000 or older, you will have to manually patch the system, and without the benefit of a patch from Microsoft. No problem. Just hit their Technet article about the issue here and read up on what it entails. Basically, you manually patch one machine of each OS type, export the relevant registry keys, and then import them on the rest of the machines of the matching OS type. Or you can script the install. The referenced site even provides the batch files necessary, but if you want to get fancy you could script it with VBS, Perl, or Javascript (assuming that all of the machines to be patched have WSH installed). You could spend a couple days perfecting the technique and then let the patching script run until it is finished. It shouldn't take too long.
And as far as I'm aware, none of the DST patches (or registry fixes) requires a reboot to complete. All it does is change the date that the DST shift occurs.
I hate to say it, but Microsoft Access fits your needs almost perfectly, in this case. It can import the data from your spreadsheets, if they're properly formatted. (And they'd have to be, if you wanted to have software make your schedule for you.)
Once your data is in place, you write a query that includes a calculated field for the heuristics you're looking for. Run a query against that that checks against a table containing your available time slots, and you'll have the data you're looking for. (Or, at least, something that will do most of the work for you.)
You've got to patch 7000 servers in four weeks. Do you really want to spend a few days learning a a new software package that will do everything when you could take a piece of software you probably already know and simplify the problem in only a day?
tasks(723) drafts(105) languages(484) examples(29106)
Some people, as I post this, have sort of strongly hinted at this, but nobody else has directly asked this yet.
What are you already using to patch your 7000+ servers? By the time you reach 7000+, this should have been a problem long solved. Hell, I'd expect it to be solved by the 100+ point.
What's so special about this DST patch that your current process can't handle it?
Because if the answer is "we have no process", you've long since lost, and good odds your systems are already seething piles of unpatched, compromised machines.
If you do have a process but it's inadequate, and Slashdot might actually be able to help you, you'll need to be a little more clear on exactly what the problem is, if it isn't "we have no process".
(What is it with people lobbing questions onto Ask Slashdot and almost, but not quite, never following up? Is the lead on Ask Slashdot so long that people die before it gets posted, or just give up? Obviously I ask this before I can tell whether "thesandbender" is one of the rare exceptions... as of this writing, no, unless (s)he's been modded into oblivion.)
Microsoft (from what I've heard from my desktop folks at work) do have a patch for Windows 2000 - it's just not exactly published yet.
Let's just say the company I work for doesn't have more than 1% WinXP....
Yes, the word is that there is a "patch" for Windows 2000. But since Windows 2000 is out of mainstream support Microsoft is only making it available to companies that have purchased extended support agreements for their Windows 2000 systems. Yes, it probably is part of Microsoft's strategy to push customers into upgrading to Windows XP/2003/Vista/Longhorn. Yes, Microsoft will undoubtedly take some heat for it, but they are also freely providing documentation on how to manually resolve the issue and script the fix, and that should be more than sufficient for any admin worth the half the title to be able to fix it.
But let's be honest here, we all know (or should know) that if you are running a Microsoft OS that is two or more generations old then there are going to be some issues. If you are still running Windows 2000 in your environment (and my company is, so I speak from experience), then this is undoubtedly not the first issue that you've run into that required a work around, nor will it be the last. Fixing them is part of what we get paid to do. Eventually there will be a point where it becomes more cost-effective to upgrade, and that's what we'll do.
Managers must manage.
You don't have the time to put in a system, but you can craft a one off solution.
Your solution starts by sub-dividing your 7k servers into groups based on business units. Poke around to find out what their SLA is, and then _tell_ them that you are going to bend the SLA a little in order to get this 'OMG CRITICAL PATCH' onto your farm.
No offense, but I have found scripting abilities in Unix/Linux shops to be of a lot higher quality than Windows shops. nevertheless, you do have some talent whether you know it or not. Enlist this talent and use scripting for a lot of the nitty gritty details.
Quest Fastlane Reporter, Winbatch, and native WMI are great ways to report on pre and post conditions of servers.
Delegate, delegate, delegate. Let your team plan the methods and schedules for each business unit's servers
Once over the crisis, use the information you have gathered to generate a requirements document and go shopping.
Remember, the key to delegating is trust. You are in charge of managing the 7k servers; you are not in charge of doing the individual upgrades/patches.
I'm sorry to take a bit of a condescending tone, but I'm trying to be clear, not flatter your ego. To reiterate, the bottom line here is that with the time you have, you will be doing an automated manual upgrade. You may find that the process you cobble together will actually become a great plan B when critical patches need to be made; especially if you design with that goal in mind.
Use the 'scare' from the event quickly to get budget money for a Real Patch System(TM).
Good luck!