Slashdot Mirror


Scheduling Large Scale Server Upgrades/Outages?

thesandbender asks: "I've inherited my companies DST patching project and I have to schedule upgrades for 7000+ servers over the course of the next few weeks. Of course each group inside the company has different SLA's and outage windows. I need to somehow turn the pile of spreadsheets I have into a database and create a schedule that spreads the load over our pool of system administrators. There is no way I can reasonably accomplish this by hand, and even software for other industries/applications that could take a few steps out of the process would be appreciated. Does anyone know of a rule based scheduling system where I provide the available outage windows and a priority ranking for each system and the scheduler will recommend the order in which they should be upgraded?"

10 of 85 comments (clear)

  1. Why micromanage this? by djh101010 · · Score: 3, Insightful

    I think if I had to do this, I'd establish a priority ranking of the systems, taking into consideration critical path and cascading dependances, and then assign the highest priority ones first. When you finish that, come back to the pool for the next high priority job. When you're out of high priority jobs in the pool, move on to mid-priority, and so on. Trying to keep a bunch of inter-related steps in synch will drive you, and your sysadmins, crazy. Set priorities and let the big boys and girls do their job.

  2. Run away, into a hole somewhere by Marcion · · Score: 1, Insightful

    >I have to schedule upgrades for 7000+ servers ... pile of spreadsheets ...

    Somebody bought 7000 servers with no plan for upgrades?

    (Patching for DST, get a new OS...)

    1. Re:Run away, into a hole somewhere by GuyverDH · · Score: 2, Insightful

      "(Patching for DST, Get a new OS...)"

      Sorry friend, but every OS in the world, that's used in the United States, that implements automatic time shifting due to Daylight Savings Time / Daylight Standard Time changes, has to be patched.

      The reason being, the start and stop dates changed.
      Why? Because someone told GWB that it was a good idea, and that it would help in the war on terror. Who really knows for sure, unless he just bought stock in the consumer electronics companies that stand to make a killing on new replacements to devices that hardcoded the DST time changes.

      Anyway, the only operating systems that won't need to be patched, are those that aren't automatic.
      This of course will require loads of admins to wait for 1AM, then push the time back to 12AM manually.

      Now as to the SLAs vs patching.

      If a business unit requires a very high SLA, then the servers should be clustered, and then it's a non issue.
      You just take down half the cluster, patch it, bring it back up, then take down the other half of the cluster.

      If the business units expect high SLAs without Clustering, then someone needs to explain that it's not a good idea to make business decisions while smoking crack.

      --
      Who is general failure, and why is he reading my hard drive?
    2. Re:Run away, into a hole somewhere by GuyverDH · · Score: 2, Insightful

      That.... was a joke, as was the line of bullshit that others fed the President. They told him it would reduce energy consumption? It's not going to reduce consumption, if anything, it will just increase energy consumption. What with all the goods that will have to be re-manufactured with new chips, plus all the overtime, burning the late night oil, patching all the boxes to make them work with the new standard. Then add in all the new hardware upgrades that will have to be purchased, or operating systems upgrades that will have to bought. It's all money wasted because either the hardware or operating system are out of support, have been for years, but worked fine - until this change, that now have to be replaced.

      Cost savings? Where? All we're going to see is big companies that manufacture products (or sell oil related products) making more money, while everyone else has to spend more to keep things functioning properly, which in turn will increase costs for everything else, which in turn increases inflation, which in turn .... you get the idea.

      Just another idiotic idea, stupidly implemented, and signed into law by someone who seems to have the comprehension of a 12 year old.

      --
      Who is general failure, and why is he reading my hard drive?
  3. Used to be my problem by RabidMonkey · · Score: 5, Insightful

    This used to be my problem ... for the DST change, we have thousands of servers and workstations to deal with. I was getting worried, but instead of taking it on, we found a PM and now it's their problem.

    The moral of the story: never try.

    --
    We emerge from our mother's womb an unformatted diskette; our culture formats us. - Douglas Coupland
  4. Re:This request surprises me for this many machine by ocbwilg · · Score: 2, Insightful

    There are also some GPL things that may work. Can't think of them right off hand. If these are *nux desktops/servers, you have plenty of time to write a perl/bash/python to accomplish the task. Some other slashdot user is going to have to give you advice for a windows environment at this stage of the game you are in.

    Hi, I'm "some other Slashdot user," and my advice for the Windows environment is the same as for Linux. Well...almost. If you are running Windows XP on the desktop or 2003 on the server (or later) then Microsoft already has released a patch that should have been part of your regular patch cycle. If not, it's time to dig out WSUS (it's free from Microsoft) or whatever patch management system that you are using to manage your 7000 servers. If you truly have 7000 servers and no patch management system in place, then you are not only screwed, but you are stupid as well.

    Now, for anything that is Windows 2000 or older, you will have to manually patch the system, and without the benefit of a patch from Microsoft. No problem. Just hit their Technet article about the issue here and read up on what it entails. Basically, you manually patch one machine of each OS type, export the relevant registry keys, and then import them on the rest of the machines of the matching OS type. Or you can script the install. The referenced site even provides the batch files necessary, but if you want to get fancy you could script it with VBS, Perl, or Javascript (assuming that all of the machines to be patched have WSH installed). You could spend a couple days perfecting the technique and then let the patching script run until it is finished. It shouldn't take too long.

    And as far as I'm aware, none of the DST patches (or registry fixes) requires a reboot to complete. All it does is change the date that the DST shift occurs.

  5. I'm gonna get mod-bombed... by Short+Circuit · · Score: 3, Insightful

    I hate to say it, but Microsoft Access fits your needs almost perfectly, in this case. It can import the data from your spreadsheets, if they're properly formatted. (And they'd have to be, if you wanted to have software make your schedule for you.)

    Once your data is in place, you write a query that includes a calculated field for the heuristics you're looking for. Run a query against that that checks against a table containing your available time slots, and you'll have the data you're looking for. (Or, at least, something that will do most of the work for you.)

    You've got to patch 7000 servers in four weeks. Do you really want to spend a few days learning a a new software package that will do everything when you could take a piece of software you probably already know and simplify the problem in only a day?

  6. How are you already handling this? by Jerf · · Score: 2, Insightful

    Some people, as I post this, have sort of strongly hinted at this, but nobody else has directly asked this yet.

    What are you already using to patch your 7000+ servers? By the time you reach 7000+, this should have been a problem long solved. Hell, I'd expect it to be solved by the 100+ point.

    What's so special about this DST patch that your current process can't handle it?

    Because if the answer is "we have no process", you've long since lost, and good odds your systems are already seething piles of unpatched, compromised machines.

    If you do have a process but it's inadequate, and Slashdot might actually be able to help you, you'll need to be a little more clear on exactly what the problem is, if it isn't "we have no process".

    (What is it with people lobbing questions onto Ask Slashdot and almost, but not quite, never following up? Is the lead on Ask Slashdot so long that people die before it gets posted, or just give up? Obviously I ask this before I can tell whether "thesandbender" is one of the rare exceptions... as of this writing, no, unless (s)he's been modded into oblivion.)

  7. Re:This request surprises me for this many machine by ocbwilg · · Score: 2, Insightful

    Microsoft (from what I've heard from my desktop folks at work) do have a patch for Windows 2000 - it's just not exactly published yet.

    Let's just say the company I work for doesn't have more than 1% WinXP....


    Yes, the word is that there is a "patch" for Windows 2000. But since Windows 2000 is out of mainstream support Microsoft is only making it available to companies that have purchased extended support agreements for their Windows 2000 systems. Yes, it probably is part of Microsoft's strategy to push customers into upgrading to Windows XP/2003/Vista/Longhorn. Yes, Microsoft will undoubtedly take some heat for it, but they are also freely providing documentation on how to manually resolve the issue and script the fix, and that should be more than sufficient for any admin worth the half the title to be able to fix it.

    But let's be honest here, we all know (or should know) that if you are running a Microsoft OS that is two or more generations old then there are going to be some issues. If you are still running Windows 2000 in your environment (and my company is, so I speak from experience), then this is undoubtedly not the first issue that you've run into that required a work around, nor will it be the last. Fixing them is part of what we get paid to do. Eventually there will be a point where it becomes more cost-effective to upgrade, and that's what we'll do.

  8. automated by hand by tsstahl · · Score: 3, Insightful

    Managers must manage.

    You don't have the time to put in a system, but you can craft a one off solution.

    Your solution starts by sub-dividing your 7k servers into groups based on business units. Poke around to find out what their SLA is, and then _tell_ them that you are going to bend the SLA a little in order to get this 'OMG CRITICAL PATCH' onto your farm.

    No offense, but I have found scripting abilities in Unix/Linux shops to be of a lot higher quality than Windows shops. nevertheless, you do have some talent whether you know it or not. Enlist this talent and use scripting for a lot of the nitty gritty details.

    Quest Fastlane Reporter, Winbatch, and native WMI are great ways to report on pre and post conditions of servers.

    Delegate, delegate, delegate. Let your team plan the methods and schedules for each business unit's servers

    Once over the crisis, use the information you have gathered to generate a requirements document and go shopping.

    Remember, the key to delegating is trust. You are in charge of managing the 7k servers; you are not in charge of doing the individual upgrades/patches.

    I'm sorry to take a bit of a condescending tone, but I'm trying to be clear, not flatter your ego. To reiterate, the bottom line here is that with the time you have, you will be doing an automated manual upgrade. You may find that the process you cobble together will actually become a great plan B when critical patches need to be made; especially if you design with that goal in mind.

    Use the 'scare' from the event quickly to get budget money for a Real Patch System(TM).

    Good luck!