Slashdot Mirror


Scheduling Large Scale Server Upgrades/Outages?

thesandbender asks: "I've inherited my companies DST patching project and I have to schedule upgrades for 7000+ servers over the course of the next few weeks. Of course each group inside the company has different SLA's and outage windows. I need to somehow turn the pile of spreadsheets I have into a database and create a schedule that spreads the load over our pool of system administrators. There is no way I can reasonably accomplish this by hand, and even software for other industries/applications that could take a few steps out of the process would be appreciated. Does anyone know of a rule based scheduling system where I provide the available outage windows and a priority ranking for each system and the scheduler will recommend the order in which they should be upgraded?"

85 comments

  1. Why micromanage this? by djh101010 · · Score: 3, Insightful

    I think if I had to do this, I'd establish a priority ranking of the systems, taking into consideration critical path and cascading dependances, and then assign the highest priority ones first. When you finish that, come back to the pool for the next high priority job. When you're out of high priority jobs in the pool, move on to mid-priority, and so on. Trying to keep a bunch of inter-related steps in synch will drive you, and your sysadmins, crazy. Set priorities and let the big boys and girls do their job.

    1. Re:Why micromanage this? by Anonymous Coward · · Score: 0

      Amen.

      Group servers - you have, presumably, dev boxes, QAT, production, and so on. Dev boxes go first. QAT second. Production last, after it's been shown that the earlier moves tested out ok.

      If you have that many servers, there should be some sort of system setup whereby sysadmin A is responsible for servers X, Y, and Z; sysadmin B is responsible for U,V, and W; and so on. Get them to help out with the scheduling for those boxes; they should have a good working relationship with the people who use the systems, and hence able to coordinate everything.

      In other words, draw up a plan in broad, and then get the people on the ground to fill in the details. Seven thousand servers is way too many for a single person to try to coordinate, and there's not really any point in trying to optimise it to minimise the overall time it takes to upgrade them (unless it's a security patch, in which case you probably end up blatting it out wide scale and hoping it doesn't break things ...)

    2. Re: Why micromanage this? by Anonymous Coward · · Score: 0

      I know Aduva http://www.aduva.com/ is one but it only works for Solaris and Linux not Windows (or at least it didn't)... once you got it setup and configured it worked well but the main issue was describing the rules as to when a system could be patched, etc. Like the ranking of the systems... big thing it allowed was rolling back... so if you had something you needed to roll back you could do it across multiple servers just like deploying the patch...

      But like any software package, it takes time to deploy and learn it to be effective, otherwise it becomes shelf-ware...

  2. You just pull the plug by Colin+Smith · · Score: 0

    Then plug it back in real quick.

    --
    Deleted
  3. My advice: by Guppy06 · · Score: 4, Funny

    shutdown -h now

    Fuck the users! They exist solely to bemuse the sysadmin! Odds are they've been getting uppity lately and need to be taught a lesson, anyway.

    1. Re:My advice: by Anonymous Coward · · Score: 0

      noob adm1n1strat0r... r3al h@ck3rz use init 0!

    2. Re:My advice: by avronius · · Score: 1

      You just go ahead and try that on a Solaris box - it'll take you to a nice OBP, and 50% of the time it will leave you there, waiting for someone to walk by and type "boot".

      Some administrators believe that if a server dies for whatever reason, leave it off - this way they're sure to be aware of the outage. These folks will set the eeprom to not automatically boot the box. After the power spins up the obp, it stops at an "OK" prompt.

      Others believe that the server should just come up after a crash - sadly, this can result in you never knowing the root cause. These folks will set the eeprom to "True", and the box will automatically try to boot to the OS.

      Which one is right? Depends on how volatile your environment is, I guess.

  4. some almost advice by ILuvRamen · · Score: 0, Flamebait

    1. Stop using acronyms that nobody knows, slashdotters hate that!
    2. never ever ever use spreadsheets ever to hold data ever because you'll eventually want to do database operations on it
    3. and as for the specialized software suites that do all that logic and notification and stuff, it'd take longer setup and configure that than to do it manually and cost an ungodly amount of money for licensing. Plus it never gets the logic right because tons of human reasoning is involved in which to drop when and stuff and computers can't handle that. If I were you, I'd stick some blank transparencies in the printer and print color coded, graphical timeline sort of outage window schedules from each department or whatever and then just lay them on top of each other in logical ways until you come up with something that works. The main object is to go through the first day and pack as many possible downtimes together in a row as you can then go to the next day and do the same thing until they all have a scheduled time for when they are allowed to be down. Make sure every single upgrade time has at least one secondary possible time in case the one before it takes longer than it should (which will happen a lot) If you have them arrangeable in overlapping transparencies that way and they can be easily rearranged and examined visually, it's better than any computer program except you're doing all the logic, but that really shouldn't take much longer than an hour or two if you use the logical pattern I said. Hope that made sense cuz it did in my head lol.

    --
    Google's Super Secret Search Algorithm: SELECT @search_results FROM internet WHERE @search_results = 'good'
    1. Re:some almost advice by djh101010 · · Score: 1

      1. Stop using acronyms that nobody knows, slashdotters hate that!

      DST? SLA? I don't think either of those are obscure... but I manage servers in a LSDC so maybe it's just part of that world.

    2. Re:some almost advice by ILuvRamen · · Score: 1

      okay, the odds that everyone that reads slashdot is a server administrator, repair technician, network technician, programmer, web developer, AND hardware salesman are pretty low. I'm only a repairer, programmer, and web developer so I have no idea what any of those mean.

      --
      Google's Super Secret Search Algorithm: SELECT @search_results FROM internet WHERE @search_results = 'good'
    3. Re:some almost advice by Anonymous Coward · · Score: 0

      Which is also why I'm sure he's not interested in your inexpert advice. If you don't understand the question, you're probably not qualified to answer it.

    4. Re:some almost advice by Intron · · Score: 1

      MPU

      --
      Intron: the portion of DNA which expresses nothing useful.
    5. Re:some almost advice by djh101010 · · Score: 1

      Sorry, but a snarky comment rather than 10 seconds typing "dst acronym" and "sla acronym" into google, well, I don't see what positive thing it accomplishes to do that.

    6. Re:some almost advice by mollymoo · · Score: 1

      I Googled. Daylight Savings Time? Dynamic Stress Test? Data Systems Test? Data Storage and Transfer?

      --
      Chernobyl 'not a wildlife haven' - BBC News
    7. Re:some almost advice by 0racle · · Score: 1

      You probably also don't have an answer for the question either so not knowing the acronyms isn't much of a problem.

      --
      "I use a Mac because I'm just better than you are."
    8. Re:some almost advice by djh101010 · · Score: 1

      I Googled. Daylight Savings Time? Dynamic Stress Test? Data Systems Test? Data Storage and Transfer?

      Lovely. So now this is google and boolean lesson time? You may have noticed, that daylight savings time was changed this year in the US. Or maybe you didn't. It's kicking in on a different time than planned, you see. Systems know when they _think_ it is to change, and that's not the right week based on the recent legislation.

      It's _fine_ if someone doesn't understand an acronym. Really, it is. What is pointless and a waste of time, is posting a snarky comment whining that they don't know what it is, or which acronym it is, etc etc etc. Context of the thread either makes it clear, or makes it clear that that's not the central point of the post.

      In this case, it obviously doesn't matter what DST means. Sure, it's the timezone thing, but it doesn't matter. Dude in question has 7000 boxes to patch, and is looking for suggestions on how to manage it. The answer doesn't change depending on what DST stands for, does it?
    9. Re:some almost advice by ILuvRamen · · Score: 1

      actually, I posted the best answer so far. It's the cheapest most efficient solution.

      --
      Google's Super Secret Search Algorithm: SELECT @search_results FROM internet WHERE @search_results = 'good'
    10. Re:some almost advice by Anonymous Coward · · Score: 0

      What, you think that Ask Slashdot questions are supposed to be dumbed down enough that "everyone" can have a shot at answering them? Sorry, dude: They're not for the benefit of Slashdot readers; they're for the benefit of the person asking the question. Posting a question here is intended to take advantage of the fact that some of the people reading will be expert enough to have useful answers... and the hope that the rest will have the sense to keep their clueless guesses to themselves. Apparently not.

    11. Re:some almost advice by SausageOfDoom · · Score: 1

      Does too.

    12. Re:some almost advice by SatireWolf · · Score: 1

      You manage servers in Local Spin Density Correlation or Low Slump Dense Concrete?

  5. Procrastinate by mkcmkc · · Score: 4, Funny

    If you just put this off for a few months, the problem will probably just go away...

    --
    "Not an actor, but he plays one on TV."
  6. Run away, into a hole somewhere by Marcion · · Score: 1, Insightful

    >I have to schedule upgrades for 7000+ servers ... pile of spreadsheets ...

    Somebody bought 7000 servers with no plan for upgrades?

    (Patching for DST, get a new OS...)

    1. Re:Run away, into a hole somewhere by GuyverDH · · Score: 2, Insightful

      "(Patching for DST, Get a new OS...)"

      Sorry friend, but every OS in the world, that's used in the United States, that implements automatic time shifting due to Daylight Savings Time / Daylight Standard Time changes, has to be patched.

      The reason being, the start and stop dates changed.
      Why? Because someone told GWB that it was a good idea, and that it would help in the war on terror. Who really knows for sure, unless he just bought stock in the consumer electronics companies that stand to make a killing on new replacements to devices that hardcoded the DST time changes.

      Anyway, the only operating systems that won't need to be patched, are those that aren't automatic.
      This of course will require loads of admins to wait for 1AM, then push the time back to 12AM manually.

      Now as to the SLAs vs patching.

      If a business unit requires a very high SLA, then the servers should be clustered, and then it's a non issue.
      You just take down half the cluster, patch it, bring it back up, then take down the other half of the cluster.

      If the business units expect high SLAs without Clustering, then someone needs to explain that it's not a good idea to make business decisions while smoking crack.

      --
      Who is general failure, and why is he reading my hard drive?
    2. Re:Run away, into a hole somewhere by ErikTheRed · · Score: 2, Interesting
      Why? Because someone told GWB that it was a good idea, and that it would help in the war on terror.
      Bzzzzt. Wrong. Somebody told him it would reduce national energy consumption. But than you for playing.

      That being said, IMHO the whole DST thing is stupid and obnoxious.
      --

      Help save the critically endangered Blue Iguana
    3. Re:Run away, into a hole somewhere by jazman_777 · · Score: 2, Funny
      This of course will require loads of admins to wait for 1AM, then push the time back to 12AM manually.


      How could this have been modded insightful, when everyone knows that you turn it back at 2AM?

      --
      Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
    4. Re:Run away, into a hole somewhere by GuyverDH · · Score: 1

      Have you tried to keep databases in sync across multiple timezones, without shifting the time simultaneously?

      ie - for 1 hour, there's a 2 hour difference?

      ehh.... sorry - thanks again - in this instance, both systems revert simultaneously.

      One at 2AM, the other at 1AM - in this case, my office works out of CST, while the corporate is EST.

      For what it's worth - almost everyone rolls their clocks back BEFORE THEY GO TO SLEEP, not at 2AM.
      But that's okay, I forgive you. It's not often that you get to one up another geek, so I'll let you have this one.

      --
      Who is general failure, and why is he reading my hard drive?
    5. Re:Run away, into a hole somewhere by GuyverDH · · Score: 2, Insightful

      That.... was a joke, as was the line of bullshit that others fed the President. They told him it would reduce energy consumption? It's not going to reduce consumption, if anything, it will just increase energy consumption. What with all the goods that will have to be re-manufactured with new chips, plus all the overtime, burning the late night oil, patching all the boxes to make them work with the new standard. Then add in all the new hardware upgrades that will have to be purchased, or operating systems upgrades that will have to bought. It's all money wasted because either the hardware or operating system are out of support, have been for years, but worked fine - until this change, that now have to be replaced.

      Cost savings? Where? All we're going to see is big companies that manufacture products (or sell oil related products) making more money, while everyone else has to spend more to keep things functioning properly, which in turn will increase costs for everything else, which in turn increases inflation, which in turn .... you get the idea.

      Just another idiotic idea, stupidly implemented, and signed into law by someone who seems to have the comprehension of a 12 year old.

      --
      Who is general failure, and why is he reading my hard drive?
    6. Re:Run away, into a hole somewhere by tsstahl · · Score: 1

      Just another idiotic idea, stupidly implemented, and signed into law by someone who seems to have the comprehension of a 12 year old.

      I feel compelled to point out the 435 other someones had to act before the pen touched the paper.

    7. Re:Run away, into a hole somewhere by GuyverDH · · Score: 1

      Notice I said "Signed into law" - I didn't claim that he made the law, or wrote it, just that he signed it - without signature statement I might add.

      --
      Who is general failure, and why is he reading my hard drive?
    8. Re:Run away, into a hole somewhere by ASCIIMan · · Score: 1

      And who keeps databases using local times? Seriously. UTC exists for a reason.

    9. Re:Run away, into a hole somewhere by Anonymous Coward · · Score: 0

      Only a 219 of those someones (simple majority of both houses) had to vote for it, however.

    10. Re:Run away, into a hole somewhere by GuyverDH · · Score: 1

      Not at the actual database layer, but in the field (or I guess column) layer - some of the programmers actually keep track of local time and tz - for what, I know not - I'm not a DBA, and thankfully don't have to wade through their reasoning.

      --
      Who is general failure, and why is he reading my hard drive?
    11. Re:Run away, into a hole somewhere by Jerf · · Score: 1

      You need to be a bit more careful. DST incontrovertibly "saves" energy, vs. non-DST. It's statistical. You can look it up. The energy savings are in fact quite significant.

      What I find stupid, and what you may want to glom onto as the reason to find DST stupid, is that the root problem is the idea that we should get up at 7am, regardless of 7am's relationship to the sun itself. I also find it stupid that everybody has to get off of work at the exact same minute. If we were more flexible (for real, not just lip service to "flex time"), the need for DST would just melt away as people naturally migrated towards more realistic schedules.

      To the best of my knowledge (and I'd be fine with being proven wrong), there is no good way to blame the government for our need to be told by our clocks that we should get up at different times. Except for the government setting worktimes for its own employees (which it has the right to do, more or less), our societal obsession with all working the same schedule, dictated to us unnaturally by our clocks, is the root problem and has just naturally evolved.

      Presumably there are some benefits too or we'd have migrated away from this by now, although it is conceivable that there are off-setting benefits to this setup that outweigh the staggering societal energy costs, and lost man-hours spent in needlessly-bad traffic jams. (There'll always be traffic jams, but they wouldn't be as bad if we staggered our times better. Well, we'd probably just compensate by building fewer roads, but for a few years it'd be nice...) Or it's simply a holdover from factory days that we still haven't flushed out of our collective systems. (Kinda like our school system, although that's a whole new flamewar.)

    12. Re:Run away, into a hole somewhere by Anonymous Coward · · Score: 0


      If a business unit requires a very high SLA, then the servers should be clustered, and then it's a non issue.
      You just take down half the cluster, patch it, bring it back up, then take down the other half of the cluster.

      If the business units expect high SLAs without Clustering, then someone needs to explain that it's not a good idea to make business decisions while smoking crack.


      You obviously don't work for SBC^W AT&T. We do that all the time, and get crapped on by management every time we try to tell them that it's a bad idea.

      Then they do it all over again and bitch about the problems. One boggles at the lack of logic. I blame it on cowardice.

  7. Used to be my problem by RabidMonkey · · Score: 5, Insightful

    This used to be my problem ... for the DST change, we have thousands of servers and workstations to deal with. I was getting worried, but instead of taking it on, we found a PM and now it's their problem.

    The moral of the story: never try.

    --
    We emerge from our mother's womb an unformatted diskette; our culture formats us. - Douglas Coupland
    1. Re:Used to be my problem by Odin_Tiger · · Score: 1

      "The moral of the story: never try."

      If I could mod this up, I would.

      --
      Unpleasantries.
    2. Re:Used to be my problem by pAnkRat · · Score: 1

      Homer: remember son, if something is hard to do, it ist not worth trying.

      --
      we need an "-1 Plain wrong" moderation option!
    3. Re:Used to be my problem by ThumpSlice · · Score: 1

      Parent is right. A PM should be able to use their tool of choice to do this easily. For example, MS Project can import your spreadsheet data. Once imported, the PM should be able to assign priorities to groups of servers/patches, then manipulate the schedule and assign resources. Then, when the PM tells you that meeting your schedule with the available resources is completely impossible, you can both transfer somewhere warm and sunny.

      --
      -- If you're posting to be funny, and your sig is funnier . . . .
  8. Use Windows Correctly by CelestialWizard · · Score: 3, Informative

    If you have 7000+ Windows Servers you should already be running a software patching solution such as SMS, WSUS, etc...

    Sure you'll spend a large amount of time sorting out which server[s] (server group[s])should be patched when, but once that is done - you should be able to schedule them within your chosen solution.

    Take WSUS for example. Organise your servers into groups, approve the update and set each group's Windows Update GP properties appropriately.

    1. Re:Use Windows Correctly by Dukebytes · · Score: 1

      This post is right on the point. You don't need a spreadsheet or database, you need a good management box to handle it for you.

      WSUS would work, but there are better products out there that and they would give you a lot more function. Hercules, from Citadel, is a good one and can handle 7000 boxes with a few systems in the right place. But it is not limited to Windows only patches, you can custom write you own upgrades for any of the apps on the box. They have scheduling, an inventory (your database), good reporting and several other things.

      And NO I don't work for them.

      Check them out, or patch link, big fix, landdesk etc.... You REALLY need something like this.

      Duke

      --

      FreeBSD: Nothing runs like a daemon with a pitch fork.
    2. Re:Use Windows Correctly by The+Great+Skeeve · · Score: 1

      I may have missed it, but I don't think he said what OS he was using. These might be 7000 Unix or Linux boxes or, more likely, a mixture of all the above.

  9. BladeLogic by Webdude · · Score: 2, Interesting

    I interviewed a while ago with a company called BladeLogic, they provide a suit of products for these type of tasks and all types of DataCenter management. I would defiantly give them a look they could help out on this project and many many in the future. http://www.bladelogic.com/
    B

    1. Re:BladeLogic by Wog · · Score: 5, Funny

      How do you defiantly look at a product?

      SCREW YOU! I'M GOING TO REVIEW YOU, AND IF I LIKE YOU, I'M GOING TO IMPLEMENT YOU, AND YOU'LL LIKE IT!

      (Lameness filter says I have too many caps. But I think they were appropriate. Bah.)

    2. Re:BladeLogic by Anonymous Coward · · Score: 0

      BladeLogic is only nice when everything works perfectly. The support is next to useless and their provisioning manager is alpha quality software. It is best when used only to audit processors. Anything else will be a hair pulling experience.

    3. Re:BladeLogic by undercanopy · · Score: 1

      How do you defiantly look at a product?

      get your boss to forbid to you look at it... then look at it.

      --
      -- D-23994, Muff#2613
    4. Re:BladeLogic by The+Great+Skeeve · · Score: 1

      Amen! I'm trying to learn the product myself right now and there is no support or training to be found. I'm told the software is "touchy", as in buggy.

  10. Script it? by Odin_Tiger · · Score: 1

    Ok, so you have spreadsheets with admins, windows, servers, priorities, etc., in them, and you're just looking for a way to schedule everything? Can you just export the spreadsheets to CSV and write a script to do it for you?

    --
    Unpleasantries.
  11. MP2 by div_2n · · Score: 1

    Check out MP2. Our maintenance guys use it to schedule and track maintenance of everything in the plant. They swear by it. I believe you could use it for server maintenance, but I haven't tried it.

    I don't know much about it, but I found one site that discusses it here.

    1. Re:MP2 by Anonymous Coward · · Score: 0

      Ummm..... MP2 (Maintenance Planner 2) is for tracking maintenance in an industrial facility (think heavy duty mechanics, shipyards, mining operations, etc). It's not for scheduling server reboots!

  12. Daylight Saving Time patching ?! by pieleric · · Score: 2, Funny

    Hi, I can't help you. I've no knowledge at all about this field. However, could someone make me a little bit less stupid and explain me those acronyms ? acronymattic has 197 TLAs for "DST" but I couldn't find the one which would fit for sure! SLA, that was standing for "Site Level Aggregator", right?

    1. Re:Daylight Saving Time patching ?! by prothid · · Score: 1

      DST = Daylight savings time
      SLA = Service level agreement

    2. Re:Daylight Saving Time patching ?! by Anonymous Coward · · Score: 0

      Most likely answers:

      DST -- Daylight saving time. The start and stop times have changed in the US, requiring mods to servers & workstation time handling rules

      SLA -- Service Level Agreement. The contract between those responsible for the servers and those who use them about acceptable outage timeframes.

  13. WSUS/Shutdown Command by TooMuchToDo · · Score: 1
    How about using WSUS?

    http://www.microsoft.com/windowsserversystem/updat eservices/default.mspx

    That, along with proper scripting of "shutdown -r /m \\computername" should get you through it.

  14. I don't get it by Anonymous Coward · · Score: 0

    It's a simple timezone change, why on earth the servers need a reboot or any downtime?
    Ok maybe a custom app freaks out but the OS should not be affected by the change.

    1. Re:I don't get it by Anonymous Coward · · Score: 2, Informative

      Crazy, huh? It gets better! For some systems, it's a MAJOR patch. Take AIX for example. It's going to take us 1.3GB worth of updates to fulfill the dependencies of the package that actually needs updating.

      Oh, and Java needs to be patched separately too. They store their timezones internally, instead of consulting the operating system.

  15. You are fucked by eln · · Score: 1

    Hi. DST changeover is in early March. If you aren't already halfway done with your 7000 server project, and they all require downtime, you are hosed. Find a new job.

    The good news is most Linux systems don't require a reboot for this change, so they can be done sans outage.

  16. Delegate by 4of12 · · Score: 3, Funny

    When computers get overloaded with work like this (host lookups, for example) they ask for help from other computers. As my stupid first try, how about asking each sysadmin to run a spreadsheet column of hostnames through an md5hash and let them convert servers with a '1' on the first day, 'a' on the tenth day, etc.?

    --
    "Provided by the management for your protection."
  17. This request surprises me for this many machines. by nortcele · · Score: 1

    I would think that a company managing 7000+ servers would have an automated patch scheduling system similar to BMC Marimba Altiris, or Opsware. You surely don't have time to purchase and install one of these mosters now, but it might be wise to pursue in the future.

    There are also some GPL things that may work. Can't think of them right off hand. If these are *nux desktops/servers, you have plenty of time to write a perl/bash/python to accomplish the task. Some other slashdot user is going to have to give you advice for a windows environment at this stage of the game you are in.

  18. Zero downtime by Anonymous Coward · · Score: 0

    Best way to do it is to eliminate downtime altogether. Backup your server. Replicate the server over to a second server at a different IP running in parallel and being updated in real time. When the upgrade time comes switch the DNS over to the spare server. Patch the first server, then replicate the changes of the data content of the second server(not OS!!!) back to the first server. Switch the DNS back. This way you get a complete backup of all your servers *and* you get your patching done with no disruption. By the 2nd thousand servers you should be able to do this in your sleep.

  19. 7000 servers? by Anonymous Coward · · Score: 0

    Holy chit!

    Dude you are so fucked. Actually, if they are Unix-based, you're probably okay, but still....

    I recommend what the guy(s) above said: bunch them into high-level groups, and delegate each group to SOMEONE ELSE. Problem solved.

    Also, in case you've haven't figured it out yet, you should do the LEAST IMPORTANT servers first. Use those to fine-tune your scripts or procedures.

    I have less than 100 FreeBSD servers to deal with and what I usually do is create a robust upgrade script on the dev servers, copy them out to all the other servers, and run in parallel. I have a set of scripts to capture all the output and run commands in parallel so it's pretty easy once the script is done. Takes a couple hours to write the script.

    Good luck!

  20. Shameless plug by Avian+visitor · · Score: 1

    Tablix is a free software package for solving various types of scheduling problems. If you have enough time on your hands to write the necessary modules for your particular problem I'm sure it can schedule your upgrades in the most efficient way.

  21. Re:This request surprises me for this many machine by ocbwilg · · Score: 2, Insightful

    There are also some GPL things that may work. Can't think of them right off hand. If these are *nux desktops/servers, you have plenty of time to write a perl/bash/python to accomplish the task. Some other slashdot user is going to have to give you advice for a windows environment at this stage of the game you are in.

    Hi, I'm "some other Slashdot user," and my advice for the Windows environment is the same as for Linux. Well...almost. If you are running Windows XP on the desktop or 2003 on the server (or later) then Microsoft already has released a patch that should have been part of your regular patch cycle. If not, it's time to dig out WSUS (it's free from Microsoft) or whatever patch management system that you are using to manage your 7000 servers. If you truly have 7000 servers and no patch management system in place, then you are not only screwed, but you are stupid as well.

    Now, for anything that is Windows 2000 or older, you will have to manually patch the system, and without the benefit of a patch from Microsoft. No problem. Just hit their Technet article about the issue here and read up on what it entails. Basically, you manually patch one machine of each OS type, export the relevant registry keys, and then import them on the rest of the machines of the matching OS type. Or you can script the install. The referenced site even provides the batch files necessary, but if you want to get fancy you could script it with VBS, Perl, or Javascript (assuming that all of the machines to be patched have WSH installed). You could spend a couple days perfecting the technique and then let the patching script run until it is finished. It shouldn't take too long.

    And as far as I'm aware, none of the DST patches (or registry fixes) requires a reboot to complete. All it does is change the date that the DST shift occurs.

  22. This is not a technical problem by the+right+sock · · Score: 1

    This is a political problem.

    The best you can do is come up with a realistic schedule for the actual timeframe you have available. And by realistic, I mean working off-hours. Then whomever is at the top of the chain tells everyone else that the upgrade happens at this time, and that's that.

  23. Which reminds me... by mkcmkc · · Score: 1

    Once upon a time I worked in operations for a Very Large Telecommunications Company (TM). One of my primary duties was to compile an onerous weekly report on server uptimes and send it to one of the directors, via his secretary. One day I found out that his secretary was moving to a different department, so I stopped sending them, to see what would happen. No one ever asked me about those reports again.

    --
    "Not an actor, but he plays one on TV."
    1. Re:Which reminds me... by j-pimp · · Score: 1

      One day I found out that his secretary was moving to a different department, so I stopped sending them, to see what would happen.No one ever asked me about those reports again.

      The question is, was he not reading them or did he have someone else prepare them because he had no idea how his old secretary got them.

      --
      --- Justin Dearing http://www.justaprogrammer.net/ We're just programmers.
    2. Re:Which reminds me... by avronius · · Score: 1

      At least now I know why I got asked to start compiling that damned report. But don't worry, I know where you work ;)

      [not really...]

    3. Re:Which reminds me... by pnutjam · · Score: 1

      does it matter to mkmkc?

    4. Re:Which reminds me... by mkcmkc · · Score: 1

      To answer your question, I was the only person covering the (mission critical!) systems in question, so no one else could have prepared the reports.

      --
      "Not an actor, but he plays one on TV."
    5. Re:Which reminds me... by mkcmkc · · Score: 1

      Yeah, but you don't know what I did six months ago...here...in the northern hemisphere... ;-)

      --
      "Not an actor, but he plays one on TV."
  24. Re:This request surprises me for this many machine by Digital+Killer · · Score: 1

    With all of the comments, only ocbwilg knew or bothered to set this guy straight? The DST change is simple, even if you do not have a patch management system. Nothing a simple script, a file with all of the server names and a some time to let it run won't take care of. No reboot is required, so SLAs do not need to be considered here. I do agree that any company with seven thousand servers needs patch management. In fact, I call bullshit. There is no way in hell they even operate without one.

  25. Re:This request surprises me for this many machine by plopez · · Score: 1

    I would think that a company managing 7000+ servers would have an automated patch scheduling system

    Nah. From personal experience I would say that most of them are pretty disorganized. And since they are very much cost driven they don't have cash for luxuries such as automated patch/upgrade tools. I mean, spreadsheets are free as is overtime for salaried employees, right?

    --
    putting the 'B' in LGBTQ+
  26. I'm gonna get mod-bombed... by Short+Circuit · · Score: 3, Insightful

    I hate to say it, but Microsoft Access fits your needs almost perfectly, in this case. It can import the data from your spreadsheets, if they're properly formatted. (And they'd have to be, if you wanted to have software make your schedule for you.)

    Once your data is in place, you write a query that includes a calculated field for the heuristics you're looking for. Run a query against that that checks against a table containing your available time slots, and you'll have the data you're looking for. (Or, at least, something that will do most of the work for you.)

    You've got to patch 7000 servers in four weeks. Do you really want to spend a few days learning a a new software package that will do everything when you could take a piece of software you probably already know and simplify the problem in only a day?

  27. Re:This request surprises me for this many machine by karnal · · Score: 1

    Microsoft (from what I've heard from my desktop folks at work) do have a patch for Windows 2000 - it's just not exactly published yet.

    Let's just say the company I work for doesn't have more than 1% WinXP....

    --
    Karnal
  28. How are you already handling this? by Jerf · · Score: 2, Insightful

    Some people, as I post this, have sort of strongly hinted at this, but nobody else has directly asked this yet.

    What are you already using to patch your 7000+ servers? By the time you reach 7000+, this should have been a problem long solved. Hell, I'd expect it to be solved by the 100+ point.

    What's so special about this DST patch that your current process can't handle it?

    Because if the answer is "we have no process", you've long since lost, and good odds your systems are already seething piles of unpatched, compromised machines.

    If you do have a process but it's inadequate, and Slashdot might actually be able to help you, you'll need to be a little more clear on exactly what the problem is, if it isn't "we have no process".

    (What is it with people lobbing questions onto Ask Slashdot and almost, but not quite, never following up? Is the lead on Ask Slashdot so long that people die before it gets posted, or just give up? Obviously I ask this before I can tell whether "thesandbender" is one of the rare exceptions... as of this writing, no, unless (s)he's been modded into oblivion.)

  29. 7000 servers and by phoebe · · Score: 1

    no redundancy?

    If you had that number of servers you can just take one, upgrade, test, move onto the next and keep on going. There should be 0% downtime.

    However if you have crapware that cannot cope in such situations maybe you should be badgering the vendor so that it can be rolled out in a more sensible manner.

  30. Re:This request surprises me for this many machine by ocbwilg · · Score: 2, Insightful

    Microsoft (from what I've heard from my desktop folks at work) do have a patch for Windows 2000 - it's just not exactly published yet.

    Let's just say the company I work for doesn't have more than 1% WinXP....


    Yes, the word is that there is a "patch" for Windows 2000. But since Windows 2000 is out of mainstream support Microsoft is only making it available to companies that have purchased extended support agreements for their Windows 2000 systems. Yes, it probably is part of Microsoft's strategy to push customers into upgrading to Windows XP/2003/Vista/Longhorn. Yes, Microsoft will undoubtedly take some heat for it, but they are also freely providing documentation on how to manually resolve the issue and script the fix, and that should be more than sufficient for any admin worth the half the title to be able to fix it.

    But let's be honest here, we all know (or should know) that if you are running a Microsoft OS that is two or more generations old then there are going to be some issues. If you are still running Windows 2000 in your environment (and my company is, so I speak from experience), then this is undoubtedly not the first issue that you've run into that required a work around, nor will it be the last. Fixing them is part of what we get paid to do. Eventually there will be a point where it becomes more cost-effective to upgrade, and that's what we'll do.

  31. automated by hand by tsstahl · · Score: 3, Insightful

    Managers must manage.

    You don't have the time to put in a system, but you can craft a one off solution.

    Your solution starts by sub-dividing your 7k servers into groups based on business units. Poke around to find out what their SLA is, and then _tell_ them that you are going to bend the SLA a little in order to get this 'OMG CRITICAL PATCH' onto your farm.

    No offense, but I have found scripting abilities in Unix/Linux shops to be of a lot higher quality than Windows shops. nevertheless, you do have some talent whether you know it or not. Enlist this talent and use scripting for a lot of the nitty gritty details.

    Quest Fastlane Reporter, Winbatch, and native WMI are great ways to report on pre and post conditions of servers.

    Delegate, delegate, delegate. Let your team plan the methods and schedules for each business unit's servers

    Once over the crisis, use the information you have gathered to generate a requirements document and go shopping.

    Remember, the key to delegating is trust. You are in charge of managing the 7k servers; you are not in charge of doing the individual upgrades/patches.

    I'm sorry to take a bit of a condescending tone, but I'm trying to be clear, not flatter your ego. To reiterate, the bottom line here is that with the time you have, you will be doing an automated manual upgrade. You may find that the process you cobble together will actually become a great plan B when critical patches need to be made; especially if you design with that goal in mind.

    Use the 'scare' from the event quickly to get budget money for a Real Patch System(TM).

    Good luck!

  32. the cure for your DST woes by FreeBSD+evangelist · · Score: 2, Interesting

    Move to Arizona, where we don't have Daylight Savings Time.

  33. EDL by jnieuwen · · Score: 1

    Scheduling using the earliest deadline late algorithm from the real time computing field might work. Based on the maintenance windows you should have different deadlines for different systems.

  34. Umm... Minkowsky? Google Calendar? by mengel · · Score: 1
    I think folks are focusing too much on the patching mechanism (i.e. how do I patch 7000 machines), and missing the point of the scheduling of the upgrade (*when* should I patch each group of machines).

    Take a package like Minkowsky , or other group calendar package, enter each of the groups you have an SLA with, and block out their you-can't-do-maintenance-here windows as "meetings" for them.

    Then try to schedule a "meeting" with as many of them as possible to do the upgrade, and a second meeting with as many as possible of the remaining batch, etc.

    --
    - "History shows again and again how nature points out the folly of men" -- Blue Oyster Cult, 'Godzilla'
  35. Patch Management products are your answer. by chopkins1 · · Score: 1

    Any decent, current PM system (Altiris PM, MS-WSUS, MS-SMS, etc.) - using a SQL or other database back end - should have a method to identify the devices to patch and build a collection and allow you to specify a certain time frame for applying the patch to the selected groupings on separate schedules and perform any necessary reboots all in an automated fashion. (Sorry about the run-on sentence).

    Altiris (or any other vendor, this is just the one I am most familiar with) would probably LOVE to have the opportunity to get their products in the door by doing a limited Proof-Of-Concept PM implementation. Or, you could download a demo copy of Altiris' Client Management Suite http://www.altiris.com/Download.aspx ) with a 30-day set of licenses and POC it yourself (1GB Mem Windows Server 2kN with SQL Server 2kN required). ( PS - They also have a Server Management suite with plugins that are appropriate for Dell and HP Servers.)

    Of course one would hope that with 7k devices you would already have such a system in place already.

    Best of luck.

  36. Re:This request surprises me for this many machine by Bastardchyld · · Score: 1

    Mod +1 Snarky

    Yeah, because the cost of WSUS ($0) is just too much to turn a profit when factoring in the jobs they are creating.

    For the Acronym Illiterate WSUS = Windows Server Update Services

    --
    $diff terrorists hippies
    $
    $rm -rf *terrorists *hippies
  37. A script anyone? by Robber+Baron · · Score: 1

    You know you could...oh I don't know...maybe patch ONE server and then write a script that would sync the other 6999 servers with it!

    --

    You're using her as bait, Master!

  38. Outage? What outage? by Brewskibrew · · Score: 1

    Reboot, Deny, Deny works well.

    *Admin reboots server*

    User: I'm getting an Outlook error.
    Admin: Reboot your computer.
    User: Okay, it's working now.
    Admin: Must have been your workstation.

    *Click*

    --
    For sale: Signature. One owner. Low miles. Always garaged. New punctuation, just installed!
  39. Re:This request surprises me for this many machine by capitalsoftware · · Score: 1

    We are doing this with Tivoli for some very large shops. See capitalsoftware.com/Forums We have written a compliance report gui and we can queue 1000's of machines at one time. Also, for SUN and IBM all you have to do (in most cases is update the TZD files using thier utilities. You can contact me at john_williscapitalsoftware.com