Slashdot Mirror


Network Monitoring and Alerting?

SpamMonkey asks: "At work I am trying to implement a central monitoring and alerting service. We have in excess of 250 Windows servers, approx 15 AIX servers and another 30 Linux servers (mainly SLES/Suse). My investigation into systems that will allow us to monitor critical areas on each of these systems has so far led me to a clustered Linux server running Nagios with passive and active checks. What I'm curious about though is how Slashdot readers are carrying out their own jobs and how they can comfortably sit back, without having to repeatedly check that various systems are still operational and how to cut down their own response times when something goes wrong."

4 of 59 comments (clear)

  1. Nagios by codejnki · · Score: 2, Insightful
    I implemented Nagios at my work and am very happy with. We are mostly a Citrix shop so rouge processes on windows servers are the bane to things running smoothly. We monitor average CPU load on all our application servers.


    All in all I'm monitoring about 200 different processes across our network as well as running MRTG on the same box. Never felt once I needed to cluster.

    --
    "War doesn't determine who's right, just who's left"

    Steven Wright

    1. Re:Nagios by dubious9 · · Score: 2, Insightful

      I like nagios, but it can be difficult to set up. To obtain the same level of functionality with MS Mom, I had to write a number of plugins. Plugin writing is really easy if you know some perl, it's basically just creating a CLI utility that outputs in a certain format that nagios can understand.

      Remote services can be checked with ease, but stuff that needs to query the local system, disk-space or load for example, needs a different setup. I ran it through SSH, but I'm not sure the kind of load that would put on 200+ machines and the network. SSH was never ment to be lightweight. You could also do some MRTG hacking and SNMP, but it's a lot of work. If the network was mostly linux machines, I'd go with nagios. However, since this problem is mostly with windows machines, I'd use MS-MOM as the main tool, and go from there.

      --
      Why, o why must the sky fall when I've learned to fly?
  2. Re:Also curious by BlurredWeasel · · Score: 2, Insightful

    Sorry to be shilling for the company I work for, but we can do exactly what you need. Indicative Service Directory can collect constant data, but then only upload to the central collection server once a day, or even less frequently if desired.

  3. Re:Zabbix by egon · · Score: 2, Insightful

    I've played with Nagios a fair bit. When I saw the Zabbix article in the recent Linux Magazine, I thought I'd check it out. The graphing looked much easier to do, and the I liked the concept of "screens".

    Having used it for a couple weeks now, I would call it "software with potential". It's not quite there yet, and has the feeling of being immature software. This is not a dig - quite the contrary. I think that zabbix has a *lot* of potential, but I think it needs a little more time before it's ready.

    Just my $.02.

    --
    Give a man a match, you keep him warm for an evening.
    Light him on fire, he's warm for the rest of his life