Slashdot Mirror


What Do You Use for SNMP Monitoring?

linuxi386 wonders: "My company is in the process of implementing a global frame relay system. The network will cover 20+ states, and several European and Asian countries and Australia. It will have a 5 point full mesh fail-over with each coast/country having about 20 ppp links about 30 servers mixed between linux and windows plus a 2003 domain controller at each site. I have been looking for a really decent cheap web based monitoring application to maintain the entire system. So far I have looked at Solarwind's Orion and Adventnet's Opmanager. I like the look of Orion, but while I prefer the feature base of Opmanager, I cannot stand its pricing model or the XP playskool style theme it uses. I am trying to avoid writing my own system to manage this if at all possible. What would you folks recommend and why?"

13 of 103 comments (clear)

  1. Cacti! by sampowers · · Score: 5, Informative

    We have a medium sized setup and for us, Cacti works great. http://www.cacti.net/

    1. Re:Cacti! by merreborn · · Score: 5, Interesting

      I've found Munin much easier to configure and extend than cacti.

      Quite frankly, I found cacti's interface, abstractions, and terminology very difficult to grasp.

      Munin, on the other hand, I've written a half dozen plugins for.

      Admittedly, cacti is more powerful, but that didn't do me much good, as I couldn't for the life of me harness that power.

  2. Google it!!!! by Knetzar · · Score: 5, Funny

    Just google for "full mesh fail-over" "ppp links" and...no, wait, forget that....

  3. What we use by Anonymous Coward · · Score: 5, Informative

    I'm posting as an AC so I don't break any I.P. and/or NDA's.

    At the companies I've worked at, we have typically started with the free monitoring software package Nagios and after a shortperiod of time, purchased the commercial product NetCool. NetCool is everything you could ever ask for... assuming you have a few months to tweak the rules to set the event levels correctly... But I guess all monitoring systems are like that.

    Depending on the size of your NOC, your datacenter, and your client base, I would recommend starting with Nagios and, if it proves to be too small for your needs, move the NetCool. (Just be prepared to pay serious $$$ for NetCool)

    HTH

    A.Coward

    1. Re:What we use by BrookHarty · · Score: 5, Informative

      Yup, Nagios is great, and you can customize it to work on anything. I dont see a reason to buy an expensive professional enterprise solution when Nagios is an enterprise solution.

      Plus when you start using it, you find your self adding new scripts to monitor more and more because its that easy. I'm using it to monitor tcp/udp ports, processes, oracle rac instanaces, oracle queues, swiftmq queues, hardware nics, hardware stats, memory/cpu/etc, log sizes, etc.

      So, not sure why I'd buy Netcool when Nagios is free, and works great. The time you spend configuring Nagios is cheap and easy. And it works with netexpert too.

      I like having a nice dashboard for my NOC, so they can keep a good eye on the health of a service, without lots of training.

  4. Either go big or go home by saxman57 · · Score: 5, Insightful

    If your company is willing to spend that much money on the network, a 'cheap' NMS tool is the wrong solution. Too often companies invest in technology only to skimp on the management of that technology. The end result is overall poor performance and dissatisfaction with the technology. I would suggest a real NMS tool such as OpenView.

    1. Re:Either go big or go home by arivanov · · Score: 5, Interesting

      Openview is not necessarily the answer.

      It is one of the best fault oriented NMSes on the market, but its performance monitoring side has always sucked bricks through thing straw sideways. Based on the packages mentioned in the original post the poster is trying to monitor performance and utilisation, not faults so Openview is the wrong tool.

      I am an old school person (been doing this for 10+ years now on networks from 10 nodes to global telco), so my first choice for performance monitoring in a 30 node setup would be the classic - MRTG (though I use it with a rrd backend nowdays). I have run it for up to 600 monitored variables. It works. For a 30 node full mesh this will be a no-brainer. Its main disadvantage is that it does not preserve long term historical data (which managers sometimes require). The main advantage is that you can also plug in non-network data (CPU, environmental, application performance) from the linux part with ease. The next choice would obviously be infovista (its original stuff, not the stuff it acquired recently). It costs money though. No idea how much nowdays. It also has a learning curve associated with it.

      As far as the utilities mentioned in the original post - they are winhoze stuff, so I am not very familiar with them. I have seen some other products under the same brands (solarwind tftp server) and they are laughable.

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
  5. Check out Nagios by ErikTheRed · · Score: 5, Informative

    Nagios is a fairly easy-to-learn, extremely extensible (can you use a scripting language?) monitoring system. It scales reasonably well, distributed stat gathering, can respond to SNMP traps, etc. Not the easiest out of the box (you'll spend a day or two learning to use it and set it up), but there's very little you can't make it do.

    --

    Help save the critically endangered Blue Iguana
  6. Correlation, your best asset by spinash · · Score: 3, Interesting

    if your network has a certain size and you do everything by SNMP, you need to be able to correlate the events to avoid alarm floods when one link goes down. We have used Openservice's Nervecenter with great success, coupled with NetCool from IBM. The pricing is steep, but the products are top-notch. In our configuration, we monitor about 8'000 network devices (Cisco, 3com, Bay, Nokia-IPSO, Consentry, etc) using 2 Nervecenter running on 2 Sun 480 boxes.

    (I'm not affiliated with these companies or products)

  7. Use a Proper Tool by Michael+Snoswell · · Score: 5, Insightful

    How is it that you're obviously spending a huge amount on the network infrastructure and want to cut costs so much on network monitoring? After going to all the effort of setting up you'll want a decent tool that tells you the instant something is wrong - and before the users tell you!

    Something like HP OpenView does the job. Cisco have a sw tool but not as good, as do Sunand IBM. CA Unicentre is overkill and too expensive to my mind. For small jobs (less than 100 nodes) I've used Ipswitch Whatsup Professional. You want something that goes inside your switches and has agents for all your servers if you want to monitor properly.

    In the dim past (10+yrs ago) I used Scotty (a Tcl/Tk freeware tool) and at other times wrote my own in Python/TK with Perl daemons/services.

    net-snmp on sourceforge has tools you can use but to my mind these days, again I'd say - it's an expensive (and I presume important) network your've got there, so spend some money to monitor it properly. The expensive tools ($30k+) all have ready made agents or know about a huge variety of hw so you don't have to customise MIBs and code (though Unicentre takes a lot of customisation to work well and they all need customisation of sorts). It might take you 3 months to do a half decent job coding yourself that a commercial package could do with more features in a few weeks and you've got support and someone to complain to if there're problems. How much money would be lost when the network goes down in those three months? Just one hour for a large corporation would cover the cost of the sw.

    I do agree it's great fun rolling your own (I'm sure you're a great programmer) if you have the time and the corporate managers don't appreciate the need to monitor things properly and you can't convince them to spend the dollars - but when it goes down it'll be your arse and the managers'/company's money being lost while you sweat to fix things - they'll quickly tell you then (and rightly so) it would have been worth doing it right the first time (you didn't think they'd take the heat for this now did you?) no matter how good your code will look in just another months time.

    At worst write some emails as evidence that you requested such and such a package with official quotes and have their replies on record they refused to spend the money on it. I know of one company that went to the wall when the network went down (chain of retail stores) and a series of seemingly small faults on critical days (like the last shopping days before christmas) meant the company went under and the IT consultants who designed the system took the blame in court in the end - cost them $30m (plus a few hundred ppl lost jobs).

    Now if this is just some academic network or it's not your responsibility then fine (mind you many research places are even more fussy about their networks than corporate users).

    Unfortunately there are times when jumping into coding, nomatter how well intentioned, isn't the most pragmatic or best solution.

    --
    pithy comment
  8. Who needs SNMP? by Anonymous Coward · · Score: 5, Funny

    SNMP? That's complicated stuff to set up.

    At work we rely on th much more robust, and easy to use URMP ("User Resource Management Protocal") to monitor our systems. When the systems go down, the users let us know about it.

  9. What the hell? by jevring · · Score: 3, Interesting

    So, let me get this straight, you're building a GLOBAL frame relay system, with nodes in 20+ states, with massive redundancy, and you're looking for a CHEAP system?

    Get yourself together and look for a GOOD system. If you're already spending TONS of money, you might aswell spend some more to get exactly what you want, instead of settling for something. It might turn out that a free system is the best system for you, but please, good HAS GOT TO come before cheap!

    --
    Move sig!
  10. Simple and Elegant: MON by rasjani · · Score: 3, Informative

    http://www.kernel.org/software/mon/ I was one of the implementation crew for small noc (about 7 people incl. managers) and approx 150 machines in various locations.. I reviewed quite a lot of free software and while most of them where looking quite nice (nagios/bigbrother/etc.), allmost all of them where filled with features that where really not essential just for "monitor the healt of the system" so i ended up with mon. Mon, for me was really the "unix way" of creating stuff, make things easy/simple and extend it with other tools.. The generic layout we used was net-snmp on client hosts either being polled in intervals or sending traps to the main machines.

    --
    yush