What Do You Use for SNMP Monitoring?
linuxi386 wonders: "My company is in the process of implementing a global frame relay system. The network will cover 20+ states, and several European and Asian countries and Australia. It will have a 5 point full mesh fail-over with each coast/country having about 20 ppp links about 30 servers mixed between linux and windows plus a 2003 domain controller at each site. I have been looking for a really decent cheap web based monitoring application to maintain the entire system. So far I have looked at Solarwind's Orion and Adventnet's Opmanager. I like the look of Orion, but while I prefer the feature base of Opmanager, I cannot stand its pricing model or the XP playskool style theme it uses. I am trying to avoid writing my own system to manage this if at all possible. What would you folks recommend and why?"
We have a medium sized setup and for us, Cacti works great. http://www.cacti.net/
Just google for "full mesh fail-over" "ppp links" and...no, wait, forget that....
I'm posting as an AC so I don't break any I.P. and/or NDA's.
At the companies I've worked at, we have typically started with the free monitoring software package Nagios and after a shortperiod of time, purchased the commercial product NetCool. NetCool is everything you could ever ask for... assuming you have a few months to tweak the rules to set the event levels correctly... But I guess all monitoring systems are like that.
Depending on the size of your NOC, your datacenter, and your client base, I would recommend starting with Nagios and, if it proves to be too small for your needs, move the NetCool. (Just be prepared to pay serious $$$ for NetCool)
HTH
A.Coward
If your company is willing to spend that much money on the network, a 'cheap' NMS tool is the wrong solution. Too often companies invest in technology only to skimp on the management of that technology. The end result is overall poor performance and dissatisfaction with the technology. I would suggest a real NMS tool such as OpenView.
Nagios is a fairly easy-to-learn, extremely extensible (can you use a scripting language?) monitoring system. It scales reasonably well, distributed stat gathering, can respond to SNMP traps, etc. Not the easiest out of the box (you'll spend a day or two learning to use it and set it up), but there's very little you can't make it do.
Help save the critically endangered Blue Iguana
if your network has a certain size and you do everything by SNMP, you need to be able to correlate the events to avoid alarm floods when one link goes down. We have used Openservice's Nervecenter with great success, coupled with NetCool from IBM. The pricing is steep, but the products are top-notch. In our configuration, we monitor about 8'000 network devices (Cisco, 3com, Bay, Nokia-IPSO, Consentry, etc) using 2 Nervecenter running on 2 Sun 480 boxes.
(I'm not affiliated with these companies or products)
How is it that you're obviously spending a huge amount on the network infrastructure and want to cut costs so much on network monitoring? After going to all the effort of setting up you'll want a decent tool that tells you the instant something is wrong - and before the users tell you!
Something like HP OpenView does the job. Cisco have a sw tool but not as good, as do Sunand IBM. CA Unicentre is overkill and too expensive to my mind. For small jobs (less than 100 nodes) I've used Ipswitch Whatsup Professional. You want something that goes inside your switches and has agents for all your servers if you want to monitor properly.
In the dim past (10+yrs ago) I used Scotty (a Tcl/Tk freeware tool) and at other times wrote my own in Python/TK with Perl daemons/services.
net-snmp on sourceforge has tools you can use but to my mind these days, again I'd say - it's an expensive (and I presume important) network your've got there, so spend some money to monitor it properly. The expensive tools ($30k+) all have ready made agents or know about a huge variety of hw so you don't have to customise MIBs and code (though Unicentre takes a lot of customisation to work well and they all need customisation of sorts). It might take you 3 months to do a half decent job coding yourself that a commercial package could do with more features in a few weeks and you've got support and someone to complain to if there're problems. How much money would be lost when the network goes down in those three months? Just one hour for a large corporation would cover the cost of the sw.
I do agree it's great fun rolling your own (I'm sure you're a great programmer) if you have the time and the corporate managers don't appreciate the need to monitor things properly and you can't convince them to spend the dollars - but when it goes down it'll be your arse and the managers'/company's money being lost while you sweat to fix things - they'll quickly tell you then (and rightly so) it would have been worth doing it right the first time (you didn't think they'd take the heat for this now did you?) no matter how good your code will look in just another months time.
At worst write some emails as evidence that you requested such and such a package with official quotes and have their replies on record they refused to spend the money on it. I know of one company that went to the wall when the network went down (chain of retail stores) and a series of seemingly small faults on critical days (like the last shopping days before christmas) meant the company went under and the IT consultants who designed the system took the blame in court in the end - cost them $30m (plus a few hundred ppl lost jobs).
Now if this is just some academic network or it's not your responsibility then fine (mind you many research places are even more fussy about their networks than corporate users).
Unfortunately there are times when jumping into coding, nomatter how well intentioned, isn't the most pragmatic or best solution.
pithy comment
SNMP? That's complicated stuff to set up.
At work we rely on th much more robust, and easy to use URMP ("User Resource Management Protocal") to monitor our systems. When the systems go down, the users let us know about it.
So, let me get this straight, you're building a GLOBAL frame relay system, with nodes in 20+ states, with massive redundancy, and you're looking for a CHEAP system?
Get yourself together and look for a GOOD system. If you're already spending TONS of money, you might aswell spend some more to get exactly what you want, instead of settling for something. It might turn out that a free system is the best system for you, but please, good HAS GOT TO come before cheap!
Move sig!
http://www.kernel.org/software/mon/ I was one of the implementation crew for small noc (about 7 people incl. managers) and approx 150 machines in various locations.. I reviewed quite a lot of free software and while most of them where looking quite nice (nagios/bigbrother/etc.), allmost all of them where filled with features that where really not essential just for "monitor the healt of the system" so i ended up with mon. Mon, for me was really the "unix way" of creating stuff, make things easy/simple and extend it with other tools.. The generic layout we used was net-snmp on client hosts either being polled in intervals or sending traps to the main machines.
yush