What Do You Use for SNMP Monitoring?
linuxi386 wonders: "My company is in the process of implementing a global frame relay system. The network will cover 20+ states, and several European and Asian countries and Australia. It will have a 5 point full mesh fail-over with each coast/country having about 20 ppp links about 30 servers mixed between linux and windows plus a 2003 domain controller at each site. I have been looking for a really decent cheap web based monitoring application to maintain the entire system. So far I have looked at Solarwind's Orion and Adventnet's Opmanager. I like the look of Orion, but while I prefer the feature base of Opmanager, I cannot stand its pricing model or the XP playskool style theme it uses. I am trying to avoid writing my own system to manage this if at all possible. What would you folks recommend and why?"
We have a medium sized setup and for us, Cacti works great. http://www.cacti.net/
Just google for "full mesh fail-over" "ppp links" and...no, wait, forget that....
I'm posting as an AC so I don't break any I.P. and/or NDA's.
At the companies I've worked at, we have typically started with the free monitoring software package Nagios and after a shortperiod of time, purchased the commercial product NetCool. NetCool is everything you could ever ask for... assuming you have a few months to tweak the rules to set the event levels correctly... But I guess all monitoring systems are like that.
Depending on the size of your NOC, your datacenter, and your client base, I would recommend starting with Nagios and, if it proves to be too small for your needs, move the NetCool. (Just be prepared to pay serious $$$ for NetCool)
HTH
A.Coward
If your company is willing to spend that much money on the network, a 'cheap' NMS tool is the wrong solution. Too often companies invest in technology only to skimp on the management of that technology. The end result is overall poor performance and dissatisfaction with the technology. I would suggest a real NMS tool such as OpenView.
Nagios is a fairly easy-to-learn, extremely extensible (can you use a scripting language?) monitoring system. It scales reasonably well, distributed stat gathering, can respond to SNMP traps, etc. Not the easiest out of the box (you'll spend a day or two learning to use it and set it up), but there's very little you can't make it do.
Help save the critically endangered Blue Iguana
Is a 'theme' really going to turn you off a piece of software? Ask the company if you can have it re-branded. Many companies will do this for free, especially web-based tools... and if they don't, well it's web based... there are stylesheets, graphics and html, it really shouldn't be that hard to make some radical visual changes without too much work.
So go with the tool that works best, looks are pretty easy to adjust, as long as usability is there to begin with... if it's clunky, confusing and you hate how it looks... well that would take a bigger commitment to fix than just looks but it's been done before. Example... I once completely redesigned the UI for Bugzilla, canned queries, new workflows, collapsing panes, calendar widgets, color coding and more... but it was worth it in the end and that company still uses it 90% the way I left it. Which means it wasn't wasted effort.
Well, think about it anyways.
A fool throws a stone into a well and a thousand sages can not remove it.
On a large cluster, we considered OpsManager, Cacti, and Ganglia, and have run all 3.
OpsManager has some real nice features which made it easy to display and group results, especially to non-engineering people (good graphing tools built in, etc), but we found it didn't perform as well as the other 2. Addtionally, you have to pay for it.
Cacti was nice because of the built in hooks for apache and MySql, but it didn't have some features we wanted (auto host discovery, certain data summarization)
We use Ganglia now. Its open source and has a good track record on large clusters, and has proven speed and reliability. It will do auto discovery of hosts, but the downside was that there were no built in hooks for MySql and Apache, which we did want to monitor.
So consider the set of data that you're interested in monitoring, and how big you intend on scaling it.
if your network has a certain size and you do everything by SNMP, you need to be able to correlate the events to avoid alarm floods when one link goes down. We have used Openservice's Nervecenter with great success, coupled with NetCool from IBM. The pricing is steep, but the products are top-notch. In our configuration, we monitor about 8'000 network devices (Cisco, 3com, Bay, Nokia-IPSO, Consentry, etc) using 2 Nervecenter running on 2 Sun 480 boxes.
(I'm not affiliated with these companies or products)
How is it that you're obviously spending a huge amount on the network infrastructure and want to cut costs so much on network monitoring? After going to all the effort of setting up you'll want a decent tool that tells you the instant something is wrong - and before the users tell you!
Something like HP OpenView does the job. Cisco have a sw tool but not as good, as do Sunand IBM. CA Unicentre is overkill and too expensive to my mind. For small jobs (less than 100 nodes) I've used Ipswitch Whatsup Professional. You want something that goes inside your switches and has agents for all your servers if you want to monitor properly.
In the dim past (10+yrs ago) I used Scotty (a Tcl/Tk freeware tool) and at other times wrote my own in Python/TK with Perl daemons/services.
net-snmp on sourceforge has tools you can use but to my mind these days, again I'd say - it's an expensive (and I presume important) network your've got there, so spend some money to monitor it properly. The expensive tools ($30k+) all have ready made agents or know about a huge variety of hw so you don't have to customise MIBs and code (though Unicentre takes a lot of customisation to work well and they all need customisation of sorts). It might take you 3 months to do a half decent job coding yourself that a commercial package could do with more features in a few weeks and you've got support and someone to complain to if there're problems. How much money would be lost when the network goes down in those three months? Just one hour for a large corporation would cover the cost of the sw.
I do agree it's great fun rolling your own (I'm sure you're a great programmer) if you have the time and the corporate managers don't appreciate the need to monitor things properly and you can't convince them to spend the dollars - but when it goes down it'll be your arse and the managers'/company's money being lost while you sweat to fix things - they'll quickly tell you then (and rightly so) it would have been worth doing it right the first time (you didn't think they'd take the heat for this now did you?) no matter how good your code will look in just another months time.
At worst write some emails as evidence that you requested such and such a package with official quotes and have their replies on record they refused to spend the money on it. I know of one company that went to the wall when the network went down (chain of retail stores) and a series of seemingly small faults on critical days (like the last shopping days before christmas) meant the company went under and the IT consultants who designed the system took the blame in court in the end - cost them $30m (plus a few hundred ppl lost jobs).
Now if this is just some academic network or it's not your responsibility then fine (mind you many research places are even more fussy about their networks than corporate users).
Unfortunately there are times when jumping into coding, nomatter how well intentioned, isn't the most pragmatic or best solution.
pithy comment
SNMP? That's complicated stuff to set up.
At work we rely on th much more robust, and easy to use URMP ("User Resource Management Protocal") to monitor our systems. When the systems go down, the users let us know about it.
So, let me get this straight, you're building a GLOBAL frame relay system, with nodes in 20+ states, with massive redundancy, and you're looking for a CHEAP system?
Get yourself together and look for a GOOD system. If you're already spending TONS of money, you might aswell spend some more to get exactly what you want, instead of settling for something. It might turn out that a free system is the best system for you, but please, good HAS GOT TO come before cheap!
Move sig!
I'm a fan of InterMapper, powerful but not overly complicated, and easily extensible. It also runs on MacOSX, Windows, Linux, Solaris, and FreeBSD. It was originally developed at Dartmouth College to support their network, and has been marketed commercially since 1996.
http://www.kernel.org/software/mon/ I was one of the implementation crew for small noc (about 7 people incl. managers) and approx 150 machines in various locations.. I reviewed quite a lot of free software and while most of them where looking quite nice (nagios/bigbrother/etc.), allmost all of them where filled with features that where really not essential just for "monitor the healt of the system" so i ended up with mon. Mon, for me was really the "unix way" of creating stuff, make things easy/simple and extend it with other tools.. The generic layout we used was net-snmp on client hosts either being polled in intervals or sending traps to the main machines.
yush
For a network like yours, you do not want to "do it yourself" with Nagios. Nagios is the best network monitoring package available, but unless you have a full-time system admin dedicated to it, you will be in a world of pain. A better plan would be to look at Groundwork Monitor Professional (www.groundworkopensource.com). The core of GMP is Nagios, but Groundwork have added plenty of integration goodness (profiles of service checks for particular servers: got an Exchange box but don't know which services to monitor; no problem, just use the Exchange profile containing all of the important service checks for Exchange). Full GUI configuration, SNMP traps, graphing, the whole shebang. US$16,000 a year for unlimited devices plus support. Get Sheila at Groundwork to walk you through a Webex presentation and download Rich Trezza's VMware appliance from http://richard.trezza.us/vmach/index.html
The VM only contains the basic open source functionality, but it still kicks any available Nagios configuration package.