What Do You Use for SNMP Monitoring?

← Back to Stories (view on slashdot.org)

What Do You Use for SNMP Monitoring?

Posted by ryuzaki0 on Thursday September 7, 2006 @04:45PM from the simple-network-info-traps dept.

linuxi386 wonders: "My company is in the process of implementing a global frame relay system. The network will cover 20+ states, and several European and Asian countries and Australia. It will have a 5 point full mesh fail-over with each coast/country having about 20 ppp links about 30 servers mixed between linux and windows plus a 2003 domain controller at each site. I have been looking for a really decent cheap web based monitoring application to maintain the entire system. So far I have looked at Solarwind's Orion and Adventnet's Opmanager. I like the look of Orion, but while I prefer the feature base of Opmanager, I cannot stand its pricing model or the XP playskool style theme it uses. I am trying to avoid writing my own system to manage this if at all possible. What would you folks recommend and why?"

23 of 103 comments (clear)

Min score:

Reason:

Sort:

Cacti! by sampowers · 2006-09-07 16:56 · Score: 5, Informative

We have a medium sized setup and for us, Cacti works great. http://www.cacti.net/
1. Re:Cacti! by merreborn · 2006-09-07 17:05 · Score: 5, Interesting
  
  I've found Munin much easier to configure and extend than cacti.
  
  Quite frankly, I found cacti's interface, abstractions, and terminology very difficult to grasp.
  
  Munin, on the other hand, I've written a half dozen plugins for.
  
  Admittedly, cacti is more powerful, but that didn't do me much good, as I couldn't for the life of me harness that power.
2. Re:Cacti! by lanner · 2006-09-07 18:02 · Score: 2, Informative
  
  I use Cacti too, both for personal use as well as at the workplace. We monitor Cisco routers, Linux systems, Windows systems, Network Appliance Storage Filers, Cisco PIX firewalls, and a few other miscellaneous things. At a previous employer of mine, we had about 200 different devices being polled.
  
  You can write your own scripts to poll items via the command line or SNMP, and then create your graph templates to draw the graphs the way you want.
  
  One of the best features about Cacti is that you can create templates for graphs and data sources, and export them for others to use.
  
  Cacti still needs some work, but it's a pretty good product for free. Releases in the last year has been slow, but I think that is because of a development efforts to a major future version change.
  
  Other tools that I have heard about are jffnms and zabbix, though I have used neither.
3. Re:Cacti! by cowwie · 2006-09-08 00:46 · Score: 2, Interesting
  
  Crap... that was me and I'm too stupid to login. Let me post again so it doesn't get filtered out.
  ---------
  I just set up CactiEZ from cactiusers.org to test out some stuff on my network. It's a basic distro built on CentOS 4 that installs just what you need, has most of the stuff pre-configured out of the box like the MySQL backend, the cron jobs, etc... and is just generally EASY to use.
  
  Personally, I'm running it in a VMWare machine without seeing a very big performance hit on the Win2k3 server it's hosted on. Then again, I'm only using it to monitor a handfull of firewalls, routers, and UPSes.
  
  Either way, don't let Cacti's complexity scare you off.... the CactiEZ distro is incredibly quick to get setup and going.
Google it!!!! by Knetzar · 2006-09-07 17:01 · Score: 5, Funny

Just google for "full mesh fail-over" "ppp links" and...no, wait, forget that....
1. Re:Google it!!!! by Raistlin77 · 2006-09-07 17:22 · Score: 2, Funny
  
  Just google for "full mesh fail-over" "ppp links" and...no, wait, forget that....
  
  Shhhhhh, you wouldn't want Google to send you a warning letter, would you?
What we use by Anonymous Coward · 2006-09-07 17:16 · Score: 5, Informative

I'm posting as an AC so I don't break any I.P. and/or NDA's.

At the companies I've worked at, we have typically started with the free monitoring software package Nagios and after a shortperiod of time, purchased the commercial product NetCool. NetCool is everything you could ever ask for... assuming you have a few months to tweak the rules to set the event levels correctly... But I guess all monitoring systems are like that.

Depending on the size of your NOC, your datacenter, and your client base, I would recommend starting with Nagios and, if it proves to be too small for your needs, move the NetCool. (Just be prepared to pay serious $$$ for NetCool)

HTH

A.Coward
1. Re:What we use by BrookHarty · 2006-09-07 17:55 · Score: 5, Informative
  
  Yup, Nagios is great, and you can customize it to work on anything. I dont see a reason to buy an expensive professional enterprise solution when Nagios is an enterprise solution.
  
  Plus when you start using it, you find your self adding new scripts to monitor more and more because its that easy. I'm using it to monitor tcp/udp ports, processes, oracle rac instanaces, oracle queues, swiftmq queues, hardware nics, hardware stats, memory/cpu/etc, log sizes, etc.
  
  So, not sure why I'd buy Netcool when Nagios is free, and works great. The time you spend configuring Nagios is cheap and easy. And it works with netexpert too.
  
  I like having a nice dashboard for my NOC, so they can keep a good eye on the health of a service, without lots of training.
2. Re:What we use by dr_d_19 · 2006-09-07 22:46 · Score: 2, Insightful
  
  So, not sure why I'd buy Netcool when Nagios is free, and works great. The time you spend configuring Nagios is cheap and easy. And it works with netexpert too.
  
  Considering that your time is not free, I think you've answered your own question. I love open source, but in most project where I have been involved where the choice between open source (as in at no cost) and closed software (as in, pay up) the difference in TCO is often minimal. I think people should stop using license costs as a way to promote open source. There are so many better arguments for it!
3. Re:What we use by macdaddy · 2006-09-07 22:59 · Score: 2, Informative
  
  The problem is you can't extend NetCool or the other closed source apps to anything other than what they've allowed you to monitor. I'm monitoring the state of BGP peering sessions with Nagios. Try doing that with NetCool. Sometimes you don't get what you pay for. Sometimes you have to use a little of your own ingenuity.
Either go big or go home by saxman57 · 2006-09-07 17:16 · Score: 5, Insightful

If your company is willing to spend that much money on the network, a 'cheap' NMS tool is the wrong solution. Too often companies invest in technology only to skimp on the management of that technology. The end result is overall poor performance and dissatisfaction with the technology. I would suggest a real NMS tool such as OpenView.
1. Re:Either go big or go home by arivanov · 2006-09-07 18:27 · Score: 5, Interesting
  
  Openview is not necessarily the answer.
  
  It is one of the best fault oriented NMSes on the market, but its performance monitoring side has always sucked bricks through thing straw sideways. Based on the packages mentioned in the original post the poster is trying to monitor performance and utilisation, not faults so Openview is the wrong tool.
  
  I am an old school person (been doing this for 10+ years now on networks from 10 nodes to global telco), so my first choice for performance monitoring in a 30 node setup would be the classic - MRTG (though I use it with a rrd backend nowdays). I have run it for up to 600 monitored variables. It works. For a 30 node full mesh this will be a no-brainer. Its main disadvantage is that it does not preserve long term historical data (which managers sometimes require). The main advantage is that you can also plug in non-network data (CPU, environmental, application performance) from the linux part with ease. The next choice would obviously be infovista (its original stuff, not the stuff it acquired recently). It costs money though. No idea how much nowdays. It also has a learning curve associated with it.
  
  As far as the utilities mentioned in the original post - they are winhoze stuff, so I am not very familiar with them. I have seen some other products under the same brands (solarwind tftp server) and they are laughable.
  
  --
  Baker's Law: Misery no longer loves company. Nowadays it insists on it
  http://www.sigsegv.cx/
Check out Nagios by ErikTheRed · 2006-09-07 18:01 · Score: 5, Informative

Nagios is a fairly easy-to-learn, extremely extensible (can you use a scripting language?) monitoring system. It scales reasonably well, distributed stat gathering, can respond to SNMP traps, etc. Not the easiest out of the box (you'll spend a day or two learning to use it and set it up), but there's very little you can't make it do.

--

Help save the critically endangered Blue Iguana
1. Re:Check out Nagios by nuintari · 2006-09-08 02:30 · Score: 2, Informative
  
  I have to second this, nagios is amazing for network monitoring. You can actively poll for availability, build a complex dependency tree based on your network's actual layout. It scales very well, you can have the main web interface server in a good central spot, and have servers that do the actual checking and report back littered throughout your network. It can handle snmp traps with the addition of net-snmp, you can write your own checks and plugins, customize notifications. It is really an amazing framework.
  
  Grab the no starch press book on nagios, it has examples of how to do everything I just said, and much more.
  
  --
  --Nuintari
  
  slashdot : where an opinion can be wrong.
Like the function, dislike the look? by foniksonik · 2006-09-07 18:07 · Score: 2, Interesting

Is a 'theme' really going to turn you off a piece of software? Ask the company if you can have it re-branded. Many companies will do this for free, especially web-based tools... and if they don't, well it's web based... there are stylesheets, graphics and html, it really shouldn't be that hard to make some radical visual changes without too much work.

So go with the tool that works best, looks are pretty easy to adjust, as long as usability is there to begin with... if it's clunky, confusing and you hate how it looks... well that would take a bigger commitment to fix than just looks but it's been done before. Example... I once completely redesigned the UI for Bugzilla, canned queries, new workflows, collapsing panes, calendar widgets, color coding and more... but it was worth it in the end and that company still uses it 90% the way I left it. Which means it wasn't wasted effort.

Well, think about it anyways.

--
A fool throws a stone into a well and a thousand sages can not remove it.
We've tried lots by neurosis101 · 2006-09-07 18:37 · Score: 2

On a large cluster, we considered OpsManager, Cacti, and Ganglia, and have run all 3.

OpsManager has some real nice features which made it easy to display and group results, especially to non-engineering people (good graphing tools built in, etc), but we found it didn't perform as well as the other 2. Addtionally, you have to pay for it.

Cacti was nice because of the built in hooks for apache and MySql, but it didn't have some features we wanted (auto host discovery, certain data summarization)

We use Ganglia now. Its open source and has a good track record on large clusters, and has proven speed and reliability. It will do auto discovery of hosts, but the downside was that there were no built in hooks for MySql and Apache, which we did want to monitor.

So consider the set of data that you're interested in monitoring, and how big you intend on scaling it.
Correlation, your best asset by spinash · 2006-09-07 18:40 · Score: 3, Interesting

if your network has a certain size and you do everything by SNMP, you need to be able to correlate the events to avoid alarm floods when one link goes down. We have used Openservice's Nervecenter with great success, coupled with NetCool from IBM. The pricing is steep, but the products are top-notch. In our configuration, we monitor about 8'000 network devices (Cisco, 3com, Bay, Nokia-IPSO, Consentry, etc) using 2 Nervecenter running on 2 Sun 480 boxes.

(I'm not affiliated with these companies or products)
Use a Proper Tool by Michael+Snoswell · 2006-09-07 18:45 · Score: 5, Insightful

How is it that you're obviously spending a huge amount on the network infrastructure and want to cut costs so much on network monitoring? After going to all the effort of setting up you'll want a decent tool that tells you the instant something is wrong - and before the users tell you!

Something like HP OpenView does the job. Cisco have a sw tool but not as good, as do Sunand IBM. CA Unicentre is overkill and too expensive to my mind. For small jobs (less than 100 nodes) I've used Ipswitch Whatsup Professional. You want something that goes inside your switches and has agents for all your servers if you want to monitor properly.

In the dim past (10+yrs ago) I used Scotty (a Tcl/Tk freeware tool) and at other times wrote my own in Python/TK with Perl daemons/services.

net-snmp on sourceforge has tools you can use but to my mind these days, again I'd say - it's an expensive (and I presume important) network your've got there, so spend some money to monitor it properly. The expensive tools ($30k+) all have ready made agents or know about a huge variety of hw so you don't have to customise MIBs and code (though Unicentre takes a lot of customisation to work well and they all need customisation of sorts). It might take you 3 months to do a half decent job coding yourself that a commercial package could do with more features in a few weeks and you've got support and someone to complain to if there're problems. How much money would be lost when the network goes down in those three months? Just one hour for a large corporation would cover the cost of the sw.

I do agree it's great fun rolling your own (I'm sure you're a great programmer) if you have the time and the corporate managers don't appreciate the need to monitor things properly and you can't convince them to spend the dollars - but when it goes down it'll be your arse and the managers'/company's money being lost while you sweat to fix things - they'll quickly tell you then (and rightly so) it would have been worth doing it right the first time (you didn't think they'd take the heat for this now did you?) no matter how good your code will look in just another months time.

At worst write some emails as evidence that you requested such and such a package with official quotes and have their replies on record they refused to spend the money on it. I know of one company that went to the wall when the network went down (chain of retail stores) and a series of seemingly small faults on critical days (like the last shopping days before christmas) meant the company went under and the IT consultants who designed the system took the blame in court in the end - cost them $30m (plus a few hundred ppl lost jobs).

Now if this is just some academic network or it's not your responsibility then fine (mind you many research places are even more fussy about their networks than corporate users).

Unfortunately there are times when jumping into coding, nomatter how well intentioned, isn't the most pragmatic or best solution.

--
pithy comment
Who needs SNMP? by Anonymous Coward · 2006-09-07 18:48 · Score: 5, Funny

SNMP? That's complicated stuff to set up.

At work we rely on th much more robust, and easy to use URMP ("User Resource Management Protocal") to monitor our systems. When the systems go down, the users let us know about it.
What the hell? by jevring · 2006-09-07 23:52 · Score: 3, Interesting

So, let me get this straight, you're building a GLOBAL frame relay system, with nodes in 20+ states, with massive redundancy, and you're looking for a CHEAP system?

Get yourself together and look for a GOOD system. If you're already spending TONS of money, you might aswell spend some more to get exactly what you want, instead of settling for something. It might turn out that a free system is the best system for you, but please, good HAS GOT TO come before cheap!

--
Move sig!
InterMapper by sam1am · 2006-09-08 01:07 · Score: 2, Informative

I'm a fan of InterMapper, powerful but not overly complicated, and easily extensible. It also runs on MacOSX, Windows, Linux, Solaris, and FreeBSD. It was originally developed at Dartmouth College to support their network, and has been marketed commercially since 1996.
Simple and Elegant: MON by rasjani · 2006-09-08 01:52 · Score: 3, Informative

http://www.kernel.org/software/mon/ I was one of the implementation crew for small noc (about 7 people incl. managers) and approx 150 machines in various locations.. I reviewed quite a lot of free software and while most of them where looking quite nice (nagios/bigbrother/etc.), allmost all of them where filled with features that where really not essential just for "monitor the healt of the system" so i ended up with mon. Mon, for me was really the "unix way" of creating stuff, make things easy/simple and extend it with other tools.. The generic layout we used was net-snmp on client hosts either being polled in intervals or sending traps to the main machines.

--
yush
Groundwork Monitor Professional by _termx23 · 2006-09-08 02:25 · Score: 2, Informative

For a network like yours, you do not want to "do it yourself" with Nagios. Nagios is the best network monitoring package available, but unless you have a full-time system admin dedicated to it, you will be in a world of pain. A better plan would be to look at Groundwork Monitor Professional (www.groundworkopensource.com). The core of GMP is Nagios, but Groundwork have added plenty of integration goodness (profiles of service checks for particular servers: got an Exchange box but don't know which services to monitor; no problem, just use the Exchange profile containing all of the important service checks for Exchange). Full GUI configuration, SNMP traps, graphing, the whole shebang. US$16,000 a year for unlimited devices plus support. Get Sheila at Groundwork to walk you through a Webex presentation and download Rich Trezza's VMware appliance from http://richard.trezza.us/vmach/index.html
The VM only contains the basic open source functionality, but it still kicks any available Nagios configuration package.