What Would You Want In a Large-Scale Monitoring System?
Krneki writes "I've been developing monitoring solutions for the last five years. I have used Cacti, Nagios, WhatsUP, PRTG, OpManager, MOM, Perl-scripts solutions, ... Today I have changed employer and I have been asked to develop a new monitoring solution from scratch (5,000 devices). My objective is to deliver a solution that will cover both the network devices, servers and applications. The final product must be very easy to understand as it will be used also by help support to diagnose problems during the night. I need a powerful tool that will cover all I need and yet deliver a nice 2D map of the company IT infrastructure. I like Cacti, but usually I use it only for performance monitoring, since pooling can't be set to 5 or 10 sec interval for huge networks. I'm thinking about Nagios (but the 2D map is hard to understand), or maybe OpManager. What monitoring solution do you use and why?"
I am going through this right now and am using and have used all the above mentioned solution. We are leaning towards System Center Operation Manager. http://www.microsoft.com/systemcenter/operationsmanager/en/us/default.aspx If you had told me 6 months about that it would be the way to go, I would have said over my dead body, but it has come a very long way in terms of usability and ease of setup.
I use OpenNMS as well. I actually migrated off of Nagios to OpenNMS. Tried out Zenoss and Cacti as well. While any of these are better than OpenView IMHO, I liked OpenNMS's full suite of functionality without having to pay for the 'commercial' version.
Cons:
The big questions are:
Will your solution need to support snmp v3?
Do the devices you talk to have published oids?
Do you need source code to extend it?
If yes to these, OpenNms is a great bet.
I really don't like the "War Room" video wall concept. I suspect such walls are made to look cool rather than to monitor.
What you want in large-scale monitoring is:
Etcetera. These are some of the things that make sane large monitoring systems. I don't think any open source product has all of them, alas.
Advice: on VPS providers
Zabbix allows you to build some fairly powerful rulesets and chains of overrides using its web gui. It's not perfect, but it keeps improving and the attitude of the developers is friendly unlike some of the other projects.
And this is why we (OpenNMS) don't play the per-node. It's not any harder to run OpenNMS when managing 1000 nodes than when managing 100, you only need to scale hardware appropriately. Per-node pricing is an artificial limitation.
We also don't play the "you get a special price behind closed doors" game, our support prices are public, fair, and the same for everyone -- and that's only if you need commerical support -- our prices are $0 if you don't need or want support.
If you do the math, it's $0 for the software, plus $14,995/year for support for any number of nodes, and the software is 100% open-source and fully capable of replacing or exceeding OpenView. ;)
WWJD? JWRTFM!!!