Slashdot Mirror


What Would You Want In a Large-Scale Monitoring System?

Krneki writes "I've been developing monitoring solutions for the last five years. I have used Cacti, Nagios, WhatsUP, PRTG, OpManager, MOM, Perl-scripts solutions, ... Today I have changed employer and I have been asked to develop a new monitoring solution from scratch (5,000 devices). My objective is to deliver a solution that will cover both the network devices, servers and applications. The final product must be very easy to understand as it will be used also by help support to diagnose problems during the night. I need a powerful tool that will cover all I need and yet deliver a nice 2D map of the company IT infrastructure. I like Cacti, but usually I use it only for performance monitoring, since pooling can't be set to 5 or 10 sec interval for huge networks. I'm thinking about Nagios (but the 2D map is hard to understand), or maybe OpManager. What monitoring solution do you use and why?"

1 of 342 comments (clear)

  1. ZenOSS all the way by Midnight+Warrior · · Score: 5, Interesting
    We use ZenOSS exclusively at work and have enjoyed every minute of it. Pro's include:
    • 2D map with status of all nodes or submaps, organized by network
    • Application monitoring, with more advanced maps available for purchase (Oracle, JBoss, Cisco) for those things you already paid a lot of money for
    • Performance monitoring via SNMP or other data sources using RRDtool internally which includes graphs linked to each other during zoom in/out or panning
    • Nagios plugins already do some of the heavy lifting
    • Built-in support for watching Windows servers (any metric accessible via WMI)
    • Access control using at least LDAP and Active Directory
    • Secondary data collectors for those networks which are too big for just one central source
    • Highly customizable through Python
    • It has so, so much more than pathetic commercial solutions like OpenView

    Cons:

    • You have to keep your eye on the back end database
    • It still takes a long, long time to tune it to remove noise events
    • If you don't know Python, it can be tough in a few places
    • Proper support is not cheap