(More) Intelligent Network Monitors?
Genady asks: "Maybe I'm getting old. I've been looking around lately at all the little scripts I've created to watch log files, drive space, web pages, general SysAdmin stuff really. It's really a mis-mash of stuff I've written and aquired over the years. I've used the higher end enterprise management frameworks, as well as lower end apps like NetSaint. My problem with these has always been lack of intelligence. Does anyone know of a project to do monitoring/alerting coupled with some artificial intelligence that can learn that I don't care about particular servers after a certain time of day?"
OpenNMS has some pretty good builtin functionality, and tries to make it easy to plugin more intelligence.
Larry
No artificial intelligence or learning is involved in the system, but just specifying it does get the job done (and probably in a more straightforward and predicatable way than a neural network or somesuch).
You're required to specify hours for contacts, as well. Eg., the on-call pager only gets messages outside of office hours, individual sysadmin pagers only get messages during office hours, etc. The contact settings are broken down by host and service, too, so, for instance, you can have it so the Oracle DBA won't get a page when a host goes down, but the unix admin will.
I've only been using nagios for a few weeks, but I've been really impressed with it. All the shortcomings I saw with other monitoring systems are fixed. The dependencies keep me from getting 20 pages when a router goes down. check_by_ssh allows me to have an individual key for each thing I want to check on a host (such as load), without running any additional daemons - and without giving the monitoring system a shell on the system. Events allow me to get information from the time of the alert - such as by running top on a host with high load, or traceroute for an abnormally high ping response time. Scheduled maintenence windows allow me to simply visit a web page, and set a maintenance time for something, and all the alarms don't go off during maintenance.
Inheritance in the template-based configuration files allows you to specify all the basics for a host or service in a single place, too, so you only need a few lines to specify the actual host or service to be checked. Since the host names can be separated by commas in the definition, it doesn't take lots of repetition for a number of similar machines.
In other words, I wouldn't call it low-end any more. :)