(More) Intelligent Network Monitors?
Genady asks: "Maybe I'm getting old. I've been looking around lately at all the little scripts I've created to watch log files, drive space, web pages, general SysAdmin stuff really. It's really a mis-mash of stuff I've written and aquired over the years. I've used the higher end enterprise management frameworks, as well as lower end apps like NetSaint. My problem with these has always been lack of intelligence. Does anyone know of a project to do monitoring/alerting coupled with some artificial intelligence that can learn that I don't care about particular servers after a certain time of day?"
Sounds as if your looking for a silver bullet, probably won't find one... Monitoring takes a lot of work to refine your criteria.
There are open source programs that to a bit, but personally one of the best programs I found when I was in the business was ACE-SNMP, it's been sold back to the original developer and can now be found at http://www.snmx.com/Download/
I'm not sure of the pricing and other restrictions, enterprise license and all, but I believe he was trying to market it to general customers as well.
/* TODO: Spawn child process, interest child in technology, have child write a new sig */
OpenNMS has some pretty good builtin functionality, and tries to make it easy to plugin more intelligence.
Larry
No artificial intelligence or learning is involved in the system, but just specifying it does get the job done (and probably in a more straightforward and predicatable way than a neural network or somesuch).
You're required to specify hours for contacts, as well. Eg., the on-call pager only gets messages outside of office hours, individual sysadmin pagers only get messages during office hours, etc. The contact settings are broken down by host and service, too, so, for instance, you can have it so the Oracle DBA won't get a page when a host goes down, but the unix admin will.
I've only been using nagios for a few weeks, but I've been really impressed with it. All the shortcomings I saw with other monitoring systems are fixed. The dependencies keep me from getting 20 pages when a router goes down. check_by_ssh allows me to have an individual key for each thing I want to check on a host (such as load), without running any additional daemons - and without giving the monitoring system a shell on the system. Events allow me to get information from the time of the alert - such as by running top on a host with high load, or traceroute for an abnormally high ping response time. Scheduled maintenence windows allow me to simply visit a web page, and set a maintenance time for something, and all the alarms don't go off during maintenance.
Inheritance in the template-based configuration files allows you to specify all the basics for a host or service in a single place, too, so you only need a few lines to specify the actual host or service to be checked. Since the host names can be separated by commas in the definition, it doesn't take lots of repetition for a number of similar machines.
In other words, I wouldn't call it low-end any more. :)
You could try JFFNMS Just for Fun Network Management System
If the feature you want it's not there yet, you can create it easily.
Someone bored today? give it a try : )
- Smells Like Open Source Code
Spong (demo) works for me. Runs on pretty well any Perl 5 installation, some support for NT, and it's reasonably easy to extend.
Oh, and the degree of customization possible on "who gets notified about which services on which machines at what time, and at what severity" is truly mind-boggling. Or perhaps I boggle easily.
Netcool does a lot of what you describe. It is often used to correlate events received by a framework such as HP OpenView. It can probe everything from SNMP devices to telephone switches. It takes a while to set up and tune, but then it is easy to maintain. It also costs mucho dinero.
Demarc PureSecure was one software suite I looked at about a year ago. It's free for personal use, fee for commercial use. I'm not sure how their prices compare to other software packages, but it would be worth looking at. It's being marketed as a Total Intrusion Detection System, and monitors Snort logs, log files, disk space/usage, open ports, and more.
Experiments must be reproducible; they should all fail in the same way.