Monitoring Your Unix Boxen?
"I know a few people who 'tail -f' the main log files, or who run 'top' every so-often. These require constant monitoring though, and you could miss essential error messages if you step away for too long. Are there any projects that do this successfully? I've seen a couple out there that started to do this, but appear to be abandoned.
Ideally, I would like some type of all-in-one, that possibly generates a daily (email/web) report of network statistics, user logins, and (web)server traffic/hits, as well as anything 'suspicious' that might be happening, perhaps what apps have been taking most of the processor time, or if any of the daemons have been busier than they normally would be. I know there probably isn't one single app out there that does all of this, so what's the best configuration , for keeping tabs on multiple machines, something I can skim for a minute or two each day, to make sure things are the way they should be? I want to know what works best, and just as importantly, what *doesn't* work (I do realize that relying on a single solution would be bad here too, so if you have more than one suggestion, that would be appreciated)."
I've user Big Brother for many years and it is very configurable. You can monitor anything from cpu usage, memory, disk space, available services, to random things like the weather and server room temp.
All that being said, I found it to be flukey in its behavoir. Sometimes it would report that everything was not responding and it had to be punted before I would get the all clear. The other negative is the license. The program consists of nothing more than shell/perl scripts so it's obviously open, but it has some strange clauses about Non-Commercial use.
Overall, I'd recommend trying something else, because BB was unreliable in my use, but YMMV.
'top' apparently is the best tool for monitoring boxen. :)
http://www.remix.net/
The extensions for BB are at http://www.deadcat.net/
I also like tripwire. Checksums of files on the system to know if important files have been changed. last time I used TripWire it has email alerts. The paid for version has an enterprise monitor.
LogWatch is another. Generates email.
Go through your linux and bsd daily, hourly and weekly scripts to see all the tools they run by default. These can be moved to most Unixs. Since most of these are shell and perl rpograms, some might be adaptable under windows using activeXPerl or Cygwin.
The hardest part is fine tuning the emails and alerts to those things you really care about.
MTRG and agreat snmp tool and tied in with BigBrother.
I've has to set these up for security purposes at one site. For monitoring a server fam at another site. A compile farm for doing builds at my current job.
I'm running Nagios. It was SAINT, and before that it was known as SATAN. I've also used big sister before. That's a pretty good big brother clone. Nagios will do what your after though. Just remember that whatever you build will probably take awhile. Creating the config files takes forever.
/* oops I accidentally made a comment, sorry */
owing to the fact almost no product will fit everyone's needs
here are aspects where you can compare what you will find
aspects of monitoring:
-availability
-uptime(subtly different from availability)
-performance
-security
-capacity
-log or otherwise event-based monitoring
nature of tools:
-web based
-daemon with web based front end
-daemon without web based front end
-other
language tool is written in, license and source
-closed source, nuff said, available in licensed per cpu, licensed per target/service, etc...
-open source, but with paid-for license that includes support(shameless plug... I do support for this kinda thing)
-open source, roll your own support
-perl
-php
-java
-python
-c/c++
integration with other products
-by snmp traps
-by snmp agent extensibility(smux/agentx/proxysnmp,etc...)
-by proprietary methods
-by sharing a RDBMS with another monitoring tool(usually used for things like remedy ARS)
measure of performance/capacity/throughput/usage
-by the exec family of functions
-by the language of choice's own internal library conventions
-by snmp
-by proprietary methods to a Manager of Manager or NMS system
-by ciscoflow/other hardware vendor's protocol
-by parsing logs
-by exec-over-ssh-connexion
examples that don't fit neatly into any category that comes to mind is monitoring of backups(were they performed, how much, which files were skipped, etc, location in jukebox of which tape for which file...
Hope this helps you even draw the lines towards evaluating the product that meets YOUR needs