Server Monitoring Solutions?
bwhaley asks: "The University I work for has asked me to research software solutions for server monitoring. More specifically, a piece of software that will monitor server variables such as load, swap usage, POP/IMAP processes, total processes, and all the other interesting data about a server's health. Watching these variables can give administrators advance warning about potential problems with the server. We are currently using an in-house solution written in Perl but its age is showing. I have found plenty of proprietary solutions such as HP OpenView and Sun Management Center, but these cost thousands of dollars. What solutions do Slashdot readers use? Are there any powerful open source solutions that I'm missing? Is anyone else running homegrown software that they are happy with? We are running an entirely Solaris environment but I am interested in any UNIX solution."
I would suggest talking to whoever teaches computer science and software. Get the kids doing this for an education to rewrite your perl scripts that do the same job.
That's something you can pass off as helping everybody, saving y'all money and teaching compSci kids how to work with the computers and OSes
Error 407 - No creative sig found
Check out bb4.com.
Nagios might be what you're looking for. Cheers.
Have you heard of Nagios?
Revolutions are never about freedom or justice. They're about who's going to be top dog. -- Kilgore Trout
I haven't used it but it seems like Nagios is what you want. It's GPL and is supposedly very powerful.
Big Brother
There's a vibrant community with lots of scripts to extend functionality.
It's free as in beer (but not freedom) for almost all uses, and is open source. You only have to pay if you use it to generate money.
Big monitor, gkrellm over remote X and someone to sit there and watch :)
top is terrific
pretty pictures are more fun to look at! Check out cacti for all of your process/bandwidth/load/usage graphing needs. It's available at raxnet.net
Nagios is a great server monitoring system and seems to have what you need.
Its meant for Linux but works under most *NIX variants
(\(\
(^.^)
(")")
*This is the cute bunny virus, please copy this into your sig so it can spread
If you don't want to pay for Big Brother, take a look at Big Sister. It does at least much of the same thing, but free (as in beer and speech).
That's easy, use nagios. It what I use and it's great. For the holes it doesn't fill, go try out mrtg. :-)
/* oops I accidentally made a comment, sorry */
Second, you might try Sun netconnect since you are running all Solaris. I haven't used it myself, but some people at my nameless company have and think well of it.
"He who would learn astronomy, and other recondite arts, let him go elsewhere. " -- John Calvin, commenting on Genesis 1
how about nagios?
Gyrate Dot Org - "Where high-tech meets low-life"
If your Unix system doesn't come with one Net-SNMP will install on many of them.
The SNMP daemon by default understands how to monitor Load Avg, Memory, Processes, and so forth. It may not be able to tell you details of the process, such as what user is logged into the POP3 daemon, but it will tell you that you have 500 of them running, and alert you (via SNMP Traps) of that fact.
ALl you need to do once you have checked the documentation for your SNMP agent and then configured it, is to setup a single (ok, maybe 2 or 3) machine to send your traps to so you can kick of alerts. With some simple scripting in $FAVORITE_SCRIPTING_LANGUAGE you can email, page, text message, update web page, or $OTHER.
Cricket or MRTG are nice utilities that will poll the servers in question (by default every 5 minutes) and produce graphs. MRTG was designed to handle network equipment and graph the bandwidth utilization, but with a change to the SNMP string, will graph anything. Cricket is the same concept but does things a little differently by using a tree configuration system for property inheritance and does graph generation on the fly instead of the at poll time method MRTG uses.
And last but not least, Transmeta produced a very good perl script monitoring package known simply as Mon. This package will do active polling of the servers including issuing a transaction to the service you are monitoring. Due to the way this software monitors, you can actually see if the remote machine is alive by actually utilizing the service to monitor instead of just the "I can ping it, it must be up" mentality some people have.
Best part about all the above mentioned software is that they are all applications with an OSI Approved OpenSource license. This means you don't spend anything but TIME, and possibly a few machines to do the actual monitoring with.
And you may wonder about the impact of system performance due to the monitoring by SNMP, MRTG/Cricket, and Mon. The short answer is that I couldn't detect a noticable increase. Other utilities such as Argent (Commercial Pay For Software) would impact a HP-UX V Class 8 CPU with 8GB RAM machine from 0% on all 8 CPUs to about 20% on ALL 8 CPUs while it telneted to the machine, created about 150KB of test scripts, and then ran them.
The program isn't debugged until the last user is dead.
You could use my project !
JFFNMS - Just for Fun Network Management System.
The site is JFFNMS.org
Look at the features, it has all you need, and of course the screenshots.
It will work on any Unix with PHP support, it will also monitor any standard compilant SNMP device or TCP Port, also if you have SNMP enabled it will tell you now many connections do you have to the specified port, apart from the connection delay.
Its open source, and fully supported, I just made the latest release a few days ago.
You could also look at the two working demos.
I hope any of you could use it, it really shows a lot of things about a host, that being a Server or a Router.
- Smells Like Open Source Code
I know you're looking for something free, but others here with some dollars to spend might like this. ProactiveNet does standard monitoring of network devices, can grab any variable available via snmp, microsoft perfmon counters, or even using shell scripts to parse data and return values you wish to monitor. It also has very extensive monitoring capabilities for just about any kind of database (it can execute any query you wish or monitor performance tables), and many kinds of middleware.
It keeps a database which keeps track of the normal values throughout the day and sets high and low thresholds. So, if you have a problem, it can use this data to try to pin down where your problem actually lies. It's actually works quite well, well enough that I just bought it. I evaluated several different products, including the standard HP and CA stuff, but the ProactiveNet stuff kicked the crap out of these in features, price, performance, and usability.
Need Free Juniper/NetScreen Support? JuniperForum
Nagios
Works great, easy to configure, and can do all of the things you are requiring (CPU load/memory/processes/etc). It has a very robust dependency mechanism, and has many levels of notifications.
I've been using it for 3 years now with zero problems. It looks like v2.0 will be out in beta form by the end of the month.
Check out moodss ( http://jfontain.free.fr/moodss/ )
It's a modular monitor framework that does incredible things.
It comes with modules to monitor machines (both local and remote), network devices, database, etc. But the best part is you can write your own modules to monitor whatever you desire.
Nagios - I'll say it again.
I am at this very moment experimenting with OpenNMS (www.opennms.org) in my testlab. Perhaps that is worth some investigation.
For a specifically Solaris solution, look at Orcallator, but read my experiences with that and SARGE first.
I'd second the various Nagios recommendations. The object templating configuration is very powerful once you get your head round it.
Ade_
/
Big Bubbles (no troubles) - what sucks, who sucks and you suck
Lrrd is great for graphing. You can graph anything through a simple script, and a lot of example script are allready included.
Lrrd uses a single server that polls one or more clients for information.
Nagios is better at monitoring the network as a whole, and responding to events. If for example a router goes down, nagios knows that the servers behind it will be unreachable as well, and won't bother you with alerts for them. As nagios can also react to events, it would be possible to change the default route to route around the broken router.
Yes nagios is the best. I've had it running totally on Solaris and you can also hack in Windows support. Also wit hthe right plugins you can monitor load, disk space etc...
Rus
Cheap UK and US VPS
I've been using the very inexpensive ServersAlive from Woodstone since 1999, and I've been very pleased with it. It's much friendlier to use than Big Brother or MRTG (and yes, I use both of those as well). The user interface is great, very easy to point-click your way through, and you can also SSH or Telnet into it to do other administrative tasks.
It can check everything from pings, snmp, databases, web pages, services, processes, port checks, and more. For whatever it doesn't check, you can design external checks, and users share their external checks for things like Lotus Notes and file counts.
The alerting is absolutely top-notch: you can set up teams and people, and each person can have their own notification settings & schedules via ICQ, MSN, email, pager, and more. I love it because I can have my alerts delivered to the right place at the right time.
The user community is very active: there's a great email list with a lot of helpful people. I've personally written lots of web templates for it, and other users have added external checks for stuff like Lotus Notes, ODBC database checking, and more. The developers are also extremely responsive, and they do beta builds every few days with new features. For example, MSN recently turned off their old protocols, but Woodstone had already made available a new version that works with the new protocol, and explained to the email list what the ramifications were.
The newest version 4 added an Enterprise Version that can log to ODBC, so you can build web-based analytical reporting as well. That version goes for $179, but there's a free 10-check-only version and a $99 normal version. Can't say enough good stuff about this - it's outlasted four network admins at my company because the alerting at from my house (using ServersAlive) has always outperformed every solution we've put in at the office, including Big Brother, WhatsUpGold, and a few others.
What's your damage, Heather?
My project, Loggerithim is right up your alley.
We have had great success with Nagios. We even wrote custom plugins to monitor certain other aspects of our custom system (in PHP, no less).
S
If you want a GUI, you might want to check out OpenMapper.
At work here we use a combination of two things to monitor our servers. First is Nagios (previously NetSaint). Nagios is good because it can do very basic checks from just pinging a server to see if it's up (and network routers, switches, firewalls, printers, etc...) to actually checking to see if a certain service is up. Such as requesting a webpage to make sure that your HTTP server is running, or making an SMTP or FTP request to check that those services respond too. (it also does more, but there's no use in listing them all here.) We have nagios setup to send out pages whenever a server is reported as going down.
Also what we use is just a simple implimentation of SNMP plus Cricket (an interface for MRTG) to graph the SNMP data over time. That tells us things like CPU load, memory + swap usage, and a number of other things. Both products work pretty well and they give us a very good idea as to what is going on with our servers and such. And on the bright side, they're free! The only cost you need is the hardware to run them on.
And if you really wanted to get fancy, you could always try something like Smoke Ping which tells you the latency to your servers over time. It'll report the average time for a ping reply, plus a graph of how far away from the norm a ping is. Works great for if you want to see things like if a server's network response time slows down at various points of the day, or during heavy CPU load and things like that. It's a very nice product, and it sits on MRTG just like Cricket does, so you don't even need a separate box for it.
-Through the server, over the router, off the firewall... Nothing but 'Net!
We use NAGIOS to monitor our ISP network of 125+ machines and nearly 600 independent services. Completely customizable with plug-in modules to monitor anything you like.
I remember an older one called Big Brother that was a little lighter weight.
What about spong?
.
.
.
description: A systems and network monitoring system -- server programs
This package includes the spong daemon, which collects and stores
information from the spong client programs, and the program for sending
out messages when problems occur.
Spong is a simple systems and network monitoring package. It does not
compete with Tivoli, OpenView, UniCenter, or any other commercial
packages. It is not SNMP based, it communicates via simple TCP based
messages. It is written in perl and easily modifiable.
Its features include:
* client based monitoring (CPU, disk, processes, logs, etc.)
* monitoring of network services (smtp, http, ping, pop, dns, etc.)
* grouping of hosts (routers, servers, workstations, PCs)
* rules based messaging when problems occur
* configurable on a host by host basis
* results displayed via text or web based interface
* history of problems
* verbose information to help diagnosis problems
* modular programs to makes it easy to add or replace check functions
or features
* Big Brother BBSERVER emulation to allow Big Brother Clients to be use
Network Management with Nagios is an article about deploying Nagios for a large mixed Linux/Unix/Microsoft environment at John Deere.
There was a brief mention of OpenNMS earilier; Clearly this needs some more input. Nagios is a great tool too, but it is not as geared towards enterprise use.
OpenNMS is.
OpenNMS handles all common port services and SNMP/MIB capability (as any NMS should do). It does everything all the tools mentioned above here can do (and even incorporates a few).
It has a front-end powered by apache tomcat4 and uses postgreSQL(like Nagios) for it's database. It has commercial support, is easily deployed on multiple architectures including Solaris and Linux and has packages for Debian, Redhat, etc. (Email me for the latest in stepwise Debian deployment docs)
The reporting capabilities approach Corcord's tool capability with availablility reports emailed out from the server in PDF format. RRDtool graphs handle response time reporting on any monitored service, with a user interface for specifying specific graph output intervals. SNMP graphs for mib2.system OIDs are built in.
There is a MIB compiler for integrating any SNMP event. Custom scripts can be executed on specific events.
The pollers are very advanced, checking for specific versions and responses. They have dynamic poll frequency change on outages, and built-in down-time calendars.
I could go on, but I suggest instead that you joing the opennms-discuss list and continue your research there.
Watch out OpenView, Tivoli and Spectrum. With experience on these tools, I believe that a large part of the future of enterprise NMS based management lies within the OpenNMS community.
Best of all, the community has great people involved that have good perspective on the connection between business processes and the monitoring tools. And everyone wants to help you.
One thing Nagios has that has not been a part of OpenNMS until recently is the GUI map. This is due in part to the OpenNMS focus on enterprise functionality rather than 'slickness'.
With nearly 0.1 terabytes of downloads a month and a 25MB binary release it is easy to see the popularity of this tool. (OpenNMS.org posts this information)
To be fair, I am going to fully deploy Nagios over here to see how it is doing, though I don't think it can scale like the OpenNMS java backend.
You could use Big Brother as mentioned above and help Quest Software pay off that $6.6(USD) million dollar purchase (SEC Form 10-Q). I guess that works to $3.3(USD) million each.
Or try http://www.nagios.org and help them by purchasing some SWAG!
Nagios. Simple as that. You won't regret it.
There is better alternative to Nagios. It's called Zabbix. Check screenshots! The software is very simple to use and allows to see performance graphs of any resolution (up-to 1 sec). Also, it has excellent notification possibilities. We are using it here to monitor network of more than 40 servers (HP-UX, Solaris) running all sorts of applications (Oracle, SAP, Domino). I've spoken to the author, v1.0 will be released very soon ;)
BS is the rewrite of BB4, which uses actual shell scripts, to make the modules use Perl and be much more "correctly" modular.
This space for rent. Call 1-800-STEAK4U
Two days and no one's mentioned Nagios or OpenNMS? Both massively popular and useful.
"Nothing was broken, and it's been fixed." -- Jon Carroll