Server Monitoring Solutions?

← Back to Stories (view on slashdot.org)

Posted by Cliff on Wednesday October 15, 2003 @03:25PM from the keeping-an-eye-on-things dept.

bwhaley asks: "The University I work for has asked me to research software solutions for server monitoring. More specifically, a piece of software that will monitor server variables such as load, swap usage, POP/IMAP processes, total processes, and all the other interesting data about a server's health. Watching these variables can give administrators advance warning about potential problems with the server. We are currently using an in-house solution written in Perl but its age is showing. I have found plenty of proprietary solutions such as HP OpenView and Sun Management Center, but these cost thousands of dollars. What solutions do Slashdot readers use? Are there any powerful open source solutions that I'm missing? Is anyone else running homegrown software that they are happy with? We are running an entirely Solaris environment but I am interested in any UNIX solution."

58 comments

Min score:

Reason:

Sort:

CompSci students? by agent+dero · 2003-10-15 15:28 · Score: 2, Insightful

I would suggest talking to whoever teaches computer science and software. Get the kids doing this for an education to rewrite your perl scripts that do the same job.

That's something you can pass off as helping everybody, saving y'all money and teaching compSci kids how to work with the computers and OSes

--
Error 407 - No creative sig found
1. Re:CompSci students? by Glonoinha · 2003-10-15 17:17 · Score: 1
  
  Oh man I was reading this thinking 'yes! Yes! YES! I have an answer!' right up until I read the last few lines about actual platforms.
  
  I was headed towards setting up perfmon as a service and having one machine lookup the values from all the other machines and display them in either graph or save it as data - but this is obviously not the answer you were looking for.
  
  Hey, I tried. I am only just now coming up to speed on Linux so it will be a while before I am useful in that arena. Slashdot motto : if you can't be right, at least post early (hey, I tried.)
  
  Gotta do something while XP installs over the course of the next 37 minutes.
  
  --
  Glonoinha the MebiByte Slayer
2. Re:CompSci students? by Anonymous Coward · 2003-10-18 11:58 · Score: 0
  
  Then watch as the comp sci students read all the other students emails, install their own programs, read prof emails, and hack their way through the whole system.
Big Brother by macx666 · 2003-10-15 15:28 · Score: 1

Check out bb4.com.
1. Re:Big Brother by Anonymous Coward · 2003-10-19 07:40 · Score: 0
  
  Uh, if you have to pay to use it to generate money then it isn't open source. Source code may be available but that doesn't mean it is open source.
Nagios... by pardey · 2003-10-15 15:28 · Score: 1

Nagios might be what you're looking for. Cheers.
1. Re:Nagios... by pardey · 2003-10-15 15:36 · Score: 1
  
  Or not - I'll read the post more carefully next time...
Nagios by ralphus · 2003-10-15 15:29 · Score: 1

Have you heard of Nagios?

--
Revolutions are never about freedom or justice. They're about who's going to be top dog. -- Kilgore Trout
1. Re:Nagios by Aparthy · 2003-10-15 15:38 · Score: 1
  
  I second nagios. We use it at work to monitor around 700 hosts and all of their services. Just don't have one machine monitor more then a few hundred hosts, it tends to get a bit behind at time.
2. Re:Nagios by Sentry21 · 2003-10-15 15:44 · Score: 3, Insightful
  
  I second Nagios. I set it up as a technology test I was doing a while back to monitor our internal network and some remote servers (arbitrary web servers on the internet) for a lark - got it telling uptime, system load, swap, memory usage, processors, network load and the like on our Linux and Win2K machines (including various network interfaces - when the wired interface on the laptop was disconnected, it paged me - useless for our situation, but good for multihomed machines).
  
  It can monitor all kinds of machines, services, ports, networks, pings, traceroutes, anything. Beautiful setup, and highly recommended.
  
  --Dan
3. Re:Nagios by ajayrockrock · 2003-10-15 16:01 · Score: 1
  
  Yup, Nagios is pretty much what you're after. I had it up and monitoring all my servers in about a day.
  
  The only advice I can give is take the time and read the docs. They are very good and understanding what's going on will save you loads of time down the road when you want to add stuff.
  
  later,
  ajay
4. Re:Nagios by omegaman_1 · 2003-10-16 08:57 · Score: 1
  
  I'll third nagios (www.nagios.org) as I've used it and its previous incarnation (netsaint) in production environments. It has a very extensible setup. It has a very active development community as well. You could probably set up a limited test of its functionality on a spare box in a weekend.
Nagios by merphant · 2003-10-15 15:31 · Score: 1

I haven't used it but it seems like Nagios is what you want. It's GPL and is supposedly very powerful.
Big Brother by Kowh · 2003-10-15 15:32 · Score: 1

Big Brother

There's a vibrant community with lots of scripts to extend functionality.

It's free as in beer (but not freedom) for almost all uses, and is open source. You only have to pay if you use it to generate money.
Easy by keesh · 2003-10-15 15:33 · Score: 1

Big monitor, gkrellm over remote X and someone to sit there and watch :)
1. Re:Easy by bohlke · 2003-10-15 15:40 · Score: 1
  
  i am just looking at my 10 remote gkrellm now :-)
  its a big bunch of information :-)
  
  it is fun to find some degree of paterns ;-)
2. Re:Easy by kidlinux · 2003-10-16 06:35 · Score: 1
  
  Better yet, run a local copy of gkrellm and connect to the remote gkrellmd. gkrellm is nice for quick glances but doesn't keep any history of what it monitors, which I imagine is part of what the poster is looking for.
  
  It's nice to be able to analyze the historical data to make predictions and such.
  
  --
  -kidlinux.
3. Re:Easy by keesh · 2003-10-16 06:56 · Score: 1
  
  Unfortunately, gkrellmd doesn't yet handle plugins entirely correctly...
top by pizza_milkshake · 2003-10-15 15:35 · Score: 1

top is terrific
1. Re:top by ader · 2003-10-15 21:29 · Score: 1
  
  OK, the version of top to which you're referring is actually here, and it only works on Linux anyway.
  
  top for Solaris and other Unices is here. It's great for monitoring a single system in real time, but it's not what the poster is seeking.
  
  Ade_
  /
  
  --
  Big Bubbles (no troubles) - what sucks, who sucks and you suck
Alarms are good, but... by keiferb · 2003-10-15 15:45 · Score: 1

pretty pictures are more fun to look at! Check out cacti for all of your process/bandwidth/load/usage graphing needs. It's available at raxnet.net
What about Nagios? by a.koepke · 2003-10-15 15:47 · Score: 1

Nagios is a great server monitoring system and seems to have what you need.

Its meant for Linux but works under most *NIX variants

--

(\(\
(^.^)
(")")
*This is the cute bunny virus, please copy this into your sig so it can spread
Big Sister by Quixotic137 · 2003-10-15 15:57 · Score: 2, Informative

If you don't want to pay for Big Brother, take a look at Big Sister. It does at least much of the same thing, but free (as in beer and speech).
1. Re:Big Sister by bakes · 2003-10-15 18:59 · Score: 1
  
  I quite like Big Sister as well. At my last job I was using it to monitor around 50 servers, shown split into their four different functional groups.
  
  Service failures generated emails, and we also configured it to sned an SMS to us out of office hours. The servers were mostly windows NT boxes, so when a BSOD took out a web or FTP server, we were alerted within a few minutes. The default was about 20 minutes, I had to tweak that setting. That was easy because it's all written in perl (with the exception of some of the interfaces to the windows performance counters, I think).
  
  I also added extra links to run scripts to show network activity graphs from MRTG for the switches. I was a pretty sweet setup once I had it the way I wanted.
  
  Big Sister can check for a response on a TCP port, check for running processes, memory or swap space, montior the run queue length, file system free space, or most other things you need, plus you can add your own easily. You can also configure thresholds so can be notified if they are reached.
  
  It's obviously not as pretty as the many-multiple-thousands-of-dollars solutions, but it's pretty good.
  
  --
  Ho! Haha! Guard! Turn! Parry! Dodge! Spin! Ha! Thrust!
2. Re:Big Sister by teemu.s · 2003-10-15 21:16 · Score: 1
  
  according to this Page, the author of big sister is not willing to maintain the windows
  port anymore - without sponsoring (-which IMHO is a good way to go)..
3. Re:Big Sister by leitz · 2003-10-15 23:58 · Score: 1
  
  Might want to verify, but BB probably wouldn't cost for a Uni. My understanding is that even a commercial entity can use it for free if the servers being monitored are non-commerce; i.e. your QA and development servers.
4. Re:Big Sister by adam872 · 2003-10-16 13:09 · Score: 1
  
  Big Sister is pretty powerful and quite extensible too. Be aware that it takes a non-trivial amount of effort to set up, as I found out. It works on all major O/S flavours though, which is a plus. It also interfaces with other packages, such as OpenView, should you ever need it to.
  
  We are doing a similar evaluation where I work. I think we'll end up with OpenView if the costs work out OK. There are other good commercial solutions on the market, such as Foglight, Storage Profiler, Sun Management Console, Tivoli. It really depends on how much one wants to pay.
Nagios by nocomment · 2003-10-15 16:04 · Score: 1

That's easy, use nagios. It what I use and it's great. For the holes it doesn't fill, go try out mrtg. :-)

--
/* oops I accidentally made a comment, sorry */
/* http://allyourbasearebelongto.us */
Two suggestions by Fished · 2003-10-15 16:13 · Score: 1

First, try nagios, which is open source from www.nagios.org. It takes a small commitment to setup, but works *very* well.
Second, you might try Sun netconnect since you are running all Solaris. I haven't used it myself, but some people at my nameless company have and think well of it.

--
"He who would learn astronomy, and other recondite arts, let him go elsewhere. " -- John Calvin, commenting on Genesis 1
nagios by gyratedotorg · 2003-10-15 16:20 · Score: 1

how about nagios?

--
Gyrate Dot Org - "Where high-tech meets low-life"
SNMP + MRTG/Cricket/... + Mon by fdragon · 2003-10-15 16:28 · Score: 4, Informative

I don't know why everyone forgets the default solution. SNMP comes with almost all Unix systems and Microsoft Windows.
If your Unix system doesn't come with one Net-SNMP will install on many of them.
The SNMP daemon by default understands how to monitor Load Avg, Memory, Processes, and so forth. It may not be able to tell you details of the process, such as what user is logged into the POP3 daemon, but it will tell you that you have 500 of them running, and alert you (via SNMP Traps) of that fact.
ALl you need to do once you have checked the documentation for your SNMP agent and then configured it, is to setup a single (ok, maybe 2 or 3) machine to send your traps to so you can kick of alerts. With some simple scripting in $FAVORITE_SCRIPTING_LANGUAGE you can email, page, text message, update web page, or $OTHER.
Cricket or MRTG are nice utilities that will poll the servers in question (by default every 5 minutes) and produce graphs. MRTG was designed to handle network equipment and graph the bandwidth utilization, but with a change to the SNMP string, will graph anything. Cricket is the same concept but does things a little differently by using a tree configuration system for property inheritance and does graph generation on the fly instead of the at poll time method MRTG uses.
And last but not least, Transmeta produced a very good perl script monitoring package known simply as Mon. This package will do active polling of the servers including issuing a transaction to the service you are monitoring. Due to the way this software monitors, you can actually see if the remote machine is alive by actually utilizing the service to monitor instead of just the "I can ping it, it must be up" mentality some people have.
Best part about all the above mentioned software is that they are all applications with an OSI Approved OpenSource license. This means you don't spend anything but TIME, and possibly a few machines to do the actual monitoring with.
And you may wonder about the impact of system performance due to the monitoring by SNMP, MRTG/Cricket, and Mon. The short answer is that I couldn't detect a noticable increase. Other utilities such as Argent (Commercial Pay For Software) would impact a HP-UX V Class 8 CPU with 8GB RAM machine from 0% on all 8 CPUs to about 20% on ALL 8 CPUs while it telneted to the machine, created about 150KB of test scripts, and then ran them.

--
The program isn't debugged until the last user is dead.
1. Re:SNMP + MRTG/Cricket/... + Mon by Anonymous Coward · 2003-10-15 18:02 · Score: 0
  
  This means you don't spend anything but TIME, and possibly a few machines to do the actual monitoring with.
  
  Whew, good thing my boss pays me $0/hour!
  
  Seriously, the good thing about Free Software is that it gives you freedom.. it still requires an outlay of cash.
JFFNMS by szysz · 2003-10-15 16:36 · Score: 3, Informative

You could use my project !

JFFNMS - Just for Fun Network Management System.

The site is JFFNMS.org
Look at the features, it has all you need, and of course the screenshots.

It will work on any Unix with PHP support, it will also monitor any standard compilant SNMP device or TCP Port, also if you have SNMP enabled it will tell you now many connections do you have to the specified port, apart from the connection delay.

Its open source, and fully supported, I just made the latest release a few days ago.

You could also look at the two working demos.

I hope any of you could use it, it really shows a lot of things about a host, that being a Server or a Router.

--
- Smells Like Open Source Code
1. Re:JFFNMS by szysz · 2003-10-15 16:47 · Score: 1
  
  Ohh.. I forgot to tell you that we are number 2 in Google for Network Management System
  
  And that we have really nice graphs to show the server health, and also have a good trigger/action system so you can get emails or sms messages when something happens.
  
  If you have any question, please ask it on the JFFNMS List.
  
  Javier
  
  --
  - Smells Like Open Source Code
ProactiveNet by austad · 2003-10-15 16:46 · Score: 1

I know you're looking for something free, but others here with some dollars to spend might like this. ProactiveNet does standard monitoring of network devices, can grab any variable available via snmp, microsoft perfmon counters, or even using shell scripts to parse data and return values you wish to monitor. It also has very extensive monitoring capabilities for just about any kind of database (it can execute any query you wish or monitor performance tables), and many kinds of middleware.

It keeps a database which keeps track of the normal values throughout the day and sets high and low thresholds. So, if you have a problem, it can use this data to try to pin down where your problem actually lies. It's actually works quite well, well enough that I just bought it. I evaluated several different products, including the standard HP and CA stuff, but the ProactiveNet stuff kicked the crap out of these in features, price, performance, and usability.

--
Need Free Juniper/NetScreen Support? JuniperForum
One word.... by anderiv · 2003-10-15 17:29 · Score: 1

Nagios

Works great, easy to configure, and can do all of the things you are requiring (CPU load/memory/processes/etc). It has a very robust dependency mechanism, and has many levels of notifications.

I've been using it for 3 years now with zero problems. It looks like v2.0 will be out in beta form by the end of the month.
Moodss by Anonymous Coward · 2003-10-15 17:30 · Score: 0

Check out moodss ( http://jfontain.free.fr/moodss/ )

It's a modular monitor framework that does incredible things.

It comes with modules to monitor machines (both local and remote), network devices, database, etc. But the best part is you can write your own modules to monitor whatever you desire.
Nagios by Karora · 2003-10-15 20:22 · Score: 1

Sheesh, is Slasdot a substitute for research?
Nagios - I'll say it again.

--

...heellpppp! I've been captured by little green penguins!
OpenNMS by winchester · 2003-10-15 21:09 · Score: 1

I am at this very moment experimenting with OpenNMS (www.opennms.org) in my testlab. Perhaps that is worth some investigation.
Been there, done that... by ader · 2003-10-15 21:14 · Score: 1

For a specifically Solaris solution, look at Orcallator, but read my experiences with that and SARGE first.

I'd second the various Nagios recommendations. The object templating configuration is very powerful once you get your head round it.

Ade_
/

--
Big Bubbles (no troubles) - what sucks, who sucks and you suck
lrrd & nagios by CAPSLOCK2000 · 2003-10-15 21:31 · Score: 1

Lrrd is great for graphing. You can graph anything through a simple script, and a lot of example script are allready included.
Lrrd uses a single server that polls one or more clients for information.
Nagios is better at monitoring the network as a whole, and responding to events. If for example a router goes down, nagios knows that the servers behind it will be unreachable as well, and won't bother you with alerts for them. As nagios can also react to events, it would be possible to change the default route to route around the broken router.
Again Nagios by rf0 · 2003-10-16 00:02 · Score: 1

Yes nagios is the best. I've had it running totally on Solaris and you can also hack in Windows support. Also wit hthe right plugins you can monitor load, disk space etc...

Rus

--
Cheap UK and US VPS
Windows guys should check out ServersAlive by Brento · 2003-10-16 00:16 · Score: 1

I've been using the very inexpensive ServersAlive from Woodstone since 1999, and I've been very pleased with it. It's much friendlier to use than Big Brother or MRTG (and yes, I use both of those as well). The user interface is great, very easy to point-click your way through, and you can also SSH or Telnet into it to do other administrative tasks.

It can check everything from pings, snmp, databases, web pages, services, processes, port checks, and more. For whatever it doesn't check, you can design external checks, and users share their external checks for things like Lotus Notes and file counts.

The alerting is absolutely top-notch: you can set up teams and people, and each person can have their own notification settings & schedules via ICQ, MSN, email, pager, and more. I love it because I can have my alerts delivered to the right place at the right time.

The user community is very active: there's a great email list with a lot of helpful people. I've personally written lots of web templates for it, and other users have added external checks for stuff like Lotus Notes, ODBC database checking, and more. The developers are also extremely responsive, and they do beta builds every few days with new features. For example, MSN recently turned off their old protocols, but Woodstone had already made available a new version that works with the new protocol, and explained to the email list what the ramifications were.

The newest version 4 added an Enterprise Version that can log to ODBC, so you can build web-based analytical reporting as well. That version goes for $179, but there's a free 10-check-only version and a $99 normal version. Can't say enough good stuff about this - it's outlasted four network admins at my company because the alerting at from my house (using ServersAlive) has always outperformed every solution we've put in at the office, including Big Brother, WhatsUpGold, and a few others.

--
What's your damage, Heather?
Loggerithim by gphat · 2003-10-16 00:34 · Score: 1

My project, Loggerithim is right up your alley.
Nagios by TheTomcat · 2003-10-16 00:49 · Score: 1

We have had great success with Nagios. We even wrote custom plugins to monitor certain other aspects of our custom system (in PHP, no less).

S
OpenMapper by Anonymous Coward · 2003-10-16 02:06 · Score: 0

If you want a GUI, you might want to check out OpenMapper.
Nagios + Cricket + SNMP by EvilOpie · 2003-10-16 02:18 · Score: 1

At work here we use a combination of two things to monitor our servers. First is Nagios (previously NetSaint). Nagios is good because it can do very basic checks from just pinging a server to see if it's up (and network routers, switches, firewalls, printers, etc...) to actually checking to see if a certain service is up. Such as requesting a webpage to make sure that your HTTP server is running, or making an SMTP or FTP request to check that those services respond too. (it also does more, but there's no use in listing them all here.) We have nagios setup to send out pages whenever a server is reported as going down.

Also what we use is just a simple implimentation of SNMP plus Cricket (an interface for MRTG) to graph the SNMP data over time. That tells us things like CPU load, memory + swap usage, and a number of other things. Both products work pretty well and they give us a very good idea as to what is going on with our servers and such. And on the bright side, they're free! The only cost you need is the hardware to run them on.

And if you really wanted to get fancy, you could always try something like Smoke Ping which tells you the latency to your servers over time. It'll report the average time for a ping reply, plus a graph of how far away from the norm a ping is. Works great for if you want to see things like if a server's network response time slows down at various points of the day, or during heavy CPU load and things like that. It's a very nice product, and it sits on MRTG just like Cricket does, so you don't even need a separate box for it.

--
-Through the server, over the router, off the firewall... Nothing but 'Net!
NAGIOS is the best I have seen by Grizzletooth · 2003-10-16 03:33 · Score: 1

We use NAGIOS to monitor our ISP network of 125+ machines and nearly 600 independent services. Completely customizable with plug-in modules to monitor anything you like.
I remember an older one called Big Brother that was a little lighter weight.
Server Monitoring Solutions? by krishnaD · 2003-10-16 06:15 · Score: 1

What about spong?

description: A systems and network monitoring system -- server programs
This package includes the spong daemon, which collects and stores
information from the spong client programs, and the program for sending
out messages when problems occur.
.
Spong is a simple systems and network monitoring package. It does not
compete with Tivoli, OpenView, UniCenter, or any other commercial
packages. It is not SNMP based, it communicates via simple TCP based
messages. It is written in perl and easily modifiable.
.
Its features include:
.
* client based monitoring (CPU, disk, processes, logs, etc.)
* monitoring of network services (smtp, http, ping, pop, dns, etc.)
* grouping of hosts (routers, servers, workstations, PCs)
* rules based messaging when problems occur
* configurable on a host by host basis
* results displayed via text or web based interface
* history of problems
* verbose information to help diagnosis problems
* modular programs to makes it easy to add or replace check functions
or features
* Big Brother BBSERVER emulation to allow Big Brother Clients to be use
Nagios implementation article by Anonymous Coward · 2003-10-16 06:20 · Score: 0

Network Management with Nagios is an article about deploying Nagios for a large mixed Linux/Unix/Microsoft environment at John Deere.
OpenNMS is going to lead the way. by iMacorIBM · 2003-10-16 09:23 · Score: 1

There was a brief mention of OpenNMS earilier; Clearly this needs some more input. Nagios is a great tool too, but it is not as geared towards enterprise use.
OpenNMS is.
OpenNMS handles all common port services and SNMP/MIB capability (as any NMS should do). It does everything all the tools mentioned above here can do (and even incorporates a few).
It has a front-end powered by apache tomcat4 and uses postgreSQL(like Nagios) for it's database. It has commercial support, is easily deployed on multiple architectures including Solaris and Linux and has packages for Debian, Redhat, etc. (Email me for the latest in stepwise Debian deployment docs)
The reporting capabilities approach Corcord's tool capability with availablility reports emailed out from the server in PDF format. RRDtool graphs handle response time reporting on any monitored service, with a user interface for specifying specific graph output intervals. SNMP graphs for mib2.system OIDs are built in.
There is a MIB compiler for integrating any SNMP event. Custom scripts can be executed on specific events.
The pollers are very advanced, checking for specific versions and responses. They have dynamic poll frequency change on outages, and built-in down-time calendars.
I could go on, but I suggest instead that you joing the opennms-discuss list and continue your research there.
Watch out OpenView, Tivoli and Spectrum. With experience on these tools, I believe that a large part of the future of enterprise NMS based management lies within the OpenNMS community.
Best of all, the community has great people involved that have good perspective on the connection between business processes and the monitoring tools. And everyone wants to help you.
One thing Nagios has that has not been a part of OpenNMS until recently is the GUI map. This is due in part to the OpenNMS focus on enterprise functionality rather than 'slickness'.
With nearly 0.1 terabytes of downloads a month and a 25MB binary release it is easy to see the popularity of this tool. (OpenNMS.org posts this information)
To be fair, I am going to fully deploy Nagios over here to see how it is doing, though I don't think it can scale like the OpenNMS java backend.
Help nagios with swag, bigbro is rich enough by Anonymous Coward · 2003-10-16 12:11 · Score: 0

You could use Big Brother as mentioned above and help Quest Software pay off that $6.6(USD) million dollar purchase (SEC Form 10-Q). I guess that works to $3.3(USD) million each.

Or try http://www.nagios.org and help them by purchasing some SWAG!
Nagios by macdaddy · 2003-10-16 14:29 · Score: 1

Nagios. Simple as that. You won't regret it.
Better alternative by Anonymous Coward · 2003-10-16 19:34 · Score: 0

There is better alternative to Nagios. It's called Zabbix. Check screenshots! The software is very simple to use and allows to see performance graphs of any resolution (up-to 1 sec). Also, it has excellent notification possibilities. We are using it here to monitor network of more than 40 servers (HP-UX, Solaris) running all sorts of applications (Oracle, SAP, Domino). I've spoken to the author, v1.0 will be released very soon ;)
BigSister by TBone · 2003-10-17 08:13 · Score: 1

BS is the rewrite of BB4, which uses actual shell scripts, to make the modules use Perl and be much more "correctly" modular.

--

This space for rent. Call 1-800-STEAK4U
for crying out loud by Stinking+Pig · 2003-10-17 11:22 · Score: 1

Two days and no one's mentioned Nagios or OpenNMS? Both massively popular and useful.

--
"Nothing was broken, and it's been fixed." -- Jon Carroll