Nagios System and Network Monitoring
David Martinjak writes "Nagios is an open source application for monitoring hosts, services, and conditions over a network. Availability of daemons and services can be tested, and specific statistics can be checked by Nagios to provide system and network administrators with vital information to help sustain uptime and prevent outages. Nagios: System and Network Monitoring is for everyone who has a network to run." Read on for the rest of the review.
Nagios: System and Network Monitoring
author
Wolfgang Barth
pages
464
publisher
No Starch Press
rating
9
reviewer
David Martinjak
ISBN
1593270704
summary
Covers installing, configuring, and deploying Nagios to monitor systems and services on a network.
The book is authored by Wolfgang Barth and published by No Starch Press. The publisher hosts a Web page which contains an online copy of the table of contents, portions of reviews, links to purchase the electronic and print versions of the book, and a sample chapter ("Chapter 7: Testing Local Resources") in PDF format.
An amusing note to begin: this is one of the only books I have read where the introduction was actually worth reading closely. Many books seem to talk about background or history of the subject without providing much pertinent information, if any at all. In Nagios: System and Network Monitoring, Wolfgang Barth begins with a hypothetical anecdote to illustrate the usefulness of Nagios. The most important section in the introduction, however, is the explanation of states in Nagios. While monitoring a resource, Nagios will return of one of four states. OK indicates nominal status, WARNING shows a potentially problematic circumstance, CRITICAL signifies an emergency situation, and UNKNOWN usually means there is an operating error with Nagios or the corresponding plugin. The definitions for each of these states are determined by the person or team who administers Nagios so that relevant thresholds can be set for the WARNING and CRITICAL status levels.
The first chapter walks the reader through installing Nagios to the filesystem. All steps are shown, which proves to be very helpful if you are unfamiliar with unpacking archives or compiling from source. Users who are either new to Linux, or cannot install Nagios through a package manager, will appreciate the verbosity offered here. Fortunately, the level of detail is consistent through the book.
Chapter 2 explains the configuration structure of Nagios to the reader. This chapter may contain the most important material in the book as understanding the layout of Nagios is essential to a successful deployment in any environment. The book moves right into enumerating the uses and purposes of the config files, objects, groupings, and templates. All of this information is valuable and presented in a descriptive manner to help the reader set up a properly configured installation of Nagios. My biggest stumbling block in using Nagios was wrapping my brain around the relationships of the config files and objects. This chapter clears up all of the ambiguities I remember having to work out for myself. If only this book had been around a few years ago!
The sixth chapter dives into the details of plugins that are available for monitoring network services. This chapter explains using the check_icmp plugin to ping both a host and a specific service for verifying reachability. Additional examples include monitoring mail servers, LDAP, web servers, and DNS among others. There is even a section for testing TCP and UDP ports.
Next, the book covers checking the status of local resources on systems. At work, we have a system in production that could have been partitioned better. Unfortunately, /var is a bit smaller than it should be, and tends to fill up relatively frequently. Thankfully, Nagios can trigger a warning when there is a low amount of free space left on the partition. From there, we have Nagios execute a script that cleans out certain items in /var so we don't have to bother with it. We can also receive notification if the situation does not improve, and requires further attention. In addition to monitoring hard drive usage, the book includes examples for checking swap utilization, system load, number of logged-in users, and even Nagios itself.
Chapter 12 discusses the notification system in Nagios. You provide who, what, when, where, and how in the configs, and Nagios does the rest. The book does a fantastic job of explaining what exactly triggers a notification, and how to efficiently configure Nagios to ensure the proper parties are being informed of relevant issues at reasonable intervals. For example, the server team might be interested to know that /var is 90% full on one of the LDAP servers; however they don't need to be notified of this every thirty seconds. This chapter also covers an important aspect of Nagios known as flapping. Flapping occurs when a monitored resource quickly alternates between states. Nagios can be configured for a certain tolerance against rapid alternating changes in states. This means Nagios won't sound the alarm if the problem will resolve itself in a short period of time. Usually flapping is caused by an external factor temporarily influencing the results of the test from Nagios; and therefore has no long-term impact.
The last major chapter to mention here deals with essentially anything and everything about the Nagios Web interface. The main point of interaction between the administrator and Nagios is the fully featured Web interface. This chapter covers recognizing and working on problems, planning downtimes, making configuration changes, and more. I especially like that the book gives an overview of each of the individual CGI programs that the Web interface is composed of; as these files are important for UI customization.
The only aspect of this book that I did not care for was that the book reads like a reference manual at times. The first several chapters start out more conversational in tone with great explanations of the procedures and files; but later it sometimes feels like I am repeatedly reading an iterated piece-by-piece structure, filled in with the content for that chapter. That is not necessarily bad all together as it does provide consistency in the presentation of the information. Additionally, the level of detail is outstanding throughout the book. The explanations are never too short or too long. This is definitely a valuable book for administrators at all levels with fantastic breadth and depth of material. Administrators who are interested in proactive management of their systems and networks should be pleased with Nagios: System and Network Monitoring.
Nagios is licensed under the GNU General Public License Version 2, and can be downloaded from http://nagios.org.
David Martinjak is a programmer, GNU/Linux addict, and the director of 2600 in Cincinnati, Ohio. He can be reached at david.martinjak@gmail.com.
You can purchase Nagios: System and Network Monitoring from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
The book is authored by Wolfgang Barth and published by No Starch Press. The publisher hosts a Web page which contains an online copy of the table of contents, portions of reviews, links to purchase the electronic and print versions of the book, and a sample chapter ("Chapter 7: Testing Local Resources") in PDF format.
An amusing note to begin: this is one of the only books I have read where the introduction was actually worth reading closely. Many books seem to talk about background or history of the subject without providing much pertinent information, if any at all. In Nagios: System and Network Monitoring, Wolfgang Barth begins with a hypothetical anecdote to illustrate the usefulness of Nagios. The most important section in the introduction, however, is the explanation of states in Nagios. While monitoring a resource, Nagios will return of one of four states. OK indicates nominal status, WARNING shows a potentially problematic circumstance, CRITICAL signifies an emergency situation, and UNKNOWN usually means there is an operating error with Nagios or the corresponding plugin. The definitions for each of these states are determined by the person or team who administers Nagios so that relevant thresholds can be set for the WARNING and CRITICAL status levels.
The first chapter walks the reader through installing Nagios to the filesystem. All steps are shown, which proves to be very helpful if you are unfamiliar with unpacking archives or compiling from source. Users who are either new to Linux, or cannot install Nagios through a package manager, will appreciate the verbosity offered here. Fortunately, the level of detail is consistent through the book.
Chapter 2 explains the configuration structure of Nagios to the reader. This chapter may contain the most important material in the book as understanding the layout of Nagios is essential to a successful deployment in any environment. The book moves right into enumerating the uses and purposes of the config files, objects, groupings, and templates. All of this information is valuable and presented in a descriptive manner to help the reader set up a properly configured installation of Nagios. My biggest stumbling block in using Nagios was wrapping my brain around the relationships of the config files and objects. This chapter clears up all of the ambiguities I remember having to work out for myself. If only this book had been around a few years ago!
The sixth chapter dives into the details of plugins that are available for monitoring network services. This chapter explains using the check_icmp plugin to ping both a host and a specific service for verifying reachability. Additional examples include monitoring mail servers, LDAP, web servers, and DNS among others. There is even a section for testing TCP and UDP ports.
Next, the book covers checking the status of local resources on systems. At work, we have a system in production that could have been partitioned better. Unfortunately, /var is a bit smaller than it should be, and tends to fill up relatively frequently. Thankfully, Nagios can trigger a warning when there is a low amount of free space left on the partition. From there, we have Nagios execute a script that cleans out certain items in /var so we don't have to bother with it. We can also receive notification if the situation does not improve, and requires further attention. In addition to monitoring hard drive usage, the book includes examples for checking swap utilization, system load, number of logged-in users, and even Nagios itself.
Chapter 12 discusses the notification system in Nagios. You provide who, what, when, where, and how in the configs, and Nagios does the rest. The book does a fantastic job of explaining what exactly triggers a notification, and how to efficiently configure Nagios to ensure the proper parties are being informed of relevant issues at reasonable intervals. For example, the server team might be interested to know that /var is 90% full on one of the LDAP servers; however they don't need to be notified of this every thirty seconds. This chapter also covers an important aspect of Nagios known as flapping. Flapping occurs when a monitored resource quickly alternates between states. Nagios can be configured for a certain tolerance against rapid alternating changes in states. This means Nagios won't sound the alarm if the problem will resolve itself in a short period of time. Usually flapping is caused by an external factor temporarily influencing the results of the test from Nagios; and therefore has no long-term impact.
The last major chapter to mention here deals with essentially anything and everything about the Nagios Web interface. The main point of interaction between the administrator and Nagios is the fully featured Web interface. This chapter covers recognizing and working on problems, planning downtimes, making configuration changes, and more. I especially like that the book gives an overview of each of the individual CGI programs that the Web interface is composed of; as these files are important for UI customization.
The only aspect of this book that I did not care for was that the book reads like a reference manual at times. The first several chapters start out more conversational in tone with great explanations of the procedures and files; but later it sometimes feels like I am repeatedly reading an iterated piece-by-piece structure, filled in with the content for that chapter. That is not necessarily bad all together as it does provide consistency in the presentation of the information. Additionally, the level of detail is outstanding throughout the book. The explanations are never too short or too long. This is definitely a valuable book for administrators at all levels with fantastic breadth and depth of material. Administrators who are interested in proactive management of their systems and networks should be pleased with Nagios: System and Network Monitoring.
Nagios is licensed under the GNU General Public License Version 2, and can be downloaded from http://nagios.org.
David Martinjak is a programmer, GNU/Linux addict, and the director of 2600 in Cincinnati, Ohio. He can be reached at david.martinjak@gmail.com.
You can purchase Nagios: System and Network Monitoring from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Ok, man, I swear this isn't a troll, but I have to know, what the heck, are you doing to these books?
I mean, it's none of my business, but do you have some insane reading technique?
maybe he is reading them in his oil filled server room?
Indeed.
;)
It's typically not wise to fold books over backwards when reading. The unpleasant sound the book makes when you do this the first few times might have been a good indication that you should stop
lol.. Now i wanna know too.
Doot!
init 11 - for when you need that edge.
Please forgive my anonymous coward use: my comments would reveal my name too well.
I'm an *OLD* Netsaint and Nagios user, and have contributed to both. Guides are great, playing with it is great, and it does a lot of things very well. But what Nagios has never had is a way to publish the URL's of specific queries or reports in a way that can be bookmarked and sent to someone else for reference. It's a big, big, big flaw in the system, common to a lot of web-based projects.
The other huge, huge flaw of Nagios is configuring it. It shouldn't take a reference book from O'Reilly to do this efficiently, but I'm afraid it does. There are easily a dozen different configuration tools at www.nagiosexchange.org and sourceforge.net, and *every single one of them* has major problems that could be solvd with 10% of the time spent on Nagios itself. Most are abandonware, exciting but uncompleted projects that are never going to be completed. Others rely on hand-compiling Nagios itself with strange local modifications and local configurations that are very difficult to import a working Nagios to, or export from. Others have absolutely *no* security model, incapable of securing access to them or relying on locally stored plain-text password setups: others rely on non-privileged accounts to edit the Nagios configurations, including the password files for databases or proxy services, in semi-public repositories. Others rely on installing every file in a browseable web directory, permitting local unauthorized to poke the guts of and use the security flaws. (Yes, you perl idiots who execute random file and directory creation without checking if it's empty first or protecting it from being written into by other people before you copy its contents, I mean you!)
Other configuration tools have beautiful "artist conception" interfaces that will make your eyes bleed aft 20 minutesworking with it. Every last one of them listed at Sourceforge and NagiosExchange suffer from one or many more of the major open source GUI flaws Eric Raymond ranted about in hisi CUPS horror story, years ago.
It's unfortunately so bad that I've had to throw away weeks of work and switch to Altiris on a major project, which is fairly painful to switch to but at *LEAST* has a usable interface.
zabbix
jffnms
opennms
etc.
I found nagios rather clunky compared to some of the others.
Deleted
I thoroughly enjoy the event handler capabilities built into Nagios. Just that single feature has made my day to day administrative tasks easier, and well worth the hours to write the scripts and get it all configured properly.
For example, it's so nice to have the spooler service on a win32 box restart automatically if it has locked or died unexpectedly, and not have to wait for the calls to come in when users can't print.
Never start vast projects with half-vast ideas.
Much easier to set up and get running - http://www.hyperic.com/ Not to mention supports more platforms than all of the others.
We bought this monitoring solution and are pretty happy with it. The agents on the windows side are just as good as on linux but the linux agents are restricted to certain distros. The reporting server is nice, gives us pretty reports to take to the customer to show them why they need the upgrades we are recommending. The SLA's can be proven since everything is tracked for years. I like it :)
You can take a look at a few screenshots here: http://www.friendlysol.com/managed_services.php
Use the contact form if someone has a question about it. The cool part is that we can either use it as a tool for our clients, or "rent" the tool to IT departments. Makes sense for the monitoring server to be outside somebody's office.
At the risk of getting off-topic, I'm tired of stuff that doesn't quite work. (can't comment on the actual book because I haven't read it) However, I can't see how Nagios can even begin to satisfy the needs of most modern IT operations folks. These days, most people need to know a lot more than whether machine X is up. They need to know which part(s) of their web apps are not functioning correctly. They need a lot more intricate detail than is possible with Nagios or SNMP-based monitoring tools. Really, the only monitoring tool that does it for me is Hyperic.
Groundwork is a great unification of Nagios and other tools that provides the missing configuration interface Nagios lacks.
v erview.html
p _id=160654&package_id=222764
http://www.groundworkopensource.com/products/os-o
There's a VMware appliance available if you want to take it for a quick spin around the block.
http://sourceforge.net/project/showfiles.php?grou
Choose you future. Choose to sysadmin.
I'm surprised people still use these 'svn co && ./configure && make install && edit config files' systems. You can download Hyperic HQ, install it, and be monitoring your software and hardware in 30 minutes -- no joke. Want alerts when your disks are full? Cake. Want to autodiscover your Apache server? Cake. Want an alert when a process goes haywire? Cake.
And since it has a pluggable framework, you can monitor anything that you want -- network devices, software, hardware, etc.
It's Open Source and has an active community, so if you really long for the days of 'svn co', that's also provided.
Disclaimer: I work for Hyperic ... and it's objectively better.
I tried nagios and I would agree with most that it is clunky. We use Hobbit because its free (and yes, its clunky too). I can't program for anything and wish either Castlerock or someone else would make SNMPc for opensource OS. It is superior to any monitoring out there and I'd gladdly pay the $5000 for it, but it doesn't run on Linux. Anyone that knows of anything that is comparible, please let me know.
So long, and thanks for all the fish.
He's using them to prop a rack up I bet =)
I had to learn everything from code already working and tweaked the hell out of it. Was actually a fun project for my internship. Sure wish I had a book back then.
Agreed; basically Nagios a mess, but it's pretty-much the standard unfortunately, as it kinda-sorta gets the job done.
My main problem with the current crop of monitoring tools is that they are all either about availablility (Nagios, et al) or performance (MRTG, Cacti). Currently I'm using Nagios+Cacti, which kinda-sorta works for me, but it would be nice to have a single coherent interface to my systems. Zenoss also looks interesting, although I haven't tried it yet, but I'd like to hear of any other possibilities.
The comfort you demanded is now mandatory - Jello Biafra
Have a look at hobbit. It always seems to be overlooked when comparing various free monitoring systems.
It's really very good.
While Nagios is a powerful solution, it is well known that it's a beast to setup/maintain/configure/etc. I have used Naigos in the past, and have recently switched to ZenOSS (http://www.zenoss.com). If you just can't get the hang of Nagios, or feel that its not user friendly/feature rich enough, give ZenOSS a try. No, I am not a member of the ZenOSS team -- just a very happy user.
When Hyperic HQ became open-sourced, our company tried it out, and have been stupendously pleased. We've started eradicating all of our Nagios and Sitescope implementations because HQ is so much easier to drive, and the interfaces are open. The crew at Hyperic is always helpful, even when you're trying to implement something that duplicates the functionality of its pay-fer Enterprise version.
-- Brad Felmey
...or is it fanbois? :)
And yeah, our users are responding. Thanks for noticing.
-John Mark
Hyperic Community Manager
of course Nagios is flexible. It's the time to setup and maintenance that costs you.
:)
And as far as "hatred of nagios" I've witnessed that firsthand when I've run BoF's on Nagios, and I've run a few - at LISA and LinuxWorld.
But I love your snarky comments. They r0x0r
Oh, and I almost hate to ask, but can you install RPM's on Windows? (har har)
-John Mark
Hyperic Community Manager
Looks like they've come out with another fine book. I've known those guys for a long time... now if they could just publish a book on Hyperic... ;)
Hyperic Community Manager
im just curious in case someone is bored enough to explain this to me. Who modded the posts here as trolls? I'm finding that if you look for it, there are actual trolls here and there on slashdot who throw in crap, often not on topic, that specifically insights fighting and complication. The two posts I just read above:
Good book
(Score:-1, Troll)
by 2.7182 (819680) Alter Relationship on Wednesday April 11, @03:24PM (#18693547)
It tells you a lot of things I don't know other sources for. But my binding cracked after a week, which is a bummer.
I'm not associated with them, but
(Score:-1, Troll)
by zappepcs (820751) Alter Relationship on Wednesday April 11, @03:41PM (#18693769)
They seem like a really easy group to work with so far. Much easier than commercial groups we've worked with.... I've got no complaints
These are two very brief, fairly positive observations about the users experience with the topic posted. Theres no fighting, theres no controversy...what makes them labeled as trolls? My request for an explanation of this moderating is 10 times more a troll then these two posts.