Nagios 3 Enterprise Network Monitoring
jgoguen writes "Nagios, originally known as Netsaint, has been a long-time favourite for network and device monitoring due to its flexibility, ease of use, and efficiency. Nagios provided, and still provides today, a low-cost, versatile alternative to commercial network monitoring applications. Nagios 3 takes a huge step forward compared to Nagios 2, providing improved flexibility, ease of use and extensibility, all while also making significant performance enhancements. Due to its extensibility and ease of use, no device or situation has yet been found that cannot be monitored using Nagios and a pre-made or custom script, plug-in or enhancement." Read on for the rest of jgoguen's review.
Nagios 3: Enterprise Network Monitoring
author
Max Schubert, Derrick Bennett, Jonathan Gines, Andrew Hay, John Strand
pages
339
publisher
Syngress
rating
8
reviewer
jgoguen
ISBN
978-1-59749-267-6
summary
Making Nagios 3 work for you and your business.
The first chapter is devoted to new features in Nagios 3. The major changes implemented for Nagios 3, which includes changes to data storage options and locations, checks, configuration objects, and macros, are discussed here. Operational, performance, and usability enhancements are also discussed here. Users upgrading from Nagios 2, or users who may already be familiar with Nagios 2, will gain the most from this chapter. New users will still gain value from this chapter, however, since a number of changes also involve some of the major features of Nagios. In addition, users who may be referring to configuration file samples created for Nagios 2 will save a great deal of time referring to this chapter for changes. Using Nagios 2 configuration files directly prevents users from enjoying some new features of Nagios 3. Users who will only be writing plug-ins and scripts for their local Nagios deployment might not find Chapter 1 very useful.
Chapters 2 and 3 deal with scaling Nagios to work efficiently within large deployments. First, designing a Nagios configuration for large organizations is shown. This is something that all Nagios administrators should make use of when designing configurations, not only administrators in large organizations, because a properly done configuration for a small organization will easily scale up as the organization grows. I was impressed to see that the authors stress the importance of the end user's input when designing configurations. Administrators who ignore this piece of advice risk the success of Nagios in their organization. Various diagrams help to explain the relationships between the various Nagios configuration objects. A good amount of detail is provided regarding allowing various groups within an organization to have semi-independent control over how Nagios interacts with their hosts and services, and how Nagios alerts their staff. The authors have included numerous configuration file snippets, which allows a Nagios administrator to very quickly create a configuration file and then tweak the configuration parameters to suit local requirements.
Scaling the Nagios graphical user interface (GUI) follows a very simple concept: use a "less is more" approach. Although the specific details here deal with Nagios, the general idea is equally applicable to anyone displaying information they expect their users to actually pay attention to. In general, users should be able to see as much as they want (limited by resources and permissions) but only be shown what they need to know about by default. For example, the system administrator for marketing probably does not need to know when the development disk image server goes down, while the development system administrator would probably be very interested. Utilizing user accounts allows the administrator to allow various groups to have access to Nagios filtered by its fine-grained permissions system. Users from various groups can also be shown only what they need to be shown by default, without the need to select a particular area first. Utilizing user accounts also prevents users who need to view Nagios from having full administrative control, and allows for records of each user's actions to be made. Using a patch provided with the book's download package will enable Nagios to have read-only accounts as well, which is great for organizations who would like to grant certain users (or groups) access to view Nagios but not make any changes. As an example, an organization's help desk could use Nagios to determine quickly whether users are unable to access services because of an outage, or if further troubleshooting is necessary.
The authors continue on here to discuss clustering, failover, and the future of the Nagios GUI. I'm not convinced that these belong in a chapter devoted to scaling the Nagios GUI, since these seem to mostly deal with scaling the entire Nagios deployment. Regardless, they are all very important topics, especially when Nagios is heavily relied upon. Clustering allows remote sites to have a Nagios instance local to the site monitoring hosts and devices rather than requiring a central Nagios instance to monitor remote hosts and services. Not only would monitoring hosts and services take much longer due to the WAN links between the central instance and remote locations, but also due to the security implications of allowing the checks to be done. The authors don't discuss the security side of clustering, but it's still something that every Nagios administrator (and everyone else!) should keep in mind. The clustering section deals primarily with the rationale behind clustering and how to configure the local and remote instances of Nagios properly, but the authors include a good deal of information here that a less experienced Nagios administrator might overlook. Most notable is their discussion about the display of service status when a service is reachable from the master server but not from a remote instance. While Nagios can translate the remote instance's check result to be displayed from its own perspective, it may be more desirable to have the master Nagios GUI display the results from the perspective of the server which made the check. After implementing clustering, some sort of fallback mechanism is required. Failover and redundancy are the two main choices, and that's what the authors discuss next. They don't spend much time on redundancy, since this would require each redundant Nagios instance to perform its own set of checks, which can significantly raise the load on both the monitored hosts and the network in general. Given the problems it can introduce, the authors have spent more time on redundancy than most administrators should spend considering. Failover is a much better solution, and the authors do a great job of covering the setup of a proper failover setup. As usual, they make sure to remind readers of some things that are easily overlooked, especially when you're trying to get Nagios back up and running when the master server crashes.
Chapters 4 and 5 discuss Nagios plug-ins, add-ons, and enhancements. These chapters alone are worth the price of the book because of how much time they can save. It's much faster to copy a script and make minor tweaks than it is to try reinventing the wheel, and with the number of scenarios covered here combined with the Nagios user community there aren't very many things that haven't been done already. Whether you want to test command-line interfaces, CPU usage, memory utilization, bandwidth utilization, HTML pages, LDAP services, or even specialized hardware, there's probably already a plug-in written for it. Most common scenarios actually have a plug-in already included in this book. The available add-ons and plug-ins are equally varied, providing ways to monitor hosts across security zones, configure read-only displays that live in a security zone other than the one Nagios is in, interface with Cacti, and even read out alerts. Even more scenarios can be handled by other scripts provided by the Nagios community.
Chapter 6 goes into detail on how to integrate Nagios into an enterprise environment. This chapter goes into just enough detail to get Nagios configured to work with a large number of third-party services, such as LDAP authentication, Cacti, Puppet, and Splunk. Emphasis here is always placed on the human element; how to use Nagios to help help desk and/or NOC staff do their jobs more efficiently and effectively, and how to gain maximum support for Nagios within the organization. The importance of the human element, in all its forms, simply cannot be overstated, and the the authors have done a wonderful job of outlining a good way to make Nagios an integral part of an organization. A lot of the material towards the end of the chapter, especially the section on smaller Network Operation Centres, could be used by anyone looking for ways to help a small group work together effectively.
Chapter 7 is another chapter with a lot of content easily applicable outside of a Nagios environment. The chapter begins with the authors reminding you to know your network and to watch out for session hijack attacks, then show you how to use Nagios to do both. Nagios can't replace a competent network administrator, but it can make their lives easier and the authors show you here how the configuration you've already done on Nagios already shows you a potential session hijack attack and how it forces you to properly know your network. Nagios forces you to know your network not only by how it's built and by what devices are in use, but it also requires that you have a solid handle on what constitutes normal conditions for all your devices and services.
Another area which is very important to companies, especially companies operating in the United States, that Nagios can assist with is regulatory compliance. The authors outline how a company could use Nagios to assist with compliance with Sarbanes-Oxley (SOX) with COBIT or COSO, Payment Card Industry (PCI) Data Security Standard (DSS), Director of Central Intelligence Directive (DCID) 6/3 and Department of Defence (DoD) Information Assurance Certification and Accreditation Process (DIACAP). Nagios alone isn't enough to be compliant, at the very least detailed documentation will also be required, but the authors give a good overview of how Nagios can assist with compliance in all of these regulations.
The final chapter helps to bring the rest of the book together by walking through a full Nagios configuration for a fictional Fortune 500 corporation. The bulk of this chapter covers the pre-deployment stage of a Nagios deployment, but that doesn't mean that there isn't a lot to learn about deploying Nagios. A major hurdle towards deploying Nagios in an organization is the pre-deployment phase, and the authors outline here how to easily turn this major challenge into a series of simple steps to increase the chances of Nagios' success in your organization. From the very beginning, you can see how involving the customer early and starting small, along with everything else, becomes a part of a process. Although it's specific to Nagios, the process followed here could be easily adapted to integrating any sort of monitoring service. The remainder of the chapter is devoted to how you might integrate Nagios into a Fortune 500 company, finishing the book off with some good advice for integrating Nagios.
Despite all the book's strengths, there is some room for improvement. In chapter 2, it may have been more effective to outline the relationships between the Nagios configuration objects before discussing configuration planning. I found it much easier to think of a configuration for a large organization after knowing about how Nagios' configuration objects relate to each other.
Throughout the book, the authors have included configuration file snippets, scripts, and example script output in the main text. While all of these are quite useful and serve to enhance the book, I think it would have been better if these were all included in an appendix instead, perhaps keeping only the relevant parts of configuration snippets in the main text for clarification.
At the end of chapter 3, the sections on the future of Nagios and the CGI front end are informational and interesting, but they would be better placed in a separate chapter dealing with the potential future of Nagios in general. These and the other major areas of Nagios combined would provide more than enough material for a full chapter on their collective futures.
Overall, this is a great book for anyone using Nagios as more than a casual user, and is still very informative for the casual user. A few of these chapters alone would be worth the price of the whole book.
Disclaimer: I worked with one author when I was asked to review this book.
You can purchase Nagios 3: Enterprise Network Monitoring from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Chapters 2 and 3 deal with scaling Nagios to work efficiently within large deployments. First, designing a Nagios configuration for large organizations is shown. This is something that all Nagios administrators should make use of when designing configurations, not only administrators in large organizations, because a properly done configuration for a small organization will easily scale up as the organization grows. I was impressed to see that the authors stress the importance of the end user's input when designing configurations. Administrators who ignore this piece of advice risk the success of Nagios in their organization. Various diagrams help to explain the relationships between the various Nagios configuration objects. A good amount of detail is provided regarding allowing various groups within an organization to have semi-independent control over how Nagios interacts with their hosts and services, and how Nagios alerts their staff. The authors have included numerous configuration file snippets, which allows a Nagios administrator to very quickly create a configuration file and then tweak the configuration parameters to suit local requirements.
Scaling the Nagios graphical user interface (GUI) follows a very simple concept: use a "less is more" approach. Although the specific details here deal with Nagios, the general idea is equally applicable to anyone displaying information they expect their users to actually pay attention to. In general, users should be able to see as much as they want (limited by resources and permissions) but only be shown what they need to know about by default. For example, the system administrator for marketing probably does not need to know when the development disk image server goes down, while the development system administrator would probably be very interested. Utilizing user accounts allows the administrator to allow various groups to have access to Nagios filtered by its fine-grained permissions system. Users from various groups can also be shown only what they need to be shown by default, without the need to select a particular area first. Utilizing user accounts also prevents users who need to view Nagios from having full administrative control, and allows for records of each user's actions to be made. Using a patch provided with the book's download package will enable Nagios to have read-only accounts as well, which is great for organizations who would like to grant certain users (or groups) access to view Nagios but not make any changes. As an example, an organization's help desk could use Nagios to determine quickly whether users are unable to access services because of an outage, or if further troubleshooting is necessary.
The authors continue on here to discuss clustering, failover, and the future of the Nagios GUI. I'm not convinced that these belong in a chapter devoted to scaling the Nagios GUI, since these seem to mostly deal with scaling the entire Nagios deployment. Regardless, they are all very important topics, especially when Nagios is heavily relied upon. Clustering allows remote sites to have a Nagios instance local to the site monitoring hosts and devices rather than requiring a central Nagios instance to monitor remote hosts and services. Not only would monitoring hosts and services take much longer due to the WAN links between the central instance and remote locations, but also due to the security implications of allowing the checks to be done. The authors don't discuss the security side of clustering, but it's still something that every Nagios administrator (and everyone else!) should keep in mind. The clustering section deals primarily with the rationale behind clustering and how to configure the local and remote instances of Nagios properly, but the authors include a good deal of information here that a less experienced Nagios administrator might overlook. Most notable is their discussion about the display of service status when a service is reachable from the master server but not from a remote instance. While Nagios can translate the remote instance's check result to be displayed from its own perspective, it may be more desirable to have the master Nagios GUI display the results from the perspective of the server which made the check. After implementing clustering, some sort of fallback mechanism is required. Failover and redundancy are the two main choices, and that's what the authors discuss next. They don't spend much time on redundancy, since this would require each redundant Nagios instance to perform its own set of checks, which can significantly raise the load on both the monitored hosts and the network in general. Given the problems it can introduce, the authors have spent more time on redundancy than most administrators should spend considering. Failover is a much better solution, and the authors do a great job of covering the setup of a proper failover setup. As usual, they make sure to remind readers of some things that are easily overlooked, especially when you're trying to get Nagios back up and running when the master server crashes.
Chapters 4 and 5 discuss Nagios plug-ins, add-ons, and enhancements. These chapters alone are worth the price of the book because of how much time they can save. It's much faster to copy a script and make minor tweaks than it is to try reinventing the wheel, and with the number of scenarios covered here combined with the Nagios user community there aren't very many things that haven't been done already. Whether you want to test command-line interfaces, CPU usage, memory utilization, bandwidth utilization, HTML pages, LDAP services, or even specialized hardware, there's probably already a plug-in written for it. Most common scenarios actually have a plug-in already included in this book. The available add-ons and plug-ins are equally varied, providing ways to monitor hosts across security zones, configure read-only displays that live in a security zone other than the one Nagios is in, interface with Cacti, and even read out alerts. Even more scenarios can be handled by other scripts provided by the Nagios community.
Chapter 6 goes into detail on how to integrate Nagios into an enterprise environment. This chapter goes into just enough detail to get Nagios configured to work with a large number of third-party services, such as LDAP authentication, Cacti, Puppet, and Splunk. Emphasis here is always placed on the human element; how to use Nagios to help help desk and/or NOC staff do their jobs more efficiently and effectively, and how to gain maximum support for Nagios within the organization. The importance of the human element, in all its forms, simply cannot be overstated, and the the authors have done a wonderful job of outlining a good way to make Nagios an integral part of an organization. A lot of the material towards the end of the chapter, especially the section on smaller Network Operation Centres, could be used by anyone looking for ways to help a small group work together effectively.
Chapter 7 is another chapter with a lot of content easily applicable outside of a Nagios environment. The chapter begins with the authors reminding you to know your network and to watch out for session hijack attacks, then show you how to use Nagios to do both. Nagios can't replace a competent network administrator, but it can make their lives easier and the authors show you here how the configuration you've already done on Nagios already shows you a potential session hijack attack and how it forces you to properly know your network. Nagios forces you to know your network not only by how it's built and by what devices are in use, but it also requires that you have a solid handle on what constitutes normal conditions for all your devices and services.
Another area which is very important to companies, especially companies operating in the United States, that Nagios can assist with is regulatory compliance. The authors outline how a company could use Nagios to assist with compliance with Sarbanes-Oxley (SOX) with COBIT or COSO, Payment Card Industry (PCI) Data Security Standard (DSS), Director of Central Intelligence Directive (DCID) 6/3 and Department of Defence (DoD) Information Assurance Certification and Accreditation Process (DIACAP). Nagios alone isn't enough to be compliant, at the very least detailed documentation will also be required, but the authors give a good overview of how Nagios can assist with compliance in all of these regulations.
The final chapter helps to bring the rest of the book together by walking through a full Nagios configuration for a fictional Fortune 500 corporation. The bulk of this chapter covers the pre-deployment stage of a Nagios deployment, but that doesn't mean that there isn't a lot to learn about deploying Nagios. A major hurdle towards deploying Nagios in an organization is the pre-deployment phase, and the authors outline here how to easily turn this major challenge into a series of simple steps to increase the chances of Nagios' success in your organization. From the very beginning, you can see how involving the customer early and starting small, along with everything else, becomes a part of a process. Although it's specific to Nagios, the process followed here could be easily adapted to integrating any sort of monitoring service. The remainder of the chapter is devoted to how you might integrate Nagios into a Fortune 500 company, finishing the book off with some good advice for integrating Nagios.
Despite all the book's strengths, there is some room for improvement. In chapter 2, it may have been more effective to outline the relationships between the Nagios configuration objects before discussing configuration planning. I found it much easier to think of a configuration for a large organization after knowing about how Nagios' configuration objects relate to each other.
Throughout the book, the authors have included configuration file snippets, scripts, and example script output in the main text. While all of these are quite useful and serve to enhance the book, I think it would have been better if these were all included in an appendix instead, perhaps keeping only the relevant parts of configuration snippets in the main text for clarification.
At the end of chapter 3, the sections on the future of Nagios and the CGI front end are informational and interesting, but they would be better placed in a separate chapter dealing with the potential future of Nagios in general. These and the other major areas of Nagios combined would provide more than enough material for a full chapter on their collective futures.
Overall, this is a great book for anyone using Nagios as more than a casual user, and is still very informative for the casual user. A few of these chapters alone would be worth the price of the whole book.
Disclaimer: I worked with one author when I was asked to review this book.
You can purchase Nagios 3: Enterprise Network Monitoring from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
A review, by an associate of the author, of an obscure product, with a picture of the book plastered on the front page of Slashdot. Who was paid off for that?
Nagios is great but even version 3 is by no means easy to configure. Like all too many F/OSS projects, the documentation is lacking or even incorrect in spots, and supplied examples barely scratch the surface of what the application can do.
I've been running it and it's great - I have it monitoring a bunch of servers (email, hosting, backup, file, etc.) with custom scripts and it works great -- once it's configured.
The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50
Is it extensible? Is it easy to use? I didn't get it the first time, better repeat it a few more times...
My personal experience is that Nagios is probably the LEAST easy to use of any piece of software, period. I hope they changed it in a major way, because last time I tried to use it I was forced to dig through configuration files and learn syntax just to get the thing to see if some server was responding to pings.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
I've used Nagios, but found Cacti and haven't turned back. Any other Cacti users out there? I found Cacti to be much easier to setup than Nagios and fairly extensible for the advance user.
Some of us still have the iBall legacy reader for the .DTF (Dead Tree Format) file type!
I gave up using Nagios when I found that there was no way to send an email to management linking to a preprepared (by month) Nagios SLA report.
What's the point of having a great monitoring tool if you can't share the reports in an useful and practical manner.
So I switched to Zabbix. Not happy with that either.
Try Pandora FMS. It does the same as Nagios, is open source but only requires a minimum knowledge of shell scripting to get it working and can monitor everything you can think of inside (using an agent) or outside a host. I monitor about 100 hosts with it and have about 1200 data points every 5-10 minutes (temperatures, network packets, processes etc.) but it scales much larger (using MySQL as backend) even on simple hardware.
Custom electronics and digital signage for your business: www.evcircuits.com
I actually managed to get a girlfriend. And she is a real hottie. I don't want to sound paranoid, but I need to monitor her to make sure she is not cheating, and keeping her AV up to date, you know what I mean.
So when I read this, "no device or situation has yet been found that cannot be monitored using Nagios and a pre-made or custom script, plug-in or enhancement", I thought this would be perfect.
I think a Nagios plug in would be best, preferably something with sharp blades. So how do I install this in my gf, and have Nagios monitor it?
Long live Hyperic. Free and extensible without the nonsense to set up that is Nagios. You can use Nagios plugins for it if you so wish.
1st Paragraph: Paraphrase of Foreword.
2nd Paragraph: What the initial chapter(s) is (are) about.
3rd Paragraph: What the next chapter is about.
4th Paragraph: What the chapter after that is about.
5th Paragraph: What the last chapter(s) is(are) about.
6th Paragraph: Pithy criticisms for balance.
7th Paragraph: Conclusion with the required, "This book is useful if you are like me" statement, as in, "Overall, this is a great book for anyone using Nagios as more than a casual user, and is still very informative for the casual user."
When I was a kid, we only had one Darth.
This book is not a big leap over the supplied Nagios documentation. I bought it out of guilt, but I doubt I have gotten my moneys worth. This is not so much a criticism of the book as praise for the supplied documentation (which is rather decent, given the topic).
Getting Nagios (or OpenView or whatever management system you have) working is a big job which will not be solved w/a $40 book and a afternoon.
For all of you who complain about Nagios being complicated, I hope you never see OpenView (et al).
If you haven't seen Nagios, there is a daemon which performs the collection. The UI is browser based (Apache HTTPD CGI applications). Sometimes there are agents on remote machines to collect status like process tables, disk utilization, etc.
Nagios is essentially a job scheduler/messaging system. Monitoring is performed by invoking little programs dedicated to collecting information, and these are easy enough to create. There are lots of hooks if you need to extend the system.
Since the UI is owned by HTTPD, so is access control. Who doesn't know how to set up LDAP or a auth file for Apache? Most of the CGI plugins are implemented in C and are not ugly to look at.
The agent issue is a little clouded because there are many agents to choose from. I usually just use the Net-SNMP agent because I have a lengthy SNMP background, but that is just my personal choice.
I will stop here since the article is about a book and not Nagios. I merely wanted to address some of the criticisms of Nagios.
Nagios was ok but the mapping on Zabbix is way better. I have it running in a virtual server on top of ubuntu. works great! Very flexible, can create your own icons, the graphs are easy to setup..
We used what's up gold before. which has a nice interface but when you consider buying a server license, buying what's up for $2,500 for 100 devices, and then dealing with ms sql, Zabbix wins.
... come back to me when nagios v3 is marked stable. :P
If you environment is big enough so you can employ at least 1 person to fully work with Nagios, then it's a great product. But out of the box it needs too much time to become usable. I'm talking of Nagios 2, I have no experience (yet) with Nagios 3.
Love many, trust a few, do harm to none.
I don't know why OpenNMS doesn't get more credit, maybe because it's a Java app, but it's a damned good one.
Get a very basic OpenNMS configuration going, point it at a range of IP addresses, and it will auto-discover most of what's out there. If you've got your SNMP agents up and running properly, it'll automatically start checking the more important OID's for you and graphing them with an RRD back end. Most of the setup can be done through the web interface instead of through vi. You don't have to restart the daemon every time you add a node.
If Nagios drives you a bit batty, check out OpenNMS.
I used nagios for years.. many many years. It has to be, as many have already pointed out.. the most difficult to configure OSS project ever made.
That said, it was fairly powerful once configured properly.
The thing is, though, that is has many shortcomings. I found a much better (although not necessarily as scalable) monitoring and data-gathering solution in Zabbix. They recently released a new version as well that adds many really nice capabilities like ipmi support.
I currently monitor about 250 hosts with Hobbit (http://sourceforge.net/projects/hobbitmon) and have had good success with it. It has trending (RRD graphs) and alerting thresholds (ie 85% full, email, 95% full pagers) built in together. It is also customizable. We have created several perl scripts that check random applications for various things that are also tracked with Hobbit. How much data Netbackup backed up last night. How many users are logged into our portal. Are the tape drives within Netbackup up or down? The list goes on and on. It runs on Unix and Windows, although the Windows client isn't as robust.
There is no such thing as secure systems, only secure admins.
I like Nagios but I can't really imagine how to apply it in large (think ten thousand hosts) setup in multiple regional/organizational branches and so on.
Also Nagios *is* painful to setup. First of all AFAIK there is no way to delegate administration f.e. to organizational branches. Configuration is just a big pile of config files included from some other config files etc. There is no autodiscovery/autoconfiguration of hosts since Nagios team belives it is BAD etc.
Well IMHO Nagios is grat but it is like, a big fat pile of hacked scripts and configs. Not too elegant but working.
Now... I am (well we are in my organization) using Zabbix and I find it great. It is much better organised/elegant than Nagios.
In Zabbix architecture you have well designed atomic elements like checks, items, services (groups), etc. It also gathers fine tuned historical data for trends and historical review. You can compact the data (lower the resolution) after a given time and so on. It is in fact a very complete monitoring framework with its own internal condition language, escalation engine. You can gather data from network checks, SNMP, custom scripts, Zabbix agents (aviable for most platforms) etc.
And it has normal configuration, not crude text config files. I have nothing against text files but sometime I don't really want to open my text editor only to quickly setup an ad-hoc overwiev screen with maps, graphs, status displays, clocks and you can have few screens of such rotating on your big screens in NOC. All with mouse clicking.
I can give it as a tool for sysadmin and he or she can work with it without having to study manuals. Not everybody in your organization is an unix hacker you know...
We have dozens of branch servers which are managed by local sysadmins and a farm of central servers which is managed by central staff.
Zabbix works in distributed manner so a local branch can have very detailed view on their infrastructure and at central level I can have an functional/business overview of entire infrastructure, core services (like business systems, transactions etc.) Not just simple checks if RAID is OK - I don't care if RAID in some server is OK. I need to know why (where, who to blame) given service (be it MQ/WebSphere) is not working as desired.
And also it is free, open source and aviable in most linux/unix distributions as a standard package. So when considering enterprise monitoring platform do yourself a favour and also check Zabbix.
http://www.zabbix.com/downloads/ZABBIX%20Manual%20v1.6.pdf
By the very nature of network administration the associated tools are going to be complex. Nagios offers functionality and control options that are appropriate given the type of operations for which most admins are responsible. I'm looking forward to digging into V.3. Like the man said, don't hate the hammer hate the house!
If you haven't already, take a look at Zenoss. Aside from having a pretty well designed UI (which as I get older I'm beginning to feel deserves more credit in the usefulness dept), supports SNMP by default (I'm not a big fan of clients unless I REALLY need them) *plus* it supports Nagios plugins.
And I'm not trying to steal any thunder here, I think Nagios is a great option.
Quack, quack.
I had put in a comment earlier about zabbix and it seems to be missing.
anyway, I third on zabbix. It works great and is very flexible. I really like the mapping!!
I've been using PandoraFMS 1.3.1 and now pandorafms 2.0 and for me it's the best monitoring tool nowadays.
Nagios, for me has a very poor reporting which is not helpful whatsoever, and the interface is awful.
What I like of pandorafms is the graphs and how easy is to monitor a host from the webconsole, nagios it's horrible to setup.
my 2 cents
no, it really is the biggest pile of cow crap I've ever seen in my entire life.
it sucks ass worse than a hoover. I'm not joking.
get a *real* monitoring system
Right now I wanted to check Nagios documentation for simple thing - configuration file syntax. This is the basic stuff. It is the first thing that should be defined in reference manual. I like to know how the files are processed. How do I do comments. How do I define multiple line commands and so on.
Please point me out that I am blind or stupid since I really cannot find it in manuals here:
http://nagios.sourceforge.net/docs/3_0/toc.html
Also I find the online manual quite retarded/clunky. It doesn't even has a search! I wonder why they havent use somekind of wiki (any serious wiki system has search) or similar.
Blech, nagios is probably the most disgusting hack currently in wide use. It was overdue for a complete rewrite after Nagios2 - but nagios hackers don't seem to have any pain treshold. Nowadays it's not even funny anymore. Nagios has gone *way* over its expiration date. The closest analogy would be a pot of milk that has been sitting in direct sunlight for 6 months straight...
I strongly suggest that anyone looking for a monitoring solution stays away from the dead horse and looks at the modern alternatives first. There are plenty: Munin, Cacti, Zenoss, Pandora, OpenNMS, just to name a few.
Most importantly: Take your time before you decide and evaluate thoroughly. A monitoring solution will stick with you for a long time and migrating to a different software is usually a very painful process. Which, btw, is the main reason why so many sites still ride the dead horse...
For those of you that aren't particularly fond of the complexity of Nagios' configuration, check out GroundWork. It's basically Nagios + a fairly easy-to-use web interface. We've been using it up at my work for over a year and it works great.
I used nagios for years.. many many years. It has to be, as many have already pointed out.. the most difficult to configure OSS project ever made.
R$+@$=W $@$1@$H user@thishost -> user@hub
R$=W!$+ $@$2@$H thishost!user -> user@hub
R@$=W:$+ $@@$H:$2 @thishost:something
R$+%$=W $@$>3$1@$2 user%thishost
Sendmail...
Nagios is easy, but it only makes sense if you have dozens or hundreds of systems, for less, get something simpler, and it will only work if you understand how to group your hosts, services etc.
Deleted
You are comparing apples with oranges, nagios is for service monitoring, cacti is for diagrams.
If you're polling the devices anyway, why can't you feed the data into a database and draw charts? Why do you need two systems? It all comes from the same source.
So Nagios is hard to set up? Probably, you can't go from zero to running in 5 minutes. It's a steep learning curve, but if the initial investment of a book (I used building a monitoring environment with nagios) and a few hours, you shouldn't be monitoring things. You won't do it correctly, you may as well throw some cron jobs together.
The first step in monitoring is working out what you want to monitor. The second step is working out what you really want to monitor. The third step is working out how you want to display problems. When you have 60 people in support working on a 6 shift 24/7 pattern, you can't expect emails to be any use. "Service problems" in nagios is fine, but there's a lot of issues that 2nd line don't need to know about -- solaris security patches on an intranet for example, can wait until the 9-5 admins get in.
Nagios is painfully easy to administer, if you set it up right. Once you know what you're doing (or even know enough to be dangerous, like myself), you can deploy a new nagios installation in about 20 minutes, add a new device that follows existing rules (new web server for example) in under 5 minutes, and a new device with new plugins in half an hour.
Nagios then grows organically. When something strange and new breaks we cobble a plugin together,
Configuration is in plain text files, one for each device on the network. I have these as an subversion working copy, which gives me the ability to track changes and easily roll back any configuration problems.
We have dozens of weird bespoke plugins, one uses WWW:Mechanize and Perl to run through a workflow on a specifc webpage, another looks at the rate of change of growth of a jboss logfile, and the proportion of stack traces, one logs into a remote machine and checks jumbo pings are working through the network.
We find nagios essential to monitor the service we provide. I don't particularly care if the server an oracle database runs on is pingable, I care if I can log in and run "select 1 from dual" (or usually something more application specific).
The small system we monitor is made up of about 800 services over 190 devices.
The number of people complaining in this topic about the ease of use of nagios shows that nagios is lacking. Trying to figure out nagios is a waste of time when there are so many alternatives out there that are much easier to use.
i.e. cacti(mostly for graphing, but can be used for alerts using plugins), pandora FMS, groundworks, sitescope, the Dude.
I have never once personally had any dealings with a properly implemented Nagios system. Every single time it was obviously tossed up by someone who had minimal knowledge of how to properly monitor the infrastructure.
The biggest complaint I hear is "too many alerts". So set your dependencies properly! You say you did that but you still get 600 alerts when the router dies? That's because you told it you wanted the alerts. See that "u" in "notification_options". That means "unreachable". You want to be alerted when the box can't be reached. You probably wanted "d,r", not "d,r.u".
The next complaint. It's so much work to add a system. Huh? It takes me about 30 seconds to add another system and all the tests I need. The trick is using host groups to automatically assign tests to a system. For example, using a generic LAMP type server. What can we assume about this? It's running Linux, Apache, MySQL, and Perl or PHP. That's a bunch of tests right off. In my world, SNMP is assumed on all systems (because I made it that way, that's why). So we define a bunch of service checks using SNMP, but instead of using "host_name some_hostname", we use "hostgroup_name lamp-servers". Now when I add a new server, I add "hostgroups lamp-servers" to the definition and like magic it gets all the tests I need: snmp port responding, ssh access, apache daemon running, mysql daemon running, web page accessible, disk space good (defined in snmpd.conf), CPU usage, load average, plus sone automatic dependencies: all snmp tests depend on the snmp port responding. Web pages are dependent on the apache daemon running, etc. I even have some simple graphing included automatically. Even the O/S icons are defined by the hostgroups. Each distro has its own hostgroup which takes care of that detail (e.g. centos-system and ubuntu-system).
Ten simple lines to define a new hosts can result in 20 service checks. I rarely need to define a new service check. And when a router goes out? One alert for the router.
Not every system is going to be generic like this, but any time I have more than one system require a specific service check, I create a hostgroup to handle it.
-- Will program for bandwidth
I always found it an absolute pig to configure.
I tried Nagios and getting it installed, set up and running seemed to be an exercise in futility. The purpose of an NMS is supposed to accomplish saving you some work, not creating more work to set it up and babysit it.
I then tried OpenNMS and never looked back. Installation and configs were a piece of cake. I installed it to a fresh install of Ubuntu Hardy Heron desktop edition and was up and running in about two hours. Everything just simply worked right the first time.
I've been using OpenNMS to monitor a 1000 user voip network with tons of cisco switches, routers and call manager servers scattered across 8 buildings for about four months now and it's working like a dream.
When I was evaluating Network Monitoring packages I installed most of the major packages in VM's to test drive all of their features. Out of all of the packages I installed Nagios was the most unrealistic and unscalable package of them all. Their approach of editing a single configuration file makes it impossible to use on a "real" network with many thousand hosts. Once you get above 10 servers it gets really cumbersome to manage because of the syntax of their config file. If you have a few crappy windows systems and linux hosts to monitor Nagios is tolerable. For a real enterprise monitoring tool its a joke.
Get used to seeing the message
** PROBLEM alert - nagios is down **
I have configured nagios for mid size organization for monitoring servers such as Linux, M$ plus over 1200 services and also for monitoring cisco routers and switches. nagios is not difficult to install/configure however while monitoring 1000+ can't be done in single day. most of the time one have to copy/paste the configuration for identical hosts. Once installed and configured properly nagios is a great NMS for sysadmins in term of notifications of server or service failure. Beauty of nagios is that you can integrated it to varity of services such jabber, email, sms, alarms. nagios will notify service/server failure by email in my case i have configured it to integrate it with our sms gateway (running gnokii) to notify not only just giving alerts on web interface but also by sending emails to conern ppl and also SMS. I found it a great asset for any IT dept and no organziation these days lives without a NMS and nagios is great NMS.
http://askaralikhan.blogspot.com/
Hobbit is much easier to configure and use
I migrated from Nagios to OpenNMS/Hyperic and never looked back. Between the Hyperic agent and its granular statistics, and OpenNMS's alerting, mapping, and autodiscovery, I would never use Nagios in an enterprise environment again.
I prefer autodiscovery. When a new device comes up, I don't have to do anything. It magically is part of the correct groups, is graphing, and alerts are done without me lifing a finger. Much better than needing to connect to the box, editing the config files, and reloading the config.
I noticed that the cover says "free e-book download" in the upper right. this is slightly off-topic (of Nagios), and somewhat on-topic (Syngress). if you buy these books with the understanding that you will get a free PDF copy to have around, this is no longer true since they were bought by elsevier. The old web site where you could register the books and get free downloads is gone and there is a new service on the elsevier web site, but no way to register your recently purchased books. there is a faq that tells you how to "register" your book, but this requires a "redemption code" that is not in the book. i've written to them to get a redemption code and they haven't responded. buyer beware.
"Due to its extensibility and ease of use, no device or situation has yet been found that cannot be monitored using Nagios and a pre-made or custom script, plug-in or enhancement."
It's so true! We put an air-actuated sensor on his pants, and now we get e-mail and SMS whenever grandpa farts. Thank you Nagios!
+++OK ATH
Hi,
Our publisher really dropped the ball on this and all the authors involved with the book (including me) are very sorry about the problems purchasers of the book have been having getting access to the code etc in the book.
The author team created this site
http://www.nagios3book.com/
because we saw early on the problems Syngress was having getting it's act together; on the site you will also find the names and email addresses of the book's management team, they have all promised to do their best to help anyone out who has purchased the book and has complaints about the electronic content that was supposed to accompany the book.