Nagios System and Network Monitoring

← Back to Stories (view on slashdot.org)

Nagios System and Network Monitoring

Posted by ryuzaki0 on Wednesday April 11, 2007 @07:22AM from the keep-an-eye-on-things dept.

David Martinjak writes "Nagios is an open source application for monitoring hosts, services, and conditions over a network. Availability of daemons and services can be tested, and specific statistics can be checked by Nagios to provide system and network administrators with vital information to help sustain uptime and prevent outages. Nagios: System and Network Monitoring is for everyone who has a network to run." Read on for the rest of the review. Nagios: System and Network Monitoring author Wolfgang Barth pages 464 publisher No Starch Press rating 9 reviewer David Martinjak ISBN 1593270704 summary Covers installing, configuring, and deploying Nagios to monitor systems and services on a network.

The book is authored by Wolfgang Barth and published by No Starch Press. The publisher hosts a Web page which contains an online copy of the table of contents, portions of reviews, links to purchase the electronic and print versions of the book, and a sample chapter ("Chapter 7: Testing Local Resources") in PDF format.

An amusing note to begin: this is one of the only books I have read where the introduction was actually worth reading closely. Many books seem to talk about background or history of the subject without providing much pertinent information, if any at all. In Nagios: System and Network Monitoring, Wolfgang Barth begins with a hypothetical anecdote to illustrate the usefulness of Nagios. The most important section in the introduction, however, is the explanation of states in Nagios. While monitoring a resource, Nagios will return of one of four states. OK indicates nominal status, WARNING shows a potentially problematic circumstance, CRITICAL signifies an emergency situation, and UNKNOWN usually means there is an operating error with Nagios or the corresponding plugin. The definitions for each of these states are determined by the person or team who administers Nagios so that relevant thresholds can be set for the WARNING and CRITICAL status levels.

The first chapter walks the reader through installing Nagios to the filesystem. All steps are shown, which proves to be very helpful if you are unfamiliar with unpacking archives or compiling from source. Users who are either new to Linux, or cannot install Nagios through a package manager, will appreciate the verbosity offered here. Fortunately, the level of detail is consistent through the book.

Chapter 2 explains the configuration structure of Nagios to the reader. This chapter may contain the most important material in the book as understanding the layout of Nagios is essential to a successful deployment in any environment. The book moves right into enumerating the uses and purposes of the config files, objects, groupings, and templates. All of this information is valuable and presented in a descriptive manner to help the reader set up a properly configured installation of Nagios. My biggest stumbling block in using Nagios was wrapping my brain around the relationships of the config files and objects. This chapter clears up all of the ambiguities I remember having to work out for myself. If only this book had been around a few years ago!

The sixth chapter dives into the details of plugins that are available for monitoring network services. This chapter explains using the check_icmp plugin to ping both a host and a specific service for verifying reachability. Additional examples include monitoring mail servers, LDAP, web servers, and DNS among others. There is even a section for testing TCP and UDP ports.

Next, the book covers checking the status of local resources on systems. At work, we have a system in production that could have been partitioned better. Unfortunately, /var is a bit smaller than it should be, and tends to fill up relatively frequently. Thankfully, Nagios can trigger a warning when there is a low amount of free space left on the partition. From there, we have Nagios execute a script that cleans out certain items in /var so we don't have to bother with it. We can also receive notification if the situation does not improve, and requires further attention. In addition to monitoring hard drive usage, the book includes examples for checking swap utilization, system load, number of logged-in users, and even Nagios itself.

Chapter 12 discusses the notification system in Nagios. You provide who, what, when, where, and how in the configs, and Nagios does the rest. The book does a fantastic job of explaining what exactly triggers a notification, and how to efficiently configure Nagios to ensure the proper parties are being informed of relevant issues at reasonable intervals. For example, the server team might be interested to know that /var is 90% full on one of the LDAP servers; however they don't need to be notified of this every thirty seconds. This chapter also covers an important aspect of Nagios known as flapping. Flapping occurs when a monitored resource quickly alternates between states. Nagios can be configured for a certain tolerance against rapid alternating changes in states. This means Nagios won't sound the alarm if the problem will resolve itself in a short period of time. Usually flapping is caused by an external factor temporarily influencing the results of the test from Nagios; and therefore has no long-term impact.

The last major chapter to mention here deals with essentially anything and everything about the Nagios Web interface. The main point of interaction between the administrator and Nagios is the fully featured Web interface. This chapter covers recognizing and working on problems, planning downtimes, making configuration changes, and more. I especially like that the book gives an overview of each of the individual CGI programs that the Web interface is composed of; as these files are important for UI customization.

The only aspect of this book that I did not care for was that the book reads like a reference manual at times. The first several chapters start out more conversational in tone with great explanations of the procedures and files; but later it sometimes feels like I am repeatedly reading an iterated piece-by-piece structure, filled in with the content for that chapter. That is not necessarily bad all together as it does provide consistency in the presentation of the information. Additionally, the level of detail is outstanding throughout the book. The explanations are never too short or too long. This is definitely a valuable book for administrators at all levels with fantastic breadth and depth of material. Administrators who are interested in proactive management of their systems and networks should be pleased with Nagios: System and Network Monitoring.

Nagios is licensed under the GNU General Public License Version 2, and can be downloaded from http://nagios.org.

David Martinjak is a programmer, GNU/Linux addict, and the director of 2600 in Cincinnati, Ohio. He can be reached at david.martinjak@gmail.com.

You can purchase Nagios: System and Network Monitoring from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

116 comments

Min score:

Reason:

Sort:

Re:Good book (this is killing me) by Critical+Facilities · 2007-04-11 07:32 · Score: 5, Interesting

Ok, man, I swear this isn't a troll, but I have to know, what the heck, are you doing to these books?

I mean, it's none of my business, but do you have some insane reading technique?
Re:Good book (this is killing me) by Anonymous Coward · 2007-04-11 07:39 · Score: 0

maybe he is reading them in his oil filled server room?
Re:Good book (this is killing me) by wframe9109 · 2007-04-11 07:44 · Score: 1

Indeed.

It's typically not wise to fold books over backwards when reading. The unpleasant sound the book makes when you do this the first few times might have been a good indication that you should stop ;)
Re:Good book (this is killing me) by scrollios · 2007-04-11 07:45 · Score: 0

lol.. Now i wanna know too.

--
Doot!
Missing Module by Silver+Sloth · 2007-04-11 07:50 · Score: 1, Funny

This chapter also covers an important aspect of Nagios known as flapping. Flapping occurs when a monitored resource quickly alternates between states. Nagios can be configured for a certain tolerance against rapid alternating changes in states. But I cant find out how to set the alarm when the boss flaps.

--
init 11 - for when you need that edge.
1. Re:Missing Module by Anonymous Coward · 2007-04-11 09:26 · Score: 0, Flamebait
  
  It's all very well joking about it, but would you really trust an open-sores solution in a mission critical environment where real money is at stake (for example a major stock exchange or trading floor)?
  
  As a senior vice president in charge of operations at a major financial institution I need to know my monitoring solution has been proven in the field under stress. Your mom's basement doesn't count.
  
  Fortunately there are plenty of rock solid alternatives out there from the likes of Tivoli, BMC, Managed Objects, Integrasolv etc.
2. Re:Missing Module by Anarke_Incarnate · 2007-04-11 09:33 · Score: 1
  
  You would want to make sure you went with something else, completely avoiding openSSH, apache, GNU/Linux, BSD, etc. Nope.....Windows for you, Sir
3. Re:Missing Module by Anonymous Coward · 2007-04-11 10:56 · Score: 0
  
  Well, Solaris would be my personal choice, but running the London stock exchange for 6 years without any outage is still quite impressive.
4. Re:Missing Module by itwerx · 2007-04-11 14:31 · Score: 1
  
  would you really trust an open-sores solution in a mission critical environment where real money is at stake
  
  Er, you really don't know anything about Nagios do you?
  Try doing a little Googling on Nagios and then see if you can make the FUD a little more subtle next time.
5. Re:Missing Module by Anonymous Coward · 2007-04-11 20:42 · Score: 0
  
  Ermmm. Yes. I would.
  
  Oh, and I do.
  
  4 datacenters in fact.
6. Re:Missing Module by catman · 2007-04-12 01:17 · Score: 1
  Two points:
  
  If you need Tivoli, you know where to get it, if not, Nagios may be what you need.
  
  Lose the attitude, son.
  
  (I do not normally say "son" to an AC, but the childishness of the post plus my belief that I am older made me do it :-) )
7. Re:Missing Module by hearnz · 2007-04-12 01:32 · Score: 1
  
  I would absolutely trust it in a mission critical environment.
  
  I've worked in large organisations that have ditched both BMC Patrol and HP OpenView in favour of Nagios, monitoring tens of thousands of services (many of them business critical) on thousands of servers, over geographically dispersed WANs. It was extremely reliable, easier and faster to set up, more flexible, much easier to implement custom service check plugins, required fewer hardware resources to run, and gave a massive dollar saving as well.
  
  If you cared to spend a couple of minutes with Google, you'd find there are even larger organisations using Nagios to monitor environments that make the ones I mentioned seem insignificant.
  
  There *are* certain specialised features of products from BMC, Tivoli, HP and so on which aren't available in something like Nagios - and they can be worth the money if you really need them (e.g. I use Tivoli products to closely monitor our WebSphere applications); but if you are only choosing those (extremely expensive) commercial products out of a fear of open-source software, then you are simply a fool.
  
  I guess this goes to show that being a "senior vice president in charge of operations" doesn't mean you have a clue what you are talking about...
8. Re:Missing Module by JWSmythe · 2007-04-12 05:14 · Score: 1
  
  I once worked for someone who insisted that you get what you pay for. Stable solutions must be paid for. Anything else wouldn't work.
  
  Our expensive monitoring system was nice and all, but I wrote my own just for fun on my own personal time. I wrote it to do exactly what we needed, and not too much else.
  
  We needed specific services monitored. They must be monitored once per minute. We needed audible notification immediately upon a problem (festival to do tts). At 5 minutes, and every 30 minutes after that, pages were sent out.
  
  I configured it to monitor our servers, and left it running on my desktop. Usually notifications coincided with the commercial package, at least within a few seconds of each other. Mine was usually first. A few seconds is ok, as long as we were notified.
  
  One day, my software started "talking" about an error. I then received the page from my software. I checked the problem, and confirmed it existed. An essential service was not functioning correctly. I checked the commercial program, and it indicated that the service was running properly. Of course, I fixed the problem.
  
  How long would we have waited for the problem to be resolved, if we depended on only the commercial software? Until an owner found it wasn't working? Until customers were complaining?
  
  There's usually no harm in using multiple monitors. Sure, go with the COTS program, since you've already paid for it. Add a second (third, or fourth) monitoring solution, just in case.
  
  We eventually dropped the commercial program, as we made improvements to my software. It positively identified problems, without sending false alerts.
  
  So, would I trust an enterprise on open source software? sure. The significant difference between my software and something like Nagios is that there are a whole lot more eyes looking at the Nagios code to fix problems.
  
  --
  Serious? Seriousness is well above my pay grade.
9. Re:Missing Module by Anonymous Coward · 2007-04-12 08:44 · Score: 0
  
  You'd love to believe that I dismissed Nagios without evaluating it, wouldn't you?
  Do you really really think that major financial corporations would spend money on software and services that they could get for free?
  
  Contrary to the groupthink attitudes of the basement-dwelling Linux Zealots that infest this site, "pointy headed bosses" are NOT the norm, and these corporations are making $$$s because their employees are smart enough to differentiate between immature "hobbyist" software and something robust that you could trust with your bottom line.
  
  I have to account for every last cent I spend. If Nagios could do what real enterprise monitoring software could do believe me, I would be using it.
  As it is, it is borderline unusable for all but the most trivial and simple monitoring tasks.
  
  Get back to your basement and carry on with your blinkered 'open sores is best' outlook. Reality will catch up with you in the end...
10. Re:Missing Module by itwerx · 2007-04-13 03:52 · Score: 1
  
  You'd love to believe that I dismissed Nagios without evaluating it, wouldn't you?
  Well, yes, actually, I do believe that. Or if you did evaluate it you didn't get far enough to even understand how it works.
  
  Do you really really think that major financial corporations would spend money on software and services that they could get for free?
  Absolutely! I've been in this industry for twenty years and have consulted with many hundreds or organizations. That's hardly the stupidest move I've ever seen the IT department make...
  
  Contrary to the groupthink attitudes of the basement-dwelling Linux Zealots that infest this site, "pointy headed bosses" are NOT the norm, and these corporations are making $$$s because their employees are smart enough to differentiate between immature "hobbyist" software and something robust that you could trust with your bottom line.
  Er, no, most corporations really don't give a damn about hardware or software. IT is simply a means to an end. You'd be amazed at some of the spit and bailing wire solutions that even Fortune 100 companies have in place as parts of their core business. But they frankly could care less as long as it gets the job done. Bosses in general are not pointy-headed, that's true, but they are also generally not IT geeks. They depend on IT people to determine the best solution-set and there are definitely a lot of pointy-headed IT people out there!
  
  I have to account for every last cent I spend.
  Hmm, and you're also the evaluator. Lemme guess, you work for a small company, probably a few hundred employees or less, probably a slightly more complex than usual IT infrastructure that's been cobbled together over the years and is a bit unstable. You don't have the funds to buy something like OpenView but you desperately need something to help you manage the fires. Yep, people like you keep my company in business. :)
  
  If Nagios could do what real enterprise monitoring software could do believe me, I would be using it.
  As it is, it is borderline unusable for all but the most trivial and simple monitoring tasks.
  Need some more detail here. Nagios can't do everything, but neither can any other application. But my guess is, again, you couldn't figure it out in the first place so you never got that far anyway. OSS is not like commercial software, it is very modular. I'm going to say you downloaded the basic installation package without doing any research on plugins or the available GUI front ends for it. So you're sitting there looking at a ton of configuration files going WTF is this?!? And you poked around at a couple of the samples, got one or two of them to work, but didn't get far enough in the documentation to understand the architecture so when you tried something more complex it didn't work.
  Now if you'd been smart and downloaded a few things like that Groundwork Monitor GUI and maybe some plugins relevant to your environment like NRPE-NT (I'm going to be snippy here and assume it's all Windows), and heck, even read a HowTo if you didn't feel like digesting the full documentation then hey presto! It would have worked just fine. (Well, maybe, your posts above don't exactly generate a vote of confidence).
  
  Get back to your basement and carry on with your blinkered 'open sores is best' outlook. Reality will catch up with you in the end...
  Funny you should mention that, my wife and I have been discussing redoing one end of the house and we were trying to decide whether or not to have a basement level...
  Here's the thing, I do understand reality, and I do understand both OSS and commercial software. And about 90% of our clientbase is standardized on Windows and we regularly recommend and implement commercial monitoring systems like HP OpenView (we're an HP reseller so we do tend to be a little biased), but the fact remains that OSS has its place, and for a sufficiently savvy IT staff, or a sufficiently cost-co
Old NetSaint and Nagios geek comments by Anonymous Coward · 2007-04-11 07:50 · Score: 5, Informative

Please forgive my anonymous coward use: my comments would reveal my name too well.

I'm an *OLD* Netsaint and Nagios user, and have contributed to both. Guides are great, playing with it is great, and it does a lot of things very well. But what Nagios has never had is a way to publish the URL's of specific queries or reports in a way that can be bookmarked and sent to someone else for reference. It's a big, big, big flaw in the system, common to a lot of web-based projects.

The other huge, huge flaw of Nagios is configuring it. It shouldn't take a reference book from O'Reilly to do this efficiently, but I'm afraid it does. There are easily a dozen different configuration tools at www.nagiosexchange.org and sourceforge.net, and *every single one of them* has major problems that could be solvd with 10% of the time spent on Nagios itself. Most are abandonware, exciting but uncompleted projects that are never going to be completed. Others rely on hand-compiling Nagios itself with strange local modifications and local configurations that are very difficult to import a working Nagios to, or export from. Others have absolutely *no* security model, incapable of securing access to them or relying on locally stored plain-text password setups: others rely on non-privileged accounts to edit the Nagios configurations, including the password files for databases or proxy services, in semi-public repositories. Others rely on installing every file in a browseable web directory, permitting local unauthorized to poke the guts of and use the security flaws. (Yes, you perl idiots who execute random file and directory creation without checking if it's empty first or protecting it from being written into by other people before you copy its contents, I mean you!)

Other configuration tools have beautiful "artist conception" interfaces that will make your eyes bleed aft 20 minutesworking with it. Every last one of them listed at Sourceforge and NagiosExchange suffer from one or many more of the major open source GUI flaws Eric Raymond ranted about in hisi CUPS horror story, years ago.

It's unfortunately so bad that I've had to throw away weeks of work and switch to Altiris on a major project, which is fairly painful to switch to but at *LEAST* has a usable interface.
1. Re:Old NetSaint and Nagios geek comments by rprague · 2007-04-11 08:08 · Score: 1
  
  Being a long time nagios and netsaint user and contributor to the community, I have to say your comments are 100% dead on.
  
  The configuration of nagios is confusing even for a seasoned user, the security models are non-existant and adding even simple graphing and historical data to nagios requires another entire level of ridiculous configurations.
  
  Nagios was a fantastic tool, in 2001. However, it is basicly the exact same tool today that it was in 2001 and there are far better tools available now that do the same thing, but they're easier to configure, manage and use.
2. Re:Old NetSaint and Nagios geek comments by walt-sjc · 2007-04-11 08:08 · Score: 2, Informative
  
  I've been using nagios for nearly 2 years too, to monitor about 80 servers. Also running the NRPE plugins to monitor things like disk space, load, and a number of other aspects.
  
  I agree that the configuration is pretty bad, and your other points on the interface. Dependencies are a nightmare to configure.
  
  That said, it does work, and requires very little maintenance once it's setup. It helps to use one file per server too, since you can include entire directories that contain configuration files. What I did was write a simple perl script that I "check off" which services I want to monitor, and it creates the nrpe.conf and nagios conf file for each specific machine. Frequently have to hand-tweak though for the dependencies.
  
  I never read any book on it, just the base docs. A book would have helped. I also haven't found any good open source alternative however.
3. Re:Old NetSaint and Nagios geek comments by Anonymous Coward · 2007-04-11 08:11 · Score: 0
  
  I agree that a major flaw of Nagios is the fact that it as no management interface. Why would you have a web based monitoring interface and not pair it with a web based management interface? It makes the software difficult to deploy and manage for anyone who is not a Systems Administrator. (i.e. Network Engineers)
4. Re:Old NetSaint and Nagios geek comments by wgadmin · 2007-04-11 08:15 · Score: 1
  
  Which free tools would you rather suggest? I am sincerely interested.
5. Re:Old NetSaint and Nagios geek comments by m0i · 2007-04-11 08:40 · Score: 1
  
  It's unfortunately so bad that I've had to throw away weeks of work and switch to Altiris on a major project, which is fairly painful to switch to but at *LEAST* has a usable interface.
  altiris, just bought by Symantec.. expect the best, prepare for the worse.
  
  --
  have you been defaced today?
6. Re:Old NetSaint and Nagios geek comments by Anonymous Coward · 2007-04-11 08:50 · Score: 0
  
  www.nagiosexchange.org
  I still don't know who this Nagio guy is and you want me to click on a link with that name?
7. Re:Old NetSaint and Nagios geek comments by jimicus · 2007-04-11 08:53 · Score: 1
  
  Forgiven.
  
  I use Nagios myself as I was looking for a quick and dirty replacement for Big Brother.
  
  While it's a fantastic tool, my biggest beef with it by a LONG way is that configuring a new server to monitor is always a case of "hand-edit this config file and that, figure out what's important to monitor and what isn't, realise 3 months later that you missed out something you really should be monitoring...." aarrgh. Templates help hugely but they're only part of the solution.
  
  If you're going to make monitoring easy with a pretty-pretty web-based user interface, then why on Earth can't what the monitor itself does be configured through the same web-based interface?
  
  Right now I'm using ZenOSS (http://www.zenoss.com/) which solves that particular problem very neatly - and unlike Nagios, has very good SNMP support. But it's not as easy to hack as Nagios, mainly because it doesn't have such a strong community surrounding it so punching "zenoss monitor (insert obscure thing here)" into Google doesn't work so well.
  
  Ah, for something with the out-of-the-box functionality (and "out of the box" means "out of the box", not "out of the box provided you spend another 2 hours setting up a bunch of config files to support it") and ease of use of ZenOSS, but the hackability and community of Nagios. Never gonna happen.
8. Re:Old NetSaint and Nagios geek comments by schlick · 2007-04-11 09:03 · Score: 2, Interesting
  
  Have you looked at Hyperic? http://www.hyperic.com/ I'm using the open source version and I like it alot.
  
  --
  "It's because they're stupid, that's why. That's why everybody does everything." -Homer Simpson
9. Re:Old NetSaint and Nagios geek comments by sysmanman · 2007-04-11 09:07 · Score: 1
  
  Very good points. Looks like this article agrees with you - http://searchenterpriselinux.techtarget.com/origin alContent/0,289142,sid39_gci1250897,00.html
10. Re:Old NetSaint and Nagios geek comments by Nutty_Irishman · 2007-04-11 09:20 · Score: 1
  
  The other huge, huge flaw of Nagios is configuring it. It shouldn't take a reference book from O'Reilly to do this efficiently, but I'm afraid it does. There are easily a dozen different configuration tools at www.nagiosexchange.org and sourceforge.net... I think the other huge flaw is that 50% of the users visiting www.nagiosexchange.org are probably looking for different "configuration tools" in the first place.
11. Re:Old NetSaint and Nagios geek comments by Anonymous Coward · 2007-04-11 09:25 · Score: 0
  
  But what Nagios has never had is a way to publish the URL's of specific queries or reports in a way that can be bookmarked and sent to someone else for reference. It's a big, big, big flaw in the system, common to a lot of web-based projects.
  
  Er, they are GET requests, what's so hard about cutting and pasting them? The fact they're in frames? Just view the frames alone; not hard to do with any browser I know of.
12. Re:Old NetSaint and Nagios geek comments by Anonymous Coward · 2007-04-11 09:52 · Score: 2, Informative
  
  I've got to agree. We use it at an ISP level to monitor various functions, both leased line and server functions based on customised scripts, easily several thousand devices are being monitored primarily through Nagios. The theory being we can contact customers pro-actively when they experience connectivity issues, as a free function of business. As a natural side effect of having acquired other ISPs over the years our monitoring system is multi-faceted depending on each ISPs platform quirks. Great for them, a PITA for us that now have to monitor multiple copies of Nagios/Netsaint. Whilst the situation is none of the Nagios developers fault, as someone that routinely uses them we can see issues on ease of use that could do with some improvements. In an attempt to consolidate this one of our engineers using a mix of open source code and some of his own produced a DB based back end for Nagios that on a cron'd basis produces new config files for Nagios.
  The platform has been running for a while and stuff is getting transferred across to it, but we keep hitting occasional almost inexplicable quirks with Nagios; such as bizarre limits on the number of characters that can be used in a string of host names, which required editing of the source code and recompilation of Nagios! No obvious reason for it to be as low as it was, plenty of people seem to be butting their heads against it too. One of the most useful changes the engineer made was a link back from Nagios to the database, so when an alarm occurs its a two click process to view the specifics of the alarm, and then view the details in the database relating to that device. The resulting hybrid system is too customised unfortunately for it to be appropriate to be released as open source.
  
  The whole interface looks, to be frank, ugly and extremely dated. Tactical overview is still too crowded a display.
  The lack of a simple quick visual history is detrimental to trend analysis. On one of our paid for (license per machine) monitoring platform it takes barely a minute to view anything from a 24 hour view through to year long views for the devices, great for spotting quirks like customers always turning off routers at end of play before a bank holiday weekend for example, or quirks with a server always occurring at the same time each day. Instead to do trend analysis one is forced to work your way through a not all helpful textual history. Cacti, a free rrdtool based SNMP monitoring platform produces graphs quite happily, so its not as if its unheard of in the open source community either.
  
  Most modern monitoring systems use simple interfaces for managing devices.. Nagios is stuck with what can be annoying files to edit. Want to add a device? Got to do it by hand, making sure to add it in all the locations it needs to be added in. Same for removing, you've got to find every instance of that device in a text file or you're stuffed. Rule of thumb: backup the text file first before editing it. The verify tool is helpful, but still on a modern system is beyond what any user should be expected to handle, and instantly raises the required technical ability of the operator or maintainer. In my opinion monitoring should be a no brainer process: See alarm, inform relevant party (customer, network team, server maintainers.) Adding and removing devices should also not be a difficult task. A name, an IP, and a few tick boxes for choice of monitoring functions should be all that is necessary for the day to day work. To require someone to edit the text file is utterly ridiculous in this day and age.
  If the Nagios project wishes to continue to be of as much use as it has been in the past and continue to be used by companies then routine comparisons with other projects are essential and stubbornness and refusal to change core methodology of the platform should be overthrown if it results in positive and ongoing improvements to the platform, even if general and feature progress has to take a back seat to it. It seems to be a common issue that with many programs the incentive is more to add features than deal with fundamental operational processes. I guess once you've written code to do something once it must be quite boring to re-write it again.
13. Re:Old NetSaint and Nagios geek comments by TooMuchToDo · 2007-04-11 09:58 · Score: 1
  
  I guess that's the main problem. I shouldn't need to write complex config files for it to work. I would even PAY to have someone write a web interface similar to Keynote's Red Alert. I simply want to specify the host to monitor, the services to monitor, who to page when something breaks, and be done with it.
14. Re:Old NetSaint and Nagios geek comments by Anonymous Coward · 2007-04-11 10:12 · Score: 0
  
  I use Red Alert to perform supplemental monitoring of my services in addition to Nagios. But I use Nagios for monitoring the majority of my network. Point and clicking your way to a monitor in Red Alert is trivial, but when you are configuring several hundred hosts with several thousand services you're going to rapidly point and click your way to carpal tunnel. I appreciate the flat file configuration of Nagios because it lends itself to easily scripted solutions.
15. Re:Old NetSaint and Nagios geek comments by calethix · 2007-04-11 10:24 · Score: 1
  
  You might want to check out Groundwork OpenSource.
  
  It's built on Nagios and several other projects. So basically Nagios with a really nice gui front-end to get things setup. I've been messing around with the free version to evaluate it as a replacement to big brother.
  
  It took me a little while to get all the connections straight in my head but would probably be more intuitive to someone with more experience in the area.
16. Re:Old NetSaint and Nagios geek comments by Colin+Smith · 2007-04-11 11:24 · Score: 3, Interesting
  
  Personally. Zabbix.
  
  Big Brother/Sister don't really scale.
  
  Nagios is horrible to administer.
  
  Jffnms is nice, the most feature complete, but not robust enough.
  
  OpenNMS looks interesting but I've never had the time to set it up.
  
  Cacti/MRTG are trending systems.
  
  Zabbix or OpenNMS.
  
  --
  Deleted
17. Re:Old NetSaint and Nagios geek comments by Mark+Bainter · 2007-04-11 14:39 · Score: 2, Insightful
  
  I see the Hyperic fanboys (aka marketing team) is out in full force.
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
18. Re:Old NetSaint and Nagios geek comments by Mark+Bainter · 2007-04-11 14:50 · Score: 1
  
  While it's a fantastic tool, my biggest beef with it by a LONG way is that configuring a new server to monitor is always a case of "hand-edit this config file and that,
  What is the deal with people being afraid of config files?
  figure out what's important to monitor and what isn't
  What? You're complaining because Nagios makes you figure out what is important to monitor?
  realise 3 months later that you missed out something you really should be monitoring
  So...your lack of planning is somehow nagios' fault?
  If you're going to make monitoring easy with a pretty-pretty web-based user interface, then why on Earth can't what the monitor itself does be configured through the same web-based interface?
  Because web based interfaces are notoriously inflexible. I don't get why people have such an aversion to config files these days. It's a unix system. Nagios is just not that hard to configure. If you had a web front end, then every time a change to the capabilities was made you'd have to go back through and update the front end to make it support it, which means more time between releases, longer turnarounds for new features, and likely less flexibility in the system in general.
  Nagios is extremely flexible. This is a great thing, but it makes developing any kind of gui front end to the config files limiting by nature. There's all kinds of things you can do with the config files that would be almost impossible to predict when designing a web UI. If you don't care about having a great monitoring system, and just want to throw something together and vomit it onto the problem then use one of the many other inferior tools. Don't dumb down a great tool and ruin it for those of us who need and/or appreciate the power it offers.
  But it's not as easy to hack as Nagios, mainly because it doesn't have such a strong community surrounding it
  Hrm. Those two things don't go together. Not being easy to hack is a function of its design, not a function of its community. Now, it may not be as easy to monitor things that it doesn't already have support for - because there aren't a lot of people writing monitors for it. That's not the same thing as 'not being easy to hack'. Downloading someone else's work and installing it in your monitoring system doesn't qualify as 'hacking'.
  Ah, for something with the out-of-the-box functionality (and "out of the box" means "out of the box", not "out of the box provided you spend another 2 hours setting up a bunch of config files to support it") and ease of use of ZenOSS, but the hackability and community of Nagios. Never gonna happen.
  No, it's not. And why? Because you simply can't have your cake and eat it too. If you want flexibility and power you're going to have complexity. If you want simplicity, you're going to have a limiting system that is made for a certain spectrum of use and you'll have a hard time applying it outside of that problem set.
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
19. Re:Old NetSaint and Nagios geek comments by jimicus · 2007-04-11 20:53 · Score: 1
  
  Nagios is just not that hard to configure. If you had a web front end, then every time a change to the capabilities was made you'd have to go back through and update the front end to make it support it, which means more time between releases, longer turnarounds for new features, and likely less flexibility in the system in general.
  
  You'd better tell the developers who are sat next to me that one. They think they're using a toolkit which practically gives them a web-based interface for free when they develop the command-line interface.
  
  No, it's not. And why? Because you simply can't have your cake and eat it too. If you want flexibility and power you're going to have complexity.
  
  You know, rather than just tell me that I'm asking for the moon, you could try ZenOSS. You get a heck of a lot of flexibility and power with substantially less complexity.
  
  There seems to be a certain idea in the Linux community that just because "it's a community-developed Unix" it has to be bloody awkward to get basic things to work. Eric Raymond has already pointed out that this makes no sense at all:
  
  http://www.catb.org/~esr/writings/cups-horror.html
  
  The only way I can make any sense of this opinion persisting is that most of the Linux developers are secretly into S&M and when they shut the computer down for the night, they wander into their dungeon for some light whipping and flaggelation.
20. Re:Old NetSaint and Nagios geek comments by The_Real_Clipper · 2007-04-11 22:57 · Score: 1
  
  I am always amazed how people can complain about free stuff. Nagios is Open Source, so it's free on one side, and maybe clumsy on the other, but at least you have still the choice of : 1) Using a 3rd party installing it for you if you're not smart enough 2) Tune the code if you don't like it and share that with others if you're smart enough 3) Give a try (and multiple ten thousands of $) to CA-Unicenter, HPOV and the like. Having checked out 3) already, I would rather hire a 100% monitoring guy twinkling on Nagios to have a monitoring system perfectly shaped to my needs, rather than using one of these commercials that never do what you want. Clipper
  
  --
  Clipper, Gray Hat
21. Re:Old NetSaint and Nagios geek comments by Mark+Bainter · 2007-04-12 02:35 · Score: 1
  
  You'd better tell the developers who are sat next to me that one. They think they're using a toolkit which practically gives them a web-based interface for free when they develop the command-line interface. Er - that has absolutely nothing to do with what I just said. Command line interface != Config File.
  You know, rather than just tell me that I'm asking for the moon, you could try ZenOSS. You get a heck of a lot of flexibility and power with substantially less complexity. I've looked at ZenOSS. I'm sure that with your (apparently) limited experience and use of network monitoring that it seems like a lot of flexibility and power. That's not meant to be insulting, just an observation.
  There seems to be a certain idea in the Linux community that just because "it's a community-developed Unix" it has to be bloody awkward to get basic things to work. Well, again, I deny that it's "bloody awkward" to get basic things to work in Nagios. I can have nagios up and running for basic needs in minutes.
  I think the real problem here is this attitude that has been growing steadily within our side of the IT industry that somehow things should be "push a button easy". It's not just in monitoring, it's cropping up all over the place. If people actually have to spend some time learning something to use it, or have to put forward some brain power to get the most out of it then it's just too hard.
  I utterly reject that.
  I don't know for sure where it's coming from, but I have a suspicion that it comes from too many unskilled users flooding into the various *nix camps. For years people who should've known better advocated how we needed to win the war on the desktop and so we have to make it *easy*. So lots of people focused on making it easy. Now many of the people who took advantage of the inroads we did make are now looking at more advanced areas of the platform and complaining that it's too hard. They've been using LinSpire, or Xandros on the desktop long enough that they've decided they're unix admins.
  I'm not saying you necessarily fall into this category, but even if you haven't, I believe these people and their complaints have infected many of the rest of the camp much like the media and others who complained about the difficulty of installing linux in prior years did.
  If we don't wake up at some point we're going to have dumbed down our OS so much it'll be OS-X. I'm not dogging OS-X, I use it myself for certain things. But you have to admit that it has a *lot* less flexibility and power than BSD/Linux/Solaris/etc platforms do. And that's by design because of the people they want to use it. The unix guys there barely convinced them to leave the terminal in.
  That's not where I want my apps and systems to go. I'm not saying there aren't poorly developed apps out there. There certainly are. I'm saying that some people obviously can't tell the difference between a poorly developed app, and an app they don't understand. Here's a hint - if it's been the lead product in the space for years and years and has huge community support around it - it's probably *not* a poorly developed application.
  There's plenty of room in the space for apps like ZenOSS, Hyperic, etc. They're welcome to compete. Just don't try to tear down Nagios to do it. Use what you want, what meets your needs, and stop with this absurd war against Nagios.
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
22. Re:Old NetSaint and Nagios geek comments by dave.josephsen · 2007-04-12 04:36 · Score: 1
  
  Guys, It really doesn't need to be that hard. The primary nagios.cfg allows you to arrange the configuration any way you choose. Most people use large, object-specific config files, but there's no reason you couldn't break your config's down into host-based entities which contain a host, the services on that host, and the contacts/contactgroups for that host. A really lightweight shell wrapper could be used to cat together pre-configured "templates" into a hostX.cfg file. It could ask you questions like "what is the name of the host?" and you could answer the questions and have a config. Check out NACE, which can create a configuration for you given piped output from Nmap. In general, in Nagios land, if things are hard, it's because you're making them hard. Stop for a second and think about how you want things to work, and Nagios can be made to operate in the manner that you specify. All it takes is a bit of thought on your part.
23. Re:Old NetSaint and Nagios geek comments by dave.josephsen · 2007-04-12 06:04 · Score: 1
  
  I'm an *OLD* Netsaint and Nagios user, and have contributed to both. Guides are great, playing with it is great, and it does a lot of things very well. But what Nagios has never had is a way to publish the URL's of specific queries or reports in a way that can be bookmarked and sent to someone else for reference. It's a big, big, big flaw in the system, common to a lot of web-based projects.
  I'm also an "OLD" Nagios user, as well as the author of the Addison Wesely Nagios book, so I might be biased, but I think you're kind of missing the point. Nagios is just a task efficient scheduling and notification engine. It's job is to schedule the execution of monitoring plugins, interpret their output, and take user-defined actions based on that output. It's flexability derives from this core minimilist approach. The plugins, and web front end are separate entities, and shouldn't really be considered "Nagios" in any meaningful way. So, in general, if you are saying something like "Nagios has no way of doing X", then Nagios probably isn't doing that by design, and you should be looking elsewhere for that functionality. If you spend some time on the nagios-users list, you get a good feel for what other people are doing to solve your sorts of problems. Based on my experience therein, I can tell you that the NLG (nagios looking glass) is probably what most people are using for the type of problem you list above.
  
  The other huge, huge flaw of Nagios is configuring it. It shouldn't take a reference book from O'Reilly to do this efficiently, but I'm afraid it does.
  
  I certainly hope that it doesn't take an oreilly book, because they don't have one yet ;-) (only apress, no starch, and awp respectively). I've noticed an important difference between the kinds of programs I enjoy working with, and those that I do not. The programs I most enjoy (the programs that stick with me for decades) have in common a complicated initial setup. This comes from the fact that the programs are themselves complex and flexable, and the fact that I am opinionated about how I'd like them to work. If you read the online Nagios docs, and decide how you want it to work for you, then you will have a few hours of initial setup, followed by a decade of trouble-free operation. If you read the docs, and plan well, then adding new hosts, services, etc.. will take less time than making the ssh connection. On the other hand, if you install MOM, Patrol, Openview, or hyperwhatever, you will be running in a halve hour, and you will be lost in a infinitely recursive system of menus filling out web forms for every host and service change for the next year or two until the compulsory upgrade comes along. Then you will rip it all apart and start over.
  That having been said, the initial setup with Nagios is really not all that hard guys. I'd tell you to go install Monarch if I thought that filling out webforms was easier, but I don't think having an input field on a webpage called "host" is necessarily easier than having some text in a file called "host". I will recommend however that you checkout NACE, which makes host configuration as simple as "nmap -sP 192.168.X.0/24 | grep 'Status: UP' | cut -d\ -f2 | WriteHosts.pl"
  
  There are easily a dozen different configuration tools at www.nagiosexchange.org and sourceforge.net, and *every single one of them* has major problems that could be solvd with 10% of the time spent on Nagios itself. Most are abandonware, exciting but uncompleted projects that are never going to be completed. Others rely on hand-compiling Nagios itself with strange local modifications and local configurations that are very difficult to import a working Nagios to, or export from.
  Again, some time on the lists would give you a good feel for what is wheat and what is chaff on NagiosExchange. Not all contributions are equal, and that's to be expected in any community. When someone does it right, other people use
24. Re:Old NetSaint and Nagios geek comments by resistor3672 · 2007-04-17 08:43 · Score: 1
  
  Dave has it right. Use management software packages for what they were designed to do. Nagios is designed to collect data on the availability and performance of servers, apps, and network devices, and to bug you when they have a problem. So far, the IT management tool industry doesn't seem mature enough to wrap all the possible functions we as sysadmins (or our management) need and want into a coherent package, open source or proprietary. One approach to getting around this is that of GroundWork Open Source. It integrates the Nagios package, a config GUI for it that lessens the pain, graphing for the performance data that Nagios gathers, and the chance to add your own web pages to the UI pretty painlessly. The web service API to the database backend is also useful for crafting your own UIs, after the fact. So, use Nagios where it is appropriate, and if you need more, add on. It works for GroundWork.
Others by Colin+Smith · 2007-04-11 07:51 · Score: 3, Informative

zabbix
jffnms
opennms

etc.

I found nagios rather clunky compared to some of the others.

--
Deleted
1. Re:Others by bomonguny · 2007-04-11 08:02 · Score: 1
  
  What others, Inquiring minds want to know
  
  --
  and to you, I say,.. good day
2. Re:Others by Antique+Geekmeister · 2007-04-11 08:03 · Score: 1
  
  The Nagios codebase is considerably older. It was written before mod_perl and PHP were in broad use, when binaries in a webpage meant using cgi-bin.
  
  There are plenty more monitoring tools. Bigbrother and Bigsister come to mind, although Bigbrother was ruined when it went commercial. And despite the claims of the anonymous coward above, there are some workable GUI's, although I admit that they do need work to make commercial or production grade.
  
  A good Nagios book would certainly be welcome on my bookshelf: it's used for the same reason sendmail is still used, it's stable and familiar and has stood up to the test of time.
3. Re:Others by phish · 2007-04-11 08:29 · Score: 3, Interesting
  
  Try Hyperic: http://www.hyperic.com/
  
  GPL, 30-minute or less setup time, auto discovery and built in support for monitoring, controlling, and log tracking for anything you can think of. 9 OS's, 42 apps, network devices, extensible plugins....
  
  Nagios is great, but I agree with the parent that the time it takes to set up and maintain is unreasonable. Oh, and yes, I'm biased. I work for Hyperic.
  
  -javier
4. Re:Others by Auntie+Virus · 2007-04-11 08:47 · Score: 1
  
  I'm fairly geeky, though by no means a programmer, I can get most FOSS programs compiled and working, but holy crap Nagios is a PITA! BigBrother is still pretty good, though somewhat lacking in many places. I'm switching to Zenoss here, it looks to be quite good, though I'll miss the main status screen of Big Brother.
  
  --
  Why yes, I *AM* new here. Why?
5. Re:Others by estevon07 · 2007-04-11 08:56 · Score: 1
  
  I couldn't agree more - Nagios has stood the test of time. The fact the No Starch Press is publishing a Nagios books is a good indication of just how widely used it is. I agree the we interface is a bit dated, but the real power of Nagios isn't in the front end - it's the robust notification and complete extensibility that keep me using it. I monitor jut over 500 servers representing about 5000 checks (around 10 per server). There has not been a n issue yet that I couldn't get Nagios to somehow monitor. The custom check plugins can be written in any language - just return -1, 0, 1, 2. Passive checking with syslog-ng regular expressions is also very powerful. It's trivial to write messages to the Nagios named pipe for custom alarming. I also agree it's a pain to setup initially, but if you're smart with host and service groups then adding new hosts and checks is really pretty easy. I'm actually working on a pre-configured Nagios appliance using Xen. I'm hoping folks will have an easier time starting with a reasonably robust canned instance that includes graphing and other niceties. I haven't yet figured out how to distribute it.
6. Re:Others by halfloaded · 2007-04-11 09:13 · Score: 3, Interesting
  
  Zenoss
  Cacti
  BixData
  MRTG
  etc, etc, etc...
  
  This site has the biggest database of NMS's around.
7. Re:Others by BagOBones · 2007-04-11 09:28 · Score: 1
  
  I think Cacti is the only one listed that has a reasonable install learning curve. Everything else requires deep investment in setup.
  
  For a smaller operation or smaller feature set needs I really like Cacti.
  
  --
  EA David Gardner -"... but the consumers have proven that actually what they want is fun."
8. Re:Others by aclark4life · 2007-04-11 14:05 · Score: 2, Interesting
  
  There's also ZENOSS (http://www.zenoss.com/), I didn't see anyone else mention so I thought I would. Haven't tried it yet but I like that it's Zope based (because I am a Zope consultant).
  
  --
  Alex Clark
9. Re:Others by tumutbound · 2007-04-11 15:28 · Score: 1
  
  Anyone who was running the (almost) open source Bigbrother would be better of moving to hobbit http://hobbitmon.sourceforge.net/
  It monitors everything I want in Linux and Windows systems and can support SNMP
10. Re:Others by Antique+Geekmeister · 2007-04-11 21:49 · Score: 1
  
  You got Zenoss working? Undre which OS distribution? And did you publish notes, or use an installation guide?
11. Re:Others by Bimo_Dude · 2007-04-12 03:06 · Score: 1
  I just finished getting Zenoss working on a test box two days ago (CentOS 4.3 VM -- not the Zenoss VM image, though). I'm currently testing it to monitor some Windows servers, as that is what we mostly have.
  There were a few things missing from the manual installation docs. Here are the steps I used to get it up and running:
  
  rpm -Uvh perl-Socket6-0.19-1.2.el4.rf.i386.rpm
  rpm -Uvh perl-Crypt-DES-2.05-3.2.el4.rf.i386.rpm
  rpm -Uvh perl-Net-SNMP-5.2.0-1.2.el4.rf.noarch.rpm
  rpm -Uvh MySQL-client-standard-5.0.24a-0.rhel4.i386.rpm
  rp m -Uvh MySQL-server-standard-5.0.24a-0.rhel4.i386.rpm
  yu m -y install net-snmp net-snmp-utils perl-Digest-HMAC perl-DBI
  rpm -Uvh zenoss-1.1.1-0.i386.rpm
  rpm -Uvh Zenoss-Plugins-1.1.1-1.py23.noarch.rpm
  /etc/init. d/snmpd start
  /etc/init.d/zenoss start
  Go to http://zenosshost:8080/ and login admin/zenoss; change admin password
  Install on a Windows host, version 2.4 of Python
  
  Install on same Windows host, pywin32
  
  Install Zenwin
  Add the path to the Python executable to the System PATH environment variable
  Edit the Zenwin configuration files to point to the CentOS Zenoss host, and change the username/password in the config files to use the new admin password [WARNING - passwords stored in plain text]
  Install Zenwin as a service, running under an account in the local admin group on the windows host.
  Add the Zenwin service account username/password into to Zenoss web interface.
  IIRC, that was the way I got it working. I had to fool around with it a little bit to get it work.
  As for my opinion of it, it seems pretty cool so far, but I'm reserving my final opinion for when I've had a chance to play with it a little more.
  --
  "Teleporting Rodents with D-Cell Battery Displacement" theory -- IgnoramusMaximus (692000)
12. Re:Others by Mark+Bainter · 2007-04-12 03:18 · Score: 1
  Install on a Windows host, version 2.4 of Python
  
  Install on same Windows host, pywin32
  
  Install Zenwin
  
  If you control all the servers you might be able to get away with that. Try working in a company where there are so many servers responsibilities are distributed between groups, then convince them they need to install python/pywin32 so you can monitor their systems. Good luck with that.
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
13. Re:Others by Mark+Bainter · 2007-04-12 03:20 · Score: 1
  
  The Nagios codebase is considerably older. It was written before mod_perl and PHP were in broad use, when binaries in a webpage meant using cgi-bin.
  And you think cgi-bin binaries are a thing of the past? I can assure you they are not. And new != better. The code in nagios is very good, and there's no reason to abandon it for a rewrite in PHP just because it's the latest fad. Nothing against PHP, but there's no compelling argument for moving to that just because. (Not that you were necessarily advocating that, but it is a common argument)
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
14. Re:Others by Bimo_Dude · 2007-04-12 03:33 · Score: 1
  
  If you control all the servers you might be able to get away with that. Try working in a company where there are so many servers responsibilities are distributed between groups, then convince them they need to install python/pywin32 so you can monitor their systems. Good luck with that.
  Fortunately, I am now in a smaller environment (~ 400 servers) where I can do just that if I need to. I do understand what you mean, though; where I last worked (> 5000 servers, several admin groups), I never would have been able to do such a thing because of all of the politics and territorialism (is that a word?). That is one of the main reasons that I no longer work there.
  I was simply responding to somebody's question about if/how they got Zenoss up and working, from a technical perspective, not a political one.
  
  --
  "Teleporting Rodents with D-Cell Battery Displacement" theory -- IgnoramusMaximus (692000)
15. Re:Others by Mark+Bainter · 2007-04-12 03:38 · Score: 1
  
  I was simply responding to somebody's question about if/how they got Zenoss up and working, from a technical perspective, not a political one.
  I understand, and that's how I read you. However, for the benefit of the larger discussion going on it was a good opportunity to point out the political reality many (most?) admins have to deal with that so often gets neglected in the slick presentations by these companies.
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
16. Re:Others by Antique+Geekmeister · 2007-04-12 05:00 · Score: 1
  
  OK, I'm afraid this is typical of a lot of unfinished open-source tools.
  
  * I see no PGP or GPG signatures on the Zenoss RPM's. This is always bad, especially for software doing core infrasture tasks like system monitoring.
  
  * Install an RPM for MySQL that conflicts with the built-in version of every deployed OS known to Linuxkind. That's understandable, but it means you've left out a critical step: start with a clean box with no MySQL installed on it, because they can't be parallel installed and it *will* modify if not break your existing MySQL installation. And step away from ever being able to get security or bugfixes from the OS vendor by doiing so.
  
  * Install a Zenoss-Plugins package with a messed up name (switching case in package names is always a bad sign!) that has no SRPM and apparently cannot be compiled from the tarball without considerable difficulty. That's a vrey secure software installation practice, don't you agree?
  
  * You must also have to grab the various perl modules from a third-party repository: OK, DAG's repositories have most of them, but that dependency is not documented by the ZenOSS authors or you. I know how to set up those, but the ZenOSS SRPM doesn't list those dependencies, does it?
  
  * Notice then that you spend a lot of time talking about ZenWin. I didn't ask about ZenWin, I asked about ZenOSS. This is an irrelevant feature and distracts from the usefulness of your answer.
  
  Please excuse me for being so harsh about this. The list of the steps you used is appreciated, but this sort of thing is quite typical in the Nagios community and is one of the things that really detracts from the use of these tools.
17. Re:Others by Antique+Geekmeister · 2007-04-12 06:32 · Score: 1
  
  Oh, I'm not advocating the discarding of cgi-bin at all! It's just that compared to some of the modern, prettier ways of doing things, it certainly does *look* clunky.
18. Re:Others by Bimo_Dude · 2007-04-12 07:01 · Score: 1
  
  Install an RPM for MySQL that conflicts with the built-in version of every deployed OS known to Linuxkind. That's understandable, but it means you've left out a critical step: start with a clean box with no MySQL installed on it, because they can't be parallel installed and it *will* modify if not break your existing MySQL installation. And step away from ever being able to get security or bugfixes from the OS vendor by doiing so.
  D'oh! I did forget to put that step in. I did have to start with a clean box without MySQL.
  You must also have to grab the various perl modules from a third-party repository: OK, DAG's repositories have most of them, but that dependency is not documented by the ZenOSS authors or you. I know how to set up those, but the ZenOSS SRPM doesn't list those dependencies, does it?
  The Zenoss SRPM does not list the dependencies for the Perl modules, but they are listed in the manual install instructions, as well as step 6 of the list I posted.
  Notice then that you spend a lot of time talking about ZenWin. I didn't ask about ZenWin, I asked about ZenOSS. This is an irrelevant feature and distracts from the usefulness of your answer.
  Yep - I did spend the last eight steps on ZenWin. I acknowledge that you did not ask about it, but thought I'd add that in just in case there may be other people who may be interested.
  Please excuse me for being so harsh about this. The list of the steps you used is appreciated, but this sort of thing is quite typical in the Nagios community and is one of the things that really detracts from the use of these tools.
  I'll have to agree with you there :) Zenoss was moderately easy to set up, but there really should be SRPMS and the dependencies should be "Automagically" installed with the Zenoss RPM.
  Just to make sure that I'm clear: I am not defending Zenoss in any way. I just listed (admittedly a little briefly / incompletely) the steps I had to take to get it to work, not all of which were in the Zenoss documentation.
  
  --
  "Teleporting Rodents with D-Cell Battery Displacement" theory -- IgnoramusMaximus (692000)
Powerful, but can be a pain to configure by mrmagos · 2007-04-11 08:23 · Score: 1

I thoroughly enjoy the event handler capabilities built into Nagios. Just that single feature has made my day to day administrative tasks easier, and well worth the hours to write the scripts and get it all configured properly.
For example, it's so nice to have the spooler service on a win32 box restart automatically if it has locked or died unexpectedly, and not have to wait for the calls to come in when users can't print.

--
Never start vast projects with half-vast ideas.
1. Re:Powerful, but can be a pain to configure by Anonymous Coward · 2007-04-11 11:45 · Score: 0
  
  "For example, it's so nice to have the spooler service on a win32 box restart automatically if it has locked or died unexpectedly, and not have to wait for the calls to come in when users can't print." yeah...or you could have just set the service to auto restart in the services console. no nagios needed.
2. Re:Powerful, but can be a pain to configure by mrmagos · 2007-04-11 14:04 · Score: 1
  
  Yes, because we all know how reliable that can be. It will only try to restart the service 3 times, in quick succession, then give up. Or, how about my other example, when the service locks the system by monopolizing CPU time? The service hasn't technically stopped, but it's not doing what it's supposed to, so you still have to rely on users to tell you what's wrong.
  
  Would you prefer a different example? Nagios can check a URL, looking for a particular string (like something from the main page of a web app). If it doesn't return the proper result (like it lost the connection to the database, or it's being DDoS'd, etc.), the service or daemon responsible can be reinitialized, or start a backup process, etc.
  
  Oh, and Nagios will email or SMS the on-call person during all this.
  
  --
  Never start vast projects with half-vast ideas.
3. Re:Powerful, but can be a pain to configure by dmihalko · 2007-04-12 00:52 · Score: 1
  
  oh so very true... through the use of openssh for win32 and public key authentication... one can accomplish all kinds of useful scripts to attempt to automate the recovery of downed services on any windows/*nix server. i love it.
Hyperic by sysmanman · 2007-04-11 08:32 · Score: 1

Much easier to set up and get running - http://www.hyperic.com/ Not to mention supports more platforms than all of the others.
1. Re:Hyperic by stacey7165 · 2007-04-11 09:10 · Score: 1
  
  MyNewPlace.com just replaced a two year, painful investment in Nagios. Turns out SNMP isn't the best way to manage *everything*. Latency caused huge alert storms. Anyway, they looked at HP and after recovering from the sticker shock and realizing that it was going to take an army of consultants to build workarounds to functionality that wasn't there, they landed on Hyperic. Took 1.5 hours to convince them. Nagios network monitoring felled by SNMP false alarms
2. Re:Hyperic by SuiteSisterMary · 2007-04-12 01:15 · Score: 1
  
  Not, I'm sure, that you're intentionally trying to be misleading, but from the linked article, and emphasis mine:
  
  "Nagios was not really the problem," Shin said. "It was the JVM stack not being able to respond to it correctly. It was recording events in SNMP that were then watched by Nagios and that made things crawl. There were a lot of man hours wasted, and it would trigger the 4 a.m. pages."
  
  In other words, tool, job, GIGO.
  
  --
  Vintage computer games and RPG books available. Email me if you're interested.
3. Re:Hyperic by Mark+Bainter · 2007-04-12 03:35 · Score: 1
  
  Hahaha. As the other fellow said, this is hardly a "painful investment in nagios". And just FYI, Nagios doesn't monitor with SNMP, it provides a framework for monitoring and this company built on that - poorly. They didn't get the results they wanted. Again, not Nagios' fault. What they really needed was to make a better interface for monitoring their App than SNMP. Instead, they switched to Hyperic and it worked for them.
  This article is a prime example of the absurd war against Nagios currently being waged by these commercial companies. Right in the article the guy states that it wasn't nagois' fault, it was the SNMP tool/JVM that was the problem. Yet look at the headline. And read the quote from the ZenOSS CEO:
  "The maintainers never thought of it as a project that an IT manager would use to monitor an entire enterprise environment," What? Yes they did. I used it to monitor systems in one of the largest companies in the US, with 400 offices around the world and 4 datacenters. It's more than capable if you use it right. When I read that article, this is the impression I get of what really happened. The guys were given no money to setup monitoring. They put Nagios up because it's the default, and worked within timeline limits to get it working. They tried the obvious first. Expose SNMP interfaces and query them. They wrestled with the impacts, and upper management got tired of waiting and started looking commercial.
  He looked big dog first "Whoa, I'm not going to spend that kind of money". Someone stumbles on Resin support in Hyperic, and they avoid the near catastrophe that is OpenView. It works, the IT guys are (probably) happy. (At least some of them). Upper Management is ecstatic and in their ignorance blames Nagios, and (possibly) works out a deal to promote Hyperic in the media and slander Nagios. Note too that in the article they're *still* using some of the Nagios components. I wonder why. (Honestly, not sarcastically. Can it not monitor all that the Nagios plugins can? Was it too much effort to rewrite the Nagios plugins for Hyperic? Did they not want to run the agent on their Linux systems?)
  What do I base this on? Experience. Experience with this sort of sequence of events lets you read between the lines. Plus, they're running Strongmail, which provides a lot of support for ignorant and overbearing upper management. It is possible they have Linux admins there who are ignorant enough about what is out there to want to run Strongmail, or to have been taken in by their sales guys - but it's unlikely.
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
Shameless plug by rafalusa · 2007-04-11 08:38 · Score: 0, Redundant

We bought this monitoring solution and are pretty happy with it. The agents on the windows side are just as good as on linux but the linux agents are restricted to certain distros. The reporting server is nice, gives us pretty reports to take to the customer to show them why they need the upgrades we are recommending. The SLA's can be proven since everything is tracked for years. I like it :)

You can take a look at a few screenshots here: http://www.friendlysol.com/managed_services.php

Use the contact form if someone has a question about it. The cool part is that we can either use it as a tool for our clients, or "rent" the tool to IT departments. Makes sense for the monitoring server to be outside somebody's office.
1. Re:Shameless plug by SlamMan · 2007-04-11 08:50 · Score: 1
  
  It make sense to have one of your monitoring servers outside of your office. I'm not about to make all the internal services we user viewable to an outside IP, but 1) having a monitoring service make sure the internal monitor is functioning and 2) making sure public services are functional to the outside world has a lot of value.
  
  --
  Mod point free since 2001
2. Re:Shameless plug by rafalusa · 2007-04-11 08:58 · Score: 1
  
  I should put a diagram on that page. The agents and probes report BACK via https to the monitoring server which is outside of the network. So no one has access from the outside. If the Agents or probes don't responds, that triggers an alert.
  
  So, monitor via:
  - outside server - out of the network - (can ping the outside ip)
  - probe - inside of the network ( can monitor any device via snmp, wmi, scripts )
  - agent - inside of the network ( can monitor the device it is installed on )
3. Re:Shameless plug by osbjmg · 2007-04-11 14:30 · Score: 1
  
  Does anyone else see that flicker on the friendlysolutions site with an LCD? ouch.
There is a better way... by sysmanman · 2007-04-11 08:39 · Score: 2, Interesting

At the risk of getting off-topic, I'm tired of stuff that doesn't quite work. (can't comment on the actual book because I haven't read it) However, I can't see how Nagios can even begin to satisfy the needs of most modern IT operations folks. These days, most people need to know a lot more than whether machine X is up. They need to know which part(s) of their web apps are not functioning correctly. They need a lot more intricate detail than is possible with Nagios or SNMP-based monitoring tools. Really, the only monitoring tool that does it for me is Hyperic.
1. Re:There is a better way... by Anonymous Coward · 2007-04-11 08:51 · Score: 0
  
  Aware ( http://www.elegant-software.com/software/aware/ ). It can integrate network events and events from agents with log watchers, etc. in a nice UI. Also, it is really fast and easy to configure.
2. Re:There is a better way... by osbjmg · 2007-04-11 09:18 · Score: 1
  
  I love when people say "these days". So back in 1995 things were different? When did they change? Why the time reference?
3. Re:There is a better way... by bartwol · 2007-04-11 10:51 · Score: 1
  
  You know...these days, when people really understand computers and apply system/application monitoring paradigms that didn't exist in the pre-iPod era.
  
  You know...don't you?
  
  (Nagios does a great job for me doing the stuff the parent poster talks about; he's as transparently shallow as you suggest.)
4. Re:There is a better way... by Cylix · 2007-04-11 11:58 · Score: 1
  
  Depending on the web app...
  
  Nagios functionality can be easily extended with a custom check script that would interact with all or some of an applications web app functions of that host.
  
  It would be a matter of parsing the return material and simply passing a check var.
  
  Yeah, not an extremely involved, but I merely posed it as a 'well, yeah, it kinda can idea.' Some of the other features I noticed (ie, monitoring, get/post, bytes) could be implemented as well with some minor reporting.
  
  With that said, all of these discussions have certainly opened my eyes to a variety of software packages and techniques.
  
  Best /. article I've read in a while.
  
  --
  "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
5. Re:There is a better way... by osbjmg · 2007-04-11 12:39 · Score: 1
  
  I would argue that computer proliferation to the masses means that your average computer user knows LESS "these days" than the average computer user a few years ago. Laptops outsell desktops in the consumer market and the few make it easier for the many. The more people on the network, the easier it gets to find help and guides to do just about anything. More work is done for you and you don't have to be on your own. Think about maybe your grandmother using the emailz on the interweb, it's possible now but probably not likely 10 years ago. The iPod argument is also part of my point - my little sister can use an iPod because it's designed to be easy and intuitive. These days just means more people are using your application or system. I still don't think that means the systems running your application or service have changed that much in terms of what is supposed to be monitored, sure there may be more parts. Is it up? - is still paramount, but obviously I think it's a good idea to get more info. I've used nagios and it's just not for me, but that's cool. I believe these days is just overused.
6. Re:There is a better way... by Door-opening+Fascist · 2007-04-11 14:42 · Score: 1
  
  Since Nagios can execute arbitrary scripts, couldn't you rig up a Perl script using Test::Harness and WWW::Mechanize to parse the web app and catch the return codes off that script?
7. Re:There is a better way... by Mark+Bainter · 2007-04-11 14:59 · Score: 1
  
  I'm not sure what the deal is, but lately I've noticed there seems to be almost a hatred of Nagios coming out of the Hyperic people. I think it's probably fear...
  Anyway, you said... At the risk of getting off-topic, I'm tired of stuff that doesn't quite work. (can't comment on the actual book because I haven't read it) However, I can't see how Nagios can even begin to satisfy the needs of most modern IT operations folks.
  Well, maybe you need to spend some time as an actual modern IT Operations "folk". Perhaps in a company that does serious monitoring rather than making sure their exchange server is pingable. I can assure you it far more than satisfies our needs.
  These days, most people need to know a lot more than whether machine X is up. They need to know which part(s) of their web apps are not functioning correctly.
  And? Are you suggesting that's all nagios can do? If so, it's quite clear that Nagios is way out of your league. If so, that's fine. Use your whats up/hyperic/whatever and be happy. I'm glad it gives you what you need. But don't criticize what you don't understand. "I don't see how a Ferarri can meet the needs of todays driver. People need more than a car that drives. They need to make it go fast. There's only one car for me, a Kia. With a spoiler and flames on the side."
  They need a lot more intricate detail than is possible with Nagios or SNMP-based monitoring tools. Really, the only monitoring tool that does it for me is Hyperic.
  More intricate detail than is possible....again, you obviously just aren't familiar with it's real capabilities. Installing an rpm, starting the service and looking at the web page doesn't qualify as an evaluation. If you like Hyperic, fine. Use it. Extol the things you like about it. Be my guest. But don't try to paint Nagios as something it isn't to make your tool look better.
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
8. Re:There is a better way... by vidarh · 2007-04-11 20:43 · Score: 1
  
  These days, most people need to know a lot more than whether machine X is up. They need to know which part(s) of their web apps are not functioning correctly. They need a lot more intricate detail than is possible with Nagios or SNMP-based monitoring tools.
  
  Seriously, if you think that you have no clue what Nagios does/can do. The monitoring in Nagios is 100% based around probes that are not built into Nagios, though a typical Nagios install comes with a huge number of standard probes. The only hard requirement for a probe is that it returns a single line of output giving the status of whatever service or services it checks, though Nagios will also collect any other output from the probe (which can be used to feed into performance monitoring etc.).
  Since writing custom probes is so trivial, covering things like which parts of your web apps are not functioning properly is just a matter of writing a probe (in whatever language you can think of) that spits out OK/WARNING/CRITICAL to standard output together with any additional info, and add a few lines to the config files.
9. Re:There is a better way... by Antique+Geekmeister · 2007-04-12 05:11 · Score: 1
  
  Many poeple here have contributed to various Nagios plugins. Pretending that the plugins are separate from your sandwich is like pretending that the bread is not part of a hamburger, or like kernels without modules. A few people use them that way, but they're quite rare.
Groundwork OpenSource by 3r33tguy · 2007-04-11 09:16 · Score: 1

Groundwork is a great unification of Nagios and other tools that provides the missing configuration interface Nagios lacks.

http://www.groundworkopensource.com/products/os-ov erview.html

There's a VMware appliance available if you want to take it for a quick spin around the block.

http://sourceforge.net/project/showfiles.php?group _id=160654&package_id=222764

--
Choose you future. Choose to sysadmin.
1. Re:Groundwork OpenSource by WebMistress · 2007-04-17 08:15 · Score: 1
  
  The downloadable ISO is built on CentOS and is also a really great way to take it for a spin. Plus, it's a bootable CD.
  
  You can get it here: http://sourceforge.net/project/platformdownload.ph p?group_id=160654&sel_platform=1491
  
  There are also some great WMI plugins for monitoring windows events:
  
  http://sourceforge.net/project/platformdownload.ph p?group_id=160654&sel_platform=1493
From 0 to Monitoring and Alerting in 30 minutes by Jick · 2007-04-11 09:21 · Score: 2, Informative

I'm surprised people still use these 'svn co && ./configure && make install && edit config files' systems. You can download Hyperic HQ, install it, and be monitoring your software and hardware in 30 minutes -- no joke. Want alerts when your disks are full? Cake. Want to autodiscover your Apache server? Cake. Want an alert when a process goes haywire? Cake.

And since it has a pluggable framework, you can monitor anything that you want -- network devices, software, hardware, etc.

It's Open Source and has an active community, so if you really long for the days of 'svn co', that's also provided.

Disclaimer: I work for Hyperic ... and it's objectively better.
1. Re:From 0 to Monitoring and Alerting in 30 minutes by Mark+Bainter · 2007-04-11 09:42 · Score: 0, Troll
  
  You have to be kidding. Objectively better? Perhaps you'd care to quantify that?
  
  Maybe you can start with the fact that it runs in Java. Including the agent. Nagios is light years ahead of Hyperic, but this one fact alone is enough to disqualify Hyperic from ever showing up in my production environment. In fact, I might make this a new interview question for disqualifying candidates. "Would you run Hyperic as a monitoring system?"
  
  Anything other than "Hell no!" and the interview is over.
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
2. Re:From 0 to Monitoring and Alerting in 30 minutes by Jick · 2007-04-11 10:08 · Score: 2, Insightful
  
  Why is Java bad? This isn't 1996 anymore. Have you ever run HQ? It would be shame to throw someone out of your interview over that! ;-) If you want to argue about objective features, then point them out.
  
  Look at the installation procedure: Nagios documentation starts out with telling you that you'll need root access, a compiler, libGD, etc. Hyperic HQ comes with an installer that does all the work for you.
  
  Where do the 'light years' come into play? Feature for Feature, Nagios and HQ have a lot of the same features. Are you getting performance management, autodiscovery, correlations between alerts and monitoring data in Nagios? HQ provides all those and all without ever compiling or chmoding anything.
  
  With such a low barrier to try it, it's worth it for people to take the 30 minutes and see for themselves how easy monitoring and managing your infrastructure can be.
3. Re:From 0 to Monitoring and Alerting in 30 minutes by Guider · 2007-04-11 14:08 · Score: 1
  
  We've been running Hyperic (both free and enterprise versions) for quite a few months now, both in-house and at client sites all across the US. We monitor everything from a single, stand-alone Apache server on Linux, to a multi-site network running custom apps/Tomcat/Apache/Oracle/MySQL on Linux/HP-UX/Windows, multiple firewalls, routers and switches.
  
  We've used Nagios. We've used Zabbix. We've used OpenView. We've used Cacti(different class, I know). We've tried countless other monitoring tools/solutions. We USE Hyperic.
  
  Features, to me, are meaningless if it takes a PhD to build/configure/maintain them. One of Hyperic's strengths is it's ease of installation and auto-discovery. You literally CAN be up and running in under 30 min...UNDER. The variety of metrics that are available is almost overwhelming if you dig into it, and the power and flexibility of the plugins is dizzying. And as you learn the product you can tighten your install, tweak things, and make them exactly as you want. The inline and online help is very good, and improving constantly. Is it perfect? No, nothing is, but Hyperic is constantly making improvements and additions. The developers are active in the user community, answering questions, taking suggestions, and, I feel, genuinely listening to their users. I've been involved with too many open-source companies that go commercial that become downright abusive of their users and their questions.
  
  Personally, I would hate to work for anybody with such a strong prejudice, so don't look for my resume anytime soon. Yes, some Java code can have issues when people take short-cuts or just write sloppy, improper code, but Hyperic's memory use and CPU utilization are minimal compared to everything else running on these systems.
  
  Folks, give Hyperic a try. I guarantee you'll get a better feel for this product with, say, 1 hour of investment than you will with a week's worth in most others.
4. Re:From 0 to Monitoring and Alerting in 30 minutes by Mark+Bainter · 2007-04-11 14:36 · Score: 1
  
  Why is Java bad? This isn't 1996 anymore. Have you ever run HQ? It would be shame to throw someone out of your interview over that! ;-) If you want to argue about objective features, then point them out.
  
  Java is bad because it's a huge runtime environment for something as simple as an agent. Linux could probably handle it, but *why*? On windows I would never dream of installing Java + anything else and still expect it to perform, anymore than I would any other two apps on the same server. You're just asking for trouble.
  If I'm going to install an agent, I want it to be small, non-intrusive, have little or no dependencies and be reliable. I don't ascribe any of those things to a java based agent.
  Oh, and you're the one who wanted to talk about objective truth. You presented the positive case, so the onus is on you to defend it. If Hyperic is "objectively better" then start proving it. What does it have that Nagios doesn't? Why is it better? And no, "My grandma can install it" isn't a valid argument.
  Look at the installation procedure: Nagios documentation starts out with telling you that you'll need root access, a compiler, libGD, etc. Hyperic HQ comes with an installer that does all the work for you.
  Well, I guess if your target market is my grandmother that might mean a hill of beans to me. But see I'm a unix sysadmin. Root access, compilers, and libraries don't scare me. Not to mention the bogus presentation. Hey, guess what, I can do 'rpm --enable-repo=dag install nagios' on my RHEL boxes. Done. "easy" install procedures don't make a good product.
  Where do the 'light years' come into play? Feature for Feature, Nagios and HQ have a lot of the same features.
  Really? Cause when I talked to your reps at LISA last year it was quite clear they weren't even close. Did you rewrite the whole thing in the last four and a half months? Again, if you're an ignorant user, or a grandma who for some bizzare reason needs a monitoring tool then I suppose a java app you can install on your servers and point at your network and have it get a rough approximation of a generic monitoring setup might be useful to you.
  For those of us who have real problems to solve, it isn't.
  With such a low barrier to try it, it's worth it for people to take the 30 minutes and see for themselves how easy monitoring and managing your infrastructure can be.
  Hell, if easy is what you're going for, why not install what's up? It'll scan your network setup network maps, and it's a windows app. It's crap, but hey, it's easy.
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
5. Re:From 0 to Monitoring and Alerting in 30 minutes by Mark+Bainter · 2007-04-11 15:11 · Score: 1
  
  I'm glad you like it, and that it works for you. You presented actual positive feedback about a product that did what you needed, which I think is valid. What isn't valid is going onto a web forum and trying to make your product look better by denigrating a product you don't really understand (as the gentleman from Hyperic did). (And yes, I feel perfectly safe saying that. I've been using Nagios since the NetSaint days, and it has never failed me. It works like a charm.) I've tried most other monitoring products out there, including commercial ones, and none of them even comes close to what I can make happen with Nagios. I trust Nagios with my systems, with my career. I can't say that about any other monitoring system. If I have a monitoring need, I know I can apply Nagios to solve it. Oh, and I don't have a Phd. I just know that sometimes, if you want to do things right, it's going to require a little effort on your part. There will always be shortcuts and "easier" ways of doing things if you're willing to sacrifice something. I'm usually not.
  Personally, I would hate to work for anybody with such a strong prejudice, so don't look for my resume anytime soon.
  I don't recall asking for it.
  Yes, some Java code can have issues when people take short-cuts or just write sloppy, improper code, but Hyperic's memory use and CPU utilization are minimal compared to everything else running on these systems.
  Of course...you're running tomcat. ;-)
  Folks, give Hyperic a try. I guarantee you'll get a better feel for this product with, say, 1 hour of investment than you will with a week's worth in most others. Or, invest in yourself. Instead of learning to build Sauder furniture with a screwdriver, learn to use a table saw, a router, etc and build fine furniture.
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
6. Re:From 0 to Monitoring and Alerting in 30 minutes by vidarh · 2007-04-11 20:22 · Score: 1
  
  Features, to me, are meaningless if it takes a PhD to build/configure/maintain them.
  Seriously, if you think it's that hard to build something like Nagios you should not be allowed anywhere near any production servers.
7. Re:From 0 to Monitoring and Alerting in 30 minutes by vidarh · 2007-04-11 20:32 · Score: 1
  
  If I'm going to install an agent, I want it to be small, non-intrusive, have little or no dependencies and be reliable. I don't ascribe any of those things to a java based agent.
  
  I'd like to second that... We have Nagios probes written in C, Perl and Ruby so far. Nagios is ugly, but it works, and the fact that the only real requirement for a probe is that it does something and spits out a string that starts with OK/WARNING/CRITICAL to standard out is one of the important features. Setting up monitoring is a pain. Not configuring the monitoring system, but writing all the custom probes needed for full coverage of failure modes for our own apps.
  Being able to pick whichever language works based on whatever people prefer or whatever you already use is a big thing. Being able to write probes with no support code to depend on etc. is a big thing.
8. Re:From 0 to Monitoring and Alerting in 30 minutes by LizardKing · 2007-04-11 21:47 · Score: 1
  
  If your attitude towards Java is anything to go by then I doubt you are in an important decision making position anyway, but if you are, then I definitely wouldn't want to rely on you to look into possible solutions for systems that I develop. Let me guess you're a PHP guy.
9. Re:From 0 to Monitoring and Alerting in 30 minutes by Mark+Bainter · 2007-04-12 03:08 · Score: 1
  
  If your attitude towards Java is anything to go by then I doubt you are in an important decision making position anyway, but if you are, then I definitely wouldn't want to rely on you to look into possible solutions for systems that I develop. Let me guess you're a PHP guy. Heh. Thankfully, being an adoring fan of Java isn't a requirement for "important decision making positions". I'm not sure where you got the idea that it was.
  That being said, my "attitude" wasn't towards Java, it was towards using Java for the wrong things. I run Java apps. In fact, one of my favorite apps is Zoe which is a Java app. My phone is Java based, and I even learned Java so I could write some stuff for it. My primary issue is with using it as an agent. Secondarily I would have a hard time trusting it for a mission critical piece like Systems Monitoring. That is probably prejudice based on my long experience with Java, and could be overcome if the product was worth it. But in this case, it is not.
  I'm not a "PHP" guy. I know PHP, but I also know (some) ruby/python, shell, perl, C, C++, D, etc. It's not my primary focus however and the language itself is really not the point of this discussion.
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
10. Re:From 0 to Monitoring and Alerting in 30 minutes by Emrys · 2007-04-12 03:50 · Score: 2, Interesting
  
  You know, I was reasonably interested in Hyperic and ZenOSS when they were first announced. Competition is good, and though I'm quite happy with what I've been doing with Netsaint and then Nagios (yes, "in the Enterprise"), I was glad to look at them and see what new things they brought to the table.
  
  So far I've been utterly disgusted by the FUD and BS you guys are spewing, and I've lost about all interest in caring what you think you're bringing to the table. I've yet to hear any of you actually do a meaningful technical comparison beyond "uh, Nagios is like, hard, you know?" and "ZOMG 30 minutes, auto-discovery FTW!!!11!". Well, guess what: if you only have 30 minutes to spend configuring your monitoring solution "in the Enterprise", you're pretty well doomed to spend a lot more time than that dealing with false alerts (both positive and negative) and irate users and admins. Knowing you have an apache server on port 8080 of server X is about 2% of the problem. It's a lot more important to know what application sits there and what other services and hosts it depends on so you can implement sane end-to-end monitoring that can do a full test of actual application functionality and if something is broken tell you which part of the tree actually has the problem, not just "oh noes teh port 8080 is down!!" (or better yet, "teh port 8080 is up!! no problemz!!" when the app you actually care about is returning a dead page instead of processing data). So tell me: is all this also "cake" under Hyperic? And if so, how is it "objectively better" done than Nagios does it?
  
  Auto-discovery is a marketing feature, but if that's all some inexperienced admin thinks they need it's not even hard to do with the 80 Nagios helper utilities that do it for you. As for a "pluggable framework", you'd be very hard-pressed to demonstrate anything more flexible than Nagios. Hell, we've been known to use it to monitor business processes and workflow efficiencies. But please do at least try, and stop talking liking a marketdroid.
11. Re:From 0 to Monitoring and Alerting in 30 minutes by Guider · 2007-04-12 06:22 · Score: 1
  
  Let's not take things to extremes, and don't take my comment out of context, as you both have.
  
  Nagios is complicated compared to many other products. The simple fact that some rather large books are available points to that fact. But as others have pointed out, it doesn't have to be that way, and as Hyperic shows. If you have two tools that have the same features, but one takes a month to install and the other a week, which do you choose? I don't shy away from a process simply because of complexity, but needless complexity is just a waste of my time. I have a lot more things to do than I will ever have time for, and most sys admins for SMB's would agree (not that large company admins don't, but they tend to have a lot more resources availablle to them.)
  
  So now if anybody wants to get personal, come on over and we can match skills. Otherwise drop the flamebait and keep on topic.
12. Re:From 0 to Monitoring and Alerting in 30 minutes by Mark+Bainter · 2007-04-12 07:38 · Score: 1
  
  First, I don't think I took your comments out of context or strayed from the topic, but if I'm mistaken feel free to demonstrate where specifically I did that.
  Nagios is complicated compared to many other products. The simple fact that some rather large books are available points to that fact. That doesn't necessarily follow. Are you really going to argue that the size of the books available indicates the complexity of the software in question?
  But as others have pointed out, it doesn't have to be that way, and as Hyperic shows. If you have two tools that have the same features, but one takes a month to install and the other a week, which do you choose? What Hyperic shows is that just like most of the commercial tools, if you make it easy, given a slick presentation, and badmouth the competition you can get some people to buy/use your product. I don't deny that Hyperic does at least part of what it claims. (I don't know the extent of it) I do deny that it is capable of doing everything Nagios can do. I feel safe doing that because they are radically different products and different approaches to the same problem.
  Again, it's the difference between a pre-manufactured desk and a screwdriver, and a pile of wood and a set of power tools. The first is easier, and if all you want is a stock desk and that meets your needs bully for you. But don't look at my finished custom desk that meets every one of my needs that I built by hand with the tools made available and my own skills and equate it with your Sauder desk. Yes, mine took longer to make, but it does what I want.
  I don't shy away from a process simply because of complexity, but needless complexity is just a waste of my time. I have a lot more things to do than I will ever have time for, and most sys admins for SMB's would agree (not that large company admins don't, but they tend to have a lot more resources availablle to them.) Heh. No, not really. Maybe in a university they have more resources. In large companies we still scramble for them. Which is why I want my monitoring tool to be able to do more than these johnny-come-lately systems can. They're more than welcome in the space, but don't try to tell me my monitoring system sucks because it's not as easy to use as the one that can't do what I need. I'm busy too. I likewise have more on my plate than I can possibly ever accomplish. Which is why I spend the time to do monitoring right the first time. Typical sysadmin fashion. You spend more time up front, to spend less time later. Thanks to the way mine is configured it can make decisions about initial corrective actions and take them without paging me. It can handle doing basic initial troubleshooting and determining if it's a problem worth paging about in a given time period. If it is, it can send me data that I need to save me time doing that troubleshooting myself. It can determine whether the failure of a single piece of a system represents a failure of the whole, and evaluate whether alerting is necessary.
  It presents an upper management friendly network map (not the tactical view) with real status information. It's tied into systems in other departments so that when they do maintenance (particularly database maintenance) it automatically handles maintenance changes and the requisite application changes. When cold standby failures happen, it handles failing over, verifying the failover, and alerting me appropriately. Most days it's more effective than a team of NOC monkeys at keeping things going when I want to sleep. Once again, use Hyperic if you want people. Just don't think slandering Nagios makes it a better product.
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
Nagios vs SNMPc by Anonymous Coward · 2007-04-11 09:47 · Score: 0

I tried nagios and I would agree with most that it is clunky. We use Hobbit because its free (and yes, its clunky too). I can't program for anything and wish either Castlerock or someone else would make SNMPc for opensource OS. It is superior to any monitoring out there and I'd gladdly pay the $5000 for it, but it doesn't run on Linux. Anyone that knows of anything that is comparible, please let me know.

So long, and thanks for all the fish.
1. Re:Nagios vs SNMPc by BigPhatPhuck · 2007-04-11 10:58 · Score: 1
  
  Anyone tried lithium? http://www.lithiumcorp.com/ Looks pretty good on paper.
Re:Good book (this is killing me) by TooMuchToDo · 2007-04-11 10:01 · Score: 1

He's using them to prop a rack up I bet =)
Was fun by post.scriptum · 2007-04-11 12:21 · Score: 1

I had to learn everything from code already working and tweaked the hell out of it. Was actually a fun project for my internship. Sure wish I had a book back then.
Availability /and/ performance monitoring by tarka69 · 2007-04-11 13:25 · Score: 1

Agreed; basically Nagios a mess, but it's pretty-much the standard unfortunately, as it kinda-sorta gets the job done.

My main problem with the current crop of monitoring tools is that they are all either about availablility (Nagios, et al) or performance (MRTG, Cacti). Currently I'm using Nagios+Cacti, which kinda-sorta works for me, but it would be nice to have a single coherent interface to my systems. Zenoss also looks interesting, although I haven't tried it yet, but I'd like to hear of any other possibilities.

--
The comfort you demanded is now mandatory - Jello Biafra
1. Re:Availability /and/ performance monitoring by thurgoodj187 · 2007-04-11 14:21 · Score: 2, Interesting
  
  Zenoss has a Virtual appliance out on the VMWare site, makes it real easy to test and evaluate! I've got it running (Whenever I've got my laptop up)
Hobbit by Anonymous Coward · 2007-04-11 16:24 · Score: 0

Have a look at hobbit. It always seems to be overlooked when comparing various free monitoring systems.
It's really very good.
In the market for an easier to manage OMS? by corrupt2k · 2007-04-12 01:35 · Score: 0

While Nagios is a powerful solution, it is well known that it's a beast to setup/maintain/configure/etc. I have used Naigos in the past, and have recently switched to ZenOSS (http://www.zenoss.com). If you just can't get the hang of Nagios, or feel that its not user friendly/feature rich enough, give ZenOSS a try. No, I am not a member of the ZenOSS team -- just a very happy user.
Hyperic HQ a godsend by bfe369 · 2007-04-12 03:09 · Score: 0

When Hyperic HQ became open-sourced, our company tried it out, and have been stupendously pleased. We've started eradicating all of our Nagios and Sitescope implementations because HQ is so much easier to drive, and the interfaces are open. The crew at Hyperic is always helpful, even when you're trying to implement something that duplicates the functionality of its pay-fer Enterprise version.

--
-- Brad Felmey
We like fanboys by porkrind · 2007-04-12 08:01 · Score: 1

...or is it fanbois? :)

And yeah, our users are responding. Thanks for noticing.

-John Mark

--
Hyperic Community Manager
1. Re:We like fanboys by Mark+Bainter · 2007-04-12 15:33 · Score: 1
  
  I can understand why. When selling your product amounts to slandering the competition, people with more zeal than sense are your best friend.
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
Nobody said Nagios wasn't flexible by porkrind · 2007-04-12 08:11 · Score: 1

of course Nagios is flexible. It's the time to setup and maintenance that costs you.

And as far as "hatred of nagios" I've witnessed that firsthand when I've run BoF's on Nagios, and I've run a few - at LISA and LinuxWorld.

But I love your snarky comments. They r0x0r :)

Oh, and I almost hate to ask, but can you install RPM's on Windows? (har har)

-John Mark

--
Hyperic Community Manager
1. Re:Nobody said Nagios wasn't flexible by Mark+Bainter · 2007-04-12 15:48 · Score: 1
  
  of course Nagios is flexible. It's the time to setup and maintenance that costs you. Ah yes. The old "if it's complex, then it's a waste of time." canard. Interesting, the last company I heard push that line hard was Microsoft against Linux. SFDD, Same FUD, different day.
  And as far as "hatred of nagios" I've witnessed that firsthand when I've run BoF's on Nagios, and I've run a few - at LISA and LinuxWorld. Yup. I've witnessed it too. Much the same as I see it here. Doesn't make it rational. Like I said, there's a general trend that says if I can't push a button and have it be done it's too hard. Again, I reject that as absurd and flawed on its face.
  When I go looking for a *nix systems admin these days, I go through hundreds of applications, and dozens of interviews. Why? Because most of them aren't worth their salt. They've got nothing to offer. They think they're a Sr Admin because they installed redhat once at a former company, and popped in a few "do everything for you" apps and it worked. Our industry is in a sad state.
  But I love your snarky comments. They r0x0r :) Whatever.
  Oh, and I almost hate to ask, but can you install RPM's on Windows? (har har) Why would I? Are you suggesting that your product is better cause it runs on windows? Cause allow me to disabuse you of that notion. Windows is not a good platform for doing monitoring. In the last oh, 3-5 years I've never missed a page. Nagios has never failed me. In the same time period, the windows guys I work with have moved from monitoring platform to monitoring platform, because they all fail. Why? Not necessarily because of the monitoring platform, but because the OS simply isn't reliable enough.
  Also, just FYI, I was pointing out that the argument that everyone has to build from source has no merit - that was my point.
  
  --
  "No nation could preserve its freedom in the midst of continual warfare."
  --James Madison
2. Re:Nobody said Nagios wasn't flexible by Emrys · 2007-04-13 09:10 · Score: 1
  
  If you're going to respond in this thread, I for one would really like to see you address the questions about technical depth of Hyperic. What we are tired of is the "use Hyperic because Nagios is hard" ad homenims. There's no meat behind them. Even if we grant Nagios is hard (which IMO is baseless, but whatever), if Hyperic can't do what we need, and Nagios can, who cares? There is such a thing as necessary complexity.
  
  So look at http://slashdot.org/comments.pl?sid=230333&cid=187 07053 and http://slashdot.org/comments.pl?sid=230333&cid=187 03061 and tell us how Hyperic does the kinds of intelligent state handling and end-to-end monitoring described there. And if Hyperic doesn't do or allow for those things, tell us why we should use it anyway.
  
  I'm perfectly willing to believe I'm doing it all wrong now if you can give a rationale for a better way, but so far I don't see anything resembling a rational argument.
By the Way - Props to No Starch Press by porkrind · 2007-04-12 08:15 · Score: 1

Looks like they've come out with another fine book. I've known those guys for a long time... now if they could just publish a book on Hyperic... ;)

--
Hyperic Community Manager
Re:I'm not associated with them, but by xmousex · 2007-04-12 08:29 · Score: 1

im just curious in case someone is bored enough to explain this to me. Who modded the posts here as trolls? I'm finding that if you look for it, there are actual trolls here and there on slashdot who throw in crap, often not on topic, that specifically insights fighting and complication. The two posts I just read above:

Good book
(Score:-1, Troll)
by 2.7182 (819680) Alter Relationship on Wednesday April 11, @03:24PM (#18693547)
It tells you a lot of things I don't know other sources for. But my binding cracked after a week, which is a bummer.

I'm not associated with them, but
(Score:-1, Troll)
by zappepcs (820751) Alter Relationship on Wednesday April 11, @03:41PM (#18693769)
They seem like a really easy group to work with so far. Much easier than commercial groups we've worked with.... I've got no complaints

These are two very brief, fairly positive observations about the users experience with the topic posted. Theres no fighting, theres no controversy...what makes them labeled as trolls? My request for an explanation of this moderating is 10 times more a troll then these two posts.