Slashdot Mirror


Server Monitoring With Munin And Monit

hausmasta writes "In this article I will describe how to monitor your server with munin and monit. munin produces nifty little graphics about nearly every aspect of your server (load average, memory usage, CPU usage, MySQL throughput, eth0 traffic, etc.) without much configuration, whereas monit checks the availability of services like Apache, MySQL, Postfix and takes the appropriate action such as a restart if it finds a service is not behaving as expected. The combination of the two gives you full monitoring: graphics that lets you recognize current or upcoming problems (like "We need a bigger server soon, our load average is increasing rapidly."), and a watchdog that ensures the availability of the monitored services."

124 comments

  1. But can I run this on Windows? by Steve_Jobs_HNIC · · Score: 5, Funny

    .... been waiting a while to say that.

    1. Re:But can I run this on Windows? by hackstraw · · Score: 2, Interesting

      Can it run on Windows .... been waiting a while to say that.

      Dunno. Don't care either, but it might. Its based on rrdtool which does run on Windows. I don't know if this article is a slashvertisement, or just void of information. I've linked to rrdtool, and here is the munin homepage.

      There are _tons_ of these things running around. In my opinion, rrdtool is one of the best tools that has come to computing in a long time. Its awesome. Other packages that use rrdtool are cricket, ganglia, and many others. I believe that the rrdtool site has a listing of some of these.

      For those not familiar with it, rrdtool is a database that is designed for time series data. Its kinda like a smart FIFO where it looses details the further back in time you go by storing running averages. I have rolled my own monitoring stuff with rrdtool and perl to monitor CPU, load, temperatures, you name it. One of the cool things about rrdtool is that the database is fixed in size. rrdtool is not easy to initially set up and work with, but the effort is definitely worth it.

      Basically, if your a sysadmin in 2006 and you do not have rrdtool based monitoring going on. Well, maybe the job is not for you. Its that important and good. A simple click on a link of a webpage with a rrdtool graph can demonstrate to even the pointiest of pointy PHB that you need more equipment or a trend is going on or whatever.

      This is the kind of stuff I would like to see more talked about here on slashdot.

    2. Re:But can I run this on Windows? by Vancorps · · Score: 2, Interesting
      I'll be setting up the linux tools on the db servers, have to find out if it works with Oracle alright.

      As for the Windows servers, the monitoring is nothing new, Microsoft Operations Manager or MOM has been around for 6 years now and is exceedingly friendly to both setup and use, also works with all servers and workstations flagging alerts like low disk space or high cpu utilization so you can see if some new virus is coming at you. They even have agents for Linux and OS X.

      I'll have to check out rrdtool though, its new to me, most of the linux boxes I have in production are only doing one task and there aren't that many servers. 20 in total that I manage so its fairly easy to check availability and go over the logs real quick manually. Time is always against me but now that its summer I should have time to get my house in order.
    3. Re:But can I run this on Windows? by Anonymous Coward · · Score: 0

      I've been using munin ever since I read some very similar articles on it at the debian-administration site. There's also an article there on monit:

      Monitoring systems with munin

      Monitoring windows systems with munin and snmp

      Monitor Debian servers with monit

    4. Re:But can I run this on Windows? by Anonymous Coward · · Score: 0

      I think you're a tad off on the age of MOM. The "2000" in the name of the version 1.0 product was not the
      release year (it was released ~ June '01) but to tie it in with Windows Server 2000.

    5. Re:But can I run this on Windows? by wobblie · · Score: 1

      munin is quite flexible - if you can write a shell script or perl script to spit out the data you want tracked, munin can graph it. It's that simple.

      As far as windows is concerned, so long as you have perl and the right perl modules install (Net::Server mainly) it should work. The problem there would be getting a (perl, cmd.exe) script to spit the data you want to track.

    6. Re:But can I run this on Windows? by Vancorps · · Score: 1

      Flexibility is great, definitely a viable solution for the linux boxes in my world. For windows I'll just use MOM and if the built in reporting isn't enough, its all stored in a SQL backend so its easy to make your own graphs using Excel.

  2. RTFA! by Elf_h34d3r · · Score: 0

    ...been waiting a while to say that. FTFA: The easiest way to follow this tutorial is to use a command line client/SSH client (like PuTTY for Windows)

    1. Re:RTFA! by remembertomorrow · · Score: 2, Informative

      He was simply playing on the "But does it run on Linux?" post that appears in tons of threads. He doesn't need to RTFA. :)

      --
      Registered Linux user #421033
    2. Re:RTFA! by Anonymous Coward · · Score: 0

      Do you even know what an SSH client is?

    3. Re:RTFA! by Tim+C · · Score: 1

      If you need to use PuTTY, then the answer is almost certainly "no, it doesn't".

      (Oh, and as for your sig - in binary, 10+10 = 100, not 1000.)

    4. Re:RTFA! by ccarson · · Score: 1

      In all seriousness: I'm a network admin that could really use a munin type analysis program that's windows based. Does anyone have any suggestions? Munin seems like a great program but right now I need to get data on some of my windows servers.

  3. swatch? by haluness · · Score: 1

    Doesn't swatch already do the job of monit? It works very nicely for me, watching servers as well as processes that generate log files

    1. Re:swatch? by Whanana · · Score: 2, Informative

      This sounds a lot like Nagios. From TFA I couldn't see anything Munin and Monit would do that you can't do on Nagios with a few plugins. Just a plug - Nagios is beautiful, it makes nice graphical representations of load, hits, throughput, and about anything else you can think of.

    2. Re:swatch? by The+Barking+Dog · · Score: 1

      Swatch monitors logfiles - monit can do that and so much more. It can connect to sockets or ports and test that the services are running. It can access a webpage and test for the presence of a string. It can checksum a file and take action if it changes. It can monitor the size of a file. It can take action based on memory usage or load average. You can configure it to take action if a test fails x times out of y (to account for false positives). I work for a small company where I'm the only admin and basically on call all the time. I can't imagine life without monit.

  4. Cacti by mtenhagen · · Score: 4, Insightful

    How is this different from cacti?

    --
    200GB/2TB $7.95 Coupon: SAVE90DOLLAR
    1. Re:Cacti by isolationism · · Score: 3, Informative

      Munin isn't at all different from Cacti, really, except that Cacti is 100% web based and perhaps a bit more mature (I use Cacti and like it a lot more than at least 4-5 other similar products out there). Cacti won't do service-testing though; maybe this is a good walkthrough for people who just want something up and running in 15 minutes (I wouldn't know, I'm not inclined to read the whole thing since a cursory glance shows there's nothing here that I don't have a running alternative for already).

    2. Re:Cacti by carlosGames · · Score: 0

      i think the difference (the most important one) is cacti looks more mature, this is importante if you will use to monitor a production server.

    3. Re:Cacti by perbu · · Score: 1

      Munin has close integration with Nagios. You can set thresholds in munin and connect it to Nagios and Nagios will alert you if the thresholds are broken. I have no idea how this is done with Cacti - but its dead simple with Munin.

    4. Re:Cacti by OverlordQ · · Score: 1

      Munin doesn't have remote command execution problems like cacti did/does?

      --
      Your hair look like poop, Bob! - Wanker.
    5. Re:Cacti by Cigamit · · Score: 1

      There is a Threshold Plugin for Cacti, it requires the Plugin Architecture, which is set to be rewritten and integrated in the next major release of Cacti. I think it works extremely well, but I may be biased considering I helped write it.

    6. Re:Cacti by Danny+Rathjens · · Score: 1
      I'm not inclined to read the whole thing since a cursory glance shows there's nothing here that I don't have a running alternative for already

      Same here; sysmon and my own scripts grabbing stats to plug into mrtg graphs already do all of that for me. :) There are many variations on the theme. It's not a difficult problem area, so it has a low barrier to entry. ;)

    7. Re:Cacti by Tom · · Score: 1

      It isn't a frontend to rrdtools. IOW: It's a different application for a similar purpose, but it ain't the same.

      Also, I very much enjoyed the fact that on a single machine you have it up and running in 5 min. tops.

      --
      Assorted stuff I do sometimes: Lemuria.org
  5. Naggios by Anonymous Coward · · Score: 0

    What advantage does this have over Naggios?

  6. RMTTFFL by Steve_Jobs_HNIC · · Score: 1

    RMTTFFL (Read More Than The First Fucking Line) ....never used that one.

    Server Monitoring on Windows != "follow this tutorial is to use a command line client/SSH client (like PuTTY for Windows)"

    having said that, a good, free, open source server monitoring solution (including Windows Servers) is MRTG.

  7. /. effect by weird7192 · · Score: 1, Redundant
    "We need a bigger server soon, our load average is increasing rapidly."

    The dude will definitely need a bigger server now every slashdot geek rush to view his website.

    1. Re:/. effect by Will2k_is_here · · Score: 1

      "We need a bigger server soon, our load average is increasing rapidly.

      Heh, it reads like an IT ad. Here's the next line...

      But thanks to the miracle of Monit and Munin, we've managed to keep our server...alive!

  8. Insignificanct in the trails of NAGIOS? by pl1ght · · Score: 2, Interesting

    Im not sure i follow why this is newsworthy. NAGIOS is OSS and is an extremely mature product with a community writing modules and plugins etc etc, to monitor any aspect you wanted of your Servers/Routers/Networks/room temperatures, i mean anything. Why would anyone bother?

    1. Re:Insignificanct in the trails of NAGIOS? by Anonymous Coward · · Score: 0

      Windows does everything anyone needs...

      Word Processing, Database, Web Server, Web Browsing, Games, Active Directory...

      Why would anyone bother?

    2. Re:Insignificanct in the trails of NAGIOS? by Anonymous Coward · · Score: 0

      Nagios is a huge ugly bloated piece of shit that requires sendmail.cf-like prowess to configure and get working decently.
      No standard graphing utilities, no standard plugin architecture, no standard tcp port string checking, pain in the ass to migrate from/to...

    3. Re:Insignificanct in the trails of NAGIOS? by Antique+Geekmeister · · Score: 1

      Nagios is fairly CPU intensive, and the client plug-ins to report on system load and other local characteristics are not well integrated, since they basically date back to NetSaint and a lot of legacy oddness that could stand a complete rewrite. A lighter weight monitoring tool would be good, or a a rebuild of Nagios, especially if most of the worst-built Nagios plug-ins were thrown out due to the extremely poor quality of the Perl or shell code involved.

      But there is no hint that this particular set of monitoring and system management tools have anything better than Nagios or the dozens of other monitoring tools also available. Nagios, for example, is a bit complex because it has a *lot* of features that you may not want in your first 15 minutes of monitoring but may really need in your next week of operation. MRTG is the same way.

    4. Re:Insignificanct in the trails of NAGIOS? by clydemaxwell · · Score: 1

      I am sorry your experiences were poor but I set up nagios for a medium-sized datacentre in a few hours with zero hassle. I'd never used it before, but the config files were easy to read and understand. As for plugins, it will run any linux util you point it at. How's that for plugins?
      it's basically just a cron job of linux checking commands with a web interface. How is it bloated?

      --
      Browsing with classic discussion, noscript, at -1 and nested
      no hidden comments and I only mod UP
    5. Re:Insignificanct in the trails of NAGIOS? by SillyNickName4me · · Score: 1

      Im not sure i follow why this is newsworthy.

      I guess because it is another option? one few people know about so far, hence for most that makes it 'news'...

      NAGIOS is OSS and is an extremely mature product with a community writing modules and plugins etc etc, to monitor any aspect you wanted of your Servers/Routers/Networks/room temperatures

      Same for Zabbix...

      i mean anything. Why would anyone bother?

      1. because people want to do things differently, there isn't a single best solution for many problems.

      2. because having a monoculture is bad. Choice is only possible when there is more then one option.

      And some more reasons..

    6. Re:Insignificanct in the trails of NAGIOS? by arivanov · · Score: 1

      I would agree with that. And this is exactly the reason why I still use mon. It provides most of the functionality you need for a small-to-medium network. I have been using it on anything from a single server to 50-60 systems. Its CPU requirement is minimal, configurability and flexibility is similar to the ones provided by NagIOS (if not better on some counts) and writing extensions is trivial. Most importantly the monitoring itself is just a shell around a set of very well written perl modules. The code in them can be reused for all kinds of other server monitoring, statistics and control.

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
    7. Re:Insignificanct in the trails of NAGIOS? by secolactico · · Score: 1

      Maybe he's thinking of Netsaint, Nagios'predecessor. As far as I remember, Netsaint didn't support templates and it's config file (single file) was a bit of a bear to read and edit without a tool to manipulate it.

      Nagios, on the other hand, is a snap to configure and maintain. And the config files' syntax are extremely easy to read and interpret. And that includes dependencies, custom made plugins and notification commands.

      --
      No sig
    8. Re:Insignificanct in the trails of NAGIOS? by Nanoda · · Score: 1

      He might be, but my experience with Nagios a few months back was very similar to his description. In a couple of hours over a week or so, I was hard pressed to get it to stay running, and was unable to get it to report any information at all to me. Part of the problem may be that it's built to support multinational corporations, and I wanted it to run in an 8 person office monitoring three machines.

      I'm really hoping this thread gives me some other options to look at on Monday... it sure seems like it has already. :)

    9. Re:Insignificanct in the trails of NAGIOS? by Linker3000 · · Score: 1

      I daresay Nagios is OTT for your app as you might as well have someone walk roudn and check the machines! Nevertheless, have another go with it as once you get over the steep part of the learning curve it gets better.

      My installation is monitoring 30 sites; that;s: 30 ADSL routers, 15 Win2K servers and 5 Linux boxes and once the basics are in place (which means getting to grips with the interactions between the various config files), things get easier.

      --
      AT&ROFLMAO
    10. Re:Insignificanct in the trails of NAGIOS? by ADRA · · Score: 1

      Same experieces. Nagios does a great job of accurately reporting service availability down to very accurate numbers. Think its too CPU intensive? Turn down the service check rate.

      I had it running against 30 machines, around 300 service checks and had performance numbers saved. Around half the systems had on-system agents for the CPU/Memory/disk/etc.. Of course it takes some CPU on the host system to support it. Mind you, the system held up on a P5 PC. For that many services to survive on a little P5, I thought it was pretty decent.

      --
      Bye!
    11. Re:Insignificanct in the trails of NAGIOS? by molarmass192 · · Score: 1

      Ummmm ... for server security, performance, flexibility, ease of administration, lack of license restrictions, and cost? Linux does everything we need on the server, why even bother looking at anything else?

      --

      Good people do not need laws to tell them to act responsibly, while bad people will find a way around the laws-Plato
    12. Re:Insignificanct in the trails of NAGIOS? by Stinking+Pig · · Score: 3, Interesting

      because in software-land, "mature" is rapidly followed by "obsolete." I love Nagios, but I'm hesistant to recommend it to anyone who's not comfortable spending a week on building and configuring software.

      Packages for it are often broken or from the old 1.3 tree, which makes for confusion when following examples that use 2.0 syntax.

      Configuration is extremely challenging to start from scratch with, especially if you want to do anything custom.

      There are a number of external dependencies, particularly if you want to compile the plugins.

      That said, Nagios still whips the pants off quite a few commercial monitoring products I've evaluated.

      --
      "Nothing was broken, and it's been fixed." -- Jon Carroll
    13. Re:Insignificanct in the trails of NAGIOS? by Anonymous Coward · · Score: 0

      Try using Oreon as a front-end to nagios. I got tired of cut-and-paste to add new hosts. Oreon has a duplicate function that helps adding new hosts.

      http://www.oreon-project.org/screenshots-oreon-en. html

    14. Re:Insignificanct in the trails of NAGIOS? by pclminion · · Score: 1

      Yep. All coders, smash your keyboards! Everything that can be invented already has been.

  9. Wrong idea by nurb432 · · Score: 1

    I dont think he meant the *client* side. He want to monitor windows servers.. Or at least thats what i would be asking in that situation.

    --
    ---- Booth was a patriot ----
  10. Hobbit by Anonymous Coward · · Score: 2, Informative

    Don't forget about the big brother clone, hobbit.

    SF.net at: http://hobbitmon.sourceforge.net/
    Live example at: http://www.hswn.dk/hobbit/

  11. Automatic restarts are bad by Erik+Hensema · · Score: 5, Insightful
    • A restart usually kills hanging processes, making the actual cause of the hang impossible to determine afterwards.
    • Automatic restarts make some admins lazy. Instead of debugging the problem, they accept apache/whatever service is restarted once a day.

    However, making graphs and monitoring your services is a very good thing. Graphs are invaluable in determining trends, such as memory leaks or steadily increasing load. Monitoring saves lots of downtime and unhappy customers ;-)

    Personally I use nagios for monitoring and DIY scripts for graphing. The latter mostly because I started making graphs before decent of-the-shelf software was available ;-)

    PS. what's this subject got to do with debian?

    --

    This is your sig. There are thousands more, but this one is yours.

    1. Re:Automatic restarts are bad by Jeff+DeMaagd · · Score: 3, Insightful

      Point taken, but I think an automatic restart is necessary to minimize intrusions into off-work-time with maintainaince and such. If the service hangs and there's no one there to tend to it, then it will stay hung until someone notices. This is not good if you want to keep going and not lose potential business if the site is down.

      Anyway, I'm glad I'm not a server admin. I'd like to live my private life NOT being on-call.

    2. Re:Automatic restarts are bad by Burv · · Score: 2, Insightful
      Good points. However, I think there's something to be said about automating things to increase uptime and lessening the load on the sysadmin, especially if it's critical that the service be available and you always go through the same checks (e.g. check /var/adm/messages, run look at the process table, load, etc.) that you go through. There's also a tradeoff in knowing details of what caused the problem if every minute your server is down, your company is or could be losing money, like for someplace like ebay.

      Oh, and I think these packages are installed as part of debian, either by default or optionally. That's why the article mentioned apt-get.

    3. Re:Automatic restarts are bad by Anonymous Coward · · Score: 0

      > However, making graphs and monitoring your services is a very good thing. Graphs are invaluable in determining trends, such as memory leaks or steadily increasing load. Monitoring saves lots of downtime and unhappy customers ;-)

      moodss can do it all, even predict the future, using sophisticated statistical methods and artificial neural networks, and therefore be used for capacity planning.

    4. Re:Automatic restarts are bad by Postmaster+General · · Score: 1

      I have seen better how-to's. This one only tells a user how to install and configure a package in Debian Sarge and then enable basic password authentication for an application's web interface. Any person can go ahead and read the INSTALL and README files in a package and get just as much info out of it that they do out of this how-to.

      In fact, that "how-to" should probably be called "How-To: Install and Configure Web-based Applications in Debian Sarge and Enabling Basic Password Authentication"

    5. Re:Automatic restarts are bad by StormReaver · · Score: 1

      "PS. what's this subject got to do with debian?"

      The article is presented from the perspective of a Debian admin.

    6. Re:Automatic restarts are bad by marcosdumay · · Score: 1

      If anything that is optional on Debian received that icon, /. would put the icon on the static template of the page...

    7. Re:Automatic restarts are bad by TeamSPAM · · Score: 1

      Sometimes you know the cause of the problem and sometimes you don't. When the shit hits the fan, you shoot first and ask questions later. Getting the system running takes priority over figuring out why it happened. Once running, you figure out what caused it as best you can and try to takes steps to prevent it from happening. This may not be the best approach, but it aligns with the goal of maintaining/improving uptime that most operation groups are given. I should know, I lived it in the dot-com era.

      --
      Brought to you by Team SPAM! where we believe: "Information in the noise!"
  12. Restarting services... by fimbulvetr · · Score: 2, Insightful

    It always bothers me when people use utilities to restart services that die/have been killed. Shouldn't a daemon be designed to run indefinitely? Doesn't the fact that a process died mean that something is wrong and needs to be fixed? For instance, if my apache daemon dies because the logfile is larger than it can handle, what good is restarting it going to do? It's just going to beat the crap out of a server - process dies - watcher daemon starts it up - process dies...etc.
    Or, if the OOM killer kills my ftp server because he's hogging the memory, doesn't that mean I have bigger problems than just doing a restart(I need more memory, the ftp server has a mem leak, etc)?

    None of my hundreds of critical daemons die for no reason whatsoever - all of require some type of human interaction if they have died. It doesn't happen very often, maybe once every several months.

    Not that I care about this software in general, I use hobbit for my trending/graphing/service availability, but I hate to see bad admin'ing, even if I'm not involved.

    1. Re:Restarting services... by Anonymous Coward · · Score: 0

      You're right in general, but when you have a large system with hundreds of hosts and software written by lots of people, the reality is that you need to have auto-restarts to maintain a running system. Yes, it often means a bug in the software when a service needs to be restarted, but you cannot stop operations and fix all the bugs right away, and you still need the service to function until it is fixed.

    2. Re:Restarting services... by mtenhagen · · Score: 1

      For some (maybe even most) servers the admin isnt available 24/7.
      Some issues like memory leaks or other bugs cant be solved by the admin in a short period of time.

      In an attempt to have the services available for as often as possible an automated restart can be helpfull. Ofcourse the cause of the event should be found and resolved.

      --
      200GB/2TB $7.95 Coupon: SAVE90DOLLAR
    3. Re:Restarting services... by NevarMore · · Score: 3, Interesting

      Egads! My education is useful!

      We're discussing such issues in a class I'm taking on software fault tolerance. In discussing selective restarts and backup processes Apache is frequently cited as an example of how software should fail gracefully, consistently, and then handle that failure itself. The lecture slides can be found here: http://wwwse.inf.tu-dresden.de/index.php?language= English&site=courses&course=ss06vl02

      Apache has some memory leaks in it. It is not bad, it happens, especially in a piece of software like that which is expected to run constantly and NEVER fail. So what the Apache software does is every so often, or when it detects that its memory usage is getting out of hand, it fires up a second copy of itself and then kills itself letting the new not-yet-leaky copy take over.

      So to you (IT/admin) that daemon may run forever, but thats because my people (CS/developer) did our jobs (for once) and ensured that the application cleaned up its own messes.

    4. Re:Restarting services... by Artifakt · · Score: 1

      A:
      In many, but not all cases, Yes.
      Yes. (ignoring intentional termination, as answer 1).
      None what-so-ever, unless you want to run the process 1 more time and get details on the bug to be sure it's the logfile size that crashed it.
      Yes, but you might want to restart the FTP server to see just who is placing demands on it, and particularly if they are an authorized user.

      I'm not trying to be flip here, by answering what are obviously rhetorical questions, but I want to make a point, which is you probably are answering those questions without all the qualifications I just used, and that and the rest of your comment suggests you are thinking only in terms of a two valued logical system - autorespawn EVERYthing or manually admin ALL processes.

            There's at least three whole categories of solutions that are feasable, depending on the process, and these have lots of subcategories:
      1. respawn automatically, if the admin is confident this won't lead to problems such as the one you describe, particularly the combination of recursive respawn and resource drain. Note - this is not the same as respawn automatically only if it will lead to no problems at all, some problems are small enough to bear with until time x (where x is for example, until the weekend's over).
      2. the admin does the work him/herself, under (I hope) intelligent control.
              (This is best for either:
                a. problems such as the ones you described
                        or
                b. lesser problems that may thus get fixed, (i.e. if the admin files a bug report)
      3. automatic respawn runs under controlled circumstances only, such as:
                a. notify admin always.
                b. respawn X number of times only.
                c. log processes at X (increasing) level of verbosity
                d. combinations of the above, i.e. respawn no more than 6x in 24 hrs, stop respawning if system load exceeds 75%, number logged in users greter than 120 or free memory less than 1 Gb, e-mail admin after second respawn in same period, e-mail again if any of the limiting conditions cause the respawn process to quit.

              It's only once you work out what options you need as an admin that you can decide whether Munin, (for example) can help you, and this often ends up being influenced more by real world questions such as whether the boss even wants you to cancel taking the kid to Six Flags over Foo just to support the guys in shipping and recieving that are the only people there on Saturdays, than by the desire to see that every process runs properly and all bad code gets fixed.

      --
      Who is John Cabal?
    5. Re:Restarting services... by Rudolf · · Score: 1

      Apache has some memory leaks in it. .... thats because my people (CS/developer) did our jobs (for once) and ensured that the application cleaned up its own messes.

      Okay, so you programmed the app to clean up its mess. Isn't the job of CS/developer to make sure the application doesn't make a mess? Wouldn't it be better to just fix the memory leak rather than not?

    6. Re:Restarting services... by gronofer · · Score: 1
      Shouldn't a daemon be designed to run indefinitely?

      Yeah, they should. But in the real world nothing is perfect and sometimes I'd like to run a daemon even if I know it has a few bugs. User error can be a problem too, e.g., accidently killing an sshd on a distant server and not being able to reconnect to fix it.

      The builtin solution that Unix/Linux provides, init, supposedly does the job, but it's not very convenient.

    7. Re:Restarting services... by anon101 · · Score: 1

      Ah but this way it can defend itself against unkown/undiscovered bugs. If it monitors memory usage and takes action for ANY memory leak problems then it doesn't matter where it is. Of course it should be tracked down and fixed but this is a pretty good short term fix till the memory leak is patched. Altough its not fixing the problem (the intial memory leak) it is reducing the impact of the problem.

      The ideal solution would be not to have any memory leaks in the code to start with. But if you can manage to write all your programs then you should probably stop writing "hello world" programs in 100 differant languages and write a real application ;)

      Of course there are going to be bugs, but we need to reduce how damaging they are. I wonder if it logs these problems so it can be report to developers? But then you would have to have debugging compilled in which production machines often don't have.

    8. Re:Restarting services... by Anonymous Coward · · Score: 0

      That depends.. what if its not Apache that has the memory leak, but that mod_php module, or the app that it loads and cannot unload due to shoddy programming on the non-apache developer?

    9. Re:Restarting services... by Proteus · · Score: 1
      Doesn't the fact that a process died mean that something is wrong and needs to be fixed?

      Yep. It also means that the services the process was providing are not available to my customers. Like most things, you have to weigh the tradeoffs before deciding to roll out a watchdog.

      Ideally, you'd set up a watchdog to do something like:
      1. Note problem with service
      2. Restart the service, saving off logs to a problem record
      3. Send an e-mail to the admin, attach the logs (or point to them)
      4. If it's restarting too often (n times in x minutes), leave service down and open an incident/page the admin/etc.

      That's a pretty good balance between making sure the issue gets fixed and continuing to make services available. Of course, it doesn't hold a candle to true high-availability configurations (clusters and the like), but it can work very well in a pinch.
      --
      We may not imagine how our lives could be more frustrating and complex—but Congress can. – Cullen Hightower
    10. Re:Restarting services... by NevarMore · · Score: 1

      Yes of course it would be better to fix the memory leak. Sometimes things like that are hard to track down and it is more effective to deal with it when it happens rather than trying to prevent it.

      Software bugs are inevitable. As a developer I do my best to fix as many bugs as I can, but I still know that something will go wrong. Since I know that something will go wrong sooner or later, I also make provisions to recover from failures.

      Many faults are completely out of my control, say a disk failure while I'm trying to write a file. I certainly can't stop the disk failure, but I can write my application to do something nice when it can't write rather than just crashing.

    11. Re:Restarting services... by dubl-u · · Score: 1

      It always bothers me when people use utilities to restart services that die/have been killed. Shouldn't a daemon be designed to run indefinitely?

      You say it like these two things are mutually exclusive. Why not strive for both? I pay for insurance that I strive to never need. For me, system monitoring tools are in the same category.

  13. no but use perfmon by badriram · · Score: 2, Informative

    Performance monitor is one of the best utilities on windows. It is very detailed, and most MS apps have additional counters for other detailed views. It also does remote logging, basic graphing, alerts etc.

    1. Re:no but use perfmon by killjoe · · Score: 1

      What it doesn't do is to is to write your own monitors, monitory remote systems by pinging, attempting to connect to ports, let you make custom screens with history, etc.

      Zabbix does all that and more and even lets you create your own counters and submit them via a REST interface.

      --
      evil is as evil does
    2. Re:no but use perfmon by Thundersnatch · · Score: 1

      It's certainly possible, and not too difficult, to write your own performance monitors on Windows that plug into the standard perfmon architecture.

      Note to open-source advocates: before posing "I can't do X on Windows because it is closed", search MSDN and you'll discover that you're wrong most of the time.

    3. Re:no but use perfmon by killjoe · · Score: 1

      You completely missed my point. I set up a zabbix server. I define the host on it. I define my own counters. Zabbix keeps track of them for me. None of that requires programming.

      For example if I have a host and I am running mysql on it I can send the output of "mysql -V" to zabbix and program it alert me if the version changes on any of my hosts.

      The "send to zabbix" part can be done via a binary or by opening up a socket and sending a string (basically three lines of ruby).

      This means you can keep track of all aspects of all your servers all around the world on a single machine. Since each server can "push" the data to the central zabbix server they can be behind firewalls too. As an option you can run agents on each machine which can receive requests from the central server. These agents can also periodically send any performance information to the server too. If the agents are running in windows then you can even send performance monitor items.

      Oh and it can also take SNMP traffic too.

      All that for free, you gotta love it.

      --
      evil is as evil does
  14. Looks nice so far by remembertomorrow · · Score: 1

    I host 2 websites (LAMP), some other assorted stuff (DNS, some perl scripts, screen + irssi), and sometimes a gameserver (half life or counterstrike or something similar) off of a low horsepower box here. This program seems to be something I could have really used all along, but never thought about.

    Now I can really see what is really hogging most of that machine's limited resources. :) My stats looks somewhat bland now, but I'm surely they'll be very pretty in a day or two.

    Cheers on an informative article and simple to install program

    --
    Registered Linux user #421033
  15. Orca by otisg · · Score: 2, Insightful

    I'm a happy user of Orca, which I use to graph all kinds of aspects of the system that runs Simpy's cluster.

    --
    Simpy
  16. Seems a lot less clunky than Nagios or Cacti by Burv · · Score: 3, Informative
    I've tried both Nagios and Cacti for years. They work great, are very feature rich, and seem to have a strong community.

    The one thing that annoys me about them is that, out of the box, they don't have much configured, and to install/configure stuff, you have to jump through a lot of hoops.

    In the case of cacti, it's mostly through a web-based GUI, which is OK if you have one server with one thing you want to measure, say %CPU usage, that you want to measure, but if you want to do it for a server farm or even a couple machines, it's a pain in the butt. They do have a templating system, but you still have to do a lot through the GUI. I've posted on their forums before to this effect, and they have suggestions for making changes like this en masse, but again, it doesn't work out of the box. Bottom line, the designers of cacti seem to be focused on the Web GUI, which is kinda nice for newbies, but a huge pain for people like me that like to script things.

    It's the same thing with Nagios, although at least they let you change text files for the settings. Although the number (about 20) of files is reflective of how feature rich it is, it also makes it a hassle to set up. Here's an article at samag.com that illustrates the process you need to go through... imagine this for a couple hundred servers, and you can see how arduous setting up nagios could be.

    So, although munin may not be as mature and well known as cacti, and monit not as popular as nagios, I think they're still worth trying out..

    1. Re:Seems a lot less clunky than Nagios or Cacti by Anonymous Coward · · Score: 0
      The one thing that annoys me about them is that, out of the box, they don't have much configured, and to install/configure stuff, you have to jump through a lot of hoops.

      Ganglia is quite useful for performance monitoring of large server farms / grids. It works in either multicast or unicast, allows nesting grids and the config files are easy to generate by script.
      Be sure to check out the developers mailing list archives - recent posts featured patches for sampling external devices and graph templating.

    2. Re:Seems a lot less clunky than Nagios or Cacti by Anonymous Coward · · Score: 0

      Just now installed munin on my "breezy" (ubuntu) box with all the default params. munin.conf file has only localhost in it. Ran with 'sudo /etc/init.d/munin restart'. Opening file:///var/www/munin in a browser shows starting munin page (Overview), then going to 'localhost' and it shows empty page (I mean you can see the raven logo, couple of hyperlinks and NO GRAPHS). I know it takes a couple of minutes for those graphs to show up, but it was already like 30 minutes. munin-html log keeps printing the same 4 lines over and over:

      May 07 15:15:02 - Starting munin-html, checking lock
      May 07 15:15:02 - processing domain: localdomain
      May 07 15:15:02 - processing node: localhost.localdomain
      May 07 15:15:02 - munin-html finished

      Going through the man , FAQ and online docs did not help.

      It MIGHT be that munin is easy to use, but for me it did not work out "out of the box" on a very basic system in a simplest basic setup, and I can't say there is plenty of documentation to help me.

    3. Re:Seems a lot less clunky than Nagios or Cacti by Linker3000 · · Score: 1

      Have to admit that I am a bit of a Nagios fan - it's monitoring servers (Windows and Linus) and broadband links on 30 sites for me.

      I'd just like to disagree with your comment that Nagios can get arduous to setup if you're looking at a lot of servers - in reality once you have found a configuration and set of monitoring parameters that suits you, adding more servers becomes a simple cut/paste + edit job to create new definitions for the servers - not so bad.

      --
      AT&ROFLMAO
    4. Re:Seems a lot less clunky than Nagios or Cacti by Artichoke · · Score: 1

      You did install munin and munin-node?

      --
      __
      Arse
  17. These Guys ROCK! by thehunger · · Score: 2, Informative

    I dont know anything about Munin, but the guys that wrote Munin absolutely rock! The company is Linpro, and they've been doing Linux and open source for over 10 years now. They do hosted management, remote management, development and Linux and OSS training. They also begun to package Linux and OSS based solutions for groupware, voip, management etc.

    The point is, they've been doing server management for years (using Nagios) and wrote Munin to -complement- it, not compete with it.

    Check them out, they absolutely rock..

    1. Re:These Guys ROCK! by Sk0yern · · Score: 1

      Halleluja?

    2. Re:These Guys ROCK! by Anonymous Coward · · Score: 0

      So you DO know something... go be a plant somewhere else.

    3. Re:These Guys ROCK! by Anonymous Coward · · Score: 0


      You should mention that you work for Linpro when commenting things like this. Otherwise people may think you're astroturfing.

  18. practical experience by routerguy666 · · Score: 2, Insightful

    I've tried a number of these monitoring apps as they've come out. To date, I still can't find a combination better than MRTG and Nagios. If you know a bit about SNMP and how to find the OID of what you are interested in (and where to get mibs), it's hard to find a simpler, cleaner pair of monitoring products.

    Although in all honesty, Nagios' only real benefit is the ability to send out alerts. I'm more fortunate than others, I know, in that I've had the resources available to build redundancy in at every level of our production networks so when something does die (and with modern platforms this is becoming a once every two years event) it doesn't create a major catastrophe.

    Other than that, all the trending info I want/need on bandwidth, cpu, disk space, user loads, etc, etc, I can pull out of any device via snmp and track it with MRTG. Plus each MRTG release doesn't require me to rewrite umpteen config files to match the author's latest greatest idea of how they should be formatted (my only real gripe about nagios/netsaint).

    In the end I guess you use what you are familiar with, and I cut my teeth on these.

    1. Re:practical experience by Stinking+Pig · · Score: 1

      I'd say that Nagios' real strength is actually its dirt-simple plugin architecture. Use any language you like to figure out any state that you want, and you can have Nagios monitor, alert, or take corrective action on it. Monitoring a single machine is easy -- using Perl to step through several sections of your entire website, expect to log in to your RADIUS/PPPoE infrastructure, or bash to make sure that Mailman is still receiving and resending emails is a job for Nagios.

      --
      "Nothing was broken, and it's been fixed." -- Jon Carroll
  19. bigger == better? by crayz · · Score: 1

    "We need a bigger server soon, our load average is increasing rapidly."

    I'm a bit unclear on this...is server performance now measured directly by the amount of space it takes up?

    1. Re:bigger == better? by 0racle · · Score: 1

      Yes.

      --
      "I use a Mac because I'm just better than you are."
    2. Re:bigger == better? by Anonymous Coward · · Score: 0

      >> We need a bigger server soon, our load average is increasing rapidly.
      >
      > I'm a bit unclear on this...is server performance now measured directly by
      > the amount of space it takes up?

      Saying "bigger server" is usually meant metaphorically - the new server may not be physically bigger, but it will have bigger performance numbers: faster cpu, more disk space, faster I/O, more memory.

  20. Add OpenNMS by nrc · · Score: 3, Informative

    Add OpenNMS to the list of stuff that this duplicates or overlaps with. Not that anyone in OSS needs permission to reinvent the wheel. You've got an itch - you scratch as it pleases you.

    1. Re:Add OpenNMS by Antique+Geekmeister · · Score: 1

      Although I have to admit, if people would concentrate on clearing out the poison ivy instead of scratching their personal itch, there'd need to be a lot less scratching. The "poison ivy" is the plethora of badly written tools already in place, with seriously unfortunate user interfaces.

      A famous write-up of the failures of user interfaces and configuration tools in open source got slashdotted when written by Eric Raymond, several years ago, at http://www.catb.org/~esr/writings/cups-horror.html . It's even funnier because the CUPS authors responded very graciously to his complaints by saying they'd fix at least some of them, and don't seem to have actually fixed *ANY* of the things Eric griped about in the 2 years since then.

    2. Re:Add OpenNMS by ximenes · · Score: 1

      There hasn't been a majorly changed CUPS release since then I don't think. CUPS 1.2 is supposed to be out in the semi-near future, perhaps some of the issues will have finally been addressed then.

      Its certainly the best of the slim crop of printing systems (LPRng and PPR being the only other two I believe), but it leaves a lot to be desired in some areas.

  21. Cacti & Nagios by Alives · · Score: 1

    Yeah ive been running cacti and nagios for a year now and Nagios seems a little superior to this monitoring prog. The grapher is just an RRD poller, same as cacti it seems. Have you tried cacti or nagios as well?

  22. JFFNMS, BB, Hobbit,etc by falzbro · · Score: 2, Informative

    Since we're on the subject, others have mentioned Nagios and MRTG of course. Be sure to check out JFFNMS (Just for fun). Horrible name for what it does, since it's quite powerful. For Big Brother users, I would recommend checking out Hobbit Monitor as a replacement of the server portion. It's compatible with the BB client, but has far more features and includes some basic MRTG graphs.

    I have yet to find an all in one integrated open source solution for monitoring (cpu, processes, port reachability), alerts (email, sms, etc). The closest I've found is JFFNMS, but writing alert rules and such is difficult to say the least.

    While on the subject, if it's not too terribly off-topic, what do people use to bill based on network usage (MRTG, RRD). Both claim that you should NOT bill off of that information, but I have yet to find any other open source solution.

    --falz

    1. Re:JFFNMS, BB, Hobbit,etc by fimbulvetr · · Score: 2, Informative

      Back when I admin'd an ISP that billed by usage, we used mrtg and the mrtg 95 percentile scripts. On more than one occasion, we had customers inquire about our billing. Fortunately, most of our customers were technically literate, so I stepped through the code and procedures with them. All of them were happy with the explanitions and were satisfied after they saw the methods. That's not to say mrtg and the 95th percentile scripts are bulletproof, but they held up under our scrutiny.

      http://www.seanadams.com/95/

    2. Re:JFFNMS, BB, Hobbit,etc by simishag · · Score: 1
      While on the subject, if it's not too terribly off-topic, what do people use to bill based on network usage (MRTG, RRD). Both claim that you should NOT bill off of that information, but I have yet to find any other open source solution.

      Both are correct: you should not bill off plain RRD-based formats, as old data is removed over time, meaning your "95th percentile" isn't valid anymore. The main reasons this is acceptable in most cases are: 1) most people just want pretty graphs and don't need to do usage based billing; 2) the RRD file will remain a constant size forever and won't fill up the disk.

      For real billing, I use RTG (http://rtg.sourceforge.net/) which stores all samples in MySQL. It should be safe to run it side by side with MRTG or Cacti or whatever. It still has the same averaging functionality for graphing, but it never removes old samples from the database so you can run complete usage reports. The downside is that the data will grow without bound, so you'll need to keep an eye on the DB (although mine's only growing about 4 MB/month).

  23. yawn, pimping his multi-page, ad filled... by Anonymous Coward · · Score: 0

    Yay.

    So he's pimping his average quality guides on his Web site on Slashdot, split amongst six or so pages, as an obvious ad whoring tactic?

    They aren't even that good.

  24. restart by packetmill · · Score: 0

    and takes the appropriate action such as a restart if it finds a service is not behaving as expected.

    Why do you have to fo that, apart from to piss the user off?

  25. Very nice! by ngunton · · Score: 2, Insightful

    I hadn't heard of this before. I liked the sound of pretty graphs, and I particularly liked how easy the article made it sound to install and get things working. So I tried it (I'm running Sarge AMD64 on the server) and it worked fine. In fact, it was up and running in a couple of minutes. Very nice!

    I have to say it is refreshing to see something that "just works" out of the box with sensible defaults. Truth be told, I am sick and tired of these holier-than-thou OSS zealots who keep pushing bloated, complex toolkits which have every option under the sun, but it doesn't all "just work" out of the install, no, that would be too easy wouldn't it. You have to read through reams of distributed, fragmented documentation, forum posts and other sources to get the damn thing working properly, not to mention cobbling together all these !@#$ing plugins that are sooooo wonderful and yet just end up being a pain in the butt because you have to track them all down individually. Why can't geeks grasp a simple fact: People don't necessarily have the time or inclination to spend days learning the arcane innards of your toolkit. I don't care if people say "well if you can't be bothered taking the time then you're not a real admin" or whatever, if I had to spend a lot of time on every package tuning it and writing a sendmail.cf-esque config file just to get it working *the way it should by default* then I'm probably just going to look for something else. That something else may be simpler and not as "pure" as your baby, but you know what? I'll use it, because it *just works* and does *most* things in a simple intuitive way. That's why MySQL became successful, and why PostgreSQL didn't - sure, PostgreSQL was more powerful (in theory anyway) and had a bunch more features, but it isn't optimized out of the box. Whenever I see people complain about how slow PostgreSQL turns out to be when they finally try it, the inevitable reply is "Well, you need to spend time tuning it - if you don't do that then you don't deserve to be running a server". Whatever. As far as I'm concerned these "Tuning required by default" and "You aren't a *real* x if you don't learn these reams of config options just to get it working" people just don't get it. Make it work out of the box with sensible defaults, and let people delve into stuff further *if they want to*, not by requirement.

    I think the snobs are like this because they did go and learn all that stuff, and so they feel deep down that they have to justify that it was all worth it by putting down those who have a life and don't feel like dedicating days and weeks of effort to getting some stupid software package to function in the most basic way.

    So, great job Munin. My hat is off to you - I have a graphical monitoring system for my server, and it took me about two minutes to get it working. Fantastic.

    1. Re:Very nice! by Tim+C · · Score: 1

      While I agree with most of your comments, I'm a little perplexed by your use of the past tense with regards to Postgres. Sure, it's not as popular as MySQL, but it's in no way dead...

    2. Re:Very nice! by ngunton · · Score: 1

      Sorry, I didn't mean to make it sound like PostgreSQL is dead.

      However, I don't think it will ever attain real popularity until the developers and zealots get over themselves and make it more straightforward... which may never happen, but whatever.

      It's not dead. There will always be some people using it, just as some people will always love Lisp - it's a purist thing.

    3. Re:Very nice! by killjoe · · Score: 1

      Damn those mean OSS zealots. I wish they would all die so I can go back to paying for crappy software again.

      --
      evil is as evil does
    4. Re:Very nice! by Anonymous Coward · · Score: 0

      > However, I don't think it will ever attain real popularity until the developers and zealots get over themselves and make it more straightforward... which may never happen, but whatever

      How is it not straightforward? I installed the package and an instance was running. My only gripe is that it's fairly insecure by default, letting localhost in with no password. On windows, it installs with an MSI.

      Getting things like fulltext and good fast cidr support (sacrificing ipv6 for speed with ipv4) was a matter of running ONE sql script through psql.

      Getting replication working sucks and forget about multi-master. Other than that, what's hard?

    5. Re:Very nice! by TheViewFromTheGround · · Score: 1

      I largely agree with you, but as usual, it just depends and that's part of the power of free software. I don't need Nagios and its crazy configuration to do my monitoring, so I use monit and munin much as described here and it worked without any significant configuration and did what I wanted. Very nice. But some people do need something as complex as Nagios. And there you go -- there are multiple projects that fill different niches. Sure, there are people who will be elitist about such things, but screw 'em if they can't take a joke (or be pragmatic). Use the right tool. But remember that in a monoculture, there's typically only one tool -- and that's far worse the the "zealots".

      --
      Online citizen journalism from the inner city: The View From The Ground
    6. Re:Very nice! by Rincewind42 · · Score: 1

      I am a system admin. But I only admin 1 server. It's not worth my time to learn every in and out of all the tools that are out there. There's just so many tools and so many options. So I depend apone threads like this to tell me what scripts and tools are actually usefull. I don't have the time to test every script just to handle 1 server. I can see that many of you already have your favorite monotoring software. But very few of you say why your choice is better. It seems that your choice is only better cause you have already installed it. Same reason people use windows so much. It's already installed so why change.

  26. Damn Straight! by Anonymous Coward · · Score: 1, Insightful

    I'm with you on that one. I just can't understand why so many people keep re-inventing the wheel rather than simply learning a bit of SNMP. SNMP and its tools provide all of this functionality and more. Why does everyone keep doing their own protocol and server and agent software? There are already several standard methods for handling this via DMTF WEBM, CIM and good old SNMP. Also, why are so many people willing to run agents from obscure packages that are likely full of bugs and certain to be abandoned in the not so distant future? Why can't we just have more SNMP agents and instrumentation?

    Some people deride SNMP over its security issues but, how is the security of all these funky apps and agents any better? Additionally, even with SNMP security being as "weak" as it is claimed to be, it has yet to create a significant problem. Yes, there have been some scares when vulnerabilities were discovered but, the internet has yet to collapse because of scary old SNMP.

    The last thing I want to do is add yet another flaky process to my systems. It's pretty embarrassing when your monitoring agent brings down the server! Or your management console decides to poll it to death! SNMP is almost always already there and running, why not just leverage it?

    P.S. Yes, I know that Munin can use SNMP but, that is a side note and not its primary operating mode.

    1. Re:Damn Straight! by Anonymous Coward · · Score: 0

      > the internet has yet to collapse because of scary old SNMP.

      The continued operation of the rest of the internet is scant comfort to someone who's had their servers haxx0red. I would never expose SNMP to the public internet. Ever. Tunnel it if you have to manage remotely.

  27. Similar to MonAMI by CunningPike · · Score: 1

    Sounds similar to a project I'm working on called MonAMI, which aims to be more flexible, but is currently less mature.

    --
    | What, you were expecting
    -O_O- +---- something witty?
  28. Speaking of those databases... by Anonymous Coward · · Score: 1, Interesting

    Slashdotters and possible Wikipedia users:

    Is there a MySQL -> PostgreSQL FAQ list out there? If not, would it be appropriate to make one in, say, Wikipedia? I have some ideas I wouldn't mind sharing with other users who "grew up" with MySQL and got used to all its particular features.

  29. Zabbix by Anonymous Coward · · Score: 0

    One word... Zabbix does it all...

    1. Re:Zabbix by gullevek · · Score: 1

      Zabbix is crap IMHO. I tried to get something out, following the Step by Step guide, but I could see nothing. On the other hand munin I had immediate success. Even Cacti is easier to use than Zabbix.

      --
      "Freiheit ist immer auch die Freiheit des Andersdenkenden" - Rosa Luxemburg, 1871 - 1919
  30. But can it track slashdot dupes? by Anonymous Coward · · Score: 0

    Anyone know of a system that will track and display a graphical history of slashdot dupes? Preferably with 3d viz because I don't think anything else would cut it.

  31. What Digg Uses by philovivero · · Score: 2, Informative

    At Digg, we use Nagios to alert (with all the warts that go along with that). We use Cacti to monitor and graph. It's a relatively nice front-end to RRDtool.

    I'm the MySQL DBA and I spent a long, long time (in concert with Peter Zaitsev of MySQL AB fame) tweaking the existing Cacti MySQL templates to add InnoDB graphing support (and a new MemcacheD set of graphing templates) and put them all over here: my mysqlUtils page.

    I'd never heard of this pair of monitoring/alerting software before. Hopefully it improves on the state of monitoring and alerting, because I feel Nagios and Cacti (and Ganglia) leave a fair bit to be desired.

    (By the way, that page includes a fair bit of other utilities, too, not just Cacti templates)

  32. Munin and restarts. by jafo · · Score: 2, Informative

    Munin is nice because it's just so simple to install and configure it. We used to use some scripts I had written to track server statistics, but have entirely switched to munin. However, munin also has some "monitoring" capabilities, which I usually disable. I wish they just stuck to graphing and didn't try to add monitoring to munin.

    Also, generating a lot of graphs can impact the system load. Not that you shouldn't use it, but I have definitely seen times where the system was getting hit particularly hard and munin seemed to be using up a lot of resourcesm at the same time. You probably don't want to install it on an already overloaded system...

    Also, munin's design is such that if the system gets hit particularly hard, munin may not be able to run and capture this information. It doesn't lock itself into memory, or run at an escallated priority, so if the system is being thrashing particularly hard, you often will get empty samples in munin instead of getting pointers to whether the problem was due to high load, high disc activity, high swap activity, etc... So it's really better suited to long-term capacity planning more than tracking down short-term load problems.

    As far as setting up service restarts, I totally agree that it's the lazy way out. The ideal solution is to track the problem to root cause and prevent it from happening. However, unlike the other respondant, I'm fine with that.

    As a sys admin, your job is to keep the system and services available. A brain-dead restart of Apache or bind once a week is much preferable to leaving it down for hours from 3am to 9am and then trying to track down a bug in bind or some random PHP application.

    So, by all means fix the real cause if possible. However, I recommend setting up automatic restarts with alerts going to appropriate people so you can keep an eye on when restarts happen. For one of my machines an apache restart happens about once every 2 weeks, and a bind restart happens once every other month. I'm not particularly inclined to spend significant resources debugging bind to prevent a 60 second outage of one of my two name servers once every 60 days. At least not today, I have other higher priority tasks to work on.

    Sean

  33. Zabbix by HermanAB · · Score: 1

    Anyone using Zabbix? (http://www.zabbix.com/)

    --
    Oh well, what the hell...
  34. Ganglia vs Munin by burtonator · · Score: 1

    Munin is pretty damn nice... Ganglia is also pretty decent. Both allow you to write custom scripts as well which is very important.

    If you have a bigger cluster you might want to check out Ganglia too. They use UDP for machine discovery.

    This might be bad for some people who rent hardware.

    KEvin

  35. Too Bad... by the+MaD+HuNGaRIaN · · Score: 1

    they lack originality when it comes to names. And too bad nobody else on slashturd recognized this.

    Munin is a Distributed Shared Memory system that was developed in 1991 at Rice University by John Bennett, John Carter, and Willy Zwaenepoel. Munin was unique in that it used a release-consistency model of coherence.

    Release-consistency attempts to increase performance by minimizing the amount of communication required to maintain consistency. Release-consistency works by buffering updates between synchronization points. In order to accomplish this, Munin requires that synchronization of processes be strictly controlled by using synchronization objects.

    These synchronization objects are not accessed in the same way that shared data objects are accessed. Instead, Munin employs a synchronization manager, which is simply comprised of the Munin server on each node interacting with each other. In order for release-consistency to work, Munin distinguishes between acquire synchronization objects and release synchronization objects.

    Munin also requires that all synchronizations are visible to the system as a whole. This way, a global ordering can be determined based on the totality of the partial-orderings between each synchronization. Munin provides queue based locks, and barriers for synchronization.

    But, I am talking Computer Science now and I know I've already lost the myspace-esque crowd that slashturd has become. If by some chance you've made it this far in the post, and have mod points, do the right thing and mark it insightful. Anything less merely proves my point.

  36. What would be the point? by toadlife · · Score: 1

    You only need such cruft for those unreliable pinko-commie backed *nix type sytsems. I mean, when is the last time you heard of a Windows server going down??

    --
    I don't always use unix-like operating systems; but when I do, I prefer FreeBSD.
  37. By CmdrTaco... by evilviper · · Score: 1

    from the I-forgotting-to-put-a-department dept.

    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  38. -1, Arrogant Ass by jabbo · · Score: 1

    Munin is also a nanosatellite project in Sweden.

    Munin the DSM project seems to be dead. Must not have been quite as earthshattering as you think. Mach seemed like a good idea at the time, too.

    Remember Wolfpack? More DSM. More complexity. Not widely used.

    Get off your soapbox before you fall and hurt yourself.

    --
    Remember that what's inside of you doesn't matter because nobody can see it.
    1. Re:-1, Arrogant Ass by qwp · · Score: 1

      I agree.. this guy is an ass.

  39. Munin And Monit? by AndroidCat · · Score: 2, Funny

    Shouldn't that be Hugin and Munin?

    --
    One line blog. I hear that they're called Twitters now.
  40. collectd by Anonymous Coward · · Score: 1, Interesting

    If you have multiple *NIX servers to monitor, check out collectd: http://collectd.org/

    The client reports various system statistics to a central collection server, which dumps the information into RRD files. Because it's a push sort of thing, there's no hassling with opening ports or running additional network accessible services on the clients. (UCD-SNMP has always made me nervous.)

    Monitoring a new machine is as simple as installing collectd and pointing it at your collectd server. The server automatically creates RRD files for the new host, and you're off and running. No configuration changes are required on the server. Make yourself a pre-configured package, and monitoring a new machine is a snap.

    1. Re:collectd by Anonymous Coward · · Score: 0

      Agree, looks like this package folows methodology described at http://www.infrastructures.org/bootstrap/pushpull. shtml. But collectd.org plugins has less features then net-snmpd + OpenNMS...

  41. Oh look, a fork! by hacker · · Score: 1

    Looks like someone repackaged up HotSaNIC and rebranded it as their own. Graphs are IDENTICAL. I knew something looked mighty familiar when I saw them, because I've been running HotSaNIC on our servers for awhile now. Great stuff.

    1. Re:Oh look, a fork! by atomic-penguin · · Score: 1

      Both have graphs generated by RRDTool, that's why they look the same. It has nothing to do with the codebase being the same. The graphing tools are the same in both.

      --
      /^([Ss]ame [Bb]at (time, |channel.)){2}$/
  42. Have you tried Zabbix? by pdwalker · · Score: 1

    This tool beats the doors off of many I have tried to use. The setup is simple and the ability to monitor and graph the data is unmatched.

    http://www.zabbix.org/

    Give it a try.

  43. SNMP Support on Linux by Ponga · · Score: 1

    You know, I find SNMP support on Linux is pretty weak.
    We have several Windoze servers running SQL, IIS, and other services - all of which, we were able to find MIB's for and monitor via snmp very easily. We keep track an MANY aspects of these servers and log historically via our snmp clients.
    We have recently been introducing many Linux servers and upon trying to monitor then in a similar fasion, I have found that several things just are not possible!
    For exmaple, Apache is REALLY hard to monitor with SNMP. You have to custom compile with mod_snmp. Postfix... same situation, although I can't recall being able to monitor that at all with snmp.
    I wish that were not the case. And ya, I know. A lot of this has to do with the specific package in question, not Linux, per se. Still, it reflects poorly on the OS for which is was designed.
    At any rate, it has influenced our decisions about installing new Linux platforms :(
    I wish it were not the case, but we WANT to use snmp!

  44. Re: Apps for Windoze by Ponga · · Score: 1

    SNMP is the answer to your question.

    http://www.ipswitch.com/Products/WhatsUp/WhatsUP - Kinda pricy. I don't know, there may be an FOSS solution, but I have never seen one.

    http://www.snmp-informant.com/SnmpInformant - The seller of this product is pretty lame, but the mibs (if even needed) work just fine.

    http://www.paessler.com/prtg/Prtg - *GREAT* little app (Windoze version of MRTG... on steroids) for only $40 that collects SNMP data and presents it in graphs using it's own http server. *GREAT* little app!

    -Ponga