Server Monitoring With Munin And Monit
hausmasta writes "In this article I will describe how to monitor your server with munin and monit. munin produces nifty little graphics about nearly every aspect of your server (load average, memory usage, CPU usage, MySQL throughput, eth0 traffic, etc.) without much configuration, whereas monit checks the availability of services like Apache, MySQL, Postfix and takes the appropriate action such as a restart if it finds a service is not behaving as expected. The combination of the two gives you full monitoring: graphics that lets you recognize current or upcoming problems (like "We need a bigger server soon, our load average is increasing rapidly."), and a watchdog that ensures the availability of the monitored services."
How is this different from cacti?
200GB/2TB $7.95 Coupon: SAVE90DOLLAR
However, making graphs and monitoring your services is a very good thing. Graphs are invaluable in determining trends, such as memory leaks or steadily increasing load. Monitoring saves lots of downtime and unhappy customers ;-)
Personally I use nagios for monitoring and DIY scripts for graphing. The latter mostly because I started making graphs before decent of-the-shelf software was available ;-)
PS. what's this subject got to do with debian?
This is your sig. There are thousands more, but this one is yours.
It always bothers me when people use utilities to restart services that die/have been killed. Shouldn't a daemon be designed to run indefinitely? Doesn't the fact that a process died mean that something is wrong and needs to be fixed? For instance, if my apache daemon dies because the logfile is larger than it can handle, what good is restarting it going to do? It's just going to beat the crap out of a server - process dies - watcher daemon starts it up - process dies...etc.
Or, if the OOM killer kills my ftp server because he's hogging the memory, doesn't that mean I have bigger problems than just doing a restart(I need more memory, the ftp server has a mem leak, etc)?
None of my hundreds of critical daemons die for no reason whatsoever - all of require some type of human interaction if they have died. It doesn't happen very often, maybe once every several months.
Not that I care about this software in general, I use hobbit for my trending/graphing/service availability, but I hate to see bad admin'ing, even if I'm not involved.
I'm a happy user of Orca, which I use to graph all kinds of aspects of the system that runs Simpy's cluster.
Simpy
I've tried a number of these monitoring apps as they've come out. To date, I still can't find a combination better than MRTG and Nagios. If you know a bit about SNMP and how to find the OID of what you are interested in (and where to get mibs), it's hard to find a simpler, cleaner pair of monitoring products.
Although in all honesty, Nagios' only real benefit is the ability to send out alerts. I'm more fortunate than others, I know, in that I've had the resources available to build redundancy in at every level of our production networks so when something does die (and with modern platforms this is becoming a once every two years event) it doesn't create a major catastrophe.
Other than that, all the trending info I want/need on bandwidth, cpu, disk space, user loads, etc, etc, I can pull out of any device via snmp and track it with MRTG. Plus each MRTG release doesn't require me to rewrite umpteen config files to match the author's latest greatest idea of how they should be formatted (my only real gripe about nagios/netsaint).
In the end I guess you use what you are familiar with, and I cut my teeth on these.
I hadn't heard of this before. I liked the sound of pretty graphs, and I particularly liked how easy the article made it sound to install and get things working. So I tried it (I'm running Sarge AMD64 on the server) and it worked fine. In fact, it was up and running in a couple of minutes. Very nice!
I have to say it is refreshing to see something that "just works" out of the box with sensible defaults. Truth be told, I am sick and tired of these holier-than-thou OSS zealots who keep pushing bloated, complex toolkits which have every option under the sun, but it doesn't all "just work" out of the install, no, that would be too easy wouldn't it. You have to read through reams of distributed, fragmented documentation, forum posts and other sources to get the damn thing working properly, not to mention cobbling together all these !@#$ing plugins that are sooooo wonderful and yet just end up being a pain in the butt because you have to track them all down individually. Why can't geeks grasp a simple fact: People don't necessarily have the time or inclination to spend days learning the arcane innards of your toolkit. I don't care if people say "well if you can't be bothered taking the time then you're not a real admin" or whatever, if I had to spend a lot of time on every package tuning it and writing a sendmail.cf-esque config file just to get it working *the way it should by default* then I'm probably just going to look for something else. That something else may be simpler and not as "pure" as your baby, but you know what? I'll use it, because it *just works* and does *most* things in a simple intuitive way. That's why MySQL became successful, and why PostgreSQL didn't - sure, PostgreSQL was more powerful (in theory anyway) and had a bunch more features, but it isn't optimized out of the box. Whenever I see people complain about how slow PostgreSQL turns out to be when they finally try it, the inevitable reply is "Well, you need to spend time tuning it - if you don't do that then you don't deserve to be running a server". Whatever. As far as I'm concerned these "Tuning required by default" and "You aren't a *real* x if you don't learn these reams of config options just to get it working" people just don't get it. Make it work out of the box with sensible defaults, and let people delve into stuff further *if they want to*, not by requirement.
I think the snobs are like this because they did go and learn all that stuff, and so they feel deep down that they have to justify that it was all worth it by putting down those who have a life and don't feel like dedicating days and weeks of effort to getting some stupid software package to function in the most basic way.
So, great job Munin. My hat is off to you - I have a graphical monitoring system for my server, and it took me about two minutes to get it working. Fantastic.
I'm with you on that one. I just can't understand why so many people keep re-inventing the wheel rather than simply learning a bit of SNMP. SNMP and its tools provide all of this functionality and more. Why does everyone keep doing their own protocol and server and agent software? There are already several standard methods for handling this via DMTF WEBM, CIM and good old SNMP. Also, why are so many people willing to run agents from obscure packages that are likely full of bugs and certain to be abandoned in the not so distant future? Why can't we just have more SNMP agents and instrumentation?
Some people deride SNMP over its security issues but, how is the security of all these funky apps and agents any better? Additionally, even with SNMP security being as "weak" as it is claimed to be, it has yet to create a significant problem. Yes, there have been some scares when vulnerabilities were discovered but, the internet has yet to collapse because of scary old SNMP.
The last thing I want to do is add yet another flaky process to my systems. It's pretty embarrassing when your monitoring agent brings down the server! Or your management console decides to poll it to death! SNMP is almost always already there and running, why not just leverage it?
P.S. Yes, I know that Munin can use SNMP but, that is a side note and not its primary operating mode.