Server Monitoring With Munin And Monit
hausmasta writes "In this article I will describe how to monitor your server with munin and monit. munin produces nifty little graphics about nearly every aspect of your server (load average, memory usage, CPU usage, MySQL throughput, eth0 traffic, etc.) without much configuration, whereas monit checks the availability of services like Apache, MySQL, Postfix and takes the appropriate action such as a restart if it finds a service is not behaving as expected. The combination of the two gives you full monitoring: graphics that lets you recognize current or upcoming problems (like "We need a bigger server soon, our load average is increasing rapidly."), and a watchdog that ensures the availability of the monitored services."
.... been waiting a while to say that.
...been waiting a while to say that. FTFA: The easiest way to follow this tutorial is to use a command line client/SSH client (like PuTTY for Windows)
Doesn't swatch already do the job of monit? It works very nicely for me, watching servers as well as processes that generate log files
How is this different from cacti?
200GB/2TB $7.95 Coupon: SAVE90DOLLAR
What advantage does this have over Naggios?
RMTTFFL (Read More Than The First Fucking Line) ....never used that one.
Server Monitoring on Windows != "follow this tutorial is to use a command line client/SSH client (like PuTTY for Windows)"
having said that, a good, free, open source server monitoring solution (including Windows Servers) is MRTG.
The dude will definitely need a bigger server now every slashdot geek rush to view his website.
Im not sure i follow why this is newsworthy. NAGIOS is OSS and is an extremely mature product with a community writing modules and plugins etc etc, to monitor any aspect you wanted of your Servers/Routers/Networks/room temperatures, i mean anything. Why would anyone bother?
I dont think he meant the *client* side. He want to monitor windows servers.. Or at least thats what i would be asking in that situation.
---- Booth was a patriot ----
Don't forget about the big brother clone, hobbit.
SF.net at: http://hobbitmon.sourceforge.net/
Live example at: http://www.hswn.dk/hobbit/
However, making graphs and monitoring your services is a very good thing. Graphs are invaluable in determining trends, such as memory leaks or steadily increasing load. Monitoring saves lots of downtime and unhappy customers ;-)
Personally I use nagios for monitoring and DIY scripts for graphing. The latter mostly because I started making graphs before decent of-the-shelf software was available ;-)
PS. what's this subject got to do with debian?
This is your sig. There are thousands more, but this one is yours.
It always bothers me when people use utilities to restart services that die/have been killed. Shouldn't a daemon be designed to run indefinitely? Doesn't the fact that a process died mean that something is wrong and needs to be fixed? For instance, if my apache daemon dies because the logfile is larger than it can handle, what good is restarting it going to do? It's just going to beat the crap out of a server - process dies - watcher daemon starts it up - process dies...etc.
Or, if the OOM killer kills my ftp server because he's hogging the memory, doesn't that mean I have bigger problems than just doing a restart(I need more memory, the ftp server has a mem leak, etc)?
None of my hundreds of critical daemons die for no reason whatsoever - all of require some type of human interaction if they have died. It doesn't happen very often, maybe once every several months.
Not that I care about this software in general, I use hobbit for my trending/graphing/service availability, but I hate to see bad admin'ing, even if I'm not involved.
Performance monitor is one of the best utilities on windows. It is very detailed, and most MS apps have additional counters for other detailed views. It also does remote logging, basic graphing, alerts etc.
I host 2 websites (LAMP), some other assorted stuff (DNS, some perl scripts, screen + irssi), and sometimes a gameserver (half life or counterstrike or something similar) off of a low horsepower box here. This program seems to be something I could have really used all along, but never thought about.
:) My stats looks somewhat bland now, but I'm surely they'll be very pretty in a day or two.
Now I can really see what is really hogging most of that machine's limited resources.
Cheers on an informative article and simple to install program
Registered Linux user #421033
I'm a happy user of Orca, which I use to graph all kinds of aspects of the system that runs Simpy's cluster.
Simpy
The one thing that annoys me about them is that, out of the box, they don't have much configured, and to install/configure stuff, you have to jump through a lot of hoops.
In the case of cacti, it's mostly through a web-based GUI, which is OK if you have one server with one thing you want to measure, say %CPU usage, that you want to measure, but if you want to do it for a server farm or even a couple machines, it's a pain in the butt. They do have a templating system, but you still have to do a lot through the GUI. I've posted on their forums before to this effect, and they have suggestions for making changes like this en masse, but again, it doesn't work out of the box. Bottom line, the designers of cacti seem to be focused on the Web GUI, which is kinda nice for newbies, but a huge pain for people like me that like to script things.
It's the same thing with Nagios, although at least they let you change text files for the settings. Although the number (about 20) of files is reflective of how feature rich it is, it also makes it a hassle to set up. Here's an article at samag.com that illustrates the process you need to go through... imagine this for a couple hundred servers, and you can see how arduous setting up nagios could be.
So, although munin may not be as mature and well known as cacti, and monit not as popular as nagios, I think they're still worth trying out..
I dont know anything about Munin, but the guys that wrote Munin absolutely rock! The company is Linpro, and they've been doing Linux and open source for over 10 years now. They do hosted management, remote management, development and Linux and OSS training. They also begun to package Linux and OSS based solutions for groupware, voip, management etc.
The point is, they've been doing server management for years (using Nagios) and wrote Munin to -complement- it, not compete with it.
Check them out, they absolutely rock..
I've tried a number of these monitoring apps as they've come out. To date, I still can't find a combination better than MRTG and Nagios. If you know a bit about SNMP and how to find the OID of what you are interested in (and where to get mibs), it's hard to find a simpler, cleaner pair of monitoring products.
Although in all honesty, Nagios' only real benefit is the ability to send out alerts. I'm more fortunate than others, I know, in that I've had the resources available to build redundancy in at every level of our production networks so when something does die (and with modern platforms this is becoming a once every two years event) it doesn't create a major catastrophe.
Other than that, all the trending info I want/need on bandwidth, cpu, disk space, user loads, etc, etc, I can pull out of any device via snmp and track it with MRTG. Plus each MRTG release doesn't require me to rewrite umpteen config files to match the author's latest greatest idea of how they should be formatted (my only real gripe about nagios/netsaint).
In the end I guess you use what you are familiar with, and I cut my teeth on these.
"We need a bigger server soon, our load average is increasing rapidly."
I'm a bit unclear on this...is server performance now measured directly by the amount of space it takes up?
Add OpenNMS to the list of stuff that this duplicates or overlaps with. Not that anyone in OSS needs permission to reinvent the wheel. You've got an itch - you scratch as it pleases you.
Yeah ive been running cacti and nagios for a year now and Nagios seems a little superior to this monitoring prog. The grapher is just an RRD poller, same as cacti it seems. Have you tried cacti or nagios as well?
Since we're on the subject, others have mentioned Nagios and MRTG of course. Be sure to check out JFFNMS (Just for fun). Horrible name for what it does, since it's quite powerful. For Big Brother users, I would recommend checking out Hobbit Monitor as a replacement of the server portion. It's compatible with the BB client, but has far more features and includes some basic MRTG graphs.
I have yet to find an all in one integrated open source solution for monitoring (cpu, processes, port reachability), alerts (email, sms, etc). The closest I've found is JFFNMS, but writing alert rules and such is difficult to say the least.
While on the subject, if it's not too terribly off-topic, what do people use to bill based on network usage (MRTG, RRD). Both claim that you should NOT bill off of that information, but I have yet to find any other open source solution.
--falz
Yay.
So he's pimping his average quality guides on his Web site on Slashdot, split amongst six or so pages, as an obvious ad whoring tactic?
They aren't even that good.
and takes the appropriate action such as a restart if it finds a service is not behaving as expected.
Why do you have to fo that, apart from to piss the user off?
I hadn't heard of this before. I liked the sound of pretty graphs, and I particularly liked how easy the article made it sound to install and get things working. So I tried it (I'm running Sarge AMD64 on the server) and it worked fine. In fact, it was up and running in a couple of minutes. Very nice!
I have to say it is refreshing to see something that "just works" out of the box with sensible defaults. Truth be told, I am sick and tired of these holier-than-thou OSS zealots who keep pushing bloated, complex toolkits which have every option under the sun, but it doesn't all "just work" out of the install, no, that would be too easy wouldn't it. You have to read through reams of distributed, fragmented documentation, forum posts and other sources to get the damn thing working properly, not to mention cobbling together all these !@#$ing plugins that are sooooo wonderful and yet just end up being a pain in the butt because you have to track them all down individually. Why can't geeks grasp a simple fact: People don't necessarily have the time or inclination to spend days learning the arcane innards of your toolkit. I don't care if people say "well if you can't be bothered taking the time then you're not a real admin" or whatever, if I had to spend a lot of time on every package tuning it and writing a sendmail.cf-esque config file just to get it working *the way it should by default* then I'm probably just going to look for something else. That something else may be simpler and not as "pure" as your baby, but you know what? I'll use it, because it *just works* and does *most* things in a simple intuitive way. That's why MySQL became successful, and why PostgreSQL didn't - sure, PostgreSQL was more powerful (in theory anyway) and had a bunch more features, but it isn't optimized out of the box. Whenever I see people complain about how slow PostgreSQL turns out to be when they finally try it, the inevitable reply is "Well, you need to spend time tuning it - if you don't do that then you don't deserve to be running a server". Whatever. As far as I'm concerned these "Tuning required by default" and "You aren't a *real* x if you don't learn these reams of config options just to get it working" people just don't get it. Make it work out of the box with sensible defaults, and let people delve into stuff further *if they want to*, not by requirement.
I think the snobs are like this because they did go and learn all that stuff, and so they feel deep down that they have to justify that it was all worth it by putting down those who have a life and don't feel like dedicating days and weeks of effort to getting some stupid software package to function in the most basic way.
So, great job Munin. My hat is off to you - I have a graphical monitoring system for my server, and it took me about two minutes to get it working. Fantastic.
I'm with you on that one. I just can't understand why so many people keep re-inventing the wheel rather than simply learning a bit of SNMP. SNMP and its tools provide all of this functionality and more. Why does everyone keep doing their own protocol and server and agent software? There are already several standard methods for handling this via DMTF WEBM, CIM and good old SNMP. Also, why are so many people willing to run agents from obscure packages that are likely full of bugs and certain to be abandoned in the not so distant future? Why can't we just have more SNMP agents and instrumentation?
Some people deride SNMP over its security issues but, how is the security of all these funky apps and agents any better? Additionally, even with SNMP security being as "weak" as it is claimed to be, it has yet to create a significant problem. Yes, there have been some scares when vulnerabilities were discovered but, the internet has yet to collapse because of scary old SNMP.
The last thing I want to do is add yet another flaky process to my systems. It's pretty embarrassing when your monitoring agent brings down the server! Or your management console decides to poll it to death! SNMP is almost always already there and running, why not just leverage it?
P.S. Yes, I know that Munin can use SNMP but, that is a side note and not its primary operating mode.
Sounds similar to a project I'm working on called MonAMI, which aims to be more flexible, but is currently less mature.
| What, you were expecting
-O_O- +---- something witty?
Slashdotters and possible Wikipedia users:
Is there a MySQL -> PostgreSQL FAQ list out there? If not, would it be appropriate to make one in, say, Wikipedia? I have some ideas I wouldn't mind sharing with other users who "grew up" with MySQL and got used to all its particular features.
One word... Zabbix does it all...
Anyone know of a system that will track and display a graphical history of slashdot dupes? Preferably with 3d viz because I don't think anything else would cut it.
At Digg, we use Nagios to alert (with all the warts that go along with that). We use Cacti to monitor and graph. It's a relatively nice front-end to RRDtool.
I'm the MySQL DBA and I spent a long, long time (in concert with Peter Zaitsev of MySQL AB fame) tweaking the existing Cacti MySQL templates to add InnoDB graphing support (and a new MemcacheD set of graphing templates) and put them all over here: my mysqlUtils page.
I'd never heard of this pair of monitoring/alerting software before. Hopefully it improves on the state of monitoring and alerting, because I feel Nagios and Cacti (and Ganglia) leave a fair bit to be desired.
(By the way, that page includes a fair bit of other utilities, too, not just Cacti templates)
fifth sigma, inc.
Munin is nice because it's just so simple to install and configure it. We used to use some scripts I had written to track server statistics, but have entirely switched to munin. However, munin also has some "monitoring" capabilities, which I usually disable. I wish they just stuck to graphing and didn't try to add monitoring to munin.
Also, generating a lot of graphs can impact the system load. Not that you shouldn't use it, but I have definitely seen times where the system was getting hit particularly hard and munin seemed to be using up a lot of resourcesm at the same time. You probably don't want to install it on an already overloaded system...
Also, munin's design is such that if the system gets hit particularly hard, munin may not be able to run and capture this information. It doesn't lock itself into memory, or run at an escallated priority, so if the system is being thrashing particularly hard, you often will get empty samples in munin instead of getting pointers to whether the problem was due to high load, high disc activity, high swap activity, etc... So it's really better suited to long-term capacity planning more than tracking down short-term load problems.
As far as setting up service restarts, I totally agree that it's the lazy way out. The ideal solution is to track the problem to root cause and prevent it from happening. However, unlike the other respondant, I'm fine with that.
As a sys admin, your job is to keep the system and services available. A brain-dead restart of Apache or bind once a week is much preferable to leaving it down for hours from 3am to 9am and then trying to track down a bug in bind or some random PHP application.
So, by all means fix the real cause if possible. However, I recommend setting up automatic restarts with alerts going to appropriate people so you can keep an eye on when restarts happen. For one of my machines an apache restart happens about once every 2 weeks, and a bind restart happens once every other month. I'm not particularly inclined to spend significant resources debugging bind to prevent a 60 second outage of one of my two name servers once every 60 days. At least not today, I have other higher priority tasks to work on.
Sean
Anyone using Zabbix? (http://www.zabbix.com/)
Oh well, what the hell...
Munin is pretty damn nice... Ganglia is also pretty decent. Both allow you to write custom scripts as well which is very important.
If you have a bigger cluster you might want to check out Ganglia too. They use UDP for machine discovery.
This might be bad for some people who rent hardware.
KEvin
they lack originality when it comes to names. And too bad nobody else on slashturd recognized this.
Munin is a Distributed Shared Memory system that was developed in 1991 at Rice University by John Bennett, John Carter, and Willy Zwaenepoel. Munin was unique in that it used a release-consistency model of coherence.
Release-consistency attempts to increase performance by minimizing the amount of communication required to maintain consistency. Release-consistency works by buffering updates between synchronization points. In order to accomplish this, Munin requires that synchronization of processes be strictly controlled by using synchronization objects.
These synchronization objects are not accessed in the same way that shared data objects are accessed. Instead, Munin employs a synchronization manager, which is simply comprised of the Munin server on each node interacting with each other. In order for release-consistency to work, Munin distinguishes between acquire synchronization objects and release synchronization objects.
Munin also requires that all synchronizations are visible to the system as a whole. This way, a global ordering can be determined based on the totality of the partial-orderings between each synchronization. Munin provides queue based locks, and barriers for synchronization.
But, I am talking Computer Science now and I know I've already lost the myspace-esque crowd that slashturd has become. If by some chance you've made it this far in the post, and have mod points, do the right thing and mark it insightful. Anything less merely proves my point.
You only need such cruft for those unreliable pinko-commie backed *nix type sytsems. I mean, when is the last time you heard of a Windows server going down??
I don't always use unix-like operating systems; but when I do, I prefer FreeBSD.
from the I-forgotting-to-put-a-department dept.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
Munin is also a nanosatellite project in Sweden.
Munin the DSM project seems to be dead. Must not have been quite as earthshattering as you think. Mach seemed like a good idea at the time, too.
Remember Wolfpack? More DSM. More complexity. Not widely used.
Get off your soapbox before you fall and hurt yourself.
Remember that what's inside of you doesn't matter because nobody can see it.
Shouldn't that be Hugin and Munin?
One line blog. I hear that they're called Twitters now.
If you have multiple *NIX servers to monitor, check out collectd: http://collectd.org/
The client reports various system statistics to a central collection server, which dumps the information into RRD files. Because it's a push sort of thing, there's no hassling with opening ports or running additional network accessible services on the clients. (UCD-SNMP has always made me nervous.)
Monitoring a new machine is as simple as installing collectd and pointing it at your collectd server. The server automatically creates RRD files for the new host, and you're off and running. No configuration changes are required on the server. Make yourself a pre-configured package, and monitoring a new machine is a snap.
Looks like someone repackaged up HotSaNIC and rebranded it as their own. Graphs are IDENTICAL. I knew something looked mighty familiar when I saw them, because I've been running HotSaNIC on our servers for awhile now. Great stuff.
This tool beats the doors off of many I have tried to use. The setup is simple and the ability to monitor and graph the data is unmatched.
http://www.zabbix.org/
Give it a try.
You know, I find SNMP support on Linux is pretty weak. :(
We have several Windoze servers running SQL, IIS, and other services - all of which, we were able to find MIB's for and monitor via snmp very easily. We keep track an MANY aspects of these servers and log historically via our snmp clients.
We have recently been introducing many Linux servers and upon trying to monitor then in a similar fasion, I have found that several things just are not possible!
For exmaple, Apache is REALLY hard to monitor with SNMP. You have to custom compile with mod_snmp. Postfix... same situation, although I can't recall being able to monitor that at all with snmp.
I wish that were not the case. And ya, I know. A lot of this has to do with the specific package in question, not Linux, per se. Still, it reflects poorly on the OS for which is was designed.
At any rate, it has influenced our decisions about installing new Linux platforms
I wish it were not the case, but we WANT to use snmp!
SNMP is the answer to your question.
http://www.ipswitch.com/Products/WhatsUp/WhatsUP - Kinda pricy. I don't know, there may be an FOSS solution, but I have never seen one.
http://www.snmp-informant.com/SnmpInformant - The seller of this product is pretty lame, but the mibs (if even needed) work just fine.
http://www.paessler.com/prtg/Prtg - *GREAT* little app (Windoze version of MRTG... on steroids) for only $40 that collects SNMP data and presents it in graphs using it's own http server. *GREAT* little app!
-Ponga