Server Monitoring With Munin And Monit
hausmasta writes "In this article I will describe how to monitor your server with munin and monit. munin produces nifty little graphics about nearly every aspect of your server (load average, memory usage, CPU usage, MySQL throughput, eth0 traffic, etc.) without much configuration, whereas monit checks the availability of services like Apache, MySQL, Postfix and takes the appropriate action such as a restart if it finds a service is not behaving as expected. The combination of the two gives you full monitoring: graphics that lets you recognize current or upcoming problems (like "We need a bigger server soon, our load average is increasing rapidly."), and a watchdog that ensures the availability of the monitored services."
He was simply playing on the "But does it run on Linux?" post that appears in tons of threads. He doesn't need to RTFA. :)
Registered Linux user #421033
Munin isn't at all different from Cacti, really, except that Cacti is 100% web based and perhaps a bit more mature (I use Cacti and like it a lot more than at least 4-5 other similar products out there). Cacti won't do service-testing though; maybe this is a good walkthrough for people who just want something up and running in 15 minutes (I wouldn't know, I'm not inclined to read the whole thing since a cursory glance shows there's nothing here that I don't have a running alternative for already).
Don't forget about the big brother clone, hobbit.
SF.net at: http://hobbitmon.sourceforge.net/
Live example at: http://www.hswn.dk/hobbit/
Performance monitor is one of the best utilities on windows. It is very detailed, and most MS apps have additional counters for other detailed views. It also does remote logging, basic graphing, alerts etc.
The one thing that annoys me about them is that, out of the box, they don't have much configured, and to install/configure stuff, you have to jump through a lot of hoops.
In the case of cacti, it's mostly through a web-based GUI, which is OK if you have one server with one thing you want to measure, say %CPU usage, that you want to measure, but if you want to do it for a server farm or even a couple machines, it's a pain in the butt. They do have a templating system, but you still have to do a lot through the GUI. I've posted on their forums before to this effect, and they have suggestions for making changes like this en masse, but again, it doesn't work out of the box. Bottom line, the designers of cacti seem to be focused on the Web GUI, which is kinda nice for newbies, but a huge pain for people like me that like to script things.
It's the same thing with Nagios, although at least they let you change text files for the settings. Although the number (about 20) of files is reflective of how feature rich it is, it also makes it a hassle to set up. Here's an article at samag.com that illustrates the process you need to go through... imagine this for a couple hundred servers, and you can see how arduous setting up nagios could be.
So, although munin may not be as mature and well known as cacti, and monit not as popular as nagios, I think they're still worth trying out..
I dont know anything about Munin, but the guys that wrote Munin absolutely rock! The company is Linpro, and they've been doing Linux and open source for over 10 years now. They do hosted management, remote management, development and Linux and OSS training. They also begun to package Linux and OSS based solutions for groupware, voip, management etc.
The point is, they've been doing server management for years (using Nagios) and wrote Munin to -complement- it, not compete with it.
Check them out, they absolutely rock..
Add OpenNMS to the list of stuff that this duplicates or overlaps with. Not that anyone in OSS needs permission to reinvent the wheel. You've got an itch - you scratch as it pleases you.
Since we're on the subject, others have mentioned Nagios and MRTG of course. Be sure to check out JFFNMS (Just for fun). Horrible name for what it does, since it's quite powerful. For Big Brother users, I would recommend checking out Hobbit Monitor as a replacement of the server portion. It's compatible with the BB client, but has far more features and includes some basic MRTG graphs.
I have yet to find an all in one integrated open source solution for monitoring (cpu, processes, port reachability), alerts (email, sms, etc). The closest I've found is JFFNMS, but writing alert rules and such is difficult to say the least.
While on the subject, if it's not too terribly off-topic, what do people use to bill based on network usage (MRTG, RRD). Both claim that you should NOT bill off of that information, but I have yet to find any other open source solution.
--falz
This sounds a lot like Nagios. From TFA I couldn't see anything Munin and Monit would do that you can't do on Nagios with a few plugins. Just a plug - Nagios is beautiful, it makes nice graphical representations of load, hits, throughput, and about anything else you can think of.
At Digg, we use Nagios to alert (with all the warts that go along with that). We use Cacti to monitor and graph. It's a relatively nice front-end to RRDtool.
I'm the MySQL DBA and I spent a long, long time (in concert with Peter Zaitsev of MySQL AB fame) tweaking the existing Cacti MySQL templates to add InnoDB graphing support (and a new MemcacheD set of graphing templates) and put them all over here: my mysqlUtils page.
I'd never heard of this pair of monitoring/alerting software before. Hopefully it improves on the state of monitoring and alerting, because I feel Nagios and Cacti (and Ganglia) leave a fair bit to be desired.
(By the way, that page includes a fair bit of other utilities, too, not just Cacti templates)
fifth sigma, inc.
Munin is nice because it's just so simple to install and configure it. We used to use some scripts I had written to track server statistics, but have entirely switched to munin. However, munin also has some "monitoring" capabilities, which I usually disable. I wish they just stuck to graphing and didn't try to add monitoring to munin.
Also, generating a lot of graphs can impact the system load. Not that you shouldn't use it, but I have definitely seen times where the system was getting hit particularly hard and munin seemed to be using up a lot of resourcesm at the same time. You probably don't want to install it on an already overloaded system...
Also, munin's design is such that if the system gets hit particularly hard, munin may not be able to run and capture this information. It doesn't lock itself into memory, or run at an escallated priority, so if the system is being thrashing particularly hard, you often will get empty samples in munin instead of getting pointers to whether the problem was due to high load, high disc activity, high swap activity, etc... So it's really better suited to long-term capacity planning more than tracking down short-term load problems.
As far as setting up service restarts, I totally agree that it's the lazy way out. The ideal solution is to track the problem to root cause and prevent it from happening. However, unlike the other respondant, I'm fine with that.
As a sys admin, your job is to keep the system and services available. A brain-dead restart of Apache or bind once a week is much preferable to leaving it down for hours from 3am to 9am and then trying to track down a bug in bind or some random PHP application.
So, by all means fix the real cause if possible. However, I recommend setting up automatic restarts with alerts going to appropriate people so you can keep an eye on when restarts happen. For one of my machines an apache restart happens about once every 2 weeks, and a bind restart happens once every other month. I'm not particularly inclined to spend significant resources debugging bind to prevent a 60 second outage of one of my two name servers once every 60 days. At least not today, I have other higher priority tasks to work on.
Sean