Slashdot Mirror


Managing Huge Networks with Open Source Tools?

An anonymous reader asks: "I work for a large multinational firm with a network that spans the globe and am responsible for evaluating the software we use to monitor our network. Our department has a lot of money, and we're usually willing and able to spend it on good commercial software. Recently though, I find myself evaluating and approving more and more open source software. We are actually in the process of replacing some of our commercial tools with software like Nagios, LooperNG and syslog-ng. We are also evaluating MRTG, RRDTool, ntop and a host of other tools. The problem is that there's just too many of them, most of which are not maintained anymore. Here's my question: What other open-source tools do you use to monitor your networks? I not just looking for names, but how long you've been using them for, how easy / hard is it to administer and I guess how well it scales as the network grows. More importantly, are their respective projects still alive and kicking?"

16 of 45 comments (clear)

  1. JFFNMS by szysz · · Score: 5, Informative


    I created and use JFFNMS (Just for Fun Network Management System) to monitor my network at work.
    Its also used by a lot of people to monitor big, and medium networks.
    Its fully mantained, and customizable.

    --
    - Smells Like Open Source Code
  2. Experience with Nagios by DDumitru · · Score: 2, Informative

    We used to use Nagios here to monitor a large number of services on a large number of servers. We eventually agandoned it and replaced our "is the server up" monitoring with simple scripts that call fping. The problem with nagios is that the process model starts to fall apart at several hundred monitored servers/services and we really did not want to dedicate a farm to monitoring.

    1. Re:Experience with Nagios by msuzio · · Score: 2, Informative

      Hmm... we've just started using Nagios at my site, and it seems to be doing quite well at that scale.

      We *are* using 3 separate machines as data collectors, but that was also done so we could continue to grow in the future. We have requirements to monitor many different OS and platform combinations, as well as several services on those platforms. In all our searching (about a 6 month process), only Nagios seemed to fit the bill for that. So, if you can dedicate some modest hardware to it, Nagios seems to do quite well.

    2. Re:Experience with Nagios by Anonymous Coward · · Score: 1, Informative

      I have 230 active checks in nagios, and it really works great. You just need to adjust max_concurrent_checks, service_reaper_frequency in nagios.cfg after a while.

      My settings are:
      max_concurrent_checks=20
      service_reaper_fre quency=2

      This is on a 2Ghz P3 2/ 512MB.

  3. Gkrellm by ptaff · · Score: 4, Informative

    A slick tool is Gkrellm, which has real-time graphical status for memory/temperatures/net/disk. Can be run in "server mode" (so no need for X on the monitored server). Lots of plugins are also available, from SNMP to ping tools. The project is well alive. Don't know if it floats your boat, though, as you're mentioning huge networks.

    Feel ready to own one or many Tux stickers?

  4. RRFW by Blaze74 · · Score: 3, Informative

    rrfw.sf.net is a nice gem of an app. It can automatically discover a lot of high end snmp equipment, and set itself up for monitoring that equipment. It's a mix of XML and Perl, and is really easy to add support for more hardware.

  5. NetFlow. by Mordant · · Score: 3, Informative
  6. Forget MRTG by Judg3 · · Score: 4, Informative

    Well, don't forget it as much as get something better, Cacti. Cacti is a frontend to MRTG & RRDTOOL and offers a lot of awesome improvements, such as a web frontend to add devices, device "profiles" to enable a common monitoring set for things such as Cisco routers, servers, etc and a whole lot more. We used MRTG here at our (Windows only) network, and I'm slowly moving it all over to Cacti for all of the above plus a lot more.

    --
    Looking for hardware (Currently need: Large Etch-a-Sketch) Have one? See my journal!
    1. Re:Forget MRTG by Judg3 · · Score: 2, Informative

      Oi, hate to reply to myself, but another good thing about cacti is it's speed. It has a compiled version of the daemon which claims it can do over 50 checks a second - in a big network it's worth it.

      --
      Looking for hardware (Currently need: Large Etch-a-Sketch) Have one? See my journal!
  7. Internode NodeMap by sr180 · · Score: 4, Informative
    As developed and used by Australian National ISP Internode. They developed it and gave it to the community... Kudos to them: NodeMap

    --
    In Soviet Russia the insensitive clod is YOU!
  8. Nagios and RRDTool by Karora · · Score: 4, Informative
    We're using Nagios (multiple redundant geographically diverse installations) and RRDTool fairly successfully, but that's for maybe 200 machines, tops.

    From looking at what we've achieved with these I would say that you will need to be careful trying to scale them to large networks. They can start huge numbers of processes each minute, when monitoring many servers.

    It depends what you're monitoring, of course - in our case we are monitoring maybe 20-30 operational parameters on each server. If we were only monitoring a single parameter then we could probably look at around 1-2000 machines from a single P4-based monitoring box, without any real problems. Using a 2.6 kernel on the monitoring box would also dramatically increase the scalability of it all.

    Scalability issues bite similarly with rrdtool: numbers of parameters monitored per server can ramp the load on the monitoring machine(s) quite quickly. Again that is process load, not CPU load though, and a 2.6 kernel will be significantly better in this area. It can also be resolved by scripting the collection process better - not just running some collect-the-statistics routine from cron every minute.

    If you're looking at monitoring 1000's of systems though, maybe you have enough of a budget to be able to plan around these issues.

    I'm sure that ultimately all monitoring apps run into issues with how many (parameters * servers) each monitoring system can monitor too.

    --

    ...heellpppp! I've been captured by little green penguins!
  9. LooperNG with netcool by Anonymous Coward · · Score: 1, Informative

    We are use looperng to do most of our snmp collection and have it integrated to netcool which is our commercial platform. we were using it since the earlir days of looper (not ng).

    It has scaled very well since now we have almost four looper collection stations collecting traps from over 6000 elements.

    Netcool is our main platform but we also use NNM and ehealth.

    We tried nagios when it was known as netsaint and found it to have poor performance at the time.

  10. http://www.opensims.org by elsie_moo · · Score: 2, Informative

    http://www.opensims.org

  11. MRTG is pretty standard by Anonymous Coward · · Score: 1, Informative

    Though alternatives do exist.

    If you've got a cluster or otherwise "clumpy" network, ganglia is the ideal end-user-visible monitoring tool - lots of pretty and informative graphs, multicast based so no heavy network load, no particularly sensitive information unless you choose to reveal it.

    For filesystem security, samhain is mature, secure and imho very nice, though the good web-frontend is non-free.

    For network security, nessus scan on a daily basis.

  12. F/OSS Tools by bastardadmin · · Score: 3, Informative

    Not sure how helpful this will be in huge environments, I live in the small to midsize market, but here are some tools that I have found useful in the past:
    Not exactly a monitoring tool, but definitely the most versatile all around auditor I have ever found: Nessus.
    Ettercap is a good sniffer.
    The MRTG tool has been a godsend when I have had managed devices to deal with, and I have heard very good things about the RRD tool and Cacti.
    Tripwire is freely available for Linux and the BSDs, though the Win32 version has not been open-sourced.
    One tool I have not been able to find in F/OSS is a Windows event log monitor (though believe me I'm still looking).

  13. Rancid is your friend by gclef · · Score: 2, Informative

    Seriously, it's hugely useful. It's very nice to be able to show management that you not only have a config backup system for your network devices, but your backup system is also doubling as a change control system. It's at http://www.shrubbery.net/rancid . I tend to use something like webCVS with it, to let folks browse through the CVS configs (you will, of course, want to use authentication to restrict access to webcvs).