Network Monitoring Options?
Nom du Keyboard asks: "We have a LAN network of 7 servers and about 400 PCs. Every so often I'll notice immense slowdowns, from minutes to occasional delays of a couple hours, while getting data from various servers, and it happens from more than just my PC. So far we haven't had any way of determining if a server has suddenly gotten tied up, or if there is some failure in the communications backbone. Without a lot of money to spend on this (I think it's more important than others right now), what cheap or free monitoring options are there available that can map and isolate problems in a network of this size?"
Some of the ones I have more recent experience with. All of these require some reading and planning before you set them up.
OpenNMS - Probably the most trouble-free NMS I've found so far. No, not "trouble-free". But the closest to it.
Nagios - The most flexible, but also the biggest royal pain in the ass to set up & maintain. Almost infinitely scalable, though, if you are willing to take the time to write some perl scripts to automate most administrative tasks and divide the monitoring work up (several "slave" hosts can harvest monitoring data for a subset of your network and push it to your central Nagios server which greatly lessens the load on your main monitoring server). Some really great monitoring possibilities are out there if you look into NRPE with Nagios.
OpManager - We bought this commercial solution at my last job. Great for monitoring Windows servers. A real pain in the ass to monitor anything else with any level of sophistication. It also has some fatal bugs that cause it to quietly orphan nodes if it misses a scheduled poll!
Isn't this precisely the job for ping? Just write an script to warn you when the ping to some server is greater than it should be.
--
Superb hosting 4800MB Storage, 120GB bandwidth, ssh, $7.95
Cacti? ettercap/ethereal/whatever? Ran snort to see what kind of traffic is on your network? You left out an awful lot of information. I'm assuming you are running switches, but who knows? You never said the speed of your network either. Whether this is all in one building or spread across many, with routers in the middle etc... Without knowing any details I will suggest Cacti, and leave it at that.
/* oops I accidentally made a comment, sorry */
Also, set up a mirror port on your switch and run "etherape" on a machine connected to that port. You'll get a real-time graphical representation of where the traffic is going on your network, and some indications of what kind of traffic you're looking at.
Then NTOP http://www.ntop.org/ is your best bet, this breaks down all traffic on your network and should allow you to see who's being naughty and who's being nice.
Sig
what cheap or free monitoring options are there available . . .
:) ). If you want to get fancy you can buy span or rspan capable switches which will let you mirror traffic from individual ports or Vlans to a single management station port (in which case you can just use a desktop).
If the network is the issue, the cheapest and simplest is a good laptop running Ethereal or Snort. Also pick up (or scrounge up) a dumb hub and if possible a fiber tap, since you're probably running in a mixed-media switched infrastructure (or maybe you're not - hence the problems
This should go withot saying, but those packet captures will be useless unless you know WHERE each mac address is on the network. That said:
1) maintain reliable L1/L2/L3 mappings
2) Tag both ends of long cables and make sure all wallports are numbered, and
3) beat the shit out of anyone who brings personal equipment in and plugs it in. It screws up your records and is probably less secure.
"Tied up"? "Failure in the communications backbone"?
My suggestion is to hire somebody who knows things about computers. Your language suggests you are a layperson. Not being a computer expert isn't bad, but not being a computer expert and trying to do the job of a computer expert is.
Besides the regular ethereal suggestions if you're trying to do something on the cheap consider installing a lightweight Snort on each of the clients. If something is up it's bound to at least trigger some sort of Snort log. And it'll cluster around your incidents. Although, hands down, Ethereal on a span port or network tap is a better option. -Pk
ntop
Nagios
MRTG
Cacti
Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
A bit off-topic, but I'm curious if anyone can answer this for me:
At work our network setup recently changed from static-IP based to DHCP based. I run a debian machine, and not all that much seems different for me, just that the machine gets its info from a server at bootup.
However, running various network sniffing tools shows that all the windows machines on the network have become insanely chatty -- every windows machine seems to be constantly sending out packets, regardless of whether they're actually doing anything or not. Given that there are hundreds of windows machines on the (ethernet) network, this means A Lot of Packets.
I find this quite annoying because it horribly clogs up the results if I run some tool to look at network activity (usually to see if something's wrong with one of my machines). I don't know if it actually degrades the network performance appreciably (the packet size seems to be fairly small), but I assume that having zillions of pointless packets getting sent can't be a good thing for performance on an ethernet...
Anyone know WTF those machines are doing? Is this some "feature" gone berzerk?
[I don't recall windows machines doing this in the past; although the change seems to co-incide with our move to DHCP, I suppose maybe it could also be due to people upgrading to newer versions of windows.]
We live, as we dream -- alone....
What does your network look like? What type of gear is in the backbone? Are the switches managed? Is the network flat or is it segmented into VLANs?
It would seem like one of the first thing to do is to look at the status lights or the management console to see what ports are loaded down with network activity during slow-downs...
Perhaps it's bad network equipment? Are the switches maintaining uptime?
Of course, all the network monitoring software already mentioned here would help diagnose the details of the network traffic.
If your intent is to detect network troubles, I recommend using some system like Cricket or MRTG to graph the interfaces as well as the Errors on the interfaces within the network. This may require some finesse in setting up for the first time.
Aside from that, Sysmon was written primarily to monitor hosts and the host based services, but was morphed also to monitoring networks. It may fit your needs as you can set up SNMP thresholds of network errors and other things.
If you want to be super-lazy, I would download the trial of Intermapper it may be able to find these troubles for you if you can SNMP poll the devices and has auto-discovery. I've not used it in awhile, so hopefully it has support for the platforms that you are using.
Thats a pretty vague question, and you didn't provide enough information to really answer it right, but here's some recommendations.
Assuming you have managed switches, collecting per-port data with SNMP is a great first start. I think Cricket (http://cricket.sourceforge.net/ is a great system for collecting this data, but I prefer Drraw (http://web.taranis.org/drraw) for graphing the data. For an example of the power available by combining these two tools, see http://stats.net.cmu.edu/
Once you've got that, install Net-SNMP's snmpd on your host and collect & graph interface stats for your unix servers as well. If you don't have managed switches this may be good enough on its own. You can also graph load average, memory usage, etc.
For actually analyzing your network traffic I suggest Argus, http://www.qosient.com/argus. It's a network traffic auditing tool, think of it as tcpdump for flows instead of packets, or as netflow on crack. You can easily record complete flow statistics for your entire network for later perusal. All you need is a network topology that allows you to sniff most/all of the traffic. A span port on a switch is usually sufficient. If you've already got a snort server and it has enough processing capacity you can just run argus on the same host.
Speaking of which, if you don't have a snort server you probably want one. Nessus as well.
For monitoring/alerting I recommend Mon (http://www.kernel.org/software/mon), but then I'm biased.
And once you've tracked down what machine(s) are causing the problem, do you have records of which machines belong to which users? (Insert plug here for CMU's NetReg system for management of DNS and DHCP, which provides that. (http://www.net.cmu.edu/netreg) I'm biased on this one as well...)
Oh, and my money would be on poorly timed overlapping network backups, saturating a switch uplink. Just a guess...
I take it you don't play well with others, but you play well with money.
-I like my women like I like my tea: green-
ssh and top.
+++OK ATH
In the short-term you need to break out a sniffer. A few people suggested this. What most of the people are suggesting are service/service monitor tools. These really won't help your problem. I use many of these myself including Nagios. In fact I'm getting another page from Nagios right now. I use it heavily at multiple customers' sites. These types of tools will help you find out when a particular service on a given server goes down but they're not going to help much in troubleshooting this problem.
What does your network troubleshooting skillset look like? Are you familiar with sniffers (network protocol analyzers)? Can you handle Linux? 95% of the free tools that you'll get recommendations on are going to need a Linux, Solaris, AIX, etc install to run on. Some of these tools have Windows ports; some can run in Cygwin. The most basic tool I'm going to recommend is tcpdump. Ince you've mastered tcpdump then you can make the switch to CLI snort. Ntop is also a very handy utility in that it gives you more historical data than most other tools. You need to position your sniffer in a place on the network where it can see the most traffic. If you're switches aren't management then you can't doing any port mirroring (spanning, monitoring, etc. whatever terminology your vendor uses). Placing a hub at a critical point will severely hurt your network throughput and may mask the problem. Network taps aren't too expensive. This would likely be your best bet.
Really you haven't given enough information for us to give you a solid recommendation. What does your physical network topology look like? What does you IP network topology look like? What brand(s) and model(s) of switches (or hubs) do you have? Are you currently graphing network I/O on your critical network or server links? Give us some more information and we can give you better suggestions. I'm really reserving any suggestions until I know more about your network. Chime in and I'll try to help.
Try NetMRG (http://www.netmrg.net/).
echo "getuid(){return 0;}" > e.c; gcc -shared -o e.so e.c; LD_PRELOAD=./e.so sh
Are all servers affected? Have you bothered measuring the load on your servers? The problem might not have anything to do with the network.
My first guess would be that all machines are set to take their Netbios setting from the DHCP server, which by default is on. Netbios is very chatty and useless, unless you have some 16bit network apps that need it. I would look there first.
I'm not drunk, I just have a speech impediment. And a stomach virus. And an inner ear infection.
thank you
7 servers and 400 PCs sounds like a small shop, one prone to growth-by-accretion. Are you daisychaining hubs? Breaking the 5-4-3 rule anywhere? Using crappy cabling, thats at (or over) the distance limit? Are you all on a switch or switches? Do they suck? Try some network partitioning, if you can swing it, drop a PC-based router in (Linux, Win2K, whatever) and DHCP all the PCs off onto a separate subnet.
Are all the servers Windows-based? Set up 1 master Perfmon screen with NIC and CPU usage stats for each server - manually correlate the slowdowns with what you see on Perfmon.
Just some ideas.
I want to delete my account but Slashdot doesn't allow it.
If you need more complex system/router data, Cacti is a really good way to centralize the collection of SNMP data.
You are not the customer.
Start -> Programs -> Administrative Tools -> Performance
/. as a means to make yourself appear more competent at support than you actually are, here's what you do with it. Place counter logs on servers experiencing poor performance. Observe any thresholds that are exceeded that shouldn't (poor disk, cpu, memory, network performance). Upgrade/fix deficient performers. If you don't see any problems, it is likely an issue with network infrastructure (But don't run straight to blaming the network if you haven't fully investigated server performance).
If you don't know how this tool works, please resign and hire a high school MCSE who does. But just in case you do want to use
I don't mean to flame but monitoring performance is not complicated and certainly not something that should qualify for an Ask Slashdot.
What will we see next on Ask Slashdot?
"I am an Administrator for a medium sized busines with 100 workstations and 8 servers. We have a new employee starting next week, and I have been told this employee does not wish to use an existing user account, instead management wants the new starter to have an account with her own name on it. I have read through all the manuals but I want to know, is it possible to have a new user account on the network? Management don't want to spend any more money on licenses so this should be a cheap solution."
"I am running a local area network with about 10 desktops and 2 servers. Suddenly last week all the computers stopped communicating. I looked at the core network switch but it appears normal, although all the lights have turned off. Management would like this fixed as soon as possible but they are on a tight budget. Are there any open source solutions, or any readers who have seen similar problems?"
I am government man, come from the government. The government has sent me. -- G.I.R.