Trying to Help a Troubled Network with Linux?
vmehta asks: "I was recently put in a situation where I am trying to help a troubled network with many students accessing it. There are issues with broadcast packets and random outages which seem to be plaguing the network. What tools and methods are the best practice when trying to use Linux and Open Source to analyze and fix a network?"
First step isn't to blunder in and migrate - the first step is to work out what's causing the outages etc. use ethereal or some other packet sniffer to establish where the broadcast floods are coming from - use nmap to find insecure hosts - also, investigate what kind of routers are being used, and what rules are being employed.
Basically, OSS/Linux are great, but don't rush in without establishing the issues first.
Almost any time I see this, its some random box flooding the network. Just go to your switches...the light that is on solid continuously will point you in the right direction.
No use fixing symptoms go after the root cause.
Whats next, "How do I produce PDF files, using Linux and Open Source?" "How can I leverage Open Source to surf the web?"
Christ, this is like the late 90's, when everything suddenly had "e" in front of it. Dude, get Ethereal, slap it on any Windows box, and be done. No need to get nerdy with Linux. If you know enough that its broadcast traffic, you're halfway there.
I want to delete my account but Slashdot doesn't allow it.
The first step in troubleshooting is in knowing the network topology. How are network segments separated? How are the connected? Where are routers, hubs, switches, etc.? Which switches are managed, and how are the VLANs set up on them? Where are the DHCP servers, and what do they serve? Where are all your network drops?
Do your network segments have multiple subnets attached to them?
Is everything subnetted properly?
The first set of questions are ones YOU should be able to answer. After all, it's YOUR network, and YOU should know how it's set up. The last two are harder to deal with, because these settings may be on computers not in your control.
Answer the first questions first, then when you are looking at packet traces, TCP/IP dumps, logs, etc. and you see a problem, you'll have a better idea where the problem is physically located, saving much time and energy.
And then there's the "dumb questions" I shouldn't have to ask: Do you have a loop? Are your cables wired to T568A or T568B standards? Are all your cables in good repair?
Give me my freedom, and I'll take care of my own security, thank you.
Without any more information, you've got a bad NIC, almost certainly. Look on the switch for the port whose light is always on. As you've describe it, software has almost nothing to do with it. This is a NIC, or a bad switch, or bad cabling, or something.
"He who would learn astronomy, and other recondite arts, let him go elsewhere. " -- John Calvin, commenting on Genesis 1
Step 1) Map the network both logically (which networks, what is the routing, etc.) and physically... the "tug test". Label everything, and put it all in a spreadsheet. Tools are nmap, pen and paper, and a label printer. Access to the routers, or being friendly the the router admin is a must.
Step 2) Isolate the problem protocols and hosts. Be on the lookout for appletalk, IPX, or old netbios. All very chatty protocols. Look for old hubs and replace them with switches. Look for comprimised boxen. Try to VLAN things logically (by department, or usage which ever is best for the environment). Tools are snort, ethereal, ntop, and syslog (any managed switches should be sending to a syslog server (I've used syslog-ng))
Step 3) Trend as much as you can. Even before the network is cleaned up, start to collect statistics from the switches, and/or hosts on your network. Any gateways should be monitored as well. This will let you see if there are problems corelated to a particular time of day, if your're going over your bandwidth etc. Tools are MRTG, or for more in depth try Cacti http://www.cacti.net/
There is much more after you get to this point, but people will be much happier the faster you get here.
Good luck
You're attempting to help diagnose a (presumably) large network. Very honourable, but attempting to do this gung-ho with a few responses from slashdot is very silly.
Grab a consultant from a local small Linux shop for a few days. Someone with good knowledge about system/network architecture.
Get them to poke around on your network. Provide all documentation you have available.
After the first day, you should have all the information necessary to write up a document regarding your existing issues. Make notes while he's using tools to investigate. From there you work with the consultant to come up with a separate document for resolutions with a criticality rating.
From there, you want systems in place to monitor the health of your network. Have a chat to him about it, but I'd be inclined to build a solution which was centered around using Nagios.
While consultants can (and frequently do) suck when you come to specifics, they are a valuable resource for pointing you in the right direction. And experience counts! They've done this stuff before, they know the pitfalls and proven solutions.
Low-tech is often a faster and more efficient way to find these sorts of problems. For surveillance and diagnosis, I recommend walking around and watching over students' shoulders. For corrective measures, a couple of taps with a ball-peen hammer usually suffices.
--
Twoflower
If you want to use a PC running Ethereal to monitor 802.11 traffic to or from other machines, rather than using Ethereal only to look at traffic to and from the machine on which you're running Ethereal, you should seriously consider running it on a recent version of Linux or of one of the free-software BSDs, rather than on Windows.
Go on, mod me 'insightfull' or mod me 'flamebait', it's one or the other.
"A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
More tools than you could learn in a reasonable timeframe can be found here: http://www.insecure.org/tools.html
I would have posted sooner, but T-Mobile's data coverage has been spotty since Wilma hit. Still no power or fuel, but at least I can can get my geek-fix now.
- Posted via Danger HipTop2 / T-Mobile Sidek!ck II -
Troubleshooting a network is a matter of experience, not of some particular tools. But these things help:
;-) Let ethereal make statistics over the traffic.
...
* Put you box on the monitor/mirror/analysing port of the switch an read the traffic with tcpdump/tethereal/ethereal (If you just want to check the broadcasts, it does not have to be a monitoring port). Edit the packet filter expression until you do not see the legal/uninteresting traffic anymore but only the suspects. (They are students? Have fun to filter all the p2p traffic
* Watch out for ICMP errors, especially ICMP-redirects. Watch out for TCP-resets. Watch out for fragments. Watch out for malicious Spanning-Tree packets. Watch for SMTP to many IPs (spamming trojans), IRC (zombies), weird packets eg. fragmented UDP (zombies attacking a target)
* Check the MAC adresses in the etherframe-header ('tcpdump -e'): are they constant? If there are packets IP_AIP_B, are the accordings MACs really MAC_AMAC_B or MAC_A-->MAC_B and MAC_B-->MAC_C instead?
* Install an arpwatcher. Stealing the default-gateway's MAC is an effective DoS attack on a network.
* Put 2 NICs into a fast linux box, bridge ('brctl') them together, put this linuxbridge in front of the default-gateway. Dump again. Install a snort on it and let it see the traffic - what does the snort log say?
* Do the switches have the feature to log to a remote syslog deamon? Do so and read those logs! Check all the snmp-variables on the switches, especially the "errors". Read the logs of the default-gateway.
* Watch the amount of traffic (snmpget the port-counters of the switches and make mrtg-graphs of the results). Maybe the problem only strikes if some switch ports are under high load?
* Scan the network with nessus. Maybe you'll find some bindshells.
*
Hope this helps.
g.
I have a better idea. Get Linux and slap it on all your windows boxes and be done. For good.
The difference between Canada and the USA is that in Canada healthcare is a right and gun ownership is a privilege.