Slashdot Mirror


Trying to Help a Troubled Network with Linux?

vmehta asks: "I was recently put in a situation where I am trying to help a troubled network with many students accessing it. There are issues with broadcast packets and random outages which seem to be plaguing the network. What tools and methods are the best practice when trying to use Linux and Open Source to analyze and fix a network?"

18 of 68 comments (clear)

  1. Assess the problem by madaxe42 · · Score: 4, Insightful

    First step isn't to blunder in and migrate - the first step is to work out what's causing the outages etc. use ethereal or some other packet sniffer to establish where the broadcast floods are coming from - use nmap to find insecure hosts - also, investigate what kind of routers are being used, and what rules are being employed.

    Basically, OSS/Linux are great, but don't rush in without establishing the issues first.

    1. Re:Assess the problem by tverbeek · · Score: 4, Informative
      Did you read the part of the question where he explained that he was looking for tools to analyze and fix the problem? And did you notice that he didn't mention or imply any kind of migration?

      Here's an idea: Before you blunder in with an answer, the first step is to work out what the question is. :)

      --
      http://alternatives.rzero.com/
    2. Re:Assess the problem by moro_666 · · Score: 2, Informative

      You can attempt a scan&sniff at first, plenty of stuff to choose from,
      but if your 100mbit network is being overhauled, it's quite difficult
      to isolate single responsible instances.

      I guess that probably you will end up doing that :

      1) get rid of cheap hubs(made in paiwan) and get some real network switches in place, like those from SMC. Having an old buggy hub talking to several cheap NICs in several machines ends up in massive packet collision, resulting a network that doesnt carry much but is totally jammed.

      2) scan all those windows machines attached to the network, they are probably bloated with viruses and spyware. sniff on the windows machines for a 24h period, some viruses/spywares only work at certain hours and a midday scan doesnt show anything. if the windows machines cant be healed from their stuff, unplug them (yes as simple as that).

      3) unless all above helps, isolate subnets and firewall them all creating according rules into inner house firewalls, so that the flooding would stop.

      4) Most important, tell the windows users to scan their machines regulary, unless they do that, you will have to start all the above the day after tomorrow again.

      I recently found a virus in a windows box in my own office sending me the "worm" emails ... luckily thunderbird on my ubuntu laptop didnt quite figure out what to do with "report.pif" files ... Anyway the biggest problem was how to draw the "big red picture" for the whole office that they *have* to scan their machines all the time. I found out that 66% of our machines were infected by worms, some of them even by multiple worms. After the cleanout the network performance increased dramatically and i could even do non lagging X sessions from my home to the office (which i couldnt do before).

      Long story short: Replace old crap, scan & sniff, isolate subnets, unplug m$ powered machines.

      --

      I'd tell you the chances of this story being a dupe, but you wouldn't like it.
    3. Re:Assess the problem by nocomment · · Score: 2, Interesting

      I'm replying to this comment but my response is directed toward the OP

      I agree with madaxe42, First things first. Diagram the network. Figure out where hubs and switches are. Figure out where the firewalls are. Figure out how packets traverse the network(s). If it's a single network with a single point of access to the internet this should be (relatively) easy. If you are looking to save the day with linux what you could do is set the switches to use "port mirrors" to capture every packet on the network to snort DB. Read up on creating snort rules and you can capture literally everything that goes on. Also run samba with no password access and log everything to see what ip's are delivering viruses to your machine. Turn on snmp at every gateway and graph the network traffic. This should tell you what segments are most prone to excessive traffic (across networks).

      Chances are with this combo you will find most virus, and especially the p2p abusers.

      I've had to do this before and this works for me.

      One of our remote sites has a T1 to the internet but also needs to access our financial system. It wasn't working. In theory they had plenty of bandwidth but the system was unstable they were able to connect...sometimes, but once they did it was almost unusable, and we're jsut talking about lightweight telnet (over a VPN) session. I initaially started with an mrtg graph on the router that is the last hop to the Internet. I saw normal traffic interupted by high periods of max bandwidth.

      I've seen this type of pattern before..."Kazaa" I thought. I set up one of the company laptops with snort and mandrake linux and sent it down there with instructions to put it on a switch on the same network as the router with port mirroring so I could figure out which network it was coming from. Once I knew that I repeated the process and had them gradually move the laptop down the chain Until it was on the same subnet as the offender. 2 days later I had the IP and a list of mp3s that were being shared out and downlaoded from that machine.

      All the while the VP of that location was harping on us that we needed to spring for a second T1 just to support their 12 users running telnet.

      I returned to them with the information I had gathered and they responded with a "I know who that is". the traffic stopped immediatley and they have been runnign fine for over a year now with no hiccups.

      Just think logically and you will have it figured out in pretty short order.

      --
      /* oops I accidentally made a comment, sorry */
      /* http://allyourbasearebelongto.us */
  2. You are infected with viruses most likely by Anonymous+Crowhead · · Score: 5, Insightful

    Almost any time I see this, its some random box flooding the network. Just go to your switches...the light that is on solid continuously will point you in the right direction.

  3. Shoot the students by Usquebaugh · · Score: 4, Funny

    No use fixing symptoms go after the root cause.

  4. OSS? Linux? WHY? by Gothmolly · · Score: 4, Interesting

    Whats next, "How do I produce PDF files, using Linux and Open Source?" "How can I leverage Open Source to surf the web?"

    Christ, this is like the late 90's, when everything suddenly had "e" in front of it. Dude, get Ethereal, slap it on any Windows box, and be done. No need to get nerdy with Linux. If you know enough that its broadcast traffic, you're halfway there.

    --
    I want to delete my account but Slashdot doesn't allow it.
  5. Know your network. Document it! by Webmoth · · Score: 4, Insightful

    The first step in troubleshooting is in knowing the network topology. How are network segments separated? How are the connected? Where are routers, hubs, switches, etc.? Which switches are managed, and how are the VLANs set up on them? Where are the DHCP servers, and what do they serve? Where are all your network drops?

    Do your network segments have multiple subnets attached to them?

    Is everything subnetted properly?

    The first set of questions are ones YOU should be able to answer. After all, it's YOUR network, and YOU should know how it's set up. The last two are harder to deal with, because these settings may be on computers not in your control.

    Answer the first questions first, then when you are looking at packet traces, TCP/IP dumps, logs, etc. and you see a problem, you'll have a better idea where the problem is physically located, saving much time and energy.

    And then there's the "dumb questions" I shouldn't have to ask: Do you have a loop? Are your cables wired to T568A or T568B standards? Are all your cables in good repair?

    --
    Give me my freedom, and I'll take care of my own security, thank you.
  6. It's a NIC by Fished · · Score: 3, Insightful

    Without any more information, you've got a bad NIC, almost certainly. Look on the switch for the port whose light is always on. As you've describe it, software has almost nothing to do with it. This is a NIC, or a bad switch, or bad cabling, or something.

    --
    "He who would learn astronomy, and other recondite arts, let him go elsewhere. " -- John Calvin, commenting on Genesis 1
  7. map, isolate, trend by grattwood · · Score: 5, Informative

    Step 1) Map the network both logically (which networks, what is the routing, etc.) and physically... the "tug test". Label everything, and put it all in a spreadsheet. Tools are nmap, pen and paper, and a label printer. Access to the routers, or being friendly the the router admin is a must.

    Step 2) Isolate the problem protocols and hosts. Be on the lookout for appletalk, IPX, or old netbios. All very chatty protocols. Look for old hubs and replace them with switches. Look for comprimised boxen. Try to VLAN things logically (by department, or usage which ever is best for the environment). Tools are snort, ethereal, ntop, and syslog (any managed switches should be sending to a syslog server (I've used syslog-ng))

    Step 3) Trend as much as you can. Even before the network is cleaned up, start to collect statistics from the switches, and/or hosts on your network. Any gateways should be monitored as well. This will let you see if there are problems corelated to a particular time of day, if your're going over your bandwidth etc. Tools are MRTG, or for more in depth try Cacti http://www.cacti.net/

    There is much more after you get to this point, but people will be much happier the faster you get here.

    Good luck

  8. Take a step back by tmasky · · Score: 3, Informative

    You're attempting to help diagnose a (presumably) large network. Very honourable, but attempting to do this gung-ho with a few responses from slashdot is very silly.

    Grab a consultant from a local small Linux shop for a few days. Someone with good knowledge about system/network architecture.

    Get them to poke around on your network. Provide all documentation you have available.

    After the first day, you should have all the information necessary to write up a document regarding your existing issues. Make notes while he's using tools to investigate. From there you work with the consultant to come up with a separate document for resolutions with a criticality rating.

    From there, you want systems in place to monitor the health of your network. Have a chat to him about it, but I'd be inclined to build a solution which was centered around using Nagios.

    While consultants can (and frequently do) suck when you come to specifics, they are a valuable resource for pointing you in the right direction. And experience counts! They've done this stuff before, they know the pitfalls and proven solutions.

  9. Low-tech by twoflower · · Score: 3, Funny

    Low-tech is often a faster and more efficient way to find these sorts of problems. For surveillance and diagnosis, I recommend walking around and watching over students' shoulders. For corrective measures, a couple of taps with a ball-peen hammer usually suffices.

    --


    --
    Twoflower
  10. Re:OSS? Linux? WHY? by Anonymous Coward · · Score: 3, Informative
    From the readme.win32:

    If you want to use a PC running Ethereal to monitor 802.11 traffic to or from other machines, rather than using Ethereal only to look at traffic to and from the machine on which you're running Ethereal, you should seriously consider running it on a recent version of Linux or of one of the free-software BSDs, rather than on Windows.

  11. Fuck by jericho4.0 · · Score: 3, Insightful
    You should not be 'helping' anyone with a network.

    Go on, mod me 'insightfull' or mod me 'flamebait', it's one or the other.

    --
    "A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
  12. top 75 list by LinuxGeekMobile · · Score: 2, Informative


    More tools than you could learn in a reasonable timeframe can be found here: http://www.insecure.org/tools.html

    I would have posted sooner, but T-Mobile's data coverage has been spotty since Wilma hit. Still no power or fuel, but at least I can can get my geek-fix now. :) (at least until my battery dies)

    --
    - Posted via Danger HipTop2 / T-Mobile Sidek!ck II -
  13. read your network by graf0z · · Score: 2, Informative

    Troubleshooting a network is a matter of experience, not of some particular tools. But these things help:

    * Put you box on the monitor/mirror/analysing port of the switch an read the traffic with tcpdump/tethereal/ethereal (If you just want to check the broadcasts, it does not have to be a monitoring port). Edit the packet filter expression until you do not see the legal/uninteresting traffic anymore but only the suspects. (They are students? Have fun to filter all the p2p traffic ;-) Let ethereal make statistics over the traffic.

    * Watch out for ICMP errors, especially ICMP-redirects. Watch out for TCP-resets. Watch out for fragments. Watch out for malicious Spanning-Tree packets. Watch for SMTP to many IPs (spamming trojans), IRC (zombies), weird packets eg. fragmented UDP (zombies attacking a target)

    * Check the MAC adresses in the etherframe-header ('tcpdump -e'): are they constant? If there are packets IP_AIP_B, are the accordings MACs really MAC_AMAC_B or MAC_A-->MAC_B and MAC_B-->MAC_C instead?

    * Install an arpwatcher. Stealing the default-gateway's MAC is an effective DoS attack on a network.

    * Put 2 NICs into a fast linux box, bridge ('brctl') them together, put this linuxbridge in front of the default-gateway. Dump again. Install a snort on it and let it see the traffic - what does the snort log say?

    * Do the switches have the feature to log to a remote syslog deamon? Do so and read those logs! Check all the snmp-variables on the switches, especially the "errors". Read the logs of the default-gateway.

    * Watch the amount of traffic (snmpget the port-counters of the switches and make mrtg-graphs of the results). Maybe the problem only strikes if some switch ports are under high load?

    * Scan the network with nessus. Maybe you'll find some bindshells.

    * ...

    Hope this helps.

    g.

  14. Re:OSS? Linux? WHY? by smartin · · Score: 2, Insightful

    I have a better idea. Get Linux and slap it on all your windows boxes and be done. For good.

    --
    The difference between Canada and the USA is that in Canada healthcare is a right and gun ownership is a privilege.