Slashdot Mirror


Trying to Help a Troubled Network with Linux?

vmehta asks: "I was recently put in a situation where I am trying to help a troubled network with many students accessing it. There are issues with broadcast packets and random outages which seem to be plaguing the network. What tools and methods are the best practice when trying to use Linux and Open Source to analyze and fix a network?"

68 comments

  1. Assess the problem by madaxe42 · · Score: 4, Insightful

    First step isn't to blunder in and migrate - the first step is to work out what's causing the outages etc. use ethereal or some other packet sniffer to establish where the broadcast floods are coming from - use nmap to find insecure hosts - also, investigate what kind of routers are being used, and what rules are being employed.

    Basically, OSS/Linux are great, but don't rush in without establishing the issues first.

    1. Re:Assess the problem by SpaceLifeForm · · Score: 1, Informative
      And then clean up the windows boxes. It sure sounds like there are many pwned machines.

      --
      You are being MICROattacked, from various angles, in a SOFT manner.
    2. Re:Assess the problem by tverbeek · · Score: 4, Informative
      Did you read the part of the question where he explained that he was looking for tools to analyze and fix the problem? And did you notice that he didn't mention or imply any kind of migration?

      Here's an idea: Before you blunder in with an answer, the first step is to work out what the question is. :)

      --
      http://alternatives.rzero.com/
    3. Re:Assess the problem by moro_666 · · Score: 2, Informative

      You can attempt a scan&sniff at first, plenty of stuff to choose from,
      but if your 100mbit network is being overhauled, it's quite difficult
      to isolate single responsible instances.

      I guess that probably you will end up doing that :

      1) get rid of cheap hubs(made in paiwan) and get some real network switches in place, like those from SMC. Having an old buggy hub talking to several cheap NICs in several machines ends up in massive packet collision, resulting a network that doesnt carry much but is totally jammed.

      2) scan all those windows machines attached to the network, they are probably bloated with viruses and spyware. sniff on the windows machines for a 24h period, some viruses/spywares only work at certain hours and a midday scan doesnt show anything. if the windows machines cant be healed from their stuff, unplug them (yes as simple as that).

      3) unless all above helps, isolate subnets and firewall them all creating according rules into inner house firewalls, so that the flooding would stop.

      4) Most important, tell the windows users to scan their machines regulary, unless they do that, you will have to start all the above the day after tomorrow again.

      I recently found a virus in a windows box in my own office sending me the "worm" emails ... luckily thunderbird on my ubuntu laptop didnt quite figure out what to do with "report.pif" files ... Anyway the biggest problem was how to draw the "big red picture" for the whole office that they *have* to scan their machines all the time. I found out that 66% of our machines were infected by worms, some of them even by multiple worms. After the cleanout the network performance increased dramatically and i could even do non lagging X sessions from my home to the office (which i couldnt do before).

      Long story short: Replace old crap, scan & sniff, isolate subnets, unplug m$ powered machines.

      --

      I'd tell you the chances of this story being a dupe, but you wouldn't like it.
    4. Re:Assess the problem by nocomment · · Score: 2, Interesting

      I'm replying to this comment but my response is directed toward the OP

      I agree with madaxe42, First things first. Diagram the network. Figure out where hubs and switches are. Figure out where the firewalls are. Figure out how packets traverse the network(s). If it's a single network with a single point of access to the internet this should be (relatively) easy. If you are looking to save the day with linux what you could do is set the switches to use "port mirrors" to capture every packet on the network to snort DB. Read up on creating snort rules and you can capture literally everything that goes on. Also run samba with no password access and log everything to see what ip's are delivering viruses to your machine. Turn on snmp at every gateway and graph the network traffic. This should tell you what segments are most prone to excessive traffic (across networks).

      Chances are with this combo you will find most virus, and especially the p2p abusers.

      I've had to do this before and this works for me.

      One of our remote sites has a T1 to the internet but also needs to access our financial system. It wasn't working. In theory they had plenty of bandwidth but the system was unstable they were able to connect...sometimes, but once they did it was almost unusable, and we're jsut talking about lightweight telnet (over a VPN) session. I initaially started with an mrtg graph on the router that is the last hop to the Internet. I saw normal traffic interupted by high periods of max bandwidth.

      I've seen this type of pattern before..."Kazaa" I thought. I set up one of the company laptops with snort and mandrake linux and sent it down there with instructions to put it on a switch on the same network as the router with port mirroring so I could figure out which network it was coming from. Once I knew that I repeated the process and had them gradually move the laptop down the chain Until it was on the same subnet as the offender. 2 days later I had the IP and a list of mp3s that were being shared out and downlaoded from that machine.

      All the while the VP of that location was harping on us that we needed to spring for a second T1 just to support their 12 users running telnet.

      I returned to them with the information I had gathered and they responded with a "I know who that is". the traffic stopped immediatley and they have been runnign fine for over a year now with no hiccups.

      Just think logically and you will have it figured out in pretty short order.

      --
      /* oops I accidentally made a comment, sorry */
      /* http://allyourbasearebelongto.us */
    5. Re:Assess the problem by siplus · · Score: 1
      He did specifically mention 'Linux'

      Chances are, he's not going to install linux under vmware to solve the problem ;)

    6. Re:Assess the problem by Phisbut · · Score: 1
      He did specifically mention 'Linux'
      Chances are, he's not going to install linux under vmware to solve the problem ;)

      Maybe he just happens to have a Linux laptop that he's willing to plug into the network to do the scan/diagnostic stuff. Maybe he wants OSS because he doesn't want to use / can't afford to spend expensive software solutions.

      I use Linux at work, while most of my coworkers have either WinXP or OSX. Although sometimes a task is better accomplished from one of my linux boxes, I'm not rushing into trying to convert everybody to Linux...

      Sometimes, Linux is a desktop, sometimes, Linux is a server, and sometimes, Linux is a tool.

      --
      After 3 days without programming, life becomes meaningless
      - The Tao of Programming
    7. Re:Assess the problem by cnelzie · · Score: 1

      Yep and anyone that doesn't understand your last bit is... well... a tool.

      --
      If you ignore the other uses of a tool, does that make the tool less useful, or you less useful?
    8. Re:Assess the problem by tverbeek · · Score: 1
      You know what's worse than someone who answers a question without understanding it?

      Someone who tries to justify the first person's cluelessness by trying to posit some alternate universe where it would make sense.

      --
      http://alternatives.rzero.com/
    9. Re:Assess the problem by Anonymous Coward · · Score: 0

      the first step is to work out what the question is. :)

      Unless you're playing Jeopardy.

    10. Re:Assess the problem by Braino420 · · Score: 1

      I think their respsonses are helpful. I also find the link in your sig helpful. ez man

      --
      They call me the wookie man, I guess that's what I am
  2. You are infected with viruses most likely by Anonymous+Crowhead · · Score: 5, Insightful

    Almost any time I see this, its some random box flooding the network. Just go to your switches...the light that is on solid continuously will point you in the right direction.

    1. Re:You are infected with viruses most likely by Anonymous Coward · · Score: 1, Funny
      Ok, I found it and unplugged it.

      Now people are shouting at me, something about an Oracle.

      Who the fuck is this Oracle dude and why has he hacked into our network? Is he like that Mitchick character? I hope they never let him out of prison!

  3. Shoot the students by Usquebaugh · · Score: 4, Funny

    No use fixing symptoms go after the root cause.

  4. OSS? Linux? WHY? by Gothmolly · · Score: 4, Interesting

    Whats next, "How do I produce PDF files, using Linux and Open Source?" "How can I leverage Open Source to surf the web?"

    Christ, this is like the late 90's, when everything suddenly had "e" in front of it. Dude, get Ethereal, slap it on any Windows box, and be done. No need to get nerdy with Linux. If you know enough that its broadcast traffic, you're halfway there.

    --
    I want to delete my account but Slashdot doesn't allow it.
  5. Know your network. Document it! by Webmoth · · Score: 4, Insightful

    The first step in troubleshooting is in knowing the network topology. How are network segments separated? How are the connected? Where are routers, hubs, switches, etc.? Which switches are managed, and how are the VLANs set up on them? Where are the DHCP servers, and what do they serve? Where are all your network drops?

    Do your network segments have multiple subnets attached to them?

    Is everything subnetted properly?

    The first set of questions are ones YOU should be able to answer. After all, it's YOUR network, and YOU should know how it's set up. The last two are harder to deal with, because these settings may be on computers not in your control.

    Answer the first questions first, then when you are looking at packet traces, TCP/IP dumps, logs, etc. and you see a problem, you'll have a better idea where the problem is physically located, saving much time and energy.

    And then there's the "dumb questions" I shouldn't have to ask: Do you have a loop? Are your cables wired to T568A or T568B standards? Are all your cables in good repair?

    --
    Give me my freedom, and I'll take care of my own security, thank you.
    1. Re:Know your network. Document it! by dr.+greenthumb · · Score: 1

      Are your cables wired to T568A or T568B standards?

      It makes no functional difference which standard you use for a straight-thru cable. You can start a crossover cable with either standard as long as the other end is the other standard. It makes no functional difference which end is which. Despite what you may have read elsewhere, a 568A patch cable will work in a network with 568B wiring and 568B patch cable will work in a 568A network. The electrons couldn't care less.

    2. Re:Know your network. Document it! by jnewmano · · Score: 1

      Just make sure that you're only using two twisted pairs and not all four or else you'll have all sorts of apparently random problems. To keep it simple you want a pair on each end of the plug another in the middle with a pair straddling that one. The wiring really is important, when done incorrectly you will have problems, even if it does seem to work

  6. Toolsets by Caydel · · Score: 1
    I agree with one of the earlier posters; it is probably an infected system or 10.

    The best thing you can do is use a tool such as Ethereal to find the IP of the system or systems causing it, and subject them to a good cleanup.

    For a good toolset, check out the Auditor Security Tools LiveCD for a collection of tools you can take with you wherever you go...

    Auditor tools

  7. It's a NIC by Fished · · Score: 3, Insightful

    Without any more information, you've got a bad NIC, almost certainly. Look on the switch for the port whose light is always on. As you've describe it, software has almost nothing to do with it. This is a NIC, or a bad switch, or bad cabling, or something.

    --
    "He who would learn astronomy, and other recondite arts, let him go elsewhere. " -- John Calvin, commenting on Genesis 1
    1. Re:It's a NIC by Beatbyte · · Score: 1

      To the parent:

      Or it could be arp flooding, or it could be a virus, or it could be a greedy student downloading music, or it could be too much bittorrent traffic, or it could be a million other things.

      Troubleshooting these things for a living, trust me, nothing is certain until you've figured out what it is.

      To the poster:

      Use ethereal and watch where the traffic is coming from. Use management built into your switches to watch for ports going down when there are outtages. Use traceroutes to find a dead hop (if you're on different subnets). Think about the path it takes (logically and physically). Get into all your routers and ping or telnet to the other routers ensuring that they don't have a screwed up route, subnet mask, or cabling issue. Try changing switch ports for key cables.

      You will most likely need more than 1 body to do all of this efficiently. Good luck!

  8. Of the top of my head by max+born · · Score: 1

    See man command for further info on these commands.

    Use to ping ip-address to see if you can get to the router and beyond. Make sure "allow ICMP" is enabled in the router.

    Use traceroute -n ip-address to see where the traffic is failing.

    Is it a DNS problem? Try host some.host.name to make sure you can resolve names.

    Is it a DHCP problem? Try dhclient to see if you can get an IP address. (maybe pump on some systems.)

    Connect a hub (not a switch) to some strategic place on the network. Give yourself an IP address and check for excessive traffic with iptraf. This will give you a breakdown of what bandwidth is being used by what services.

    You can use commands like nc and telnet to connect to specific ports. e.g. nc -p 53 dns-server to see if the DNS server is open.

    You can also automate these commands in a bash script run via a cronjob every minute. Something like:

    for x in router1 router2 router3 ....
    do ping $x || echo >>/tmp/failures.txt
    done


    See man bash for details.

    Good luck.

    1. Re:Of the top of my head by Anonymous Coward · · Score: 0

      for x in router1 router2 router3 ....
      do ping $x || echo >>/tmp/failures.txt
      done

      In zsh, all that mess is just

      for x in router{1..3} ; ping $x|echo >>/tmp/failures.txt

    2. Re:Of the top of my head by jtev · · Score: 1

      Not if router1 rounter2 and router3 have names like kirk, spock, and mccoy. Don't assume everyone uses genric names like he just did.

      --
      That which is done from love exists beyond good and evil
    3. Re:Of the top of my head by Will2k_is_here · · Score: 1

      Connect a hub (not a switch) to some strategic place on the network. Give yourself an IP address and check for excessive traffic with iptraf. This will give you a breakdown of what bandwidth is being used by what services.

      I'm only a student, not a systems administrator so I wouldn't pretend to suggest I know what's acceptable and what's not, but this would piss me off if I knew someone was doing this to me. I imagine this kind of behaviour should be kept under one's hat

      Further, random unplugging of cables which cause seemingly excessive traffic on the switches/routers can cause outrages as well. It all depends on the nature of the network, but it might make more sense to pay the owner of the machine or subnet a visit before yanking the plug.

      To contribute something to the discussion: Of course this depends on the kind of network we're talking about here, but I think the best approach is to document who, what, and why on your network. Get a handle on who is on your network, what they are doing on your network, and why they are doing it. When you have this all documented, it is simple to define policies (or prices?) for the kind of behaviour going on on the inside, and for what is allowed from the outside.

    4. Re:Of the top of my head by AmigaBen · · Score: 1
      I'm only a student, not a systems administrator so I wouldn't pretend to suggest I know what's acceptable and what's not, but this would piss me off if I knew someone was doing this to me. I imagine this kind of behaviour should be kept under one's hat

      This would be because you're a student. Students tend to think they have some right to a network and every network resource they can imagine. They don't. On the other hand, the administrator has the responsibility of making sure the network and it's resources are working and available for ALL of it's appropriate uses.

      --
      +5 Insightful, really!
    5. Re:Of the top of my head by Will2k_is_here · · Score: 1

      Provided my usage is within the policies handed down from the administrator, I am granted the right to private Internet usage.

      If my activities are suspect, an administrator can and should investigate and this should be mandated in the policy.

    6. Re:Of the top of my head by AmigaBen · · Score: 1
      I doubt the AUP you're governed by grants you the right to "private" internet usage. If it does, it's in the minority.

      Just because you wish it, doesn't make it so.

      --
      +5 Insightful, really!
  9. map, isolate, trend by grattwood · · Score: 5, Informative

    Step 1) Map the network both logically (which networks, what is the routing, etc.) and physically... the "tug test". Label everything, and put it all in a spreadsheet. Tools are nmap, pen and paper, and a label printer. Access to the routers, or being friendly the the router admin is a must.

    Step 2) Isolate the problem protocols and hosts. Be on the lookout for appletalk, IPX, or old netbios. All very chatty protocols. Look for old hubs and replace them with switches. Look for comprimised boxen. Try to VLAN things logically (by department, or usage which ever is best for the environment). Tools are snort, ethereal, ntop, and syslog (any managed switches should be sending to a syslog server (I've used syslog-ng))

    Step 3) Trend as much as you can. Even before the network is cleaned up, start to collect statistics from the switches, and/or hosts on your network. Any gateways should be monitored as well. This will let you see if there are problems corelated to a particular time of day, if your're going over your bandwidth etc. Tools are MRTG, or for more in depth try Cacti http://www.cacti.net/

    There is much more after you get to this point, but people will be much happier the faster you get here.

    Good luck

  10. Take a step back by tmasky · · Score: 3, Informative

    You're attempting to help diagnose a (presumably) large network. Very honourable, but attempting to do this gung-ho with a few responses from slashdot is very silly.

    Grab a consultant from a local small Linux shop for a few days. Someone with good knowledge about system/network architecture.

    Get them to poke around on your network. Provide all documentation you have available.

    After the first day, you should have all the information necessary to write up a document regarding your existing issues. Make notes while he's using tools to investigate. From there you work with the consultant to come up with a separate document for resolutions with a criticality rating.

    From there, you want systems in place to monitor the health of your network. Have a chat to him about it, but I'd be inclined to build a solution which was centered around using Nagios.

    While consultants can (and frequently do) suck when you come to specifics, they are a valuable resource for pointing you in the right direction. And experience counts! They've done this stuff before, they know the pitfalls and proven solutions.

  11. Re:It's a NIC - YES! by schon · · Score: 1

    Bingo. First thing I thought of when I heard this.

    While it's *possible* this is a virus (as others have said), I'd look at hardware first. A bad tranciever will generate more bad traffic than a virus could ever hope to.

  12. Re:OSS? Linux? WHY? by nri · · Score: 1

    " when everything suddenly had "e" in front of it. Dude, get Ethereal,"

    you mean eThereal don't you :-)

    --
    if :w! doesn't work, try :!cvs commit -m""
  13. Map the physical/logical network. by khasim · · Score: 1

    Find out what is connected to what and how. More than 90% of the "network problems" I encounter are basic cable issues.

    Remember, when a NIC is connected to a switch, they only auto-negotiate if both are set to auto-negotiate. If someone sets them to a certain configuration, but doesn't get pair correctly matched, you will have a lot more collisions and such.

    Make sure that your collision domain is setup correctly. Pay attention to the length of the cables. This is where the physical map comes in. You can check each section to make sure it's good. Then move to the next.

    Start at the physical layer and work your way up.

  14. It's still a NIC by Fished · · Score: 1
    I strongly qualified my response in several ways. And, for what it's worth, I've been diagnosing networks for 15 years, so I feel qualified to have a strong opinion. When I see a network exhibiting the kind of erratic behavior described by the questioner, first thing I check is for a bad NIC, because 90% of the time that's the problem.

    It certainly could be any of the things you mention. With the vagueness of the original post, it could even be a layer 7 problem (i.e. a crappy Windows server.) But with the piss-poor information provided, my money is still on an NIC.

    --
    "He who would learn astronomy, and other recondite arts, let him go elsewhere. " -- John Calvin, commenting on Genesis 1
    1. Re:It's still a NIC by Anonymous Coward · · Score: 0

      I call bullshit.

      You've been troubleshooting for 15 years? What are you troubleshooting, layer 2 networks? Christ.

    2. Re:It's still a NIC by Fished · · Score: 1
      My first "real" job was at Christopher Newport University in 1991 (okay, 14 years--but prior to that I worked at a local computer store for several years and did fiddle with networks a little bit.) There I managed an integrated, campus-wide IP and IPX network. This was back when Gopher was state of the art.

      Go back to your hole, troll.

      --
      "He who would learn astronomy, and other recondite arts, let him go elsewhere. " -- John Calvin, commenting on Genesis 1
  15. Low-tech by twoflower · · Score: 3, Funny

    Low-tech is often a faster and more efficient way to find these sorts of problems. For surveillance and diagnosis, I recommend walking around and watching over students' shoulders. For corrective measures, a couple of taps with a ball-peen hammer usually suffices.

    --


    --
    Twoflower
  16. Network Traces by yancey · · Score: 1

    Start using tcpdump along with ethereal. Put the Linux box on different parts of the network to see what is happening. If you're in a switched environment, you will see mostly broadcasts. Some broadcasts are required and good (like the necessary ARP requests and possibly DHCP requests when a computer boots and initializes its network devices). However, unnecessary broadcasts are very bad for network performance and can cause "packet storms" which cause outages.

    Start tracking those broadcasts down and find out what's going on in this network, find out what machines are sending the broadcasts and why. Learn why various services broadcast and what services are available to minimize the broadcasting. For example, Windows boxes configured for a workgroup will typically broadcast until you setup a WINS system. Once properly configured, all the Windows boxes talk to the WINS box directly without broadcasting.

    The same is true with Windows domains. If all the boxes are joined to an Active Directory domain, the workstations should switch from broadcasting to unicasting (talking directly) to the Active Directory servers.

    This is also true of Service Location Protocol. If you've got a bunch of boxes trying to use SLP, they're probably multicasting (or broadcasting if the network switches and routers don't properly support multicast). Once you start an SLP Directory Agent, all the servers register their services with the DA and all the clients ask the DA where to find those services -- all with unicast instead of broadcast.

    Certain older protocols are very "chatty" -- AppleTalk, IPX, NetBEUI, and NetBIOS are good examples. Work toward eliminating these protocols. In a properly configured network, you should be able to do everything you need without them.

    --
    Ouch! The truth hurts!
  17. Re:OSS? Linux? WHY? by Anonymous Coward · · Score: 3, Informative
    From the readme.win32:

    If you want to use a PC running Ethereal to monitor 802.11 traffic to or from other machines, rather than using Ethereal only to look at traffic to and from the machine on which you're running Ethereal, you should seriously consider running it on a recent version of Linux or of one of the free-software BSDs, rather than on Windows.

  18. Fuck by jericho4.0 · · Score: 3, Insightful
    You should not be 'helping' anyone with a network.

    Go on, mod me 'insightfull' or mod me 'flamebait', it's one or the other.

    --
    "A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
    1. Re:Fuck by ChrisJones · · Score: 1

      Shame there's no 'doesn't make the first bit of sense' mod ;)

      --
      Chris "Ng" Jones
      cmsj@tenshu.net
      www.tenshu.net
    2. Re:Fuck by geminidomino · · Score: 1

      I think his point was that if the network admin was competent, the questioner wouldn't even be needed.

      Of course, if they were competent, there would be no market at all for conslutants in the first place.

    3. Re:Fuck by MarkGriz · · Score: 1

      "there would be no market at all for conslutants in the first place"

      Best. Freudian slip. ever.

      --
      Beauty is in the eye of the beerholder.
    4. Re:Fuck by chivo243 · · Score: 1

      I work in a school as an catch-all tech guy, hell desk to maintaining the damn *xchange/file cluster. It sounds like he was hired for one thing, and has been told this 'help' is his real job.... it happened where I am at, but not to me :-/ I was twisted enough to be already involved.

      You are right though, he should seek assistance.

      --
      Sig Hansen?
  19. Baselining by Bios_Hakr · · Score: 1

    The first step it to document and baseline the systems.

    For baselining, I'd enable SNMP for all the managed devices. Then use something like MRTG with RRD Tool and chart every port for every switch for week or so.

    While that's happening in the background, start mapping your LAN. Use something like Visio on a laptop and start visiting switches and routers. Confirm the connections between all the routers and switches. Then use good labels (no, not scotch tape and paper) to document those connections with FROM: and TO: information.

    FROM:bldg1024 rm201 sw3 p4
    TO: blgd2048 rm906 sw17 p33

    Now, labeling to and from may seem dumb at first. But the first time you unplug something to move it and then forget where it was suposed to go, you'll thank me.

    Once everything is labeled and documented, you can go back you your MRTG graphs and start analyzing the data.

    Look at your core switches. Which ports have the highest graphs? Look at your documentation and see what switch is connected to that port. On that switch, which port is highest? Wash, rinse, repeat.

    Once you have the access-device that is concentrating all the bad data, set up a clone port and then use a packet sniffer (I use Sniffer Pro) to figure out what the bad data is.

    Anyway, after you "shave off the peaks", you can re-baseline the system and start agian. Onc traffic is semi-reasonable, then it's time for hardware analasys.

    Using MRTG, look at the CPU, memory, and other nifty stats from the switches and routers themselves. Target devices in need of an upgrade. One word of caution: Cisco switches always have high CPU and memory usage. Just because a device shows 85%CPU does not mean it's working hard. Look at a switch with nothing connected to see what I mean.

    1. baseline
    2. document
    3. analyze
    4. fix
    5. upgrade
    6. re-baseline
    7. re-document

    --
    I'd rather you do it wrong, than for me to have to do it at all.
  20. top 75 list by LinuxGeekMobile · · Score: 2, Informative


    More tools than you could learn in a reasonable timeframe can be found here: http://www.insecure.org/tools.html

    I would have posted sooner, but T-Mobile's data coverage has been spotty since Wilma hit. Still no power or fuel, but at least I can can get my geek-fix now. :) (at least until my battery dies)

    --
    - Posted via Danger HipTop2 / T-Mobile Sidek!ck II -
  21. Eh? by yahwotqa · · Score: 0, Offtopic

    I thought this was slashdot, where news topics are discussed, not community support forum.

  22. Open source network analysis tools by SgtChaireBourne · · Score: 1
    What tools and methods are the best practice when trying to use Linux and Open Source to analyze and fix a network?
    These are some of the tools to consider, in no particular order:

    You'll have to read the descriptions to decide which ones to try.
    --
    Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
    1. Re:Open source network analysis tools by doon · · Score: 1

      I'll add ntop to the list. Plug a box running that into a monitor port and watch the traffic for a while.

      As others have said good documentation of the Network is a must. I was thrown into a similiar situation a year or 2 back at my highschool (I graduated in 94, so it wasn't as a student). Aftering doing a walk through of the network and finding every single hub (there where 2 switches) and what was attached to it we could then easily locate some of the problems. In some cases they have hubs chained 8 deep (with 60-70 computers) and there was a ton of broadcast traffic, we isolated some of the labs and replaced some of the chained hubs with switches (Temp fix), removed some worms, virii and located a bad nic, we got the network running a lot better. My next step was to replace the little NAT box they where using with a Netra that I had sitting at home running OpenBSD and Squid. This way I could transparent proxy all of the net traffic and cut back on a lot of the stuff the school didn't want to come in, also it had the added benefit of speeding up some of their classes, since most of them where like, ok Kids everybody click on this url. So 30+ requests for a graphics heavy page will bog a 1mb/s DSL connection. We are finally upgrading the network (Should be done soon), to use a bunch of fully managed cisco switches with Gig Fiber backbone and much needed vlans and firewalls. (which is cool since I can get my Netra back )

      --
      To E-mail me, replace the first period in my domain with an @
    2. Re:Open source network analysis tools by Anonymous Coward · · Score: 0

      Cittio WatchTower is a linux based Network Monitoring Tool also. http://www.cittio.com/

  23. read your network by graf0z · · Score: 2, Informative

    Troubleshooting a network is a matter of experience, not of some particular tools. But these things help:

    * Put you box on the monitor/mirror/analysing port of the switch an read the traffic with tcpdump/tethereal/ethereal (If you just want to check the broadcasts, it does not have to be a monitoring port). Edit the packet filter expression until you do not see the legal/uninteresting traffic anymore but only the suspects. (They are students? Have fun to filter all the p2p traffic ;-) Let ethereal make statistics over the traffic.

    * Watch out for ICMP errors, especially ICMP-redirects. Watch out for TCP-resets. Watch out for fragments. Watch out for malicious Spanning-Tree packets. Watch for SMTP to many IPs (spamming trojans), IRC (zombies), weird packets eg. fragmented UDP (zombies attacking a target)

    * Check the MAC adresses in the etherframe-header ('tcpdump -e'): are they constant? If there are packets IP_AIP_B, are the accordings MACs really MAC_AMAC_B or MAC_A-->MAC_B and MAC_B-->MAC_C instead?

    * Install an arpwatcher. Stealing the default-gateway's MAC is an effective DoS attack on a network.

    * Put 2 NICs into a fast linux box, bridge ('brctl') them together, put this linuxbridge in front of the default-gateway. Dump again. Install a snort on it and let it see the traffic - what does the snort log say?

    * Do the switches have the feature to log to a remote syslog deamon? Do so and read those logs! Check all the snmp-variables on the switches, especially the "errors". Read the logs of the default-gateway.

    * Watch the amount of traffic (snmpget the port-counters of the switches and make mrtg-graphs of the results). Maybe the problem only strikes if some switch ports are under high load?

    * Scan the network with nessus. Maybe you'll find some bindshells.

    * ...

    Hope this helps.

    g.

  24. Consultants by dolmen.fr · · Score: 1

    Grab a consultant from a local small Linux shop for a few days. Someone with good knowledge about system/network architecture.

    You should read between the lines. He said: I was recently put in a situation...
    Which means he is the consultant. Of course, thanks to a fake curriculum made by the sales representant of the consultancy firm, they sent him while he has no clues about network administration.

  25. Re:OSS? Linux? WHY? by smartin · · Score: 2, Insightful

    I have a better idea. Get Linux and slap it on all your windows boxes and be done. For good.

    --
    The difference between Canada and the USA is that in Canada healthcare is a right and gun ownership is a privilege.
  26. Step 1 by OeLeWaPpErKe · · Score: 1

    Get someone who does this for a living. I am sure there are a few in your local linux shop. Someone who works at an isp should have experience with the problems you site.

    Step 2

    Follow his/her recommendations (which will probably be splitting the network in more l3 domains) get a 6500, or a few 3750, or if you really can't afford much a few 3550 switches (which will leave you out of luck when ipv6 starts getting used, but otherwise is a fine choice).

    This is about having L3 switches closer to the end user than you have now, as far as I know there are no acceptable products that are cheaper.

    (probably) You should split up the l2 network into a lot of separate l3 domains. Do not implement firewalling and nobody of the students will mind. Get an IGP running between the l3 domains, and provide multiple, geographically separate uplinks (10 * adsl exporting 0.0.0.0/0's in the igp is a lot better than 1 E3 if you don't really know what you're doing)

    In short, if you have to ask, you don't know how to fix this, no analysis tools can help you without a serious and deep understanding of the technology. If you don't want to pay someone for this, a lot of people will say they can fix it, but you'll need to be extremely lucky to actually find someone.

    Perhaps if you decide to go the cheap route, go the old-fashioned way, trust someone with a CCNP and a CS degree more than a 17-year-old.

    1. Re:Step 1 by RazorJ_2000 · · Score: 1

      So, let me quickly summarize your solution:


      1. Get a consultant.


      2. Blow $50K in Crisco hardware (yah, you heard me, Crisco, not Cisco)


      3. Put a bunch of snot-nosed barely literate retards, err, sorry, students on a L3 network where they can run fscking kazaa all day.



      I haven't laughed this hard in a while :)


      --
      pi=sigma{n:0-infinity}[(1/16)^n][(4/(8n+1))-(2/(8n +4))-(1/ (8n+5))-(1/(8n+6))]
  27. Gee, thanks. by Anonymous Coward · · Score: 0

    "This is a NIC, or a bad switch, or bad cabling, or something"

    Gee, thanks for pinpointing the problem.

  28. check the obvious by vaseyandco · · Score: 1

    Make sure you're students haven't started looping back the cables from one network socket to another, always make sure an unconnected network point isn't connected at the patch panel/switch end, - it's just asking for trouble, the more physical restrictions you have on your network, the easier the rest will be to manage. - Rogue access points may also be a downfall of you network, check for them!

    --
    You bought her a Kentucky Fried Chicken Franchise!!!
  29. Loopback by phorm · · Score: 1

    Actually, another cause could be a looped network connection. We have problems with students who will connect two network jacks together, thus creating a loopback in the switch they are connected to. Generates a whole lot of network traffic. Basically they were doing this when they had exams requiring computers, because bringing down the network ensured no exam...

    1. Re:Loopback by vwjeff · · Score: 1

      If you have managed switches, you should enable STP.

    2. Re:Loopback by phorm · · Score: 1

      Some are, some aren't. We're in the process of putting in managed at the trouble sites.

  30. Insightful eh? by Darius+Jedburgh · · Score: 1

    Tell me. What problems, exactly, would this solve?

    1. Re:Insightful eh? by Anonymous Coward · · Score: 0

      "Tell me. What problems, exactly, would this solve?"

      No more viruses clogging the network?

  31. First and always first in troubleshooting networks by deejer · · Score: 1

    Always check the physical layer first.

    Just this summer I tracked down an error that was caused by a cisco wireless access point trying to pull electricity from the cat5. It was UNPLUGGED from the power! It took down a whole segment of the network.

    The way we found it was from the solid light on the switch.

  32. Documentation by staticsage · · Score: 1

    If you are going to try to write up docs....

    If they run Cisco equipment, a show cdp neighbor will help you a lot. Keeping up to date documentation on a network (especially a large one) is a difficult task, but it will make solving future problems much easier.

  33. All sorts of good tools by canuck57 · · Score: 1

    Here are some tools I use for just about the same thing your about to do. And a brief reasons why I use each. Start with one, then once mastered move to another.

    • ethereal - good for interactive network monitoring and also for analyzing caputures from tcpdump
    • tcpdump - You can use this to caputre the network traffic. Have the router bridge a port so you can monitor everything. Can be a fundimaental component in a network recorder to record while your not there.
    • snort - if a packet matches known goofy stuff, it logs it. Chances are bad nodes generate a lot of snort alert entries.
    • arpwatch - when someone plugs in something new you know what/where it is
    • openbsd - the OS I prefer running most of this stuff on, and log the firewall violations with logwatch so if a virus scans for port 135, something that should happen on a BSD systems, you have a valid concern
    • router access - so you can shut down offending ports/ips to enforce policy as well as block broadcasts through the routers except for dhcp. Look at QoS in the routers.
    • policy - management supplied, so you can kick off the offenders without kick back. And don't ask, do kick them off until remediation is complete.
    • email (any kind) - let everyone know why the lan segement was cut off, use peer presure to get conformance. Students who get access cut off because a room mate with a WAP and no WEP and a virus/worm infested PC gets them cut off will talk to them in ways you can't

    Be patient. Could be just that the lan/wan/Internet connect is just too slow for what the population expects. University and college management rarely understand this until a prof or big wig can't watch football game or something.