Slashdot Mirror


Linux Token Ring Support Bringing Down Corporate Nets?

mjh asks: "I've been running Debian GNU/Linux on my company supplied laptop for 3 months now. I got permission from my manager to run it on the network, but I did not go through the somewhat rigorous process of getting the software certified. I have legitimate business reasons for using it on the corporate network (which is why my manager approved it). I even managed to get Lotus Notes to run under wine so I never had to boot into Winders at all (unless someone sent me a PPT doc). I was pretty happy...until I brought the entire network down." Anyone else running Linux on a Token Ring network who would care to talk about their own experiences?

"My company runs Token Ring at the office (puke!) I got drivers from the card manufacturer (Madge), and I'd been happily churning along. Then last week, we started seeing a bunch of errors on the network. These errors would bring everyone on the ring down. After a week of this kinda stuff, they eventually isolated it to me.

Reboot the laptop into Windows and the network card works just fine and they don't see any ring errors. Reboot into linux, and suddenly they start seeing ring errors. I don't really grok token ring, so I'm not entirely certain that I know exactly what the problem is. But, whenever I brought the token ring on line under linux, they saw ring errors, which eventually (as I understand it) would bring down the entire ring. Switch cards (same model) and it continues to happen. It looked to me (and the network analysts) that the Linux driver was causing the problem.

I tried switching to an IBM token ring card, but there's a bug and I hadn't patched for this. The people with the fluke would not wait around while I tried to figure this out. I didn't have any other token ring cards that I could try.

In the end, I agreed not to boot into Linux unless I went into the conference room (which is one of the only rooms in the building with ethernet ports). How should I have done this differently so that using Linux would have been a more positive experience for my company?"

14 of 354 comments (clear)

  1. Contact the developers by Fluffy+the+Cat · · Score: 5, Informative

    If you have a reproducable problem that causes the entire network to fall over, the only thing you can do is pull the machine. On the other hand, you should really get in touch with the developer of the driver you were using. It's possible that this bug is known and a fixed version of the driver exists, or it's possible that nobody's ever seen it before. Either way, doing what you can to help the developer get this fixed will help prevent other people from having the same problems in the future. You should be able to find out who's responsible for the driver by looking at either /usr/src/linux/MAINTAINERS or the source for the driver itself (it'll probably be under /usr/src/linux/drivers/net/tokenring).

  2. Re:Imagine if this was Windows... by freebase · · Score: 2, Informative

    Actually, back when I was doing Token Ring on a regular basis, there was a Win driver issue with madge TR nic's that would take a ring down.

    Some how, some way, every madge nic on a particular ring would decide at almost the same time that it wanted to be the RPS (ring parameter server) and/or the active controller. Not a very nice thing on a large ring, nor is it easy to troubleshoot.

    We eventually figured out the problem when (for the third time) we shut every machine on the ring down, and brought them up one by one. The machine that started having the problem changed every time, but every machine that started the problem had the same driver loaded. We replaced the cards with Olicom, got the current drivers, and never had that problem again.

    Notice I didn't say never had A problem again. When Token Ring worked, it was fairly good... when it didn't, almost by design it was a pain in the (insert your choice here).

    Anyhow, my 2%.

    --
    Sig??? I don't need no stinkin Sig!
  3. Token Ring by kune · · Score: 5, Informative

    We have used Linux and Token Ring for years in our company network. Biggest problem has been to find a reliable drivers. We settled for Olicom adapters and their driver. The driver works under kernel version 2.2.19. We used it on our central CVS server with more than 50 users. Olicom has been bought by Madge the other non-IBM-producer of Token-Ring-Adapters.

    We switched the whole network to 100 MBit Ethernet, so we will not look into the issue in the future.

    The drivers in the kernel have some problems, particular for PCMCIA.

    Here some useful links:

    Linux Token-Ring page, with updated drivers, but a discouraging news entry from 9/14/2001:
    http://www.linuxtr.net/

    Linux-Software for Olicom-Drivers(recommended):
    http://www.madge.com/connect/downloads/software/ ol icom/

    Linux-Software for Madge-Adapters on:
    http://www.madge.com/Connect/Downloads/Software/

  4. Re:Brought down the ENTIRE network? by Zerotheos · · Score: 2, Informative
    >> I don't see how one lone NIC can bring down an entire network...

    You should pull out your 'network protocol design books' and read up on the fundamental differences between token ring and ethernet. On a token ring network, each node plays an active part in passing the token. If one node is misbehaving, it _can_ seriously affect the rest of the network.

  5. Ran tr0 for over two years by DenialS · · Score: 5, Informative

    I ran Token Ring on my personal desktop and a server at work for over two years without any incidents requiring sysadmin intervention.


    Here's how I did it:

    • check out the development site: http://www.linuxtr.net. This site is quite good about posting patches and information for the tr module.
    • get a static IP! The Linux token ring driver was not at all happy with DHCP
    • double-check your network settings--we had dire threats about setting our MTU right (yep, even in Windows), so I ensured that I knew what I was doing before plugging into the network
    • get recent linux kernels! I used Red Hat, who didn't ship token ring support, so needing to recompile the kernel anyways, I always picked up the latest kernels. For a driver that few people apparently use, there have been a lot of patches that made their way into the kernel. I ran 2.2.19 on the server, and 2.4.8 on my desktop. I don't know what Debian gives you, but I would consider recompiling your kernel.
    • read the docs! Unfortunately the Token Ring HOWTO appears to have forked down two paths: out of date (http://www.linuxtr.net), and further out of date (http://www.linuxdoc.org). I wrote the author of the newer version (Tom Gall), asking him to submit the updates to the LDP, but despite his assurances it never happened. Sigh.
    • with IBM TR cards, you have to run some program that changes the firmware settings to get it to run in Turbo 16/4 mode, or some such arcana

    So, it worked for me, as I said, for a couple of years. But then I moved to a new site with pure Ethernet, and I have to admit that life is much simpler now.
  6. A few hints for Token Ring Troubleshooting by HRB · · Score: 5, Informative

    What I see here is that a lot of People no longer have a clue about Token ring - am I getting realy old (I am 25) ?

    This is generally bad, because TR is realy a cool Technology (except that it was always to expensive and proprietary)

    But superior technology was never the point.
    (see also: Donalds Becker s comment on NE2000 clones)

    The Network card is not the only possible error source. Token Ring is an active Network, where a lot of the logic is within the NIC and the Cabling (e.g. M(S)AU = Multi Station access unit)

    All Stations are assembled in a physical double ring. Even though the Cabling is a star topology.

    If you connect your station to a MAU (= TR hub)
    your plug is connected to the MAU, but you are not yet connected to the ring. If you turn on your computer, the network driver opens a relay in the MAU (signaled via the adabter cable) to switch you into the ring.
    If you turn off the computer you get discinnected.

    All data on the ring passes all the NICs in the ring (exception: Early Token Release). The NC acts as a Bridge (it amplifies the signal to the next ring segment).

    Since the unamplified distance between to NCs is limited this can lead to the "Token Ring Sleeps at Night" Problem, where the token Ring refused to work at night (simply because too many employes turn off there PC after work)

    This can simply be overcome by replacing passive MAUs with active TR Switches.

    One should also have in mind, that the cable to the network card is a part of the ring after activation of the card. A faulty cable can disturbe the ring (even though it should be automaticaly removed from the ring)

    I would try your laptop directly on a TR switch.
    Thís way you can eliminate driver/TR component interaction (a driver which agressivly tries to connect to a ring with a faulty cable)

    I personaly implemented many Linux Servers with linux and never had problems with disturbing ring operation. I used IBM and Olicom Adabters and they always worked well.

  7. Re:Debug before, gateway after. by Simba · · Score: 3, Informative

    Token Ring runs over many sorts of cable, among them Coax and good 'ol UTP. The real fun is the ancient wall-drop plugs in use by IBM, and others.

    I used to work for IBM, and was only too exposed to the hell that is TR. The most common way rings go boom is when unix-type users add machines to a ring at the wrong speed. E.g. Bringing a card up at 4Mbit on a ring running at 16 will usually drop it.

    My solution was to build a gateway running a very hacked up debian. IBM's 300PL worked great, as it came out of the box with both ethernet and token ring cards. I ran 2.2.19 on it and used iproute2 to do various NAT and address forwarding tricks to an ethernet switch.

    Worked great until the hard drive melted. Damned fujitsu's.

    Lessons learned:

    1. Test first. A nice way of doing this is to get one of IBM (or someone else's) TR Hubs, and use it for your testing. That way, if you blow something up, you'll only blow up your testing network. The downside is that these things are expensive as all hell, and hard to come by. Even at IBM they required much bribery to nab for... 'unofficial uses'.

    2. Use IBM hardware if at all possible. Someone else said something to the effect of "IBM - Hardware build like a tank". Very true.

    3. Ask someone first. Chances are, at least in larger tech-oriented companies, someone else will have tried alternative operating systems before, and have advice (or horror stories) to share.

    --
    Hippies smell.
  8. A little TR background. by r2ravens · · Score: 5, Informative

    I work with broken... sorry, Token Ring every day. I work for a state agency with near 5,000 nodes (server, workstation, printer, etc) which until just last year were all on TR. The switch has only just started to Ethernet and going office by office as budget allows.

    I came from an all Ethernet environment prior to the this job and have had some experience with ARCNet as well. (Hows that for you old /.'ers out there?)

    Token Ring is a logical ring topology, ususally implemented in a physical star or bus topology. Some of our rings have upward of 200 nodes with thousands of feet of cabling connecting them. We have MAU's (Multiple Access Units - a hub) connected to each other with copper and fiber. Most of the cabling that runs to the workstations is type I - 4 conductor, big gauge stuff that comes to large data connectors at the wall. If you haven't seen these, you'd love them, about 1 1/4" square and 2 1/2" long. Then a lobe cable goes to a db-9 connector on the NIC card.

    TR works by passing a token (electrically) to each node in sequence. When a node has data to be transmitted, it hangs the data on the token and sends it on it's way. All subsequent cards check to see if the data is for them and then pass it all on if it's not. The intended recipient strips the data and sends the token on it's way. In a 4Mb ring, there is one token and on a 16 Mb ring there are two, 180 deg. to each other (timing-wise) on the ring. I don't know how the 100 Mb version does it, but almost nobody uses that.

    This has an advantage in that there are no such things as collisions like on Ethernet. This allows for the massive number of nodes per ring and high efficiency in data transfer - perhaps 80 - 90% For comparison, Ethernet starts having problems due to collisions at 40% or so - depending on the number of nodes.

    It also has the disadvantage that a single break at any point in the ring breaks the whole ring. (Think Christmas lights in series rather than parallel.) Another disadvantage is exactly the problem the poster reports - timing errors. I don't know if the problem was just timing errors, but the other problem - beaconing - would have brought the whole ring down right away and he said that it was was just noise with the potential to bring the ring down.

    Indeed, timing is critical. Beacon errors are worse as the NIC put out spurious signal that doesn't allow any node to hear the token as they attempt to pass it around.

    Early in my employment, I attempted to put a linux box on the ring, but couldn't get the TR drivers to work with a Madge or old IBM card. About a year in, they got all tight-assed and concerned about security and prohibited all alternate OS's. We're an all M$ house, how's that for irony. Security, what security? At least we're behind a pretty good firewall.

    As far as the problem with this particular installation, I agree with other posters who have said that the author of the driver needs to be contacted to report the bug and maybe get a fix. It would be good to set up a separate ring with just the two nodes (and the fluke) to try to ID the problem. But he may also be facing administrative/political issues as well. Those are hard to overcome, especially in a large organization, and even more in a government agency - as I have found.

    I'm not karma whoring, I just thought that since this technology (TR) is so ancient and in use by so few places, readers unfamiliar with it might like a little info.

    BTW, the aforementioned ARCNet is also a token passing design that runs on a bus or a star and runs at 2 Mb. It can run on UTP or 93 ohm coax (RG-62) It's relatively robust, if slow. A boss of mine went to a Novell Admin class where the instructor hooked a server and workstation together on ARCNet with BNC connectors crimped to a piece of barbed wire. It passed data acceptably.

    Hope this all helps a bit.

    --
    War is Peace. Freedom is Slavery. Ignorance is Strength. - George Orwell or George Bush?
  9. Voodoo debugging by RobertGraham · · Score: 5, Informative
    Token Ring came out in the late 1980s as a more "reliable" technology because it replaced the old "bus" topology of coax Ethernet with a "star" topology that isolated errors to a single port. Adapters even had the ability to detect that they were causing errors and would automatically pull themselves off the network.

    The promised reliability never materialized. In the early days, the TR connector was the same as that for DB9 serial ports and EGA (pre-VGA) video. L-users would frequently connect the cables incorrectly, taking down the entire LAN. In the later days, 10BaseT Ethernet replaced coax, and became slightly more reliable than Token Ring. These days, we used switched Ethernet, which is infinitely more reliable than Token Ring.

    Keeping Token Ring networks running has become like voodoo management. Stories like yours are common. Nobody knows exactly WHY things are going wrong, so they are quick to point the finger at oddball stuff. There is so little support for Token Ring that nobody can figure out how to solve even basic problems. The only solution is to remove the offending products from the network.

    Here is some background for what might be going wrong. First of all, your card has its own microprocessor. As a kid in the early 1980s I owned a TI-99/4a home computer/game-console: it is roughly the same CPU in your card. It runs its own embedded OS. This means that under normal conditions, your card will run fine, regardless of the driver: all the intelligence is on the adapter, not in the driver.

    I point this out because you never specified exactly the types of errors you are receiving. In theory, all such errors are related to the hardware, and there is nothing the driver can do to cause them. Specifically, I don't know how it can be possible for something to "cause ring errors that eventually bring down the entire net". There are really no progressive failures like this in Token Ring.

    If you mentioned the precise ring error and/or the method in which the ring goes down, it might be helpful. Here are some possible ring erors.

    A burst-error is caused when an adapter inserts itself into or removes itself from the ring. This might be caused because, for some reason, Linux might be re-initializing the card. For example, you may have DHCP set to renew the lease every minute which may cause this to happen. I have no knowledge of how Linux deals with Token Ring, but if the problem is "Burst Errors", then it is because of some higher-layer interaction like this.

    A "receiver congestion" error is caused when the Linux driver doesn't remove packets from the card's buffers fast enough. In theory, they are suppose to indicate that packets are coming in too fast for the machine to handle. In practice, you see this happen when machines "hang" and fail to empty their queues. You might be running some sort of libpcap packet-sniffer on the system or have the adapter running in promiscuous mode (do an ifconfig to check) that is having some sort of pathelogical condition.

    Maybe you are getting "FC errors" which indicate that somebody has the same MAC address as you. This won't happen if you use the standard MAC address built into the card, but it could happen if the Linux driver has a bug setting a locally administered address. Maybe it's setting it to all zeroes, causing a conflict with some other card that has a similar bug.

    None of these errors really cause problems. Burst errors will nuke a frame as it passes by (maybe one out of a thousand) -- the hardware auto-retransmits, so it doesn't cause performance problems. Receiver congestion errors only cause problems for YOU and nobody else on the ring. A duplicate address will only cause problems with the other machine that shares your MAC address.

    My guess is that your admins are just getting testy over the fact that your Linux box re-inserts itself more often than Windows boxen, causing a higher number of relatively harmless burst-errors. When they diagnose problems with the ring, they notice that your machine causes the highest number of errors, and therefore blamr any ring failure on you.

    If your machine is truly causing a problem, the only thing I can think of is that your port on the hub gets "stuck" (this happens a lot). The process of re-inserting has a small chance of getting stuck, so if your Linux box re-inserts 100 times more often than Windows, you'd see this.

    BTW, Token Ring is a good lesson in Zen. A burst-error is defined as 5 half-bit times without a transition. What this really means is that a station has entered or left the ring. I point this out because if you try to debug this problem yourself, you'll have to hunt down Token Ring references. Go quickly to the definition of burst-errors: if it has the "technical" definition, discard the reference and move on. If it has the "practical" definition, then you'll be in luck.

  10. Re:Tolken Ring Networks.... by Negadecimal · · Score: 2, Informative

    What's more, TR has collision avoidance built into the protocol, where Ethernet networks have to be architected in star topologies to avoid collision, because Ethernet responds horribly.

    However, TR is horribly inefficient when one machine produces a disproportionate amount of traffic (which is the case for pretty much all corporate networks). Unless each machine in your ring produces a steady stream of packets, the old ALOHA collision model still wins.

  11. Re:Token Ring sucks, Linux TR REALLY sucks by Anonymous Coward · · Score: 3, Informative

    Wow. everyone's bashing token ring so much you'd think it was a Microsoft product.
    I work for a large bank that is _SUPRISE_ largely an IBM shop. the vast majority of our network is token ring and thanks to relatively clueful network design is very stable. Token ring is not ancient technology, it is a mature technology that has a lot of advantages over ethernet for wide area networking; especially in a Mainframe environment. Source Route Bridging anyone? try that with ethernet.
    as far as MAUs are concerned, gimme a break. now that _is_ ancient. hit up ebay and grab a nice synoptics hub and a token ring switch. follow IBM's standards on lobe length, number of stations per ring and cabling type and verify the buffer sizes on your adapters. most of the non-ibm adapter drivers i've seen set them far too low and you end up with gobs and gobs of reciever congestion errors. if you have enough stations broadcasting these errors, bam. you have a beaconing ring.
    my experience so far has been best with the IBM PCI token Ring adapter 2 and the IBM auto 16/4 PC cards (the older ones with the hologram-y label, not version 2). The lanstreamers are kinda junky but work well in windows and Novell.
    to sum up, use your head, follow standards and understand how token ring actually works. grab an IBM redbook on it while you're at it...

    Token Ring lover in WI,

    sixty4k

  12. Sounds like you are Beaconing. This is a NIC or dr by Anonymous Coward · · Score: 1, Informative

    Hello,
    I used to work as a network engineer at T.Rowe Price and associates. Before our ethernet conversion I was responsible for over 400 PC's on a token ring network.

    To understand how this happens, you need to understand how the token ring topology works. Token ring avoids data collisions by sending data based on a token passing scheme. Only one computer, the one with the token, is allowed to send data at a time. This prevents collisions and actually speeds a large network up.

    Ethernet, on a large flat shared media network, would go extremely slow because of collisions. At the time, token ring was the best solution for a very large network. Ethernet switching has fixed this problem with ethernet, but that is another subject.

    When your network card beacons, it does not let go of the token. Therefore no other computers can transmit data.

    My advise is to first check every inch of wire between you and the MAU (where you are plugged into the network in the closet). If the wiring is good(you can plug a different pc in and it, or a tester, works... the only way to be sure.) try another network card in a lab(so you don't shut the network down), if that does not fix it, then try a different kind of card (or a known good card taken out of a working PC, only way to be sure). If your setup worked at one time, it is definitely a NIC (network card) hardware failure or port problem. Most likely it is the card. Try re installing the drivers (make sure they are really deleted before re-installing) and get the latest linux drivers for the card. If you compiled your kernel, do a make clean and a make dep. Find the correct module and copy it over the existing one.

    They were probably pissed because it takes forever to find a beaconing problem. I mean hours. There is no easy way to do it (or at least there did not used to be.). They blamed the OS because they don't know linux: ). Sounds like you need a linux admin (and a switched ethernet network) in your camp.

    This problem is totally OS independant and the primary hazard of using the token ring topology.

    Don't let them shut you down. Simply point them at http://www.cisco.com/univercd/cc/td/doc/cisintwk/i tg_v1/tr1906.htm

    And search for "beacon" in the page. Under "Fault Management Mechanisms" There is a piece of equipment you can put on the network to prevent this but it does not always work. This explains beaconing.

  13. Ring Speed? by z84976 · · Score: 2, Informative

    It sounds very much like you're trying to insert into the ring at 4mbps on a 16mbps ring or vice versa. That will freak out other things on the ring pretty badly. I know of successful Olicom usage, but I always used an IBM PCMCIA Token Ring 16/4 card. Check your ring speeds.

  14. Rationale for TR is outdated. by hey! · · Score: 3, Informative

    I was involved as a consultant some years ago (ca 1996) with the decision of a college to migrate from TR to ethernet. At the time the situation was very cloudy; TR was clearly on the way out, but there was some question as to whether we should go FDDI or CDDI or perhaps ATM for the network backbone. I recommended the use of Ethernet throughout the system. Ethernet,in its most primitive an naive implementations, has a number of problems when used as a basis for a large network. The answer, of course, is not to use it primitively or naively.

    Ethernet evolved from networking schemes used for packet radio. The original idea was you had a single medium (a long cable) that was shared by a number of hosts. As in radio, they were supposed to listen before talking (Carrier Sense Multiple Access/Collision Detection or CSMA/CD) so they didn't garble each others' messages (collisions).

    CSMA/CD networks have two problems: (1) throughput begins to collapse somewhere between 40% and 50% of the nominal speed due to collisions and retransmissions and (2) packets delivery cannot be guaranteed within a fixed time (although at low loads latencies tend to be very low).

    However, Ethernet switching technology has taken care of the througput problem by reducing the number of machines sharing a medium for purpose of collision detection, to the point where a single workstation on a full duplex switched port can never have a collision. A combination of switches with huge backplane capacity, spanning tree, trunking, VLAN and powerful routers give the administrator great flexibility in delivering network capacity to every port on his network, along with excellent scalability.

    The only thing that remains is guaranteed delivery times for packets; although stations needn't worry about collisions, there is still queuing time within the switches to consider. This might affect people attempting to stream broadcast quality video over their network to several workstations, who might choose to go with 100Mbit token ring. In theory QoS is supposed to address this, but I haven't seen it used much. Most streaming media applications are Internet centric, and buffer their data to prevent problems due to the much more random nature of the Internet. It is possible to contrive scenarios where you need QoS or isochrounous packet delivery (e.g. high quality video conferencing over a LAN) but these haven't proved to be very important. If they were, then ATM would probably be a better choice than TR.

    Of course TR still has to be supported for places that have too much human inertia to switch, but I don't think there is any technology that is superior to Ethernet in its cost effectiveness for the widest range of corporate applications.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.