Slashdot Mirror


Linux Token Ring Support Bringing Down Corporate Nets?

mjh asks: "I've been running Debian GNU/Linux on my company supplied laptop for 3 months now. I got permission from my manager to run it on the network, but I did not go through the somewhat rigorous process of getting the software certified. I have legitimate business reasons for using it on the corporate network (which is why my manager approved it). I even managed to get Lotus Notes to run under wine so I never had to boot into Winders at all (unless someone sent me a PPT doc). I was pretty happy...until I brought the entire network down." Anyone else running Linux on a Token Ring network who would care to talk about their own experiences?

"My company runs Token Ring at the office (puke!) I got drivers from the card manufacturer (Madge), and I'd been happily churning along. Then last week, we started seeing a bunch of errors on the network. These errors would bring everyone on the ring down. After a week of this kinda stuff, they eventually isolated it to me.

Reboot the laptop into Windows and the network card works just fine and they don't see any ring errors. Reboot into linux, and suddenly they start seeing ring errors. I don't really grok token ring, so I'm not entirely certain that I know exactly what the problem is. But, whenever I brought the token ring on line under linux, they saw ring errors, which eventually (as I understand it) would bring down the entire ring. Switch cards (same model) and it continues to happen. It looked to me (and the network analysts) that the Linux driver was causing the problem.

I tried switching to an IBM token ring card, but there's a bug and I hadn't patched for this. The people with the fluke would not wait around while I tried to figure this out. I didn't have any other token ring cards that I could try.

In the end, I agreed not to boot into Linux unless I went into the conference room (which is one of the only rooms in the building with ethernet ports). How should I have done this differently so that using Linux would have been a more positive experience for my company?"

9 of 354 comments (clear)

  1. Contact the developers by Fluffy+the+Cat · · Score: 5, Informative

    If you have a reproducable problem that causes the entire network to fall over, the only thing you can do is pull the machine. On the other hand, you should really get in touch with the developer of the driver you were using. It's possible that this bug is known and a fixed version of the driver exists, or it's possible that nobody's ever seen it before. Either way, doing what you can to help the developer get this fixed will help prevent other people from having the same problems in the future. You should be able to find out who's responsible for the driver by looking at either /usr/src/linux/MAINTAINERS or the source for the driver itself (it'll probably be under /usr/src/linux/drivers/net/tokenring).

  2. Token Ring sucks, Linux TR REALLY sucks by Syberghost · · Score: 5, Insightful

    Token Ring is horribly sensitive to timing issues, especially when using Cat5 in a physical bus instead of a physical coax ring.

    I have seen a TR network where a single machine could develop a problem, and this would cause a group of 8 machines to all lose the net. Any one of those machines could bring them all down, and the only thing that would get them back up was shutting them all off (completely power-down, even rebooting didn't do it) and then bringing them back up one by one. Something as simple as shutting down Windows NT to the "click to reboot" prompt was enough to cause the problem to develop; eventually one of them would lose it's mind, and they'd all go.

    Throw into that mix, the fact that Linux Token Ring drivers are bastard stepchildren that get 1/1,000th of the use of the Ethernet drivers (if that much) and you end up with real problems.

    Bottom line; come in a weekend and try that other NIC out, maybe it's drivers are more mature. But other than that, don't dick with the company network, Token Ring is too damn sensitive.

    You might try putting a few NT boxes into the "click to reboot" state, and see if they screw up the company network too. Works best with 3COM TR NICs, which is ironic since they also seem to recover the best to having their cable pulled and replaced while live.

    If they see the problem is Token Ring specific, and just exacerbated by a bad Linux driver, perhaps they'll switch to Ethernet. If they trade their TR NICs in to somebody like CablExpress, they might break even or make a small profit on the switchover, and they'll certainly recover the costs in a short period of buying Ethernet NICs instead of new TR ones; they're horribly expensive, and the infrastructure gear (CAUs, LAMs, MAUs, switches, routers, etc.) is even worse.

    An even better suggestion might be to find a job in a shop that prefers the more-manageable problems of Ethernet to the problems of Token Ring.

  3. Token Ring by kune · · Score: 5, Informative

    We have used Linux and Token Ring for years in our company network. Biggest problem has been to find a reliable drivers. We settled for Olicom adapters and their driver. The driver works under kernel version 2.2.19. We used it on our central CVS server with more than 50 users. Olicom has been bought by Madge the other non-IBM-producer of Token-Ring-Adapters.

    We switched the whole network to 100 MBit Ethernet, so we will not look into the issue in the future.

    The drivers in the kernel have some problems, particular for PCMCIA.

    Here some useful links:

    Linux Token-Ring page, with updated drivers, but a discouraging news entry from 9/14/2001:
    http://www.linuxtr.net/

    Linux-Software for Olicom-Drivers(recommended):
    http://www.madge.com/connect/downloads/software/ ol icom/

    Linux-Software for Madge-Adapters on:
    http://www.madge.com/Connect/Downloads/Software/

  4. Ran tr0 for over two years by DenialS · · Score: 5, Informative

    I ran Token Ring on my personal desktop and a server at work for over two years without any incidents requiring sysadmin intervention.


    Here's how I did it:

    • check out the development site: http://www.linuxtr.net. This site is quite good about posting patches and information for the tr module.
    • get a static IP! The Linux token ring driver was not at all happy with DHCP
    • double-check your network settings--we had dire threats about setting our MTU right (yep, even in Windows), so I ensured that I knew what I was doing before plugging into the network
    • get recent linux kernels! I used Red Hat, who didn't ship token ring support, so needing to recompile the kernel anyways, I always picked up the latest kernels. For a driver that few people apparently use, there have been a lot of patches that made their way into the kernel. I ran 2.2.19 on the server, and 2.4.8 on my desktop. I don't know what Debian gives you, but I would consider recompiling your kernel.
    • read the docs! Unfortunately the Token Ring HOWTO appears to have forked down two paths: out of date (http://www.linuxtr.net), and further out of date (http://www.linuxdoc.org). I wrote the author of the newer version (Tom Gall), asking him to submit the updates to the LDP, but despite his assurances it never happened. Sigh.
    • with IBM TR cards, you have to run some program that changes the firmware settings to get it to run in Turbo 16/4 mode, or some such arcana

    So, it worked for me, as I said, for a couple of years. But then I moved to a new site with pure Ethernet, and I have to admit that life is much simpler now.
  5. A few hints for Token Ring Troubleshooting by HRB · · Score: 5, Informative

    What I see here is that a lot of People no longer have a clue about Token ring - am I getting realy old (I am 25) ?

    This is generally bad, because TR is realy a cool Technology (except that it was always to expensive and proprietary)

    But superior technology was never the point.
    (see also: Donalds Becker s comment on NE2000 clones)

    The Network card is not the only possible error source. Token Ring is an active Network, where a lot of the logic is within the NIC and the Cabling (e.g. M(S)AU = Multi Station access unit)

    All Stations are assembled in a physical double ring. Even though the Cabling is a star topology.

    If you connect your station to a MAU (= TR hub)
    your plug is connected to the MAU, but you are not yet connected to the ring. If you turn on your computer, the network driver opens a relay in the MAU (signaled via the adabter cable) to switch you into the ring.
    If you turn off the computer you get discinnected.

    All data on the ring passes all the NICs in the ring (exception: Early Token Release). The NC acts as a Bridge (it amplifies the signal to the next ring segment).

    Since the unamplified distance between to NCs is limited this can lead to the "Token Ring Sleeps at Night" Problem, where the token Ring refused to work at night (simply because too many employes turn off there PC after work)

    This can simply be overcome by replacing passive MAUs with active TR Switches.

    One should also have in mind, that the cable to the network card is a part of the ring after activation of the card. A faulty cable can disturbe the ring (even though it should be automaticaly removed from the ring)

    I would try your laptop directly on a TR switch.
    Thís way you can eliminate driver/TR component interaction (a driver which agressivly tries to connect to a ring with a faulty cable)

    I personaly implemented many Linux Servers with linux and never had problems with disturbing ring operation. I used IBM and Olicom Adabters and they always worked well.

  6. A little TR background. by r2ravens · · Score: 5, Informative

    I work with broken... sorry, Token Ring every day. I work for a state agency with near 5,000 nodes (server, workstation, printer, etc) which until just last year were all on TR. The switch has only just started to Ethernet and going office by office as budget allows.

    I came from an all Ethernet environment prior to the this job and have had some experience with ARCNet as well. (Hows that for you old /.'ers out there?)

    Token Ring is a logical ring topology, ususally implemented in a physical star or bus topology. Some of our rings have upward of 200 nodes with thousands of feet of cabling connecting them. We have MAU's (Multiple Access Units - a hub) connected to each other with copper and fiber. Most of the cabling that runs to the workstations is type I - 4 conductor, big gauge stuff that comes to large data connectors at the wall. If you haven't seen these, you'd love them, about 1 1/4" square and 2 1/2" long. Then a lobe cable goes to a db-9 connector on the NIC card.

    TR works by passing a token (electrically) to each node in sequence. When a node has data to be transmitted, it hangs the data on the token and sends it on it's way. All subsequent cards check to see if the data is for them and then pass it all on if it's not. The intended recipient strips the data and sends the token on it's way. In a 4Mb ring, there is one token and on a 16 Mb ring there are two, 180 deg. to each other (timing-wise) on the ring. I don't know how the 100 Mb version does it, but almost nobody uses that.

    This has an advantage in that there are no such things as collisions like on Ethernet. This allows for the massive number of nodes per ring and high efficiency in data transfer - perhaps 80 - 90% For comparison, Ethernet starts having problems due to collisions at 40% or so - depending on the number of nodes.

    It also has the disadvantage that a single break at any point in the ring breaks the whole ring. (Think Christmas lights in series rather than parallel.) Another disadvantage is exactly the problem the poster reports - timing errors. I don't know if the problem was just timing errors, but the other problem - beaconing - would have brought the whole ring down right away and he said that it was was just noise with the potential to bring the ring down.

    Indeed, timing is critical. Beacon errors are worse as the NIC put out spurious signal that doesn't allow any node to hear the token as they attempt to pass it around.

    Early in my employment, I attempted to put a linux box on the ring, but couldn't get the TR drivers to work with a Madge or old IBM card. About a year in, they got all tight-assed and concerned about security and prohibited all alternate OS's. We're an all M$ house, how's that for irony. Security, what security? At least we're behind a pretty good firewall.

    As far as the problem with this particular installation, I agree with other posters who have said that the author of the driver needs to be contacted to report the bug and maybe get a fix. It would be good to set up a separate ring with just the two nodes (and the fluke) to try to ID the problem. But he may also be facing administrative/political issues as well. Those are hard to overcome, especially in a large organization, and even more in a government agency - as I have found.

    I'm not karma whoring, I just thought that since this technology (TR) is so ancient and in use by so few places, readers unfamiliar with it might like a little info.

    BTW, the aforementioned ARCNet is also a token passing design that runs on a bus or a star and runs at 2 Mb. It can run on UTP or 93 ohm coax (RG-62) It's relatively robust, if slow. A boss of mine went to a Novell Admin class where the instructor hooked a server and workstation together on ARCNet with BNC connectors crimped to a piece of barbed wire. It passed data acceptably.

    Hope this all helps a bit.

    --
    War is Peace. Freedom is Slavery. Ignorance is Strength. - George Orwell or George Bush?
  7. Voodoo debugging by RobertGraham · · Score: 5, Informative
    Token Ring came out in the late 1980s as a more "reliable" technology because it replaced the old "bus" topology of coax Ethernet with a "star" topology that isolated errors to a single port. Adapters even had the ability to detect that they were causing errors and would automatically pull themselves off the network.

    The promised reliability never materialized. In the early days, the TR connector was the same as that for DB9 serial ports and EGA (pre-VGA) video. L-users would frequently connect the cables incorrectly, taking down the entire LAN. In the later days, 10BaseT Ethernet replaced coax, and became slightly more reliable than Token Ring. These days, we used switched Ethernet, which is infinitely more reliable than Token Ring.

    Keeping Token Ring networks running has become like voodoo management. Stories like yours are common. Nobody knows exactly WHY things are going wrong, so they are quick to point the finger at oddball stuff. There is so little support for Token Ring that nobody can figure out how to solve even basic problems. The only solution is to remove the offending products from the network.

    Here is some background for what might be going wrong. First of all, your card has its own microprocessor. As a kid in the early 1980s I owned a TI-99/4a home computer/game-console: it is roughly the same CPU in your card. It runs its own embedded OS. This means that under normal conditions, your card will run fine, regardless of the driver: all the intelligence is on the adapter, not in the driver.

    I point this out because you never specified exactly the types of errors you are receiving. In theory, all such errors are related to the hardware, and there is nothing the driver can do to cause them. Specifically, I don't know how it can be possible for something to "cause ring errors that eventually bring down the entire net". There are really no progressive failures like this in Token Ring.

    If you mentioned the precise ring error and/or the method in which the ring goes down, it might be helpful. Here are some possible ring erors.

    A burst-error is caused when an adapter inserts itself into or removes itself from the ring. This might be caused because, for some reason, Linux might be re-initializing the card. For example, you may have DHCP set to renew the lease every minute which may cause this to happen. I have no knowledge of how Linux deals with Token Ring, but if the problem is "Burst Errors", then it is because of some higher-layer interaction like this.

    A "receiver congestion" error is caused when the Linux driver doesn't remove packets from the card's buffers fast enough. In theory, they are suppose to indicate that packets are coming in too fast for the machine to handle. In practice, you see this happen when machines "hang" and fail to empty their queues. You might be running some sort of libpcap packet-sniffer on the system or have the adapter running in promiscuous mode (do an ifconfig to check) that is having some sort of pathelogical condition.

    Maybe you are getting "FC errors" which indicate that somebody has the same MAC address as you. This won't happen if you use the standard MAC address built into the card, but it could happen if the Linux driver has a bug setting a locally administered address. Maybe it's setting it to all zeroes, causing a conflict with some other card that has a similar bug.

    None of these errors really cause problems. Burst errors will nuke a frame as it passes by (maybe one out of a thousand) -- the hardware auto-retransmits, so it doesn't cause performance problems. Receiver congestion errors only cause problems for YOU and nobody else on the ring. A duplicate address will only cause problems with the other machine that shares your MAC address.

    My guess is that your admins are just getting testy over the fact that your Linux box re-inserts itself more often than Windows boxen, causing a higher number of relatively harmless burst-errors. When they diagnose problems with the ring, they notice that your machine causes the highest number of errors, and therefore blamr any ring failure on you.

    If your machine is truly causing a problem, the only thing I can think of is that your port on the hub gets "stuck" (this happens a lot). The process of re-inserting has a small chance of getting stuck, so if your Linux box re-inserts 100 times more often than Windows, you'd see this.

    BTW, Token Ring is a good lesson in Zen. A burst-error is defined as 5 half-bit times without a transition. What this really means is that a station has entered or left the ring. I point this out because if you try to debug this problem yourself, you'll have to hunt down Token Ring references. Go quickly to the definition of burst-errors: if it has the "technical" definition, discard the reference and move on. If it has the "practical" definition, then you'll be in luck.

    1. Re:Voodoo debugging by mjh · · Score: 5, Interesting
      A "receiver congestion" error is caused when the Linux driver doesn't remove packets from the card's buffers fast enough. In theory, they are suppose to indicate that packets are coming in too fast for the machine to handle. In practice, you see this happen when machines "hang" and fail to empty their queues. You might be running some sort of libpcap packet-sniffer on the system or have the adapter running in promiscuous mode (do an ifconfig to check) that is having some sort of pathelogical condition.

      Wow! This has *got* to be what the problem was. This problem started showing up right around the time of the big Code Red hubub. So I installed snort just to watch and see what was going on. Snort, of course, uses libpcap and puts the card into promiscuous mode. Right afterwards, is when we started seeing problems on the network.

      My guess is that your admins are just getting testy over the fact that your Linux box re-inserts itself more often than Windows boxen, causing a higher number of relatively harmless burst-errors. When they diagnose problems with the ring, they notice that your machine causes the highest number of errors, and therefore blamr any ring failure on you.

      Holy schnikies! You must have been in the room! That is *exactly* what happened. They discovered these errors and basically said that the errors were the *only* thing that they could see that was wrong with the network. From this they concluded that the problem must have been caused by my running Linux.

      About the only thing that does not fit, is that since I've stopped running Linux on the network at work, the problem has completely gone away. Not a single recurrance in several weeks time (I actually submitted this article to /. many weeks ago. Why it took so long to get accepted, I dunno.) They did, as part of their process of troubleshooting replace all of the TR equipment in the closet. But even after they did that, we were still having problems. So far the only thing that seems to have fixed this problem was me staying out of Linux.

      Thanks for you're very informative post!

      --
      Key to financial independence: Spend less than you earn. Save and invest the difference. Do it for a long time.
  8. Tolkien Rings Suck? by tshak · · Score: 5, Funny

    One ring to rule them all, One ring to find them.
    One ring to bring them all and in the darkness bind them.

    --

    There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips