Slashdot Mirror


Cisco Routers to Blame for Japan Net Outtage

An anonymous reader passed us a link to a Network World article filling in the details behind the massive internet outage Japanese web users experienced earlier this week. According to the site faulty Cisco routers were to blame for the lapse, which left millions of customers without service from late evening Tuesday until early in the morning on Wednesday. "NTT East and NTT West, both group companies of Japanese telecom giant Nippon Telegraph and Telephone (NTT), are in the process of finalizing their decisions on a core router upgrade, according to the report. The routing table rewrite overflowed the routing tables and caused the routers' forwarding process to fail, the CIBC report states."

6 of 78 comments (clear)

  1. Underspec routers by ReidMaynard · · Score: 5, Interesting
    Phrases like

    The routing table rewrite overflowed the routing tables
    and

    router capacity was partly responsible for the failure
    leads me to think this was a problem which was probably reported numerous times to middle management and perpetually postponed.
    --
    -- www.globaltics.net

    Political discussion for a new world

  2. Re:CEF and the routers. by Anonymous Coward · · Score: 1, Interesting

    If that were the only breakage in CEF...
    I have a Cisco with a complex config with tunnels and there really is no way it will work reliably with CEF enabled.

  3. Re:Should have used Junipers by Anonymous Coward · · Score: 5, Interesting

    We're a Cisco shop but are seriously looking into Juniper due to some negative service impacting experiences. Juniper, especially M series, look like it was designed very intelligently from the ground up with superior hardware architecture with separation of routing engine/packet forwarding/control plane, much more powerful CLI/config error checking/timed roll-back, wire rate granular filtering, one train of code to follow and so on. Unlike Cisco 6500/7600 Sup720-3B/3BXL with hardware limitation of 256K and 512K IPv4 routes respectively, even Juniper's older M20 platform has been tested with upwards of 1 million routes. As for stability, Juniper is found in the core of most service providers, government, academia and research (Internet2 high speed network http://www.abilene.iu.edu./ I see Juniper as the Unix of routers and Cisco the Windows of routers. If you desire stability, security, performance and flexibility go Juniper. Cisco still has a place such as in enterprises that still run legacy IPX.

  4. Re:TCAM exhaustion by Anonymous Coward · · Score: 1, Interesting

    TCAM (tertiary memory) exhaustion sounds plausible. We were looking into 6500/7600 to upgrade our 7200 platforms and were quoted Sup720-3B supervisor/routing engine. Being doubtful of Cisco these days I did some double checking on my own since they seem to have put their focus on being a marketing gorilla instead of a technological leader. It turns out Sup720-3B has limited TCAM memory that only supports 256K IPv4 routes and even fewer IPv6 routes. The current BGP routing table is just shy of that mark. One important thing to keep in mind is that other functions and features within the router will also use up TCAM so do your research so that it doesn't bite you hard in the butt where your only fix is a hardware upgrade. I don't know if they were just uninformed or if they were trying to pawn off near obsolete hardware on us and forcing us to upgrading in the near future. We're now looking into Juniper and as I found out even their old M20 platform was tested to upwards to 1 million routes. Juniper seems to be a much superior platform all-around.

  5. Having worked at Cisco, I strongly disagree by Anonymous Coward · · Score: 3, Interesting

    You're doing an Apples and Oranges comparision. Cisco's IOS is far more dedicated to a specific set of tasks than the other notable OS's. So yes, one would expect far less bugs to be visable. That doesn't mean they aren't there; just that they haven't been discovered.

    Having worked at many of the companies which supply OS's, Cisco is, IMHO, the worst. They go for lots of cheap talent. The common theme is to hire lots of low paid talent rather than focusing on getting the best and the brightest. And it shows. Things which shouldn't happen, do. And the general level of code quality is below average.

    The general development infrastructure sucks badly as well. So much so, that they've actually developed bandaids to make it semi-palitable.

    This isn't to say that they don't have some good talent there. They do. But they are a minority, and are hindered by the general red-tape which keeps those folks from having a greater impact.

    Sun, on the otherhand, had the best development environment, talent and infrastructure that I've ever seen, back in the 90's. I've heard that things have fallen off a bit since then, but I really can't say.

    Anyway, the bottom line here is that I wouldn't at all be surprised if Cisco screwed up on the basics. The cheap talent is biting them daily in ways the top management can't see, and it all adds up eventually. Things like this are to be expected, and I also expect it to get worse over time, not better.

  6. Re:TCAM exhaustion by anticypher · · Score: 3, Interesting

    you've done a major design mistake

    Not one of MY designs, but you are right about the mistake part. I know of a carrier with CRS-1s struggling with a poor design coupled with an out of control sales force that will not ever say "NO!" to a customer doing bad things to their MPLS service. That's the origin of the idea of a maximum of four instances of 512K routes in 4 separate TCAMs per chassis (or per line card, or per virtual machine, or something). Not really my job any more, so I learn this over beers next to the data centre and extend my sympathies to those stuck in the Cisco world.

    hopefully IPv6 might stifle that a bit

    Well, the IPv6 table is ~850 routes right now, growing by 10 to 20 new routes per month. Just like the early days of the internet as BGP rolled out. Now I can toss out the obligatory "You kids get off my LAN".

    Problems are already starting to be seen by the RIRs, where speculative companies have started grabbing IPv4 allocations with no intention of using them, betting on a market for buying and selling prefixes and forcing the RIRs out of business. Exactly what happened to the DNS market when it became apparent that second level domains could be rented for yearly fees for a large profit.

    If companies start buying and selling prefixes in an unregulated free market frenzy, aggregation will become a fond memory and expect every router to need several Gigabytes to hold the 2 million+ routes on the old IPv4 internet. At RIPE meetings, there is a hope that this is a worst case scenario, but it seems to be a business plan for some less altruistic people at ICANN.

    the AC

    --
    Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on