Slashdot Mirror


How a Router's Missed Range Check Nearly Crashed the Internet

Barlaam writes "A bug by router vendor A (omitting a range check from a critical field in the configuration interface) tickled a bug from router vendor B (dropping BGP sessions when processing some ASPATH attributes with length very close to 256), causing a ripple effect that caused widespread global routing instability last week. The flaw lay dormant until one of vendor A's systems was deployed in an autonomous system whose ASN, modulo 256, was greater than 250. At that point, the Internet was one typo away from disaster. Other router vendors, who were not affected by the bug, happily propagated the trigger message to every vulnerable system on the planet in about 30 seconds. Few people appreciate how fragile and unsecured the Internet's trust-based critical infrastructure really is — this is just the latest example." Vendor A, in this case, is a Latvian router vendor called MikroTik.

17 of 196 comments (clear)

  1. Vendor B by CSFFlame · · Score: 5, Informative

    Vendor B is Cisco btw. Dunno why they were being vague.

    1. Re:Vendor B by mysidia · · Score: 5, Insightful

      It seems like we live in a world now where media go ridiculously out of their way to soften the blow and protect the parties who screwed up and shipped software that had mistakes in it, by playing PR on their behalf and hiding their name.

      They had a bug; they deserve to be called on that fact, authors should be honest and direct, and always mention them by name. ESPECIALLY in this case, so people who bought their product KNOWM they need to update, even if they didn't notice the fact that they were impacted by the bug (not everyone impacted necessarily knows what caused their problems, a lot of people may still be wide open to the bug but not know about it).

      Seriously, if you develop an implementation of an exterior routing protocol that untrusted devices participate in BY DESIGN...

      How do you justify NOT taking basic steps to validate what happens in your implementation if another party decides to play dirty, and hit you with a ridiculously long or corrupt entry in a field (like AS path) ?

      How does your QA team miss the potential consequences of how such a case can impact your re-advertisements of that long path? And miss testing that the result you send is still valid, or that you at least block it properly.

      It doesn't mean they're totally inept, i'm sure their QA team does a lot of good work. But something fundamental seems to be missing, if these sort of elementary bugs slip through the cracks.

      It may be hard on them PR wise, but the public deserves to know the facts, without the names being changed to protect the guilty.

    2. Re:Vendor B by afidel · · Score: 5, Informative

      The Cisco bug had been fixed for about forever so anyone running an affected version probably had a million other known bugs as well, just most didn't bring their primary function to a screeching halt. Some of the time admins choose to run with the devil they know rather than finding all the new bugs waiting in new code, this time it bit a bunch of them hard and hence bit their customers. They will now upgrade to newer software or implement a workaround for this bug, if they upgrade their customers will probably have some additional downtime while the new bugs are found and worked around. Unfortunately this is how IT works, it's a complex web of systems built, programmed, and administered by fallible humans.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    3. Re:Vendor B by Shag · · Score: 5, Funny

      so people who bought their product KNOWM

      WTF does that mean?

      It means some people don't know how to spell GNOME.

      --
      Village idiot in some extremely smart villages.
  2. No more routers...think of the children by Mrs.+Grundy · · Score: 5, Funny

    I'm sure nobody here would argue with me if I suggested that the internet would be a much safer place without routers.

  3. Re:Gee, known Cisco bug causes problems by Shakrai · · Score: 5, Funny

    From long experience most people agree... if it isn't broken, don't fix it.

    Reminds me of an old "offensive" fortune quote: Working computer hardware is a lot like an erect penis. It stays up as long as you don't fuck with it.

    If you have no clue what offensive fortunes are try 'fortune -o'. They are great when you are stoned, drunk or just bored at work. If you don't have fortune installed then you are clearly on the wrong website ;)

    --
    I want peace on earth and goodwill toward man.
    We are the United States Government! We don't do that sort of thing.
  4. didnt kdawson post this last week by gad_zuki! · · Score: 5, Insightful

    except in the kdawson style it was a single link to a message board posting about a router "taking out half the internet." Dupe? Correction? I dont care as long as kdawson is kept away from the site for a while.

    1. Re:didnt kdawson post this last week by Bryan+Ischo · · Score: 5, Insightful

      That explains alot.

      I complained to CmdrTaco a year ago or so about kdawson's terrible editing and article judgement. The site would be SOOO much better without him. But CmdrTaco stood up for him, arguing that he does "a pretty good job".

      I lost alot of faith in Slashdot that day. I only continue to read out of habit. But I skip more articles now and I get a chuckle when I see lame stories posted by lame editors with sub-100 comments. I only wish that *no one* would read and comment on the lame stories (I should be taking my own advice here!) so that maybe the Slashdot editor cabal would get the hint.

  5. Re:Same story, different spin??? by Anthony_Cargile · · Score: 5, Insightful

    It just amazes me how differently presented this story is compared with the previous.

    Previous story: kdawson. Current story: Timothy. Do you need any more explanation than that?

  6. Fragile Internet by tick-tock-atona · · Score: 5, Funny

    Few people appreciate how fragile and unsecured the Internet's trust-based critical infrastructure really is - this is just the latest example.

    Yeah. Like how everyone is trusted not to google "google".

  7. Re:Gee, known Cisco bug causes problems by Kaboom13 · · Score: 5, Informative

    You have to have a support agreement with Cisco to get the latest IOS. They won't even give you the last version when your support contract ran out. Also, older routers do not always have upgrades available for various reasons, either they do not have enough space or hardware limitations or Cisco End-of-Lifed it and hasn't bothered.

    There's also the "if it isn't broke don't fix it" mentality in the networking world. A new version may fix some bugs but it might add some bugs as well. An upgrade, even if minor, generally means a lot of work testing and reconfiguring before you roll it out. Network engineers are expensive and that time isn't free. Sometimes the devil you know is better then the devil you don't.

    In an ideal world it wouldn't be an issue, but when it comes to networking it's NEVER an ideal world. There's always too much to do and never enough budget/manpower to do it. Every network admin probably has 10 things on his mental wishlist right now, upgrades he would like to make, redundant hardware he would like to purchase, failover contingencies he needs to test, etc. Upgrading IOS on an old router in a rack somewhere (and hoping it doesn't blow up in your face) can be pretty far down the list.

  8. Cisco to Blame, not Mikrotik by DeadboltX · · Score: 5, Informative

    The critical bug is with the Cisco routers; a Mikrotik router merely nearly triggered the bug.
    It would be possible to trigger this bug with any routing software that does not do range checking on the amount of times the ASN is pretended.

    The summary is spreading FUD by making Mikrotik, the only named vendor in the summary, look like the vendor at fault.

  9. I love this article's summary. by Korey+Kaczor · · Score: 5, Funny

    The next time someone needs you to fix a computer problem and asks what went wrong, simply give them this article's summary as the reason why, replacing "router" and "Internet" with the the defective part in question. You're also guarenteed to look a bit sharper, too.

    "A bug by power supply vendor A (omitting a range check from a critical field in the configuration interface) tickled a bug from power supply vendor B (dropping BGP sessions when processing some ASPATH attributes with length very close to 256), causing a ripple effect that caused widespread global routing instability last week. The flaw lay dormant until one of vendor A's systems was deployed in an autonomous system whose ASN, modulo 256, was greater than 250. At that point, the power supply was one typo away from disaster. Other power supply vendors, who were not affected by the bug, happily propagated the trigger message to every vulnerable system on the planet in about 30 seconds. Few people appreciate how fragile and unsecured the power supply's trust-based critical infrastructure really is â" this is just the latest example."

  10. GPL violators by Anonymous Coward · · Score: 5, Informative

    Mikrotik are known GPL violators, that use a modified Linux (they re-branded that as "RouterOS") and a terribly bad implementation of the BGP protocol..

    In some custom community network, where MikroTik has been deployed internally, that stolen-Linux is being hacked to use the Quagga instead of MikroTik's BGP.

    In short: that "RouterOS" has been higly unsuitable for the Internet. I can't believe somebody was so stupid to trust it.

  11. Reminds me of a story by ShakaUVM · · Score: 5, Interesting

    Reminds me of a story that Keith Marzullo told our class in a graduate level reliability class. This was back in the days of using UUCP to send email, and the vendor that he worked for had just released a "failsafe" product they were very proud of -- essentially, it was a mail router that could detect if a path went down, and would try an alternate router instead. The company touted it as a bulletproof solution.

    So they go to a conference, and set up some routers, unplug some of them, etc., and everything is going fine until they ask an audience member for his UUCP address. UUCP addresses are in the form of host1!host2!host3!username, with the routing for the username explicitly specified... the addresses could thus get quite long. In this case, the guy's email address was over the buffer limit the company's routers used.

    Guess what happened?

    The mail server tried sending an email to the next router in the chain. The router buffer overflowed and crashed. The reliable server than tried another router... and crashed it. It then went through the entire network, and crashed every single one of the nodes, turning a bug that would have been a single point of failure into a total network collapse.

    =)

    Yeah, one of my favorite stories from UCSD.

  12. Should have updated IOS in 2003 when fixed. by Anonymous Coward · · Score: 5, Insightful

    Maybe if they updated their IOS back in 2003 when Cisco came out with the fix they wouldn't have these problems. You wouldn't give an XP user a pass on not updating for 6 years and having a problem, don't give these upstreams any.

    -zifr

  13. Re:Gee, known Cisco bug causes problems by geirnord · · Score: 5, Interesting

    Untrue. Cisco TAC wil give you the latest firmware for free, provided you tell then n\you need it due to security flaws discovered in your current version. Yoy may need to point to their blletin about the bug, but that should be trivial (http://www.cisco.com/en/US/products/products_security_advisories_listing.html)

    Since Cisco almost exclusivly patches current versions due to security bugs, all their IOS are belong to us for free.