How a Router's Missed Range Check Nearly Crashed the Internet
Barlaam writes "A bug by router vendor A (omitting a range check from a critical field in the configuration interface) tickled a bug from router vendor B (dropping BGP sessions when processing some ASPATH attributes with length very close to 256), causing a ripple effect that caused widespread global routing instability last week. The flaw lay dormant until one of vendor A's systems was deployed in an autonomous system whose ASN, modulo 256, was greater than 250. At that point, the Internet was one typo away from disaster. Other router vendors, who were not affected by the bug, happily propagated the trigger message to every vulnerable system on the planet in about 30 seconds. Few people appreciate how fragile and unsecured the Internet's trust-based critical infrastructure really is — this is just the latest example." Vendor A, in this case, is a Latvian router vendor called MikroTik.
I wonder why the summary went out of it's way to use company A & B, then tagged a small Latvian vendor for their range-check bug, but didn't name the much larger vendor that also has a range-check bug, namely Cisco...
Sleep your way to a whiter smile...date a dentist!
"timothy" is actually kdawson's alter ego from which he posts the same crap
If you mod me down, I will become more powerful than you can imagine....
I don't know about it nearly crashing the Internet. How many people actually noticed a difference that day, for that matter?
A lot of admins, especially after the alert went out over the NANOG list, set their routers to reject long ASPATHs (or I assume, from what I saw on those list, I am not a BGP admin myself.) Many routers simply rejected these ASPATHs as well; correct me if I'm wrong, but weren't old versions of IOS the only ones affected? It was a serious issue, but I'm not sure if it came anywhere near a disaster scenario.
They had a bug; they deserve to be called on that fact, authors should be honest and direct, and always mention them by name.
The writer is probably trying to facilitate discussions, instead of playing the blame game.
Names trigger emotions in us (right brain). Identifiers triggers logic in us (left brain).
The writer is probably relying on us to suggest how to get top-level ISPs to implement filtering. It's a human and business issue ... not a technical issue.
I always thought they did.
Most already do. The problem was not the ASPATH itself, it was the length of it. The routers affected did not handle updates for a prefix which required more than one AS_SEQUENCE segments in order to obtain the full AS path. The existence of the additional AS_SEQUENCE segment is what triggered the bug, causing the receiving router to treat the update as invalid, and the BGP session is dropped.
When I worked for *unnamed nw regional backbone here* we had peering agreements with everyone except uunet that we connected to, and it was pretty known that if we spat out an bad BGP route we could bring down the whole net by hitting enter ('cept uunet, although I'm pretty sure uunet woulda went down from everyone else routing around them to us)
How is this new? That was the 90's. and when we spent 100k+ on a Cisco 7513 with 64megs of ram so it could hold the BGP tables...
We even wrote our own manual ('cause none existed) on how to deal with BGP tables so junior admins working for us wouldn't fuq it up. (and on top of that, we wouldn't let them touch the routers either)
-meetme room in the westin in Seattle-
Should be obvious, hm? Because Vendor B is the one really to blame: as far as I can see, one router from Vendor A misbehaved, but thousands or more from Vendor B. Unfortunately, Vendor B is also the one with deep pockets for legal action, so you cannot possible put the blame on them. Oops, hope Ido not get sued.
Reminds me of a story that Keith Marzullo told our class in a graduate level reliability class. This was back in the days of using UUCP to send email, and the vendor that he worked for had just released a "failsafe" product they were very proud of -- essentially, it was a mail router that could detect if a path went down, and would try an alternate router instead. The company touted it as a bulletproof solution.
So they go to a conference, and set up some routers, unplug some of them, etc., and everything is going fine until they ask an audience member for his UUCP address. UUCP addresses are in the form of host1!host2!host3!username, with the routing for the username explicitly specified... the addresses could thus get quite long. In this case, the guy's email address was over the buffer limit the company's routers used.
Guess what happened?
The mail server tried sending an email to the next router in the chain. The router buffer overflowed and crashed. The reliable server than tried another router... and crashed it. It then went through the entire network, and crashed every single one of the nodes, turning a bug that would have been a single point of failure into a total network collapse.
=)
Yeah, one of my favorite stories from UCSD.
With Cisco you can choose between:
- Known, often workaroundable Bugs in older Versions
or
- new unknown fancy Bugs w/o workarounds that can hit you like a truck in the groin every minute now.
As long as the first choice does not include Show-Stopper bugs like the BGP one, there is usually no reason to use the latest IOS image. /not/ to use the latest, shinyest version with lots of new features and even more new bugs.
Actually, the stability of your network is often a good reason
Consider that.
Untrue. Cisco TAC wil give you the latest firmware for free, provided you tell then n\you need it due to security flaws discovered in your current version. Yoy may need to point to their blletin about the bug, but that should be trivial (http://www.cisco.com/en/US/products/products_security_advisories_listing.html)
Since Cisco almost exclusivly patches current versions due to security bugs, all their IOS are belong to us for free.
...A Slashdot "Editor" notices these posts and mods them into oblivion.
But is that better or worse than having them modded down by sycophantic Slashdot readers?
My Slashdot login - a four-digit userid - is worthless now.
It's been stuck on Karma:-1, Terrible for a couple of years.
What did I do to deserve that terrible fate?
My sin was to post a message critical of dear Michael Sims and his editing methods and practices here on Slashdot.
He said safer, not better.
Game! - Where the stick is mightier than the sword!
But CmdrTaco stood up for him, arguing that he does "a pretty good job".
I see the old "should a boss side with his subordinates or customers" argument.
I only wish that *no one* would read and comment on the lame stories (I should be taking my own advice here!) so that maybe the Slashdot editor cabal would get the hint.
What's the reason for not filtering out kdawson and timothy in Preferences > Index > Authors? (I'm not saying you're a complainer, I'm just wondering if "not wanting to miss out on the news" is the reason.)
Of course, I agree that it's important to present a better Slashdot with higher quality news to the casual visitor.
A bug by device vendor A (twiddling a framis panel instead of sparting the glinbo interface) patted a bug from device vendor B (elevating ALP packets when deferring some GALAS modifiers with size benath 176), yielding a domino effect that caused widespread universal switching instability last week. The flaw lay dormant until one of vendor A's systems was deployed in an autonomous system whose LKM, divisor 965, was less than 1250. At that point, the Internet was one typo away from disaster. Other router vendors, who were not affected by the bug, happily propagated the trigger message to every vulnerable system on the planet in about 30 seconds. Few people appreciate how fragile and unsecured the Internet's trust-based critical infrastructure really is -- this is just the latest example.
Reads just about the same to me. I can't make any sense of either description of the bug
Trouble is, you can't just go and download cisco updates... Even if you own their harware, they make it difficult to download anything... You need a support contract and valid account to download most stuff, and their website is absolutely horrendous to navigate. It's pretty stupid, just about every other vendor makes the updates freely downloadable.
Cisco is where they are because they monetize everything.
The higher the technology, the sharper that two-edged sword.
Cisco update policy? Isn't that called Juniper or Huawei?
Cisco used to be the best option (they weren't that great in product terms, but everyone else was worse, and Cisco had good service and support).
They're getting squeezed from both the top and bottom.