How a Router's Missed Range Check Nearly Crashed the Internet
Barlaam writes "A bug by router vendor A (omitting a range check from a critical field in the configuration interface) tickled a bug from router vendor B (dropping BGP sessions when processing some ASPATH attributes with length very close to 256), causing a ripple effect that caused widespread global routing instability last week. The flaw lay dormant until one of vendor A's systems was deployed in an autonomous system whose ASN, modulo 256, was greater than 250. At that point, the Internet was one typo away from disaster. Other router vendors, who were not affected by the bug, happily propagated the trigger message to every vulnerable system on the planet in about 30 seconds. Few people appreciate how fragile and unsecured the Internet's trust-based critical infrastructure really is — this is just the latest example." Vendor A, in this case, is a Latvian router vendor called MikroTik.
It seems like we live in a world now where media go ridiculously out of their way to soften the blow and protect the parties who screwed up and shipped software that had mistakes in it, by playing PR on their behalf and hiding their name.
They had a bug; they deserve to be called on that fact, authors should be honest and direct, and always mention them by name. ESPECIALLY in this case, so people who bought their product KNOWM they need to update, even if they didn't notice the fact that they were impacted by the bug (not everyone impacted necessarily knows what caused their problems, a lot of people may still be wide open to the bug but not know about it).
Seriously, if you develop an implementation of an exterior routing protocol that untrusted devices participate in BY DESIGN...
How do you justify NOT taking basic steps to validate what happens in your implementation if another party decides to play dirty, and hit you with a ridiculously long or corrupt entry in a field (like AS path) ?
How does your QA team miss the potential consequences of how such a case can impact your re-advertisements of that long path? And miss testing that the result you send is still valid, or that you at least block it properly.
It doesn't mean they're totally inept, i'm sure their QA team does a lot of good work. But something fundamental seems to be missing, if these sort of elementary bugs slip through the cracks.
It may be hard on them PR wise, but the public deserves to know the facts, without the names being changed to protect the guilty.
except in the kdawson style it was a single link to a message board posting about a router "taking out half the internet." Dupe? Correction? I dont care as long as kdawson is kept away from the site for a while.
It just amazes me how differently presented this story is compared with the previous.
Previous story: kdawson. Current story: Timothy. Do you need any more explanation than that?
Maybe if they updated their IOS back in 2003 when Cisco came out with the fix they wouldn't have these problems. You wouldn't give an XP user a pass on not updating for 6 years and having a problem, don't give these upstreams any.
-zifr