Slashdot Mirror


Microsoft Worms and Global Routing Instability

James Cowie writes: "Fresh analysis here indicates that worm propagation periods correlate very strongly with global BGP routing instability, as measured by sustained exponential increases in the number of prefix announcements and withdrawals seen in BGP message traces."

15 of 215 comments (clear)

  1. Good report, but what's the point? by disc-chord · · Score: 3, Interesting

    Very fascinating read, with lots of graphs that really strike the message home. But what is the point? Anyone with an internet connection will have no doubt experienced the instability.

    I've personally had a particularly poor router lossing my packets for the last week, and have been trace routing it from all over the country to triangulate the problem. Doing a tracert from Maine, California and Texas seems to provide a reasonable picture of what's going on with a specific router by triangulating in on the offending router... so I'm a bit unclear on why this study was called for, unless it's just to point fingers at microsoft...

    1. Re:Good report, but what's the point? by MemeRot · · Score: 2, Interesting

      Well I think the point to the researchers is just to find out what was causing what they saw. This is what researchers do :) This was not about one router, it was about global routing.

      To me, the point of research like this is to point fingers at Microsoft. Microsoft can claim not to have a problem with security all they want. But if it is shown that security vulnerabilities in their system are causing instability in global internet routing, that could provide a way to show liability. Because dammit no software company should be doing anything that could degrade global internet routing.

      Currently it's hard to argue in court that a reasonable programmer might not leave some of those vulnerabilities. But if those vulnerabilities were responsible for crippling the net? I think any court would hold that any reasonable programmer would make sure their program can't cripple the internet. Meaning the billions in dollars it costs everyone attached to the net when these viruses spread, not just MS users, could be recovered from MS and give them a real impetus to build security into their systems, which is currently missing. Many of you hold spammers to be responsible when they use your network resources without your permission. Microsoft is doing the same thing by leaving these holes. Why haven't the limited patches they have been pushed by critical update? Why has Microsoft come out in the press to say that millions are unnecessarily downloading these patches in an apparent attempt to dissuade people from downloading the patches? In the same week that critical update kept insisting I download patches for Win2k that are only relevant to servers when I only use my box as a workstation?

  2. Re:Msft is definitely guilty by sien · · Score: 5, Interesting

    Ha. Someone mod this up as funny please !
    But seriously, if a company makes a product that costs large numbers of other companies money they get fined. If a company's negligence causes a public resource to be degraded they get sued. Has anyone heard anything about some of the major service providers or any of the major uses launching a class action against MSFT ? It seems that they would have at least a start for a case here.

  3. Here's a great idea! (word association) by Uttles · · Score: 5, Interesting

    OK, everyone knows that word association is a powerful marketing tool. Example: Microsoft Office. When you say "office suite of programs" to the average person, they automatically think Microsoft Office. Well this article sure gives us a great one:

    In this online note, we summarize our preliminary analysis of the surprisingly strong impact of the Internet propagation of Microsoft worms (such as Code Red and Nimda) on the stability of the global routing system.

    Look on AP, Yahoo, MSNBC, CNN, and you always see "the Nimda virus" or "the Code Red virus," but I prefer the way the article said it. So from now on in your conversations with others, refer to each virus in this category as a "Microsoft Virus" and hopefully by word of mouth word association we can sway public opinion away from this crappy MS software.

    --

    ~ now you know
  4. What will be done? by Anonymous Coward · · Score: 2, Interesting

    I have followed this problem extensivly in my local area... When code red came out, mrtg and numerous sites around the city showed large spikes in bandwidth usage. I have discussed this with several large corporations (Nationwide, Bank-One.. and telecom's Time Warner and AT&T) and i have heard very little about how to approache what are Application layer exploits at layer 2 or 3...
    I understand that to serve people, telecom and internal IT departments can't very well restrict ports and such based on response to each and every exploit that causes problems...
    so what can telecoms and large corporations do to cut down on meaningless uses of bandwidth?

  5. Oh, wow. by jd · · Score: 3, Interesting
    You mean, if the Internet gets saturated by bizare routing requests, it puts its feet in the air and dies?


    I'd never have guessed.


    Seriously, though, this does strongly suggest that merely using NAT and crude approximations of heirarchical routing are not enough. The networks aren't capable of tolerating the kinds of loads even a humble skript can put on them.


    In short, we need a better routing system, better IP stacks, a more stream-lined structure, and better load-balancing. In short, we need IPv6, if we're to survive anything but these relatively feeble virus attacks.


    (And they are feeble! In comparison to what could be done. The world is very, very lucky.)


    Oh, and we also need a stronger backbone. T3's don't cut it, in a world where T4's are "standard items" and high-speed optics of up to 4 Tbs are potentially usable tomorrow.


    When you start upping the bandwidth across the board by 2-3 orders of magnitude, the impact of a few flea-bag packets will not be noticable. For that matter, the impact of a major world event (such as the Starr Report, or the WTT disaster) would not bring the information infrastructure to its knees.


    *Orator Mode On* Now, more than ever in the history of humanity, our society, our economy and our security depend on good lines of communication. No expense is too great, because the price of failure is greater still. This truth has tragically shown itself these past few weeks, and no amount of money can undo a single death, reverse a single bereavement, or heal a single injury.


    Forty billion dollars has been allocated to the cause of chasing shadows, yet we know that shadows can never be caught. A mere four billion, on shining the light of information around the world, would have gone a long way to prevent the shadows from being there to start with.


    Terror, fear - these are weapons that rely on ignorance and superstition. Without ignorance, terror has nothing to hold onto. Yet ours is a society that lives in ignorance. We have computers on our desks that are many hundreds of times more powerful than the ones used to put man on the moon. Yet those computers can be crippled by a simple forwarder virus, and the users of those computers do not wish to know. The dark is much more comforting than the light, even though it is the dark, not the light, that these viruses can grow in. Perhaps, because in the light, you do not need comforting. There is no fear to be comforted over.


    Someday, maybe, people will become less frightened of living in understanding. When that day comes, the terrors of the night will no longer threaten.


    *Orator mode off*

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  6. Doesn't that... by Hammer · · Score: 2, Interesting

    ...put him in a funny spot. He has publicly wowed to destroy those who harbour terrorists and also that MS is good for America.
    So, does he go after the hand that fed him? Or will he leave MS alone and thereby in effect harbour someone who's harbouring terrorism. We all know what he promised to do to those ;-D

  7. Re:Caching and port-scanning by mdouglas · · Score: 5, Interesting

    first off, i'd just like to say, i love it when a hardcore networking article gets posted to slashdot, the number of responses is so much lower due to the userbase having no experience with the subject; and mindless pontificating and chest beating (as in anti microsoft/pro linux articles) doesn't cut it with this subject matter.

    as an aside, i don't mean the above preamble as a negative statement about the specific poster i'm responding to.

    "Consequently, since routes time out after a while
    ...This would logically increase the load on route discovery protocols such as BGP."

    well...not exactly. when 2 routers are set up in BGP partnership they exchange an initial set of rotes which are statically set by the AS administrator, there's no dynamic discovery process. those routes are only changed under a few specific conditions : explicit changes announced by the BGP partner, or the loss of connectivity to the partner (too many missed hello packets). BGP route exchange is not based on some kind of dynamic route timeout/refresh algorithm as that would be horrifyingly inefficient.

    a few words on how routing and route caching work (this is assumed to be on an defaultless internet backbone router) :

    a packet enters the router destined for some ip address, a lookup against the routing table is done, the appropriate outbound interface is selected (this set is known as path determination), the packet is then sent to the appropriate outbound interface, re-framed, and sent out to the next hop (this step is known as switching); route caching associates a destination ip address with an next hop interface, thus bypassing the redundant route table lookup. a definate gain in efficiancy, cisco makes a number of advanced caching/switching engines that are used in thier high end core routers.

    to summarize/explain the BGP/worm paper : worms generate excessive traffic; the traffic overwhelms some routers and wan links; thus, BGP hello packets get lost or never sent depending upon traffic or router load; consequently the BGP routes are being announced/withdrawn at a high rate (this is known as route flapping). this is bad, having a route fail is not a problem, as long as it stays failed. rapidly changing states creates extra load on the router. route dampaning policies help, but with a worm creating these conditions everywhere at once the cumulative effect is instability.

    check these sites out to learn networking :
    http://www.cisco.com/univercd/cc/td/doc/cisintwk /i to_doc/index.htm
    http://www.merit.edu/mail.archives/nanog/

    anyone who writes a wise ass follow up to this had better include a CCIE number.

  8. No: Microsoft worms are NOT "web/email viruses"! by Jens · · Score: 3, Interesting
    ... but professionals and those writing formal papers need to steer clear of this sort of propaganda ...

    Whats propaganda here? They are telling the truth. Those viruses only propagate and damage Microsoft systems. They are there because Microsoft systems are so vulnerable. If it weren't for IIS, Windows 2000 etc. those worms wouldn't exist. (And don't "but others would" me - I don't see any reason why Unices, Apache, etc. would be unsafer without Windows.)

    Tell the truth. Don't hide behind words. That's a journalist's job, isn't it? And anyway, now with Microsoft distributing reports that claim Apache is also vulnerable, citing relatively harmless directory listing bugs from 1999, why should we not try to educate the public?

  9. Re:Viruses, terrorism and Microsoft by thrig · · Score: 4, Interesting

    Windows has anti-virus software, for windows.

    Linux has anti-virus software, for windows.

    FreeBSD has anti-virus software, for windows.

    Solaris has anti-virus software, for windows.

    Open, exploitable ports are nothing compared to the design flaws inherent in the Office document format and the Outlook family, that cause wave after wave of new virus to saunter past anti-virus software, laughing.

  10. A Simple Solution by Anonymous Coward · · Score: 5, Interesting

    One of the inherent problems with all routing protocols is that rely on inband announcements and updates, and communciate state purely by reachability. This is clearly a flawed approached on heavily loaded links and routers. This problem has already been addressed worldwide on the telephone network with the introduction of SS7. One of the key aspects of SS7 is that it is transported over an Out of Band network (the actual transport may be on a dedicated timeslot on a SONET link, but the basis is that the link is dedictated to management).

    By implementing a low throughput (say 64K -256K - this requires more analysis) management network, the ISPs could be certain that the state of the BGP peering sessions and the integrity of the UPDATE messages are always intact.

    One of the key aspects/benefits of BGP is that unlike other routing protocols it does not advertise routes in the simple - "here's my routing table" messages that protocols such as RIP and while less so, but similarly, OSPF and ISIS use. BGP relies on TCP sessions between peers. On connection the entire known (or filtered via policies) short test path routing table is exchanged. After this the link stays idle, with the exception of TCP keepalives, until an UPDATE message is sent to communicate that a new route is added or an existing route is removed from that peer's routing table. Also BGP does not assign any significance to the port that receives the information - merely the peer. This all makes BGP inherent scaleable, stable and reliable - unless resources are not available (CPU, memory, buffers or links). TCP is the reliability mechanism here. The presence of the TCP session validates all the routes learned via that session. The absence of the TCP session invalidates all the routes and causes them to be withdrawn for that TCP session.

    Maintenance of the TCP session stability is key to the stability of the routing table. With over 80,000 routes on any BGP full update, the processing needed to cope with multiple TCP sessions failing or starting is immense (and probably better servered by a UNIX platform than by a router to be honest).

    SS7 uses a mechanism whereby UNIX servers process the routing information and create the core routing table - note: table is the key - it is not the path the data or calls follow. Building a similar architecture within the Internet would allow routers to have one or two TCP sessions to BGP servers (a concept already grasped with route reflector servers) and dedicate their CPU to forwarding packets etc. The dedicated servers never need to see a packet to be forwarded - it's just not that important to BGP, so they have no need to be on the same physical cables/links as user packets. This architecture would take some rethinking but not would not be outside the plans of most ISPs, and definitively not outside the skillsets.

    Clearly the next problem then becomes low speed customer connections. Again the Telco industry has addressed this problem with ISDN - with the B channels. For these lower speed connections, there is no need to change the existing model. Losing one customer here or there is nothing (UPDATEs on BGP are typically well over 100 a second at NAPs) and would be catered for simply.

    The NAPs could merely serve as routing table peering points, and not data transfer points - again another area of congestion.

    The Internet is proving to be reliable and a trustworthy international communications medium, the next step is to make it even more robust, and truly scalable. Using OOB management is the obvious next step to this goal.

    GMPLS is being touted as the next step for ISPs in terms of exchanging routing information in an OOB network. This is only one aspect of the work that is being done there.

    1. Re:A Simple Solution by darkonc · · Score: 3, Interesting
      The in-band nature of the Hello packets, loss of which causes the 'flapping' is not an accident or an error. It is a feature. If you lose the hello packets, then chances are that you're losing other packets as well. This means that this branch of the network is overloaded and you should try another path.

      Lost packets cause retries -- which cause even more traffic. If your problem is overload, you are far better to try another path than to lose packets and generate (overall) more packets through retries on the shorter path.. If all inbound paths to a network are overloaded, then the whole network is overloaded, anyways. You might as well just drop the packet, and give the overloaded routers that 30 second flap time to catch up to the backlog.

      If you took those packets out of band, then you'd be needing another method to measure packet loss... This would require more CPU and/or more packets (bandwith) -- thus making the whole problem even worse.

      --
      Sometimes boldness is in fashion. Sometimes only the brave will be bold.
  11. Re:Funny stuff! by PolaRis75 · · Score: 3, Interesting

    The reason for this is more than obvious. There are a lot of small ISPs and companies that do BGP over links as slow as T1s and fractional T1s. This recent M$ worm caused a lot of connectivity issues for a lot of people with links even faster than that. A company with just a few unpatched IIS boxes could easily produce more than 1.54 MB or traffic per second, which would cause massive latency and packet loss across their T1. This, in turn, would cause timeouts of TCP sessions like FTP downloads, web browsing, and yes, BGP sessions.

    This would then cause the session to start flapping, the upstream provider to dampen the session and routes being advertised, and their address space being removed from the global routing table.

    This doesn't mean that there was routing instability due to the worm, it just means that a lot of networks running unpatched IIS boxes became unreachable.

  12. SOMEONE WRITE AN ANTIBIOTIC WORM!!!! by mallsop · · Score: 1, Interesting

    I had a stupid idea...write a worm enters a backdoor set by the code red and nimda worms that fixes all the code red and nimda boxes and then, after a few months, removes itself from the box it's on (to stop looking for infected boxes). Unfortunately I don't think I could write something like that anytime soon. Call it "Early Bird" since the Early bird gets the worm. he he.

    --

    Moving at the speed of government.
  13. Re:Yeah Well, Except... by anomaly · · Score: 5, Interesting

    It's easy to say this, but speaking as one who works for an enterprise, it's not easy to do.

    We've got tens of thousands of PCs running hundreds of applications - some internally developed, some externally developed.

    For MS security patches (or anything else) that we release into "production" we need to engineer the build to make sure it works with our OS build, then test against Tier 1 applications.

    Once that is complete, the development groups need to sign off saying that their application runs with that code.

    Specifically in terms of IE 5.5 SP2, Quicktime is no longer compatible. Sure, there's an update to Quicktime, but my point is this - how many other things stop working? Which of our internal apps are dependent on IE or subcomponents that no longer work with IE5.5 SP2?

    We don't know. Frankly, even if we thought that we knew, we couldn't be sure outside of testing.

    IE has seen 7 security patches in the last 8 months. Particularly in this economy, we can't afford the testing staff to nail each of these as they are released.

    Of course we're at risk. Now is the time to question our continued use of MS products. I'm doing that.

    Regards,
    Anomaly

    --
    But Herr Heisenberg, how does the electron know when I'm looking?