Slashdot Mirror


Blackout Shows Net's Fragility

It doesn't come easy wrote to mention a ZDNet article discussing a recent outage between Level 3 Communications and Cogent Communication. A business feud inadvertently highlighted the fragility of the Internet's skeleton. From the article: "In theory, this kind of blackout is precisely the kind of problem the Internet was designed to withstand. The complicated, interlocking nature of networks means that data traffic is supposed to be able to find an alternate route to its destination, even if a critical link is broken. In practice, obscure contract disputes between the big network companies can make all these redundancies moot. At issue is a type of network connection called 'peering.' Most of the biggest network companies, such as AT&T, Sprint and MCI, as well as companies including Cogent and Level 3, strike "peering agreements" in which they agree to establish direct connections between their networks. "

29 of 287 comments (clear)

  1. The small should pay for the big? by hkmwbz · · Score: 5, Interesting
    As I understand it, these were about the same size and had an agreement, or didn't bother to bill each other. Then suddenly one of them figured out that "hey, we are bigger, so they should pay us!"... And the smaller one cut off the connection because they didn't want to pay since they considered themselves to be as big as their rival.

    What I don't get is why one of them would suddenly want the other to pay up. What's changed now, and why does the smaller company have to pay the big one's bills?

    Am I missing something here?

    --
    Clever signature text goes here.
    1. Re:The small should pay for the big? by hkmwbz · · Score: 3, Funny

      Ah, thanks! That certainly made it clearer ;)

      --
      Clever signature text goes here.
    2. Re:The small should pay for the big? by Daniel+Boisvert · · Score: 5, Informative

      NANOG has been on fire with posts about this issue over the past few days. The following two from Leo Bicknell do a good job of explaining why this sort of thing would happen, why nobody in particular is The Bad Guy[tm], and why this issue has no relevance to the issue of internet resilience in the case of natural or manmade disaster:

      http://www.merit.edu/mail.archives/nanog/msg12302. html
      http://www.merit.edu/mail.archives/nanog/msg12350. html

    3. Re:The small should pay for the big? by Cally · · Score: 4, Informative

      Check the NANOG archive over the last few days for far, far more than you ever wanted to know about "The Art of Peering: The Peering Playbook"... or read the book yourself.

      --
      "None are more hopelessly enslaved than those who falsely believe they are free." -- Goethe
    4. Re:The small should pay for the big? by NanoGator · · Score: 3, Funny

      "NANOG has been on fire with posts about this issue over the past few days."

      WHat?! No! I haven't said a word about it!

      --
      "Derp de derp."
  2. No worries by WormholeFiend · · Score: 5, Funny

    The pr0n industry was designed to find alternative routes of delivery in case of Internet outages.

    1. Re:No worries by JPriest · · Score: 3, Funny

      And slashdot runs redundant stories on the same thing in case the first one is lost on the way.

      --
      Saying Java is nice because it works on all OS's is like saying that anal sex is nice because it works on all genders.
  3. Background info by NicolaiBSD · · Score: 3, Funny

    Hey, I've found some interesting background info on this novel story here.

  4. Efficiency can be the enemy of robustness by dpilot · · Score: 5, Interesting

    This statement popped up in some of my security readings. It's most "efficient" to have one path between two places, and it's most "efficient" to set up peering agreements to route packets. But these efficient measures can introduce single points of failure.

    On a similar note, that's why there are 13 root DNS servers, and why most of us aren't supposed to use them. The DNS example though, is one where efficiency and robustness agree. It's more efficient, at least in terms of net bandwidth, to use a DNS server closer than the root servers.

    --
    The living have better things to do than to continue hating the dead.
  5. Call the helpdesk...wait, THEY don't even know! by digitaldc · · Score: 4, Interesting

    http://www.gamergod.com/article_display.cfm?articl e_id=329
    Good article on this situation here

    This situation has adversely affected various users of both companies' services. The inability of Level 3 to handle this situation in a fair and equitable manner to the consumers has alienated many customers and will continue to do so until the current situation is remedied. At what point is it good customer service to discontinue services due to no fault of said consumer base? Market history shows us that the single worse thing a company can do is to arbitrarily allow influences beyond the control of consumers to negatively impact services, determined by consumers to be status quo, without any warning or notification. If left unresolved and unaddressed, the current situation could set dangerous precedents for internet users across the country by allowing service providers to instantly discontinue provided services at the moment they feel that the services they provide are not being adequately compensated for from outside companies.

    On a side note, I was listening to Howard Stern (oh no!) this morning and he said that his Time Warner internet connection at home didn't work. Howard then called a tech guy to come and fix the problem, only for him to call a help desk to figure out what happened. The help desk didn't even know what was wrong. It sounds like Level 3 just pulled the plug and didn't notify ANYONE. Or maybe it was Cogent, the point is nobody outside of that dispute KNEW what was going on.
    This sounds like a good way to alienate your customers and/or ruin your business model. But that is just my opinion.

    --
    He who knows best knows how little he knows. - Thomas Jefferson
    1. Re:Call the helpdesk...wait, THEY don't even know! by peragrin · · Score: 4, Funny

      You want scary, I can show you scary. I emailed Roadrunner saying I would drop them if they couldn't due something.

      I got a semi canned response but it did have some techincal details. It also stated that if you wish to discuss the techincal nature of the problem go to www.ask.slashdot.org With a full link to the other article.

      Yep Roadrunner sent me to slashdot to get more information.

      --
      i thought once I was found, but it was only a dream.
  6. Peering by Neurotoxic666 · · Score: 5, Funny

    At issue is a type of network connection called 'peering.'

    In other news, the RIAA announced they've stopped an extremely large P2P network.

    --
    You are more than the sum of what you consume. Desire is not an occupation.
  7. Internet can route against natural calamities by anandsr · · Score: 4, Informative

    Internet cannot route when your providers do not want you to communicate.
    Nothing can protect you in this case.
    If on the other hand there was a natural calamity and every one was trying to get you access
    then you would get it. Like it happened during Katarina.
    This is not a natural calamity.

    The best option is to ditch your provider if they are not a monopoly and if they are lobby to your government to create multiple providers.

  8. It's dupealicious! by mrpotato · · Score: 3, Funny

    But for easy karma, just go get a +5 comment in the other thread, and repost it here without attribution.

    Not that I would ever do such a thing...

    --

    cheers
  9. It always will be fragile by squoozer · · Score: 4, Insightful

    The Internet will IMVHO always be quite fragile. While the design lends itself to robustness the reality is that there is only money for a few very big connections and therefore a disaster that affects one of these connections is going to cause wide spread outages.

    Take, for instance, the connections running between Europe and America. I bet most of them run in almost exactly the same place on the sea bed because it's the cheapest / shortest path to take. A fairly localized geological disaster (at least in geological terms) could cut all the cables at once; or at least enough to make to difference.

    If we wanted the network to be robust we would need to run cables up over the north pole and round the equator and probably stick in some satelite links as well. There just isn't money for that. People are willing to accept the risk that it might fail in extreme situations.

    FWIW I think the problem is worse on the global scale than the country scale. I imagine most developed countries probably have enough redundancy in their own country. It's the interconnects between countries that are probably the biggest problem.

    --
    I used to have a better sig but it broke.
    1. Re:It always will be fragile by brunes69 · · Score: 3, Informative
      Take, for instance, the connections running between Europe and America. I bet most of them run in almost exactly the same place on the sea bed because it's the cheapest / shortest path to take. A fairly localized geological disaster (at least in geological terms) could cut all the cables at once; or at least enough to make to difference.

      This isn't a good example, because in this case most traffic would automatically be re-routed to go through Asia and the trans-Pacific cables. And if those went down it would go over South America Oceana.

      It would get much slower, sure, but would not cause an outage.

      There is no *technical* reason this peering relationship breaking down should be causing an outage either. If the both also peered with some third party that could service them both, like MCI or something, then the traffic would still get through. The companies are just being bull-headed.

  10. Re:Didn't notice at all. by lostlogic · · Score: 3, Informative

    You would only notice if you are on one of these two networks. I am personally on UUNet at home and MCI at work, and my server is on SpringLink (via Schlund, who I am not familiar with). As a result, all of my traffic is completely unaffected. Customers on a single-homed connection through Cogent, or through L3 cannot see other single homed customers on the other network. The rest of us don't know the difference. The dumb thing that this article points out is that both Cogent and L3 are refusing to route packets destined for each other through the rest of the internet (probably for fear of fucking up other peering agreements by dumping too much traffic on their other peers). I believe there was a comment in the previous thread about this issue saying that traffic in one direction could be routed, but that even return packets were being null-routed at some point, preventing any type of connection from being established.

    --
    --Brandon
  11. Not a redundancy issue... by boldtbanan · · Score: 4, Interesting

    As I understood the problem, redundancy wasn't an issue. Level 3 was actively filtering out request to Cogent, however they came in. The redundancy was working, but Level 3 was playing NetNanny and blacklisting all Cogent IPs.

  12. Re:Ask Slashdot by AlexTheBeast · · Score: 3, Interesting

    The problem with web services is that they need for the internet to be completely secure and completely reliable. The internet of today is neither.

    Physicians trying to use the internet to take care of critically ill patients are already experiencing this. Radiologists sitting home reading films are seeing this as well.

    Is 100% on neccessary? Hell, VoIP is making money like crazy over this unstable network of ours.

    My suggestion is to test with people that will understand the limitations of your service. Then get a little VC money to spread your servers out.

  13. The fragility of the net by elfguygmail.com · · Score: 5, Informative

    It's very true, and anyone can see how a few big companies basically make the net work in north america. Simply do traceroutes to various big web sites, and you'll notice the packets always go across the same networks. The biggest one seems to be alter.net (MCI), with others including Level3, above.net, AT&T and UUnet. Basically you remove any of these and the North American part of the Internet would be in chaos. The problem is because most ISPs do the same thing. They pick a primary provider, and get a backup one. The problem is they all pick the same few primary companies, and their backup links are much smaller pipes.

  14. ah peering by bigpat · · Score: 3, Interesting

    The only time peering should involve an ongoing exchange of money for bandwidth should be when a network is primarily serving as an intermediary between other networks, such as long haul or backbone networks.

    But if most of the traffic from other networks is going to customers that are connected and already paying for your network's service then it makes no sense and is simply wrong for a network to start charging other network providers. It breaks the end to end communication model and is providing your customers with less than the service they are paying for. People pay for internet connectivity so they can transfer data between other users on the internet, not just the ones on your company's network.

    If money exchanging hands is at all appropriate in this case it might be for the actual installation of routing equipment which establishes the physical connection between networks.

  15. This was predictable by PhilipPeake · · Score: 4, Interesting
    The Internet was designed to be resiliant to malfunctions and automatically take appropriate action to ensure connectivity.

    Unfortunately, that is not the Internet that we have today. In the original Internet, every router knew about every network connected to the Internet. Most networks had connectivity to many other networks. Discovery protocols allowed alternative routes to be discovered if one failed.

    Today, we don't have a (mostly) fully connected net, we have ISPs who don't know anything about networks which they don't "own", only that certain IP prefixes need to be passed to ISP x, y or z.

    This makes the infrastructure much more fragile than it was originally intended to be. We ended up with this for a few reasons. First, the wimpy routers in use at the time had limited memory available to hold the network maps. The answer chosen was to no longer attempt to hold a full world view, but to divide the world into regions, certain IP prefixes would "belong" to those regions, and all any router would need to know about was networks in its region, plus how to route traffic to other regions, who would take care of routing within the region. This led to "backbone" connections - high capacity links needed because all traffic between regions now didn't "diffuse" through the network, but was channeled into specific connections. It also set the scene to allow the net to be commercialised, those regional centers were obvious "choke points" that an enterprising company could own and pretty much dictate the pricing to lower level enterprises who would do the dirty work of dealing with end-users.

    Slowly but sureley the Internet evolved into a system dependent upon a few companies with high-speed links between them - prime candidates BTW, as locations for government control to be imposed. The self-healing nature of the original Internet was lost because all traffic HAS to pass via the top level companies infrastructure and over their interconnect backbone connections.

    The "self healing" Internet is long gone.

  16. Re:The small should pay for the big? (mod this up) by gskouby · · Score: 5, Informative

    About 4 months ago I got a call from a sales critter at Cogent saying "We will knock 50% off of the price you are paying for your L3 connectivity if you drop them and come be our customer." I was kind of surprised at the boldness of this proposition because they were specifically targeting current L3 customers. I was even more surprised to find out from others that this sales pitch from Cogent was company wide. Of course this pissed off L3 and that was the start of this pissing contest.

  17. Monitor it yourself by dereference · · Score: 4, Interesting
    I found this site while trying to research the problem. I wish I had known of it earlier; it provides a very nice (near) real-time snapshot of all the Tier 1 peering:

    http://www.internetpulse.net/

    I'm not affiliated with them in any way, and I'm sure there are other similar sites, but I thought it was worth mentioning.

  18. Re:A New Approach by BeBoxer · · Score: 4, Informative

    I know there's been talk of wireless mesh networks where everybody is both an end point and a router. This would work in populated areas but I'm not sure how well it would work for "long haul" connections which is what the issue is here.

    If by "work in populated areas" you mean "slow the network to a crawl" then yes, it would work. Mesh networking is cool stuff, but you aren't going to build a backbone out of it. Wireless is really fast compared to your DSL line or cable modem. But it isn't even in the same ballpark as what you can do on fiber. Backbone links are running at 10Gbps or even 40Gbps. Full duplex, so that is 20Gbps or 80Gbps of "marketing bandwidth". Compared to what, 22Mbps or 54Mbps half-duplex for your wireless? You aren't going to build a comparable backbone out of wireless links running at roughly 1/1000th of the speed. Physics pretty much guarantees that fiber links will always be faster than wireless.

  19. Re:It's Nobody's Fault by Alioth · · Score: 3, Informative

    Cogent COULD route around the damage - if they wanted to, but they don't.

    If the peering point had been taken out by a bomb, the re-routing would have been performed in fairly short order. However, this is not the case here.

    Level3 think that Cogent is taking the piss and is not a real peer. Level3 want Cogent to buy transit to reach Level3, either directly from them (or from someone else) because at the moment the peering is very lopsided, and costing Level3 a bucketload of money and giving Cogent a boatload of free bandwidth.

    Cogent on the other hand doesn't want to pay for transit to Level3.

    Right now, Cogent could route all their traffic for Level3 over transit they pay for. They don't want to do that because it won't force Level3 back into the peering agreement. So what they do is leave the link severed and do not re-route so that Level3 customers cannot get to sites hosted by Cogent. This means Level3 customers will grumble at Level3. Additionally, they offer a year's free transit to single homed Level3 customers just to raise the brinkmanship with Level3 a notch higher. Basically it's war between L3 and Cogent.

    If Cogent re-routes their traffic, they are defeated and L3 will never re-peer. What Cogent are hoping is that enough angry customers on the L3 end will whine at L3 so L3 will be forced to re-peer.

    For the rest of us in the peanut gallery (i.e. those of us who aren't single homed customers of Cogent or Level3) we can just watch the fun and games and throw peanut shells at the squabbling combatants because we don't see any black hole at all.

  20. Re:So the internet is breaking down by the_real_bto · · Score: 3, Insightful

    "Privatization strikes again. You put the infrastructure into the hands of a few powerful people and this is what you will get."

    Are you arguing that government control moves power from the few to the many? That is backwards to my way of thinking. The quickest way I can think of to concentrate power is to put the government in charge of it.

  21. Fixed now? by dereference · · Score: 3, Informative

    The availability grid for the past 4 hours shows ~40% and the grid for the past 1 hour shows 100%. As noted by "Cally" below, I honestly have no idea how exactly this grid has been generated (hence my original disclaimer) but this certainly seems to indicate, from a practical standpoint, that the L3/Cogent issue has been very recently resolved. Indeed, from my (single-homed) L3 server I can now traceroute directly to a (single-homed) Cogent host.

  22. That's not how peering works - here's the diff by billstewart · · Score: 3, Informative
    There are two basic ways that networks connect to each other - peering and transit. In a transit arrangement, one network (typically the big one) agrees to deliver any traffic the other network hands it, in return for a bunch of money, and it typically either advertises a default route (telling a small customer that they can send it all their packets) or a bunch of detailed routes and a default (telling a dual-homed medium-large customer how good its connections are to lots of places, but that customer might use another carrier for destinations that are closer with that carrier.) If you're an end customer, or a small ISP buying service from a big ISP, that's usually what you buy.

    Peering arrangements are different. Two networks that have a lot of traffic for each other will set up direct connections, split the direct costs of the connections, and not charge for accepting packets from the other carrier. But they'll only advertise the routes for their *own* customers. If two small ISPs peer with each other, typically they're each also buying transit service from big ISPs, but it's cheaper for them to dedicate a connection or put bits on a public peering point like MAE-West than to both pay their upstream ISPs.

    The biggest ISPs in the US are called "Tier 1" ISPs, and they all peer with each other rather than buying transit, though they might buy transit for international connections, if they can't get the other side to buy transit from them. It seems flaky, but it makes business sense, or at least it did for a while. In some sense, being big enough that all the other Tier 1s will peer with you is what defines Tier 1, and aside from technical issues, it's a marketing thing - "See, we're one of the big players!" Peering and Transit don't mix very well - you either connect to a given carrier by peering, or by transit, or else you spend a long time hammering out custom arrangements about exactly which routes you'll accept and tweaking routing tables.

    Cogent is a Wannabe-Tier-1. Their main business model is to put fiber into big multi-tenant office buildings and sell everybody 100-meg Ethernet for about the price other carriers charge for one or two T1s. If I were a customer, I wouldn't expect there to be enough upstream to really get that much bandwidth all the time, but I'd expect to get more than a T1 all the time, and a lot more than a T1 almost all the time. Level 3 has apparently decided they're not getting enough value out of the relationship (i.e. not sending Cogent enough packets to make it worth their while) to keep peering, and wants Cogent to either pay them for service or get transit from somebody else. They gave them about 50 days to make other arrangements, but Cogent decided to play chicken with them.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks