Slashdot Mirror


Electricity Outage Puts Routing to a Tough Test

infofarmer writes "Today at about 11:30 MSD (GMT+4) a major electricity outage in Moscow, Russia brought new meanings to words like "uninterruptible", "redundant" and "uptime" for network administrators, who haven't experienced such harsh and unexpected power failures since the USSR got its Internet connection. Half of the city is totally out of electricity - including subway and the most important traffic exchange point, half of the top russian sites went down, including www.mail.ru, www.rambler.ru, www.lenta.ru, some of them haven't been brought up yet. IP packets going from ADSL users in Moscow to some local sites got rerouted to somewhere in London and then back to Scandinavia, where they met their "No route to host" deadend. Other routers found themselves in a loopback, which made many packets get dropped with TTL expired. The point is that most of popular servers have got two or three mainline Internet connections, but lack of BGP/RIP2/whatever configuration resulted in packets losing their way to hosts."

18 of 233 comments (clear)

  1. Probably unrelated by SIGALRM · · Score: 5, Funny
    half of the top russian sites went down, including www.mail.ru, www.rambler.ru, www.lenta.ru, some of them haven't been brought up yet.
    And in other news, spam volumes suddenly and unexpectedly plummeted.
    --
    Sigs cause cancer.
  2. In soviet russia... by Winckle · · Score: 5, Funny

    Oh nevermind...

    1. Re:In soviet russia... by ravenspear · · Score: 5, Funny

      Power fails you.

  3. no more all off mp3 .com by acomj · · Score: 4, Funny

    no more all off mp3 .com

    Obviously the MPAA/RIAA are to blame..

  4. Re:LOL by MyLongNickName · · Score: 4, Funny

    Iron? Aluminum? Which one do you prefer?

    --
    See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
  5. Yes, but... by rewt66 · · Score: 5, Funny

    Did kremvax stay up?

  6. In all seriousness... by tgd · · Score: 4, Informative

    For the last three or four weeks my gmail account has been POUNDED by 100-200 cyrillic spam messages every day. The filters catch them, but I have to clean out my spam folder pretty often.

    I've gotten none in the last couple hours.

  7. Odd.... by MarkGriz · · Score: 4, Funny

    I can't seem to log into my bank account to update my out-of-date account information.
    Wonder if these are somehow related.

    --
    Beauty is in the eye of the beerholder.
  8. in times like these, the 'net is a godsend by ChipMonk · · Score: 4, Informative

    I think you need to check your priorities. How do you think geeks all over the world just found out about the power failure?

  9. Internet... works! by Cyberax · · Score: 4, Informative

    I live in Russia, about 1000 km from Moscow. We were hit by network outage, nothing worked (even Slashdot :( ) for about 30 minutes. Number of routes announced by both of our peers was about 700 instead of normal 150000.

    But then routes began to appear again! I was amazed, Internet routed itself around damaged segments, packets were routed through Japan (!), Finland and Holland instead of Moscow. The most funny part was when I traced the route to a computer in the next building - it went through Saint-Petersburg :)

    I was able to access Slashdot, and most of Russian sites (http://newsru.com/ , http://ntv.ru/ , http://nbc.ru/ not directly affected by outage.

    1. Re:Internet... works! by lheal · · Score: 4, Interesting
      both of our peers

      That's why.

      TCP/IP and the Internet anticipate cooperation among sites. You and your neighbors should all happily route each other's packets.

      The trouble is that in many places it doesn't work that way. There are rural "leaf" nodes, of course, but there are many more sites which have only one connection because of what I consider to be petty business decisions.

      Two competing ISPs in the same area should share a direct link to each other. If they have different upstream providers, then when one provider goes down the other picks up the slack. In any case local traffic should stay local.

      The fear, of course, is that one ISP will choose a bad provider and take advantage of the other. That has an easy fix: if the other one starts to abuse you, pull the plug.

      Single points of failure are not supposed to exist.

      --
      Raise your children as if you were teaching them to raise your grandchildren, because you are.
  10. Re:No spam for 4 hours! by joeytmann · · Score: 5, Funny

    Whats your email addres?

    --
    Insert funny smart-ass comment here.
  11. Re:The submitter has to have his priorities checke by keraneuology · · Score: 4, Insightful
    Considering that sewage, power and medical processes could all rely on the internet...

    There's more traffic on the 'net than pr0n, wazrez, mpEs and /.

    Some of it actually matters.

    --
    If the g'vt kept the data on you that google does you'd better believe you'd be calling it "doing evil"
  12. Re:The submitter has to have his priorities checke by josecanuc · · Score: 4, Interesting
    I already read the news that sewer water is being dumped into the Moscow river because of a plant failure.

    This is what is supposed to happen. All (nearly all?) sewage treatment plants have a bypass to send the input straight to the output, which is usually a river or lake.

    They do it because when a treatment plant cannot accept any more sewage, whether due to excessive water input by rain, or by power loss, the customers are better served by *NOT* letting the sewage back up into their houses. The stuff has to go *somewhere* when all their holding tanks are full. This is the last-resort method of dealing with problems at such plants.

  13. Re:The submitter has to have his priorities checke by venicebeach · · Score: 4, Insightful

    Yes, but slashdot is concerned with the internet, and so this is an appropriate forum to discuss how an event like this affects the internet. I don't think someone who runs an ISP in Russia should be trying to figure out how to get the sewer working, they should be figuring out how to get the internet up.

  14. UES Management Faces Criminal Investigation by Anonymous Coward · · Score: 4, Informative

    http://mosnews.com/news/2005/05/25/chubaiscriminal case.shtml

    From the article:

    Russian prosecutors on Wednesday opened a criminal case against the management of power monopoly Unified Energy System (UES) after a major power outage in Moscow, agencies reported Wednesday.

    The case was opened to investigate possible negligence, the Interfax agency quoted the Prosecutor General's Office as saying.

  15. I think this is a "political outage" of some sort by melted · · Score: 5, Interesting

    There's a Russian politician of Yeltsin era, Anatoly Chubais who is in charge of RAO UES Russia (which is an uber-organization controlling production and distribution of energy in Russia).

    While the guy is not as powerful as he was a few years ago, he still poses a significant threat to Putin's third (and fourth, and so on) term presidency, and further concentration of power in Putin's hands.

    So within half a few hours of outage, Putin blamed Chubais directly for this, and Russian justice dept opened up a criminal case against him. If you know anything about Russia, you know that Russian DOJ (Prokuratura) doesn't start criminal cases against wealthy and powerfull businessmen and politicians unless instructed to do so by Putin.

    So I'd bet dollars against donuts that this outage was caused by folks from Lubyanka (FSB aka KGB) purely to remove Chubais, and if cards play well maybe even give him a lengthy prison term.

  16. lack of BGP/RIP2/whatever configuration by g-san · · Score: 4, Informative

    I doubt it was the lack of RIP2 configuration that caused this. You don't use RIP in the core, you use BGP as the exterior protocol and most likely OSPF or ISIS as the interior protocol.

    UPS: at least in one place in MSK-IX they did have proper UPS backups, you can tell from routing tables that some BGP connections have an uptime of 4 weeks plus. They did bounce (or it had a power failure) one of their core routers as all those peering connections only have an uptime of 8.5 hours. I'd rather not provide a link to this as the last thing they need is their core routers slashdotted with BGP table summary requests.

    Connectivity: it appears MSK-IX is peered with at least 12 other sites that are also peered with another major IX. For example they are connected to three other sites that are also connected to AMS-IX and four other sites that are also peered with LINX, among a few others with only 1 connection to another Internet Exchange. Many of these were thru Informtelecom XXI, so if they also had power problems everything was running on 50% normal capacity. There should have been enough connections to keep things running (i.e. no single point of failure), but that is assuming everything is working/powered, and assuming these guys in the middle could/would handle all the traffic (unlikely).

    BTW, packets don't lose thir way, routers lose their routes to destinations. When all the crap started the routes began to "flap", i.e. go up and down as routers were reset, power came back on, routers went back down under the heavy load, manually trying to route around the problem, etc. When your peer sees your routes flapping, they usually put a holddown on them for a period of time, meaning they won't readvertise your route updates to other routers on the internet (said flaps propogate all over the world, putting undue stress on other routers). So even once you get everything working again, the internet waits for a little bit to accept your routes. Well, some do and some don't or some wait longer. That's why you see routers still forwarding packets to London, apparently London thinks it can still get to Moscow so it's still advertising routes. You don't get the count to infinity problem with BGP, but loops are still possible, especially during major outages and route flapping. And routers get "routing loops," not "found themselves in a loopback."

    I provided as much details as I could, it's lacking in a few places because I can't follow russian websites.