Electricity Outage Puts Routing to a Tough Test
infofarmer writes "Today at about 11:30 MSD (GMT+4) a major electricity outage in Moscow, Russia brought new meanings to words like "uninterruptible", "redundant" and "uptime" for network administrators, who haven't experienced such harsh and unexpected power failures since the USSR got its Internet connection. Half of the city is totally out of electricity - including subway and the most important traffic exchange point, half of the top russian sites went down, including www.mail.ru, www.rambler.ru, www.lenta.ru, some of them haven't been brought up yet. IP packets going from ADSL users in Moscow to some local sites got rerouted to somewhere in London and then back to Scandinavia, where they met their "No route to host" deadend. Other routers found themselves in a loopback, which made many packets get dropped with TTL expired. The point is that most of popular servers have got two or three mainline Internet connections, but lack of BGP/RIP2/whatever configuration resulted in packets losing their way to hosts."
Sigs cause cancer.
The msk-ix went down, and now that its back up, your going to have it slashdotted?
Can someone give this guy a metal?
You are confusing me with someone who cares.
Oh nevermind...
no more all off mp3 .com
Obviously the MPAA/RIAA are to blame..
must have been males- didn't stop to ask for directions
bride.ru is still up.
Good thing I saved all my russian pr0n.
Pulp Audio Weekly - Geek News and Reviews
That's crazy. These sites were down before they hit Slashdot.
Web Design Tips
An alternate headline should be:
Correct router configuration can be difficult!
Online Starcraft RPG? At
Dietary fiber is like asynchronous IO-- Non-blocking!
Last night I lost power for about 3 hours. My laptop worked. My cable modem stayed connected on battery backup. My router is plugged in and died.
Russia's connections at least made a couple of hops before dying. Mine died on 1 hop. It did illustrate the uselessness of a battery-backed up modem on my network, however.
/. ++
Did kremvax stay up?
For the last three or four weeks my gmail account has been POUNDED by 100-200 cyrillic spam messages every day. The filters catch them, but I have to clean out my spam folder pretty often.
I've gotten none in the last couple hours.
Knowing how things are done in Russia, you should be a lot more concerned with things OTHER than Internet.. Everything is such a fucking mess over there, that's I really hope no serious injuries happen. I already read the news that sewer water is being dumped into the Moscow river because of a plant failure. In times like these, who gives a shit about Internet?
I can't seem to log into my bank account to update my out-of-date account information.
Wonder if these are somehow related.
Beauty is in the eye of the beerholder.
but lack of BGP/RIP2/whatever configuration resulted in packets loosing their way to hosts."
Those mischevious packets. Unraveling networks where'er they roam.
Someone should blackmail the Moscow electrical grid. "If you ever want to send spam again, fork over $200 and send it to this address..."
The RIAA's crack anti-mp3 commando teams are rumored to be cutting a bloody swath through street markets and datacenters across the city.
I think you need to check your priorities. How do you think geeks all over the world just found out about the power failure?
It was interesting that news.google.com, cnn.com, msnbc.com, etc. do not have this story on its front news page. I guess the outage isn't severe like one in New York a few years ago.
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
I live in Russia, about 1000 km from Moscow. We were hit by network outage, nothing worked (even Slashdot :( ) for about 30 minutes. Number of routes announced by both of our peers was about 700 instead of normal 150000.
:)
But then routes began to appear again! I was amazed, Internet routed itself around damaged segments, packets were routed through Japan (!), Finland and Holland instead of Moscow. The most funny part was when I traced the route to a computer in the next building - it went through Saint-Petersburg
I was able to access Slashdot, and most of Russian sites (http://newsru.com/ , http://ntv.ru/ , http://nbc.ru/ not directly affected by outage.
That explains why there was less spam in my inbox today.
Whats your email addres?
Insert funny smart-ass comment here.
There's more traffic on the 'net than pr0n, wazrez, mpEs and /.
Some of it actually matters.
If the g'vt kept the data on you that google does you'd better believe you'd be calling it "doing evil"
This is what is supposed to happen. All (nearly all?) sewage treatment plants have a bypass to send the input straight to the output, which is usually a river or lake.
They do it because when a treatment plant cannot accept any more sewage, whether due to excessive water input by rain, or by power loss, the customers are better served by *NOT* letting the sewage back up into their houses. The stuff has to go *somewhere* when all their holding tanks are full. This is the last-resort method of dealing with problems at such plants.
Are there any technical reports out of what happened to the network? What is the russian equivilant of NANOG?
Yes, but slashdot is concerned with the internet, and so this is an appropriate forum to discuss how an event like this affects the internet. I don't think someone who runs an ISP in Russia should be trying to figure out how to get the sewer working, they should be figuring out how to get the internet up.
Nobody cares that this was in russia, that people can't get their email, or that it was because of a power outage.
The reason this is on ther front page is because the internet is suposed to be able to handle things like this. People will be watching how the routers automatically deal with the outage (there's one response like that already), and what manual intervention it needed. Hopefully this information will be used for training the next generation of router admins.
Even if this was because of a meteor strike or nuclear bomb, we'd still be interested on how the net took it. We'd be more interested in the everything else about the event, but the response of the network would still be interesting
Perhaps these guys touched live wires ^^
:)
Off-topic, but an interesting read
http://mosnews.com/news/2005/05/25/chubaiscriminal case.shtml
From the article:
Russian prosecutors on Wednesday opened a criminal case against the management of power monopoly Unified Energy System (UES) after a major power outage in Moscow, agencies reported Wednesday.
The case was opened to investigate possible negligence, the Interfax agency quoted the Prosecutor General's Office as saying.
There's a Russian politician of Yeltsin era, Anatoly Chubais who is in charge of RAO UES Russia (which is an uber-organization controlling production and distribution of energy in Russia).
While the guy is not as powerful as he was a few years ago, he still poses a significant threat to Putin's third (and fourth, and so on) term presidency, and further concentration of power in Putin's hands.
So within half a few hours of outage, Putin blamed Chubais directly for this, and Russian justice dept opened up a criminal case against him. If you know anything about Russia, you know that Russian DOJ (Prokuratura) doesn't start criminal cases against wealthy and powerfull businessmen and politicians unless instructed to do so by Putin.
So I'd bet dollars against donuts that this outage was caused by folks from Lubyanka (FSB aka KGB) purely to remove Chubais, and if cards play well maybe even give him a lengthy prison term.
I'm sure Putin will exploit the power outage to weaken and possibly get rid of Chubais.
:)
Whether the FSB caused the outage directly, to prompt an attack on Chubais is another matter. Maybe they were working on a plan but it wasn't ready yet. They have a lot to do
Even Putin sometimes just exploits opportunities.
In any case, the outcome is the same.
I doubt it was the lack of RIP2 configuration that caused this. You don't use RIP in the core, you use BGP as the exterior protocol and most likely OSPF or ISIS as the interior protocol.
UPS: at least in one place in MSK-IX they did have proper UPS backups, you can tell from routing tables that some BGP connections have an uptime of 4 weeks plus. They did bounce (or it had a power failure) one of their core routers as all those peering connections only have an uptime of 8.5 hours. I'd rather not provide a link to this as the last thing they need is their core routers slashdotted with BGP table summary requests.
Connectivity: it appears MSK-IX is peered with at least 12 other sites that are also peered with another major IX. For example they are connected to three other sites that are also connected to AMS-IX and four other sites that are also peered with LINX, among a few others with only 1 connection to another Internet Exchange. Many of these were thru Informtelecom XXI, so if they also had power problems everything was running on 50% normal capacity. There should have been enough connections to keep things running (i.e. no single point of failure), but that is assuming everything is working/powered, and assuming these guys in the middle could/would handle all the traffic (unlikely).
BTW, packets don't lose thir way, routers lose their routes to destinations. When all the crap started the routes began to "flap", i.e. go up and down as routers were reset, power came back on, routers went back down under the heavy load, manually trying to route around the problem, etc. When your peer sees your routes flapping, they usually put a holddown on them for a period of time, meaning they won't readvertise your route updates to other routers on the internet (said flaps propogate all over the world, putting undue stress on other routers). So even once you get everything working again, the internet waits for a little bit to accept your routes. Well, some do and some don't or some wait longer. That's why you see routers still forwarding packets to London, apparently London thinks it can still get to Moscow so it's still advertising routes. You don't get the count to infinity problem with BGP, but loops are still possible, especially during major outages and route flapping. And routers get "routing loops," not "found themselves in a loopback."
I provided as much details as I could, it's lacking in a few places because I can't follow russian websites.
The same can be said of the electrical grid. And the cellular network. And the water network. And the sewage system. Or the public road infrastructure. Or the food distribution chain. Face it - virtually every aspect of modern life requires you to rely almost completely on infrastructures that you do not own.
One has to remember what the internet actually is - a system to transport data. For me it has proven to be far more reliable than the power grid - when the lights go out the internet connection at my house remains active. Should a system go down the first time a packet is dropped? Absolutely not. But that isn't the case here. What Russia is seeing is a massive widespread power failure that is probably beyond the designed tolerance. And keep in mind what else -could- be happening. Why was the sewage shunted to the river? Was it to keep it out of the basement of the local hospital?
If the g'vt kept the data on you that google does you'd better believe you'd be calling it "doing evil"
This is interesting news coming as it does in the week that the CIA is scheduled to run a 3 day netwar exercise called Silent Horizon.
2 .html
http://apnews.myway.com/article/20050525/D8AAFUIO
Am I just blowing smoke here...?
Intolerance for ambiguity is the mark of the authoritarian personality.