How Peer1 Survived Sandy
Nerval's Lobster writes "When hurricane Sandy knocked out the electricity in lower Manhattan, data-center operator Peer1 took extreme measures to keep its servers humming, assembling a bucket brigade that carried diesel fuel up several flights of stairs. Ted Smith, senior vice president of operations for Peer1, talks about the decisions made as the floodwaters rose and the main generators went offline, as well as the changes his company has made in the aftermath of the storm. He said, 'When the water got to a point that it had flooded the infrastructure and the basement, we were then operating under the reserves the building had on the roof, and our own storage tanks. Literally, at that point we had to do calculations as to how long we could run. And we believed we had enough diesel fuel—between what is in the building, and in our tanks, to about 9 AM the following day. ... You know the bucket brigade—it’s something I’ve never asked the team to do. If you think about what that was at that time, you’re talking about carrying fuel up 17 flights, in total darkness, throughout a whole evening. We had informed our data center manager that we were shutting down, but he kind of took on it himself to say, ‘Not on my watch.’ And he organized himself, got a temporary solution and then more customers jumped in. And at peak I think we had about 30 people helping.'"
I would have thought that they barricaded the doors and windows with wicker baskets and throw pillows. Wait...
"A casual stroll through the lunatic asylum shows that faith does not prove anything." ~Friedrich Nietzsche
Carrying diesel up stairs in the dark sounds like it might not exactly be reasonable to ask your employees to do...
From what I hear, based on the StackExchange podcast, and the tweets that went out from SquareSpace and StackExchange during the whole idea is that Peer1 had a complete failure, and it was only due to the hard work of their customers (SE and SquareSpace) that the datacenter was able to remain operational. If your customers have to start carrying buckets of diesel up 17 flights of starirs, you, as a datacenter have failed. Peer1, left to their own devices would have just let the thing shutdown, and apparently head office wasn't aware of how bad things even were.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
http://xkcd.com/705/
Why would you ever bother?
The myriad of regulations that were probably broken during all this should turn a watchful eye. Luckily no one was hurt and nothing bad happened, but it was just that, luck.
In total darkness, up 17 flights of stairs, with a flooded basement? Sounds like a recipe for a potentially fatal fire. People's lives are more important than a freaking data center. Sorry, but I don't see this as a heroic story about people trying to keep critical infrastructure running, but as a desperate failure that could easily have turned into a disaster. They never should have gotten to the point where they're continually carrying fuel up stairs. It also sounds like they then decided to pump fuel up a pipe they installed in the stairwell. That doesn't sound terribly safe either, especially when done in a mad rush like I'm sure it was.
Gee.. couldn't have someone planned for this contingency rather than this sort of haphazard, dangerous sounding plan that was thrown together?
AccountKiller
----Original Message-----
From: PEER 1 Hosting NOC [mailto:@peer1.com]
Sent: Wednesday, December 05, 2012 4:04 PM
To: @peer1.com
Cc: @peer1.com
Subject: DAILY UPDATE - NYC DATA CENTER - December 5, 2012
Dear Customer,
Our facility engineers have identified an electrical explosion located in basement 2 that caused the building to flip to generator. Commercial power is available but the building advised that we stay on generator until it is safe to do so.
More updates to follow once available.
--
Network Operations Center
PEER 1 Hosting | Ping & People
1000-555 West Hastings Street
Vancouver, BC, Canada V6B 4N5
All I can say is you damn well better reward all your employees that helped. They kept you up and kept your revenue stream moving. You need to give them some kickass holiday bonuses or you're all major douche bags.
The original post sounds like a snippet from that Corey Doctorow end-of-the-world novel. Did they have to find parts to fashion a rudimentary lathe along the way? I applaud the efforts of the server team, but as one commenter stated, it sounds like a failure of the company's business continuity/disaster recovery plan. The cost of dealing with employees and customers in a burn ward should overshadow revenue flow.
Idiots are +5 Insightful for lambasting hosting companies for not maintaining DR and remote site capabilities throughout Sandy.
Seriously Peer1's efforts are all one can ask for and I applaud their efforts to stay online during what has to be a worst case scenario for them aside from Pandemic.
---Up Up Down Down Left Right Left Right B A START
The StackExchange Podcast had a excellent review of the events at Peer1.
http://blog.stackoverflow.com/2012/11/se-podcast-36-we-got-hit-by-a-hurricane/
http://www.podtrac.com/pts/redirect.mp3/feeds.soundcloud.com/stream/66762703-stack-exchange-stack-exchange-podcast-36.mp3
I don't find the bucket brigade thing that interesting. And I have little sympathy for a company that chooses to put a data center in a flood plane (and in very expensive real estate at that). What I find interesting is that the data center apparently was able to keep a connection to the rest of the world. I would have expected the power outages and the flooding to disconnect it, even if it could power itself.
I'm an American. I love this country and the freedoms that we used to have.
Having your employees stay in a emergency stricken zone that is flooded and carrying open canisters of diesel fuel to keep a data center running so that someone in California can share pictures of their cat is really not worth it IMHO.
I am sure some people were probably a little more worried about the lives of their families and themselves rather then some digital data.
I am not going to call someone a hero for this. At some point out there, people using cloud services and online storage are going to have to accept the fact that during emergency situations, their data just isn't accessible, period.
The basic fundamental problem I have about all this and what Sandy has highlighted is that the Internet was designed to be decentralized solely for the purpose of surviving natural or man-made disasters. Why is it then that a data center company creates a single centralized storage site instead of having an auxiliary site somewhere else, even on the other side of the country.
I think this is an epic fail in planning and execution. Anyone using Peer1 shouldn't be happy for putting people's lives in danger when common sense could have had them build in redundancy to their infrastructure allowing people to worry about their families more then your data.
Also, just like with Japan, don't build your backup generators at or below sea level.
I haven't thought of anything clever to put here, but then again most of you haven't either.
How many of these asinine data center advertisements are we going to get? This is at least the 3rd "How such and such data center survived Sandy!" I don't care... it's not news. You told your employees to stand in knee deep water in the middle of tons of electronic equipment and bail water? You're a god damned fool and lucky no-one got killed.
Sacking workers on Dec 26 rather than Dec 24.
Thus, resulting in the obligatory XKCD.
Some people don't know how to live. Switch off the computers, go home and take care of your personal life for heavens sake. Work to live, don't live to work. Took me a long time to finally realize this. I am MUCH happier now.
Haven't read the details of Peer1's trials and tribulations, but the situation reminds me of the Interdictor blog, about keeping DirectNIC running during Hurricane Katrina. That was one of the most thrilling blogs I've ever read.
load "windows7"
Why on Earth would you have a diesel generator up seventeen flights?
If they are a distributed company with data centers located in other parts of the county why was this data not replicated or transferred as virtual machines to the other locations and shut down the NY data center?
Nice way to solve the problem.
...is with all these repeated 'data center survives Sandy' stories? I remember reading this post 2+ weeks ago!
...bucket brigade that carried diesel fuel up several flights of stairs...
Wow, their servers are diesel powered? Awesome!
Proverbs 21:19
When taking the decision to keep the emergency backup fuel pumps in the basement, did no one think of what would happen in the event of flooding.
AccountKiller
I agree that in a flat, well-lit environment they'd probably be okay for gas, and even safer for diesel, especially outdoors where plants and soil would cushion the fall and vapors would dissipate quickly. Falling down an enclosed concrete stairwell though... even if the bucket itself didn't break I seriously doubt the lid would stay on, and then you'd have a real problem on your hands.
--- Most topics have many sides worth arguing, allow me to take one opposite you.
Not being ready for the problems outlined above is not new for Peer 1.
Their main facility in Vancouver failed a number of years back because they relied upon a single generator that was not properly tested on a regular basis. Did they install a second generator? Nope -- vulnerability remains four years later.
(see http://forums.peer1.com/viewtopic.php?f=37&t=52 -- in fact, a quick review of their forums shows many many power problems at Peer 1 facilities)
"Probably someone thought about it, but decided that other potential hazards (e.g., a leak in the tank causing fuel to be soaked all through several floors of the building) were more important to deal with"
..
...
Specious logic, the position of the fuel pumps don't contribute to fuel tank leakages
"You can guess what the likelihood of each particular risk is, but that's definitely guesswork;
No need to guess, certain people are paid a lot of money to analyse the risks,
"the whole of New York really wasn't set up with this sort of storm surge in mind"
NYC Hurricane History: 1821,1893,1938,1954,1955,1960,1985,1995,1996,1999,1999,2011
Sep 2007: 4 Million Gallon Diesel Fuel Tank Fire
AccountKiller
What the SE crew did to keep their site up was amazing. They got no help from their site until way late in the game.
They counted on Peer1 to handle facilities... and they dropped the ball.
I have had multiple 72 hours outage due to local power going away. Having to plan to have fuel delivered to the backup gen-set, and having to ration power (I knew the burn rates and what equipment was non-essiental)... blah blah blah.
The bottom line is that the customers are not in the NOC, so if the organization cares about its customers... what needs to be done gets done.
Peer1 abandon ship for two days and left THIER customers to fend for themselves... and then treated them poorly when they came back to restore the services they failed to maintain during a crisis that while bad was not insurmountable.
Peer1 should get shutdown for failing to cover their end of the bargain.