Slashdot Mirror


Multiple Sites Down In SF Power Outage

corewtfux writes with word of a major outage apparently centered on 365 Main, a datacenter on the edge of San Francisco's Financial District. Valleywag initially claimed that a drunken person had gotten in and damaged 40 racks, but an update from Technorati's Dave Sifry says the problem is a widespread power outage. Sites affected include Technorati, Netflix (these display nice "We're Dead" pages), Typepad, LiveJournal, Sun.com, and Craigslist (these just time out).

15 of 423 comments (clear)

  1. Redundant? by DogDude · · Score: 5, Insightful

    Don't these large sites have failover capable, redundant servers in multiple physical locations? Why should a failure in one rack, one room, or heck, even one state for the giant sites, effect them?

    --
    I don't respond to AC's.
    1. Re:Redundant? by Anonymous Coward · · Score: 2, Insightful
      This reminds of other sites I have worked on. On more than one occasion someone wanted to move the physical location from Texas to the SF or some other silly place. For some reason it made perfect sense to them to move the computers from a stable location with inexpensive labor and cheap reliable power(Texas is on it's own grid, with a plethora of power plants, and energy executives always give themselves cheap power) to a location that was earthquake ridden, and unreasonable expensive in living costs and power. Even when Enron was taking california for a ride, and in Texas we knew this, people still thought it made good business sense to have the servers in the bay area. All in one place. With no redundant location.

      I am not surprised that one little power blip took everything out.

    2. Re:Redundant? by raehl · · Score: 3, Insightful

      I'm certainly forwarding this article to my boss, who abruptly decided to put an end to planning for a backup site on the basis of "aw, nothing is going to happen".

      The thing is, letting something happen may be a better decision than trying to stop it.

      If you're going to have a fully-redundant setup, it's going to cost you twice as much as having just one setup. And if you're not going to have a fully-redundant setup, your backup site is going to buckle under the full load of normal traffic anyway.

      The correct business decision might just be "I just saved a bunch of money on my data center insurance," and if you lose a day's business, oh well, that was still cheaper than keeping a backup data center around.

  2. Re:Redundent power supply? by Anonymous Coward · · Score: 1, Insightful

    I've been told there was no fuel left at the time.

    Now, the only remaining question is: How did the drunk guy get in there?

  3. Re:No Generators? by eln · · Score: 5, Insightful

    Any data center that advertises high availability should be testing that sort of thing on a regular basis. It's possible that they could fail switchover even if they are being regularly tested, but it is unlikely.

    If the "power outage" theory is correct and the "drunken employee" theory is incorrect, as a customer I'd be pissed that the data center I pay tons of money to can't keep my site up in the event of a power outage, which is one of the main perks of hosting at a data center in the first place.

  4. Re:From Technocrati: by Cervantes · · Score: 2, Insightful

    Where's the +1 "100% fucking right" mod option?

    Whaddya bet some poor mid-level admin gets blamed and tossed for this? And the upper-management guy who ignored the recommendations for testing or redundancy still gets his bonus for good fiscal performance.

    --
    If I knew the wedgies I gave you back in 6th grade would have resulted in this . . . I might have taken a moments pause.
  5. Re:No Generators? by Frosty+Piss · · Score: 4, Insightful

    If the "power outage" theory is correct and the "drunken employee" theory is incorrect, as a customer...

    For me it would be other way around. A technology failure I could understand. Letting a drunk employee near my server rack, I could not.

    --
    If you want news from today, you have to come back tomorrow.
  6. Re:No Generators? by Anonymous Coward · · Score: 2, Insightful

    Wait, you think its OK to advertise five nines reliability, UPS backup, and generator backup, only to find out that the systems were not being properly tested to meet the advertised capability?

  7. Re:No Generators? by Anonymous Coward · · Score: 1, Insightful

    What is "high availability". 99% uptime is 3.5 days down. 99.9% is 9 hours down. 88.88% is nearly an hour down. Certainly these sites can still be considered 3 nines high availability.

  8. Re:No Generators? by Anonymous Coward · · Score: 1, Insightful

    Much of Europe uses 220V/50Hz.

  9. Re:No Generators? by Sancho · · Score: 2, Insightful

    The drunk thing is way outside the control of the administrators. Testing the failover is something they can do, and if something doesn't work, they can fix it.

  10. Re:No Generators? by Not+The+Real+Me · · Score: 3, Insightful

    "...I would think these large sites are going to pitch a bitch..."

    I would think these large sites would understand the concept of not putting all your eggs (servers) in one basket. There is a reason why smart companies use replication and clustering, and datacenters spread across the country.

  11. Re:Gross malfeasance by suresk · · Score: 2, Insightful

    Now, now... LiveJournal is back up.

  12. Re:No Generators? by gujo-odori · · Score: 2, Insightful

    Have you ever been in a data center? Cabinets that are all locked. To get the key, you have to sign it out from security. Ditto for the cages. It wouldn't just require a drunken/disgruntled employee, it would require a conspiracy of them: security staff to hand over the keys and the disgruntled employees to do the misdeeds.

    Well, there is one way around that: you walk over to the EPO button and give it a whack. It'll take down the whole floor. Rinse, lather, repeat on other floors. How many do you think you can do before someone stops you?

    Anyway, my employer has a lot of stuff in 365 Main. We're not one of the companies mentioned in TFA, but we're certainly one of the ones affected. Within a couple minutes of the outage, we knew we'd lost everything we had there and several of our sysadmins grabbed their gear and headed for the city to go join that line outside of 365. By the time they left the building we had confirmation that it was a power outage.

    Power was already back on when they got inside and they immediately brought up anything that wasn't already up and tested it all to make sure it was OK. To say the least, this is inconsistent with (tall) tales of somebody going apeshit on 40 racks.

  13. Re:Insane level of backup... by HeroreV · · Score: 2, Insightful

    If the heater is really that important, it should be reporting back at regular intervals that it's on, and when the signal isn't being received anymore there should be a process so that somebody calls and asks what's up. If somebody wanted to turn it off and couldn't, they'd just unplug it.