Slashdot Mirror


Multiple Sites Down In SF Power Outage

corewtfux writes with word of a major outage apparently centered on 365 Main, a datacenter on the edge of San Francisco's Financial District. Valleywag initially claimed that a drunken person had gotten in and damaged 40 racks, but an update from Technorati's Dave Sifry says the problem is a widespread power outage. Sites affected include Technorati, Netflix (these display nice "We're Dead" pages), Typepad, LiveJournal, Sun.com, and Craigslist (these just time out).

18 of 423 comments (clear)

  1. I work in the Financial District by slug_bait · · Score: 5, Interesting

    I can verify that it affected much of the Financial District here in SF. We had the power go out 3 times. Seems to be back now. Haven't heard any explanation yet.

    1. Re:I work in the Financial District by halfloaded · · Score: 5, Funny

      Phew... I was worried the internet got slashdotted.

  2. Oblig.... by Anonymous Coward · · Score: 5, Funny

    im in ur datacentr
    trashin ur racks

    1. Re:Oblig.... by Tackhead · · Score: 5, Funny
      > im in ur datacentr
      >
      > trashin ur racks

      Lizzie Borden did teh h4x,
      Got drunk and unplugged 40 racks.
      When she saw what she had done,
      She unplugged number 41.

      (Lawn. Off. Git.)

    2. Re:Oblig.... by MsGeek · · Score: 5, Funny

      I felt a great disturbance in the Internet, as if millions of geeks suddenly cried out in terror and were suddenly silenced. I fear something terrible has happened.

      --
      Knowledge is power. Knowledge shared is power multiplied.
  3. Redundant? by DogDude · · Score: 5, Insightful

    Don't these large sites have failover capable, redundant servers in multiple physical locations? Why should a failure in one rack, one room, or heck, even one state for the giant sites, effect them?

    --
    I don't respond to AC's.
    1. Re:Redundant? by Anonymous Coward · · Score: 5, Informative

      They do, but one of the dirty little secrets of most data centers is that they don't have enough generator capacity for all the cooling. They'll woo you with the generator, the 2,000 gallons of diesel, and N+1 array of UPSes, but when utility power dies, it gets hot very quickly. And some racks must go down.

  4. Re:No Generators? by eln · · Score: 5, Insightful

    Any data center that advertises high availability should be testing that sort of thing on a regular basis. It's possible that they could fail switchover even if they are being regularly tested, but it is unlikely.

    If the "power outage" theory is correct and the "drunken employee" theory is incorrect, as a customer I'd be pissed that the data center I pay tons of money to can't keep my site up in the event of a power outage, which is one of the main perks of hosting at a data center in the first place.

  5. zombies .... by taniwha · · Score: 5, Funny

    There's a report here that "Flesh-eating zombies are prowling the streets"

  6. From Technocrati: by Darth_brooks · · Score: 5, Funny

    We are working with our co-location facility managers to assess why it is back-up power generators failed to provide the necessary back-up power to prevent our site going down. We apologize for any inconvenience caused by our site being unavailable this afternoon.

    I think that's admin speak for:

    I warned these idiots eight months ago during my review that the datacenter had outgrown its generator capacity. But did they listen? Fuck no, they just kept counting money and worrying about the bottom line. The beancounters looked at me like I'd asked them for a blowjob from their grandmothers when I submitted the workup for additional generator capacity. And now that the shit's hit the fan, whose ass are they screaming for? Screw this, I'm applying at Taco Bell.

    --
    There are some people that if they don't know, you can't tell 'em.
    1. Re:From Technocrati: by Soko · · Score: 5, Funny

      Thanks for the laughs, even if they led to a sad realization. Cancel, or Allow?

      --
      "Depression is merely anger without enthusiasm." - Anonymous
    2. Re:From Technocrati: by RealGrouchy · · Score: 5, Funny

      "... to assess why it is back-up power generators failed ..." I've been a grammar nazi for many years, but it looks like the enemy has unleashed new weapons.

      Tell my family I loved them.

      - RG>
      --
      Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
  7. Re:Redundent power supply? by aaarrrgggh · · Score: 5, Interesting

    It takes Diesel a few years to go bad. That site has fuel polishing systems to prevent that. Because of earthquake risk, they contractually are obliged to have 24-48 hours of backup fuel with many of their clients.

    They have the HiTec rotary UPSs in all their facilities, which link a generator to a flywheel UPS. It's stupid to not have backup fuel for that type of system; you can only run for 13 seconds before the load crashes.

    It is possible that they got a number of small hits and the generators failed to re-start after a few. Good procedures are to stay on generator until utility stabilizes if you have more than one "hit."

    Be interesting to find out what happened.

  8. About Emergency Power by linuxwrangler · · Score: 5, Informative

    It's been a long time since I went on a tour of several data centers to locate a new facility for our dot-com. I believe that 365 Main was a facility that does not use a battery UPS. Instead, they have engine-backed flywheel UPS system (see http://www.enterprisenetworksandservers.com/monthl y/art.php?2813 for a description). At the time, they have 10 2-megawatt generators on the roof in a N+2 configuration. The engines are kept heated and are spec'd to go from stop to engage-clutch/deliver-power in 3 seconds. The flywheel can deliver 11 seconds of power so they can fail through a couple of bad engines before running out of flywheel power. They periodidally do a 20-hour load test into a pair of 500,000 watt heat-sinks. Time will tell if this outage was a failure of design, failure of maintenance, or outright malfeasance. But it wasn't supposed to happen. They've got some 'splainin' to do.

    As to diesel storage, use of diesel is widespread for emergency use everywhere from hospitals to emergency-services to hospitals. Those systems are run regularly - typically weekly. The use of biocides, stabilizers, and mobile fuel-scrubbing services, and extra filtration systems can maintain the fuel quality. Our colo currently maintains a 1-week fuel-supply and has multiple quick-refuel contracts in place. I can't imagine any colo having less than 24-48 hours in-the-tank with quick-refill on-call.

    But one thing that is missing is cooling. Our colo has a typical contract that says something like blah-blah won't exceed 80F for more than 4 hours blah blah. OK, but a rack full of blade servers can crank out 15-20kW of heat load and a data center can heat up real quick without AC. By contract, 150F for 3.5 hours would be in-spec.

    --

    ~~~~~~~
    "You are not remembered for doing what is expected of you." - Atul Chitnis
  9. Re:No Generators? by MichaelSmith · · Score: 5, Interesting

    Stuff happens

    No kidding. years ago in my former job on traffic systems we had a great UPS with a generator on site and the ability keep it fueled up indefinitely. A security contractor came in on the weekend to install something and tried to wire up a new circuit hot. He slipped with a screwdriver and shorted the white phase to the chasis of the breaker panel. I don't think the tip of the driver actually touched ground, but the burn mark is still there to show how close he got.

    The resuting current spike blew the 100A fuses (heavy metal strips) both going in to and out of the UPS. With the UPS effectively broken the generator set failed to start and the system gracefully shut down 40 minutes after the incident. Thats not bad. The batteries were only specified to work long enough for the genny to settle at 50Hz.

    In the process of blowing the fuses a spike got back into the power supply of one of our DEC Alphas and took out the power supply. The system was redundant at the software level so I didn't notice immediately.

    The UPS guy came out and didn't have enough fuses to replace the blown one, but we found that with a bit of brute force and filing attacks some others could be made to fit.

    Please type the word in this image: problems

  10. Re:The Scoop from SFGate.com by RealGrouchy · · Score: 5, Funny

    It was hard to read through that block of text, but looking closely, it explains why:

    "Officials say the power outage may affect some websites, including the site that hosts Slashdot.org's preview button."

    It all seems to be back up now.

    - RG>

    --
    Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
  11. Re:No Generators? by apoc.famine · · Score: 5, Funny

    I tried to mod the article "-1 Not Redundant" but it wasn't an option. And I didn't have mod points. At least my inability to function only warrants a comment, rather than a slashdot article.

    --
    Velociraptor = Distiraptor / Timeraptor
  12. Insane level of backup... by SmoothTom · · Score: 5, Interesting

    ...until the commercial power fails and doesn't come back for days.

    The only places I've actually seen the insane levels of backup that some would like is in some telco central offices. The one I was associated with the longest had eight-hour-plus battery backup and 8 days of fuel for the diesels. Some of our really remote microwave sites had 24 hour battery and 30 day diesel.

    Of course one of those sites failed high up in a mountain range in a mid-winter storm (Tieton, 1978) when the commercial power failed, and the starter battery for the diesel froze. When one of the techs finally got there (after burying his Sno-Cat and walking the last couple miles), he had to chip ice off the steel door to get inside, where he was able to get the diesel started with a little "rewire" of one of the backup battery sets. Oh, his two-way radio also failed during his hike, since it was outside his snowsuit, and the lack of communication caused the company to start two more Sno-Cats and a helicopter in that direction.

    The site was out for nearly six hours, IIRC.

    Even the BEST designs are subject to failure. :o(

    --
    Tomas