Slashdot Mirror


Explosion At ThePlanet Datacenter Drops 9,000 Servers

An anonymous reader writes "Customers hosting with ThePlanet, a major Texas hosting provider, are going through some tough times. Yesterday evening at 5:45 pm local time an electrical short caused a fire and explosion in the power room, knocking out walls and taking the entire facility offline. No one was hurt and no servers were damaged. Estimates suggest 9,000 servers are offline, affecting 7,500 customers, with ETAs for repair of at least 24 hours from onset. While they claim redundant power, because of the nature of the problem they had to go completely dark. This goes to show that no matter how much planning you do, Murphy's Law still applies." Here's a Coral CDN link to ThePlanet's forum where staff are posting updates on the outage. At this writing almost 2,400 people are trying to read it.

431 comments

  1. Server/customer ratio? by gardyloo · · Score: 1, Interesting

    9000::7500?

    So I guess a "customer" in this case is a company or business, not an individual? Unless many of the individuals have several servers each.

    1. Re:Server/customer ratio? by ChowRiit · · Score: 2, Insightful

      Only a few people need to have a lot of servers for there to be 18 servers for every 15 customers. To be honest, I'm surprised the ratio is so low, I would have guessed most hosting in a similar environment would be by people who'd want at least 2 servers for redundancy/backup/speed reasons...

    2. Re:Server/customer ratio? by 42forty-two42 · · Score: 5, Insightful

      Wouldn't people who want such redundancy consider putting the other server in another DC?

    3. Re:Server/customer ratio? by p0tat03 · · Score: 4, Insightful

      ThePlanet is a popular host for hosting resellers. Many of the no-name shared hosting providers out there host at ThePlanet, amongst other places. So... Many of these customers would be individuals (or very small companies), who in turn dole out space/bandwidth to their own clients. The total number of customers affected can be 10-20x the number reported because of this.

    4. Re:Server/customer ratio? by bipbop · · Score: 3, Informative

      At my last job, BCP guidelines required both: a minimum of four servers for anything, two of which must be at a physically distant datacenter.

    5. Re:Server/customer ratio? by wirelessbuzzers · · Score: 2, Insightful

      I'm guessing that most of the customers are virtual-hosted, and therefore have only a fraction of a server, but some customers have many servers.

      --
      I hereby place the above post in the public domain.
    6. Re:Server/customer ratio? by AtomicSnarl · · Score: 1

      One of the fatalities was Blank Label Comics, which hosts a large collection of web comics and their supporting forums. They get multi-millions of hits daily, so multiple servers for them would be expected.

      --
      Pacifist paratroopers yell, "Ghandi!" when they jump.
    7. Re:Server/customer ratio? by toadlife · · Score: 1

      I am a customer of a reseller that happens to host at theplanet. My sites are all down. :(

      --
      I don't always use unix-like operating systems; but when I do, I prefer FreeBSD.
    8. Re:Server/customer ratio? by billcopc · · Score: 2, Insightful

      What ? I run 4 servers myself. The small firm I work for, we run maybe 70-80 boxes in our cage.

      In fact I find it odd that this facility has so many individual customers. Seems like a lot of administrative overhead... If I were running that DC, I'd much rather lease out full or half racks, than individual units, then you let those people sublet to the small frys.

      That's how most of the big hosting companies operate. They don't own their own datacenters, they just lease a cage or two, cram it full of gear and sell you that godawful oversold web space you love to hate. That's also why colocating a single server can be so goddamned expensive - datacenters set per-unit pricing high to scare away the Joe Blows, and the resellers make a lot more money selling crap hosting than subletting their precious space. This is especially true in the USA/Canada.

      --
      -Billco, Fnarg.com
    9. Re:Server/customer ratio? by cowscows · · Score: 2, Insightful

      I think it depends on just how mission critical things are. If your business completely ceases to function if your website goes down, then remote redundancy certainly makes a lot of sense. If you can deal with a couple of days with no website, then maybe it's not worth the extra trouble. I'd imagine that a hardware failure confined to a single server is more common than explosions bringing entire data-centers offline, so maybe a backup server sitting right next to it isn't such a useless idea.

      --

      One time I threw a brick at a duck.

    10. Re:Server/customer ratio? by stonecypher · · Score: 1

      If you have twelve servers, they're probably in three datacenters. Lots of naive setups are webserver, webserver, webserver, database, repeat across location, or things like that.

      --
      StoneCypher is Full of BS
    11. Re:Server/customer ratio? by peragrin · · Score: 1

      Blanklabel web comics and their forums are on their. While individual comics might be virtually hosted, with the big draws getting their own, your talking about millions upon millions of hits.

      --
      i thought once I was found, but it was only a dream.
    12. Re:Server/customer ratio? by arbiter1 · · Score: 1

      yea this happening to a data center, i never heard this happening before myself, i've heard of power outages. when it comes to major data centers they pick an area that don't have outages rarely ever. this just a classic case of "shit happening"

    13. Re:Server/customer ratio? by Dhalka226 · · Score: 1

      The web hosting company Site5 recently moved their servers to The Planet. I'm not sure offhand how many servers they have, but I'm sure it jumps that ratio significantly.

    14. Re:Server/customer ratio? by Anonymous Coward · · Score: 0

      I was thinking about this ratio...if you use netcraft research data there is about 168.000.000 active websites...if you consider an average of 800 websites hosted (this can be higher if think about vps's) per server it's possible to estimate that 5% of internet websites are offline at this moment. Impressive.

    15. Re:Server/customer ratio? by Anonymous Coward · · Score: 0

      When you pay for redundancy you expect to have it already. That is why we pay to host in a professional data center instead of running it out of the basement. Our mistake was assuming they have their Emergency Disaster Plan in place and working.

    16. Re:Server/customer ratio? by Anonymous Coward · · Score: 0

      I have two servers - one is at ThePlanet, but it was not affected by this outage. The other is with Netsonic. So it's not just two datacenters, it's two completely independent providers.

      One server is in Texas, the other is in Wisconsin. I'm not even in the USA.

    17. Re:Server/customer ratio? by dknj · · Score: 1

      the disgruntled employee did it.

    18. Re:Server/customer ratio? by dw604 · · Score: 1

      I host my main business site and a number of client sites on 2 servers affected by this. Sucks.

    19. Re:Server/customer ratio? by ChameleonDave · · Score: 1

      Me too.

      In fact, I didn't even know my site was in Texas. I thought that my local Australian hosting company had machines on site.

      It's a little worrying. What if something like a crash or a powercut (or, let's be crazy, an explosion) occurred? The people whom I'm paying would have to trust someone on the other side of the world to fix it. No pressure on them could make the problem be fixed faster.

      It also raises the question of why I'm letting a middleman grab a chunk of my cash instead of going directly with these nerds in Stetsons.

    20. Re:Server/customer ratio? by PenguSven · · Score: 1

      I got a call this morning from the agency I contract through (in Canberra, AU).
      turns out their online timesheet system and their mail is all located in this data center. well unless another US data center just happened to catch fire on the weekend?

      --
      What is...?
    21. Re:Server/customer ratio? by gnuman99 · · Score: 2, Informative

      How about catching on fire and burning down??

      http://lists.debian.org/debian-devel/2002/11/msg01926.html

    22. Re:Server/customer ratio? by bladesjester · · Score: 1

      Same here.

      I'm just wondering which floor the server that hosts my site is on. According to their forum, it's the difference between probably having it back on in the morning or not until tomorrow night.

      I'm glad nobody was hurt and none of the servers were damaged, but I have to say that I'm not looking forward to seeing "downloading 1 of 1500 (or more) messages"

      --
      Everything I need to know I learned by killing smart people and eating their brains.
    23. Re:Server/customer ratio? by bladesjester · · Score: 1

      According to their forum, The Planet has several sites. This was an explosion at just one of them.

      --
      Everything I need to know I learned by killing smart people and eating their brains.
    24. Re:Server/customer ratio? by Anonymous Coward · · Score: 0

      I've got seven Web sites that I designed, all down, including all e-mail service for each of those Web sites.

    25. Re:Server/customer ratio? by Ethan+Allison · · Score: 1

      So what would be a less naive setup then?

    26. Re:Server/customer ratio? by kflat · · Score: 1

      Yay, stereotypes!

      Ironically, in 5 years in the Texas (DFW/Houston) hosting industry, the only person I have seen who wore any sort of cowboy wear was a recent California (San Jose) transplant.

      The guy was definitely a huge nerd, though, so you nailed that one... but really, do you wrestle with kangaroos and hunt for crocodiles, just like we all wear Stetsons and shitkickers and ride horses to work?

    27. Re:Server/customer ratio? by D'Sphitz · · Score: 1

      Well I think everyone in the affected data center are legacy EV1 customers on old outdated hardware. I guess it would make sense that the big companies would have decided to upgrade hardware in the last 4 years and subsequently been moved out of that data center.

      Luckily I just canceled my last legacy server a few weeks ago so this didn't affect me.

    28. Re:Server/customer ratio? by CharlieHedlin · · Score: 1

      Not every disaster can be contained if you have a single location.

      If you want to have full protection, you need more servers, and to pay more $ accordingly.

      If they have a fire the fire department isn't going to let them energize ANYTHING. even if the backup generator automatically turned on (the hardware could have been damaged), the fire department won't let them leave anything energized while they inspect.

    29. Re:Server/customer ratio? by ChameleonDave · · Score: 1

      but really, do you wrestle with kangaroos and hunt for crocodiles Nah, I'm vegetarian. I cuddle koalas and Tasmanian devils instead.
    30. Re:Server/customer ratio? by stonecypher · · Score: 1

      Varies according to the application, the environment, the tools used to build it. Something which can distribute both database and web load across all nodes (such as you'd get in an Erlang, Twisted Python or Mozart-Oz setup) is a good first step.

      --
      StoneCypher is Full of BS
    31. Re:Server/customer ratio? by Pollardito · · Score: 1

      maybe they have the other server in their office and not in a dedicated data center at all

  2. 9 Volts of Love by Anonymous Coward · · Score: 5, Funny

    Electricity is a fickle mistress, one moment she's gently caressing your genitals through gingerly applied electrodes the next she's blowing up your data centers.

    1. Re:9 Volts of Love by milsoRgen · · Score: 2, Funny

      I got your 9 Volts of Love right here

      --
      I'm sick of following my dreams. I'm just going to ask where they're goin' and hook up with 'em later.
    2. Re:9 Volts of Love by Anonymous Coward · · Score: 0

      Transformer Software powered by Microsoft !

    3. Re:9 Volts of Love by Anonymous Coward · · Score: 0

      Electricity is a fickle mistress, one moment she's gently caressing your genitals through gingerly applied electrodes the next she's blowing up your data centers. Electricity is a fickle mistress, one moment she's gently powering your data center, the next shes blowing your genitals off through gingerly applied electrodes.
    4. Re:9 Volts of Love by Anonymous Coward · · Score: 0

      You know what they say its all fun and games until someone loses a data center.

  3. Kudo to their support team by QuietLagoon · · Score: 5, Insightful

    ... for posting frequent updates to the status of the outage.

    1. Re:Kudo to their support team by Anonymous Coward · · Score: 1, Informative

      Yes.

      Shit happens. The question then becomes how you deal with it.

      As above, see below. Will follow with interest.

    2. Re:Kudo to their support team by imipak · · Score: 3, Interesting

      Little-known fact: The Planet were the first ever retail ISP offering Internet access to the general public - from 1989. Hmmm, so the longest-established ISP in the world that they're not only working hard to get that DC back online, they're posting pretty open summaries of the state of play... coincidence? I don't think so.

    3. Re:Kudo to their support team by larien · · Score: 4, Insightful

      It's probably less effort to spend a few minutes updating a forum than it would be to man the phones against irate customers demanding their servers be brought back online.

    4. Re:Kudo to their support team by QuietLagoon · · Score: 3, Insightful
      man the phones against irate customers

      It does not sound like the type of company that thinks of its customers as an enemy, as your message implies.

    5. Re:Kudo to their support team by SSpade · · Score: 5, Informative

      It's little known mostly because it's not actually true. I think you're confusing theplanet with the world, aka world.std.com.

    6. Re:Kudo to their support team by Anonymous Coward · · Score: 4, Funny

      Not sure I want to go to a std.com domain, might get infected...

    7. Re:Kudo to their support team by Angostura · · Score: 0

      Agreed,

      Putting aside for one moment any shortcomings in their infrastructure, those posts area text-book example of how to communicate status clearly and concisely to customers.

    8. Re:Kudo to their support team by c_forq · · Score: 2, Funny

      Or more likely it sounds like someone who has worked tech support (this is slashdot).

      --
      Computers allow humans to make mistakes at the fastest speeds known, with the possible exception of tequila and handguns
    9. Re:Kudo to their support team by Anonymous Coward · · Score: 1, Insightful

      If you host all DNS servers for your customers' domains in the same data center, you better have excellent support staff to make up for this rookie mistake.

    10. Re:Kudo to their support team by cheater512 · · Score: 1

      Apart from the DNS and control panel stuff ups, I dont think they had any other flaws.

      When the firemen tell you to turn the power off, you really need to do it.

      Although I'm not sure why they cant get the generators back online now.
      The explosion must have knocked out some important equipment.

    11. Re:Kudo to their support team by larien · · Score: 1
      Mostly just an expectation that if their clients have lost their website, they're not going to be happy. Enemy or not, the customers are going to be a tad peeved that they've vanished off the face of the internet...

      I've never had any dealings with ThePlanet, so can't comment on how they view/treat their customers, but my point about relative effort still stands; it's easier to provide hourly updates on a website than it is to deal with hundreds of customer complaints/questions.

    12. Re:Kudo to their support team by Anonymous Coward · · Score: 1, Insightful

      Yes, the updates are better now. However, in the hours immediately after the servers went down, communication was terrible.

      Their website had no indication of the fact that there was a problem, and no one was responding when I called their 1-866 customer service number. After waiting for half an hour, their customer service number was disconnected, and you couldn't even call the number any more.

      They're better now, but the ETA they gave for having things working by "mid-afternoon" sunday looks unlikely now, and in the meantime, my business is hemorrhaging users...

      They've had serious problems since I joined a few months ago, and always with absolutely no communication (I have to call their customer service to learn that all packets to their data center are being lost, etc). I am definitely backing up my stuff and switching providers the moment they come back online.

    13. Re:Kudo to their support team by Anonymous Coward · · Score: 0

      I can understand those customers being unhappy; I'd be unhappy too if my server went offline for a day.

      But the state of the art is replication and redundancy. Services should not go down even if some of the servers involved in providing those services go down. Google and Amazon provide services to help address this requirement. Anybody seriously affected by the outage at ThePlanet needs to think seriously about replication and redundancy.

      I note that ThePlanet themselves are in this category - their EV1 customer support systems went offline because they were housed completely in the H1 datacenter. Outages like this may not be preventable, but are hopefully very rare. ThePlanet are doing their best to bring all the servers back online. I hope that every affected customer comes to understand that they personally are responsible for keeping their service running, even if their hosting provider goes down.

    14. Re:Kudo to their support team by foobat · · Score: 1

      yeah you would of thought that sending out an email wouldn't be too hard either, but I had to see that my site was, down for a few hours, log in to their site to check out what's going on and submit a support ticket and end up poking around onto some forum post which I don't normally look at Then they post "don't submit a support request on it, we're inundated with calls, we know about it." Yeah, the only problem being that your customers don't and are left guessing.

    15. Re:Kudo to their support team by hendridm · · Score: 1

      I think you're confusing theplanet with the world, aka world.std.com.

      Or perhaps Everyone's Internet, which merged with The Planet in 2006.

    16. Re:Kudo to their support team by Fred_A · · Score: 2, Funny

      Just answer the phone with a recording that has a background of screams, fires, stuff falling down and cracking, electrical buzzing and a few sirens...

      "Hello, this is the Planet, our servers are down for the moment but we're working on it, thank you for your comprehension... Oh no, Smith is on fire ! Someone get him !!! *click*"

      --

      May contain traces of nut.
      Made from the freshest electrons.
    17. Re:Kudo to their support team by Jellybob · · Score: 1

      It's possible their e-mail notification system was in that data centre, preventing them from being able to get at the customer database, and notify people who were effected.

      If that wasn't the case though, I totally agree that it's sloppy not to be a bit proactive about things, and let the customers know whats going on.

    18. Re:Kudo to their support team by Isaac-Lew · · Score: 1
      I am definitely backing up my stuff and switching providers the moment they come back online.


      And you don't already have backups because...?

    19. Re:Kudo to their support team by Anonymous Coward · · Score: 0

      You're obviously not a customer...

    20. Re:Kudo to their support team by Pollardito · · Score: 1

      i think people would see right through your ploy by the 4th or 5th call...

    21. Re:Kudo to their support team by Caspian · · Score: 1

      I call bullshit.

      I maintain two servers at this facility. One was down for a few days and then came back up; it then went down again after their generator failed, but now it is back up again. The second server has been down since last Saturday. Theplanet.com refuses-- absolutely REFUSES-- to give the slightest amount of specific information on the status of this second server. They won't tell me why it's down but the other server is up. They won't give me an ETA for when it will come up. They won't tell me jack shit. I've submitted several trouble tickets, called at least three or four times, and talked to their techs via the chat feature on their website three or four times. They refuse to provide any specifics whatsoever. Meanwhile, I've not had access to my email for around five days now...

      These assholes are just as customer-hostile as any other company. All they give you is nice-sounding bullshit like 'we are working on the problem as quickly as we can'; they refuse to give any specific information whatsoever. I'm starting to completely loathe these people.

      --
      With spending like this, exactly what are "conservatives" conserving?
    22. Re:Kudo to their support team by Caspian · · Score: 1

      Ultimately, after a phone call to the CEO, I got service restored quickly. Evidently, during the problems, one of my hard drives developed a problem. I recovered the data from my RAID and everything seems to be getting back to normal now. So, in the end, theplanet.com came through for me.

      --
      With spending like this, exactly what are "conservatives" conserving?
  4. explosion? by Anonymous Coward · · Score: 5, Funny

    Lesson learned: don't store dynamite in the power room.

    1. Re:explosion? by Gazzonyx · · Score: 4, Funny

      Lesson learned: don't store dynamite in the power room. But they told me to take it out of the room with the fuel for the generators, the management offices, and HR department...
      --

      If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

    2. Re:Explosion? by wirelessbuzzers · · Score: 1

      It's also possible that they had a generator, and the gas or diesel fuel exploded.

      --
      I hereby place the above post in the public domain.
    3. Re:Explosion? by Gazzonyx · · Score: 4, Informative

      Actually, modern batteries should be sealed valve or Absorbed Glass Mat (AGM) that don't vent (too much) hydrogen. During a thermal runaway, they vent a tiny bit before killing themselves, but hydrogen doesn't become explosive until the concentration in an enclosed environment is ~4%. 4% of a data center is a fairly large area. I've heard of this happening in one data center where the primary and fail over (IIRC) HVAC units failed and no one had been on site for well over a month. IOW, every battery in the place started venting and it took over a month without any air circulation for it to get to 4%.

      --

      If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

    4. Re:Explosion? by RGRistroph · · Score: 3, Insightful

      Haven't you ever seen one of those gray garbage can sized transformers on a pole explode ? I used to live in a neighborhood that was right across the tracks from some sort of electrical switching station or something, they had rows of those things in a lot covered with white gravel. Explosions that were violent enough to feel like a granade going off a hundred yards away were not uncommon. I think most of them were simply the arcing of high voltage vaporizing everything and producing a shock wave, but sometimes the can-type transformers that are filled with cooling oil exploded and the burning oil sprayed everywhere.

      At one place I worked, every lightening storm my boss would rush to move his shitty old truck to underneath the can on the power pole, hoping the thing would blow and burn it so he could get insurance to replace it.

    5. Re:Explosion? by kyriosdelis · · Score: 1

      Geordi was probably trying to reverse the polarity as usual...

      --
      I don't mind dating a girl that has been with everybody, as long as she had a good shower afterwards.
    6. Re:Explosion? by kylegordon · · Score: 1

      "Early indications are that the short was in a high-volume wire conduit."

    7. Re:explosion? by guruevi · · Score: 3, Funny

      As always, you should've left it with support, they usually know what to do with it and that's where all the junk ends up anyway.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    8. Re:Explosion? by womenwantmefishfearm · · Score: 5, Interesting
    9. Re:Explosion? by Eil · · Score: 1

      According to what I've heard/read (large shaker of sale requierd), a transformer inside the data center exploded due to a short in one of the high-voltage circuits. The transformer was in some kind of electrical closet and the explosion knocked down three brick walls around it.

    10. Re:explosion? by BillTheKatt · · Score: 1

      Put it where it belongs...with marketing.

    11. Re:Explosion? by Bill,+Shooter+of+Bul · · Score: 1

      Yeah, I haven't killed AGM's yet. But the older Lead acids would sometimes catch on fire. Its a very scary site to see a plastic container of acid burning in a third world country with no fire extinguisher. If you do see, please check that the first rag you find to extinguish the fire is not soaked in diesel. Thats even more frightening.

      --
      Well.. maybe. Or Maybe not. But Definitely not sort of.
    12. Re:Explosion? by John+Hasler · · Score: 1

      Large oil-cooled transformers are quite capable of exploding.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    13. Re:explosion? by $0.02 · · Score: 2, Funny

      The did not store dynamite. They used Sony batteries.

      --
      If enithin kan gow rong it whil. (Murfey)
    14. Re:Explosion? by Anonymous Coward · · Score: 0

      I'm no expert in the field, but if the HVAC units failed, wouldn't all the hardware overheat and get somebody over there long before all that hydrogen built up?

    15. Re:Explosion? by Gazzonyx · · Score: 1

      That's what I thought, too, but I'm not sure of the details at all.

      --

      If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

    16. Re:Explosion? by Gazzonyx · · Score: 1

      Worse. Day. Ever!

      I'm sure in retrospect, it's kind of like a funny Three Stooges moment. I'd bet at the time you didn't see the humor in it, though.

      Another really nice thing about the AGMs is that you can leave them at any angle (upside down, on their side, etc) and they don't build up internal 'sludge' on the plates and die like the lead acid could. Disclaimer: Or so I've heard, I've not seen it nor have I used a Lead Acid battery.

      --

      If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

    17. Re:Explosion? by Gazzonyx · · Score: 1

      *sigh*

      Worst. Day. Ever!

      --

      If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

    18. Re:Explosion? by Bill,+Shooter+of+Bul · · Score: 1

      Well, worst day, not involving fire arms, ever.

      --
      Well.. maybe. Or Maybe not. But Definitely not sort of.
    19. Re:Explosion? by Viceroy+Potatohead · · Score: 1

      I'm a little late on this thread, but...

      That's exactly the failure point, IMO: a transformer. Years ago I worked for an electrical company, and we did a major service on a pharmaceutical plant. The garbage can sized ones are nothing. There were 10 sub-panels with transformers ranging from refrigerator sized to 2-3x refrigerator sized (that's 0.00439 Libraries of Congress, FYI), and another 7 of roughly garbage can to range size.

      I really don't know the power requirements of a DC this size, but I have my doubts it's measurably more (and probably less). The wall of input electricity monitoring/conditioning and breakers in the "power feed" room was huge (probably 30ft long, 7ft high, 3ft deep. Some of the sub-panels (with transformers) were hundreds of yards away. The building had several "power rooms".

      In the "power feed" room, though, there was one transformer, separated from the rest of the room by a stub wall, which was the feed to the mechanical room (which is where you do all the building related control (HVAC, enviromental logic, pumps, etc)).

      I've witnessed the results of a smallish transformer blowing up. It was under the street, and it blew a manhole cover off. It took about 6-8 seconds for the manhole cover to land after the transformer exploded. That's a lot of power. In the place I worked in, I suspect the transformer would have smeared $1+ million worth of equipment into the walls, knocked over the one wall that wasn't buried in the earth, and forced someone to dig up the feed and re-lay it from the service point.

      If it was a small enough place to only require a couple of transformers, I suspect they would have landed them in the power feed room, if bigger, they may easily have put one or more in the vicinity of the room. Anybody know the wattage requirements for a DC this size? It would be interesting to get a feel of the scale of this...

    20. Re:Explosion? by Alpha830RulZ · · Score: 1

      9000 servers * 500 watts/server =~ 4.5 megawatts, just for server power. AC probably adds 25-50%. So, for a high level estimate, 6 megawatts, or, at 120 volts, a 50,000 amp service.

      --
      I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.
  5. trying to read it by z_gringo · · Score: 5, Funny

    At this writing almost 2,400 pelople are trying to read it. Posting it on slashdot should help speed it up.

    --
    -- -- Warning. Do not stare directly at the sun.
    1. Re:trying to read it by Lorcas · · Score: 5, Funny

      Here's a new update from Urvish Vashi: To keep you up-to-date, some idiot posted this forum page on slashdot. Expect some slowdowns and interruptions trying to access this page. ps: **** you slashdot.

    2. Re:trying to read it by Flamora · · Score: 2, Funny

      That's not from Urvish, that's from the guys having to maintain the servers we run our forums off of.

    3. Re:trying to read it by Anonymous Coward · · Score: 0

      Let's blow up the next one! We can do it!

    4. Re:trying to read it by Anonymous Coward · · Score: 0

      Hey, you leave the Dallas data centers out of this.

    5. Re:trying to read it by Ksevio · · Score: 1

      Fortunately the bandwidth that would be used by 9000 servers is available for updating customers on the condition of their servers

    6. Re:trying to read it by CBravo · · Score: 1

      so people actually read tfa?

      --
      nosig today
  6. Recovery costs by Scuzzm0nkey · · Score: 5, Funny

    I wonder what the dollar value of the repairs will run? I'm sure insurance covers this kind of thing, but I'd love to see hard figures like in one of those mastercard commercials: Structural damage: $15000 Melted hardware: $70000 Halon refill: $however much halon costs Real-Life Slashdot effect: Priceless

    --
    People are like slinkies; useless but fun to watch when you push them down the stairs
    1. Re:Recovery costs by 42forty-two42 · · Score: 1

      Not to mention the cost of pulling all those consultants in, overnight, on a weekend... Also, only the electrical equipment (and structural stuff) was damaged - networking and customer servers are intact (but without power, obviously).

    2. Re:Recovery costs by macx666 · · Score: 4, Insightful

      Not to mention the cost of pulling all those consultants in, overnight, on a weekend...

      Also, only the electrical equipment (and structural stuff) was damaged - networking and customer servers are intact (but without power, obviously). I read that they pulled in vendors. Those types would be more than happy to show up at the drop of a hat for some un-negotiated products that insurance will pay for anyway, and they'll even throw in their time for "free" so long as you don't dent their commission.
    3. Re:Recovery costs by 42forty-two42 · · Score: 1

      The support thread talks about both, so I'd assume they (or their insurance, anyway) is paying out the nose for dozens of contractors to come in on short notice right about now.

    4. Re:Recovery costs by Deliveranc3 · · Score: 1

      But your insurance goes up when you call them in... over the long term it makes sense to pay for it youself.

    5. Re:Recovery costs by Yetihehe · · Score: 2, Interesting

      So maybe it would make more sense to just skip their insurance?

      --
      Extreme Programming - Redundant Array of Inexpensive Developers
    6. Re:Recovery costs by the_B0fh · · Score: 1

      Exactly!

      And, after a bit of thinking, you realize that since you always pay for it yourself, you don't even need insurance anymore, and can save that money!

      Hey, did you know that 40% of all sick leave is taken on Mondays and Fridays! We should do something about that.

    7. Re:Recovery costs by aaarrrgggh · · Score: 1

      Insurance doesn't usually do much in these cases-- it might cover equipment but not labor at double overtime. It is easy to spend $500k just in engineering and stop-gap repairs on an electical-only incident. Fire would usually cause twice that much damage for stop-gap repairs.

    8. Re:Recovery costs by Geak · · Score: 3, Funny

      Maybe they'll just haul the servers to another datacenter:

      Dollys - $500, Truck rentals - $5000, Labour - $10000, Sending internets on trucks - Priceless

    9. Re:Recovery costs by zippthorne · · Score: 2, Funny

      We are doing something about that. Now sick days and personal days are pooled into one unit. So your vacations have to compete with your potentially contagious illnesses. Everybody wins!

      --
      Can you be Even More Awesome?!
    10. Re:Recovery costs by Deliveranc3 · · Score: 1

      Well at some point you stop being able to pay,

      Dr.: " I need a loan."
      Bank: "Why?"
      Dr.: "I killed someone and they're suing me I don't have insurance but if you lend me the money I can settle, $5,000,000 should do it."
      Bank: "Cough..."

    11. Re:Recovery costs by GigsVT · · Score: 1

      Halon (well its CFC-free modern equivalents) would run you about $10,000 for a medium sized room.. Not cheap. Inergen is more costly to install, but cheaper to refill, since it's just nitrogen and CO2. Both are about equally safe for occupants in a room where it goes off
      (i.e. extremely safe).

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    12. Re:Recovery costs by Jellybob · · Score: 1

      From the looks of it, that's what they did for some of their critical services.

      Moving an entire data centre over night would be a bit of a crazy thing (although I've heard of it being done), but if you've got a few servers that your business can't run without it makes sense.

      Having a hot spare already in the other data center would make even more sense, but sometimes it takes a crisis to realise just how much you depend on something.

    13. Re:Recovery costs by Anonymous Coward · · Score: 0

      Halon? You think they use Halon? HAHAHAHAHAHBWAAHAHAHAHAHA. How naive.

    14. Re:Recovery costs by jesboat · · Score: 1

      They also are going to have to reimburse all the affected customers (assuming they all have SLAs). That's not too cheap either.

  7. Murphy's Law by Anonymous Coward · · Score: 1, Interesting

    While they claim redundant power, because of the nature of the problem they had to go completely dark. This goes to show that no matter how much planning you do, Murphy's Law still applies."

    And then they put it on the front page of Slashdot.

    It was Sunday, June 1, 2008. Xeon, my children, just don't belong in some places.

    (About the only thing missing from this real-world version of the story is a YouTube video of a halon fire suppression system going off. Damn ozone-protection regs :)

  8. This is BAD KARMA!! by Izabael_DaJinn · · Score: 5, Funny

    Clearly this is bad karma resulting from all their years of human rights violations....especially Tiananmen Square...oh wait--

    --
    Careful What You Wish For....
    1. Re:This is BAD KARMA!! by Anonymous Coward · · Score: 0

      Didn't George Bush come from Texas?

  9. What does a server room by iminplaya · · Score: 2, Funny

    have that can explode like this? All I can think of are all those cheap electrolytic caps. They really do put on quite a show, don't they? Put the transformer up on the roof, ok?

    --
    What?
    1. Re:What does a server room by Hijacked+Public · · Score: 3, Insightful

      Probably less traditional explosion and more Arc Flash.

      --
      "Sacrifice for the good of The State" - The State
    2. Re:What does a server room by masonc · · Score: 1

      Nothing. The electrical room lost three walls. No servers were damaged but the fire department wanted to play safe and did not allow them to power up the backup systems. I have servers there, and I agree with the fire people. The internet will survive without us for a day or so.

      --
      CM www.cometenergysystems.com Blog: http://caribbeanrenewable.blogspot.com/
    3. Re:What does a server room by plantman-the-womb-st · · Score: 1

      My first guess was fuel tanks for the backup generators. Though, those are usually diesel, and diesel fuel doesn't explode unless it's squeezed really hard. And also, this is just a guess. After all, I don't *actually* give two shits, but the comments are fun to read.

      --
      Say bad words about my book, in cold oatmeal, or I shall sue!
    4. Re:What does a server room by pclminion · · Score: 1

      So it wasn't Explosion Traditional, but more like Explosion Pro? Perhaps Explosion Pro Gold Edition?

    5. Re:What does a server room by legoman666 · · Score: 1

      have that can explode like this? All I can think of are all those cheap electrolytic caps. They really do put on quite a show, don't they? Put the transformer up on the roof, ok? You know the big cans you see on utility poles on the side of the road? Those are filled with mineral oil. It is flammable.

      Like this: http://www.youtube.com/watch?v=fzbQjd_Oo4Q

      You can hear the arcing, then see the oil fly everywhere as the transformer bursts, and then it catches on fire. Something like this, albeit on a smaller scale, probably happened.

    6. Re:What does a server room by CptNerd · · Score: 2, Interesting

      From what they were saying (I'm a customer, with both servers in that datacenter) it was a high-voltage transformer, so it might very well have been one that size. They did say it was much larger than the kind on power poles, but not indication of exactly how much it was handling. This is probably one of those times when architecture and esthetics took primary status over safety when the building was built. I would have thought a transformer as large as what blew up would be outside the building proper. At any rate, it's a major fustercluck that's going to take time to fix.

      Maybe in the post-mortem, someone will figure out it's time to start looking at ways to use less power, maybe switching to servers that use the lower-power CPUs that are coming out, so that the very high power infrastructure isn't as necessary. I have a feeling there'll be a "fire sale" on server subscriptions once a lot of customers leave (I'm not one of them, but I will likely swap one of mine for another at another location, much much later).

      --
      By the taping of my glasses, something geeky this way passes
    7. Re:What does a server room by enosys · · Score: 1

      Actually I think that since PCBs have been outlawed transformers use flammable oils and so there could have been an explosion.

    8. Re:What does a server room by Rub1cnt · · Score: 1

      Well, if they used the same UPS batteries that ATT Uverse did, I find it quite easy to see an explosion like this. ATT's streetside boxes in Houston had faulty batteries that weren't up to thermal stress in them. One of the boxes exploded and threw the steel door from the box across a 4 lane street and severely damaged a car parked on the other side in the path of the flying door. This prompted ATT to replace all the UPS batteries in the boxes, a project that is still under way in our area. The explosion bowed the wall in the bunker, so it must have been pretty big. Man...I wonder if the seismographs at the universities around the DC picked up the explosion... :)

      --
      Remember, it's not paranoia if they really ARE out to get you... :)
    9. Re:What does a server room by Lodragandraoidh · · Score: 1

      You are forgetting the UPS's --- essentially batteries, which contain explosive gas...

      --

      Lodragan Draoidh
      The more you explain it, the more I don't understand it. - Mark Twain
  10. Kevin Hazard? by Pyrex5000 · · Score: 3, Funny

    I blame Kevin Hazard.

    1. Re:Kevin Hazard? by Anonymous Coward · · Score: 0

      I blame Kevin Mitnick. Remember, he could start a war from a telephone so it's not a complete stretch that he could manage something like this.

  11. OVER 9000?!?! by Anonymous Coward · · Score: 0

    What does the scouter say about the servers affected? IT'S OVER 9000!!!!

    1. Re:OVER 9000?!?! by Anonymous Coward · · Score: 0

      WHAT, NINE THOUSAND?

  12. Helpful Slashdot! by quonsar · · Score: 5, Funny

    At this writing almost 2,400 people are trying to read it

    and as of this posting, make that 152,476.

    1. Re:Helpful Slashdot! by Amigori · · Score: 1

      and you know its bad when the Coral Cache is running slower than the nearly slashdotted forum itself. 3100+ users right now in the official forum.

      --
      "The quality of life is determined by its activites."--Aristotle
    2. Re:Helpful Slashdot! by FinchWorld · · Score: 1
      and as of this posting, make that 152,476.

      That many people bother to read more than the summary!?!

      --
      "I may be full of crap about this game, and I may be wrong, and that's fine." -Jack Thompson
    3. Re:Helpful Slashdot! by Volatar · · Score: 1

      Db/. (Death by Slashdotting) strikes again.

    4. Re:Helpful Slashdot! by Rub1cnt · · Score: 1

      I dunno, I still think taking the crippled new orleans ISP offline by slashdotting was hilarious...that resulted in the total blackout of all internet in the NOLA area when someone linked a video hosted by DirectNic directly to slashdot's front page. :) How to spike a T1 in two easy steps: 1. Post video for NOLA residents on front page of NOLA website. 2. link video to slashdot's front page. Optional step 3. Sit back and sip caffinated bev of choice.

      --
      Remember, it's not paranoia if they really ARE out to get you... :)
  13. Who's hosted on ThePlanet? by ZipK · · Score: 1

    So who's missing from Al Gore's Internet? Who do we know who's hosted on ThePlanet?

    1. Re:Who's hosted on ThePlanet? by gmack · · Score: 1

      keenspot for anyone who likes online comics...

    2. Re:Who's hosted on ThePlanet? by kevinbr · · Score: 1

      Me! I have had a server there for years from the RackShack days. I never expected such an outage from one of the world's largest hosting companies. My machine has been up three years straight and then a year ago it hung. But it was up since then.

    3. Re:Who's hosted on ThePlanet? by strredwolf · · Score: 1

      Keenspot's hosted at Hurricane Electric in California. We're still up, but we're doing some work to the servers and consolidating them a bit along with Comic Genesis (Keenspot's free-for-all service). So you may not find many of your comics for a short (one day) while.

      --

      --
      # Canmephians for a better Linux Kernel
      $Stalag99{"URL"}="http://stalag99.net";
    4. Re:Who's hosted on ThePlanet? by Anonymous Coward · · Score: 0

      FamilyReunion.com, the world's largest social networking site for family gatherings, is dark since the explosion. Weekends are usually very busy for us, too, while families gather online to plan their reunions. Of course, there's no good day of the week for something like this to happen. :)

    5. Re:Who's hosted on ThePlanet? by gmack · · Score: 1

      That explains that then. Lets hear it for twisted coincidence.

    6. Re:Who's hosted on ThePlanet? by imipak · · Score: 1
      b3ta.com!

      Oh, and thousands of dull corporate brochureware sites.

    7. Re:Who's hosted on ThePlanet? by bit+trollent · · Score: 1

      A startup company I used to work for hosted their servers at The Planet. I took a tour of The Planet's server room back when I was working for that company. It's a pretty cool place. I wonder what it's like after an explosion.

    8. Re:Who's hosted on ThePlanet? by linal · · Score: 1

      That would explain why I haven't been able to get my fix! where else am i meant to go to find images that have been photoshoped using paint?

    9. Re:Who's hosted on ThePlanet? by Flamora · · Score: 1

      The server room itself is fine since it's pretty far away from the power room.

      The power room is the one you'd wanna see, what with the three missing walls.

    10. Re:Who's hosted on ThePlanet? by hostyle · · Score: 1

      Eh, you left it up but hung for the last year? Now thats a well hung server ... in a nice rack too i suppose!

      --
      Caesar si viveret, ad remum dareris.
    11. Re:Who's hosted on ThePlanet? by bcrowell · · Score: 1

      I've had a couple of my sites (lightandmatter.com, theassayer.org) hosted there since they were Rackshack. IIRC, they renamed themselves to EV1, then got bought by The Planet. EV1 is the company that made themselves unpopular with the open-source community by paying protection money to SCO. I thought about changing hosts due to the SCO episode, but ended up not wanting to go through that kind of hassle and expense on a matter of principle. Their reliability has generally been pretty good. Support has been miserable on the few occasions when I've had to use it, but that seems par for the course with $100/mo webhosting. Now that The Planet has bought them, and SCO is essentially history, I'm starting to feel like the cooties have worn off, and I'll probably stick with them.

    12. Re:Who's hosted on ThePlanet? by ribit · · Score: 1

      Our site is down... Car Design News http://www.cardesignnews.com/ Our server is up (in Datacenter 2), but unfortunately our legacy EV1 nameservers are in Datacenter1... We'll look at getting redundant nameservers setup after this (never thought we would need them, and probably never will actually)

    13. Re:Who's hosted on ThePlanet? by jacquesm · · Score: 1

      /me raises hand...
      ww.com and a bunch of other stuff...

    14. Re:Who's hosted on ThePlanet? by Noctilux · · Score: 1

      I'm a customer of theirs - . I still have my little Sun Cobalt Raq 4i since 2001, when I got it from Rackshack (who became EV1, who became The Planet). I share space with lots of my artist friends. No big loss. :)

    15. Re:Who's hosted on ThePlanet? by aronschatz · · Score: 1

      I'm hosting in that DC that went down.

      www.aselabs.com
      www.ase.cc

      I'm pretty pissed. I'm going to get geographically redundant servers after this. I never expected ThePlanet to crap like they did.

      Another thing, it took HOURS before a response was given to the situation. I can't stand people saying the communication was good. It wasn't for the first few hours.

      When you call support and get a fast busy when your server is down, how are you to know that the place didn't blow up or go bankrupt and cut its losses? Very troubling.

      I never had ANY issue like this with EV1 (which ThePlanet ate).

    16. Re:Who's hosted on ThePlanet? by aronschatz · · Score: 2, Insightful

      ThePlanet dropped the ball on redundant DNS. They had all the EV1 nameservers at that DC which is completely ridiculous...

    17. Re:Who's hosted on ThePlanet? by TheNeticule · · Score: 1

      I know statcounter.com and proboards.com are 2 high profile sites hosted there.

    18. Re:Who's hosted on ThePlanet? by Anonymous Coward · · Score: 0

      b3ta.com!


      Ahhh, righty, I thought it was just my connection.

      Bugger.
    19. Re:Who's hosted on ThePlanet? by CptNerd · · Score: 1

      Nobody knows who I am, but I've been there since they were RackShack. I used to run a web-hosting company, and I had my customers on there, but I moved most of them off to other companies when I shut my company down in '04. Maybe this will be the incentive to get the rest moved off. Mules and 2x4's and all that...

      --
      By the taping of my glasses, something geeky this way passes
    20. Re:Who's hosted on ThePlanet? by thyrf · · Score: 1

      I'll second that. It's not even been two days and I'm already getting withdrawl.

    21. Re:Who's hosted on ThePlanet? by LostCluster · · Score: 2, Interesting

      RackShack was also the company with the "screwdriver incident" where the a tech working in the power room dropped the tool into a UPS and shorted out the facility. No customer data was lost, but the power outage caused them to be offline for more than a day.

    22. Re:Who's hosted on ThePlanet? by IDreamInCode · · Score: 1

      I had a bunch of sites affected by the DNS going out as well. For the important ones, I went in, changed their DNS at the registrar to use GoDaddy's DNS, then just point directly at their IP address. Works fine for a temporary fix. For my smaller clients, I told them what was going on, and they gladly just said they'd wait it out.

    23. Re:Who's hosted on ThePlanet? by Anonymous Coward · · Score: 0

      InternetDJ.com

      Forgot my slashdot pw, can't receive it due to mail servers down.
      Have been at rackshack, then over to theplanet. My site generates about 3MM pvs/month. About 5 years ago, they formatted my primary HDDs where all of our members live MP3s & videos are stored. Took days to restore from backup. They credited me like 10 bucks.

      When TP goes down, they take me down hard. Is Serverbeach any good? I'm preparing my offsite restore scripts if TP is not up by tomorrow afternoon.

    24. Re:Who's hosted on ThePlanet? by Jellybob · · Score: 1

      Another thing, it took HOURS before a response was given to the situation.


      From the sounds of it their support department is based in the data centre that went down, so I'd guess they were all stood outside waiting for the fire department to let them back in.
    25. Re:Who's hosted on ThePlanet? by MrLizardo · · Score: 1

      When you call support and get a fast busy when your server is down, how are you to know that the place didn't blow up or go bankrupt and cut its losses? Very troubling. Sorry to point out the obvious, but the place did blow up!
      --
      ^I'm with stupid.^
  14. Photos or informaton on building? by PPH · · Score: 3, Insightful

    Being in the power systems engineering biz, I'd be interested in some more information on the type of building (age, original occupancy type, etc.) involved.

    To date. I've seen a number of data center power problems, from fires to isolated, dual source systems that turned out not to be. It raises the question of how well the engineering was done for the original facility, or the refit of an existing one. Or whether proper maintenance was carried out.

    From TFA:

    electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding their electrical equipment room. Properly designed systems should never result in any fault to become uncontained in this manner.
    --
    Have gnu, will travel.
    1. Re:Photos or informaton on building? by Anonymous Coward · · Score: 0

      Blah blah blah. Can you say "hindsight is always perfect"?

      It's easy to blame the victim for not having taken precautions after something bad happens, but, let's face it, while we're supposed to learn from our mistakes, it's still human to make mistakes, and nobody will ever be perfect.

      Not you, either, FWIW, so get off of your high horse.

    2. Re:Photos or informaton on building? by gmack · · Score: 1

      Yeah but as we both know in these days of excessive growth that infrastructure tends to lag behind more visible changes.

      I'm sure at one point it was well designed.. but that was, I'm guessing, a few years ago and at a lot lower current and more than a few modifications ago.

      That's of course not counting the possibility of contractor stupidity.

        I don't know what makes people so freaking stupid when it comes to electricity. But then I'm still annoyed by a roofing contractor having two employee's lives saved in three days by the fact that whatever screw they were using melted before they got electrocuted.

    3. Re:Photos or informaton on building? by Gazzonyx · · Score: 1

      The short happened in a conduit (behind a wall, I'm assuming), FWIW.

      --

      If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

    4. Re:Photos or informaton on building? by p0tat03 · · Score: 4, Informative

      I'm a mechanical/electrical engineer by training, and what you're saying makes no sense to us. Mistakes are made in the laboratory, where things are allowed to blow up and start fires. Once you hit the real world the considerations are *very different*. While it's possible that this fire could be caused by something entirely unforeseeable (unlikely given our experience in this field), it's also possible that this was due to improperly designed systems.

      I don't suppose you'd be singing the same tune if this was a bridge collapse that killed hundreds. There's a reason why engineering costs a lot, and that's directly correlated to how little failure we can tolerate.

    5. Re:Photos or informaton on building? by Anonymous Coward · · Score: 0

      It's easy to blame the victim for not having taken precautions after something bad happens

      The engineering firm they hired is payed to not make mistakes. Sure, they're allowed mistakes too, as long as they're "novel". But an engineering firm is up shit's creek if there was an onsite accident and they didn't follow best practices.

    6. Re:Photos or informaton on building? by Burdell · · Score: 1

      Properly designed systems should never result in any fault to become uncontained in this manner. That's nice in theory, but this is the real world. I live in Huntsville, Alabama, and we had a power failure a couple of weeks ago when the water treatment facility had a power problem that resulted in a fire, blowing a transformer, and taking a whole substation (that feeds a big chunk of the south end of town) off-line (and leaving us without 36 million gallons per day of water pumping capacity as well). Basically, you can design and plan all you like, but unexpected stuff still happens and causes bigger problems than you ever expected.
    7. Re:Photos or informaton on building? by xaxa · · Score: 2, Interesting

      I was very impressed that a new bridge that was being extended over a busy railway line didn't cause any damage when they dropped it (they were lucky no trains were going under the bridge at the time, it's a very busy railway line -- about 40 trains in the next hour on a Sunday night, so you can imagine what it's like on a weekday. It did cause massive disruption, as they closed the line. And I don't know why they didn't have backup jacks if the failure of one left it unsupported.)

      I know it's not really relevant, but I didn't realise I was so interested in construction/engineering before reading about the past year's worth of posts on that blog (well, the construction ones. Not the "I was first on the new train!" ones. Though I admire the guy's dedication, to be awake at 4.00 to get the first ever train from the new Heathrow Airport station or whatever).

    8. Re:Photos or informaton on building? by PPH · · Score: 1

      I'm asking the questions from the point of view of a root cause analysis.

      Bridges just don't 'fall down' and switchgear doesn't just 'blow up'. It was either designed improperly or poorly maintained.

      The cost of proper engineering and maintenence over the life of a bridge, for example, can be factored in, given the requirement that we will not tolerate loss of life. If that turns out to be too high, we can place a value on each potential life lost to set a cap on those costs.

      Likewise, a server center can be designed and operated to some failure probability level and guaranteed minimum downtime. Failures will still occur, but at least we can go back and look at each one to see if it fell within the budget (proper design and operation should have prevented this) or outside it (we can't afford a system that will truly never fail). From the point of view of the customer, are they getting what they paid for?

      Based on the (limited) description of the failure effects (an uncontained fault), this might fall within the realm of an unsafe system, not just an unreliable data center. An inspection authority (the fire department, for example) could have mandated repairs, had the condition been detected prior to the fault. Or even had the facility closed, pending repairs.

      --
      Have gnu, will travel.
    9. Re:Photos or informaton on building? by aaarrrgggh · · Score: 2, Insightful

      This isn't that uncommon with a 200kAIC board with air-power breakers, if there is a bolted fault. Instantaneous delays. Newer insulated-case style breakers all have an instantaneous override which will limit fault energy,

      The other possibility was that a tie was closed and the breakers over-dutied and could not clear the fault.

      Odd that nobody was hurt though; spontaneous shorts are very rare-- most involve either switching or work in live boards, either of which would kill someone.

    10. Re:Photos or informaton on building? by Anonymous Coward · · Score: 0

      engineers are allowed mistakes? even novel ones?

      can i have a fax number i can send my resume to? my job's been hell ever since i made it through 8 years of university whilst being told that no mistake is ever acceptable. ever.

    11. Re:Photos or informaton on building? by dissy · · Score: 1

      Properly designed systems should never result in any fault to become uncontained in this manner. So when lightning strikes your home, and hits the power line as it enters the house, or on the pole if it happens to be behind your back yard, nothing bad will happen?

      Wow, you're hired!
    12. Re:Photos or informaton on building? by fishbowl · · Score: 1

      >So when lightning strikes your home, and hits the power line as it enters the house, or on the pole if it happens
      >to be behind your back yard, nothing bad will happen?

      There should be another home with totally separate power lines, and a different family living in it, in another city far away.

      Redundancy, man.

      --
      -fb Everything not expressly forbidden is now mandatory.
    13. Re:Photos or informaton on building? by John+Hasler · · Score: 1

      > So when lightning strikes your home, and hits the power line as it enters the house,
      > or on the pole if it happens to be behind your back yard, nothing bad will happen?

      If you want to spend enough money (as you would were you, for example, operating a 7500 server datacenter that claimed "redundant power") I could design such a system for you.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    14. Re:Photos or informaton on building? by dissy · · Score: 1

      >So when lightning strikes your home, and hits the power line as it enters the house, or on the pole if it happens
      >to be behind your back yard, nothing bad will happen?

      There should be another home with totally separate power lines, and a different family living in it, in another city far away.

      Redundancy, man. Of course! But that wasn't what the person I replied to was saying. Perhaps you should reply to them to let them know :)

      Properly designed systems should never result in any fault to become uncontained in this manner. Everyone else expects such things as possible, so plans around them (aka redundancy), but the GP stated this is not needed because a properly designed system can handle it.

      And while I am sure there are electrical systems out there that could handle the hundred of thousands of volts, and god knows how much amperage, lightning contains, a simple 3 phase 480 volt power grid is not one of these things :)

    15. Re:Photos or informaton on building? by aaarrrgggh · · Score: 1

      As much as it shouldn't happen, the law of unintended consequences rears its ugly head often enough when systems are designed for reliability first and safety a close second.

      Per the NEC (National Electrical Code), this type of consideration is acceptable where losses associated with a failure could exceed the value provided by the safety improvement. The key is that only qualified people work on the system.

      Ten to fifteen years ago, designing a main-tie-main switchboard for closed-transition operation, you would assume the duty rating to be that of the single-ended configuration, not parallel operation of both feeders. It was standard practice. Today, an overlapping transfer switch (performing the same basic function, only 1,000x faster) is considered to be a permanent tie between systems, so the short-circuit duty rating of the equipment must be considerably higher.

      The reason for the change is that a theoretical potential for a problem turned into a real problem one or two times, and "best practices" changed.

      This happens constantly. Underlying assumptions change in subtle ways, and an unforeseen (or purely academic) risk becomes real.

      I have a client that won't do an upgrade that we have recommended, at least in a time-line I am comfortable with. I know that this failure is likely to lead to a major outage, and I cross my fingers that nobody gets hurt when it happens. Other clients have similar nagging problems that they are aware of, but you can't fix everything in a day. Some problems take several years to get repaired. Welcome to the world of capital planning.

    16. Re:Photos or informaton on building? by Code.C6 · · Score: 1
      The Houston facilities had quite a few problems due to the EV1 startup not really having the organization a company of its rate of growth would need. They didn't acquire the best equipment money could buy. It seemed more like they would spend as little as possible on everything from network equipment and it's configuration to its employees.

      Doug is pretty charismatic, and likes to seed big thoughts into the minds of those around him, but all in all - it's about the bottom line with him. Quantity over quality.
      This event really doesn't surprise anyone.

      Since partnering with The Planet, the bar was raised. Legacy EV1 equipment will eventually be decommissioned and new top notch equipment installed as well as a more professional configuration of the network schema.

      I'm sure since this little issue, there will be more pro-active monitoring measures taken in the power rooms at each facility, other than just temperature readings being taken from the CRAC units.

    17. Re:Photos or informaton on building? by Anonymous Coward · · Score: 0

      Yes, once you hit the *real world* you have to worry about things like actually having enough capacity to run all the servers and having only limited space, limited money and (potentially) limited time to deal with the power supply. Engineering every single component for full redundancy and full isolation so there's absolutely no way the massive amounts of energy being pumped through there could possibly cause any kind of damage or cause any downtime would be monumentally expensive. It'd be so expensive, that almost nobody would build datacentres at all - it just wouldn't be financially viable unless you're charging every customer millions of dollars per year for a rack.

      I don't suppose you'd be singing the same tune if this was a bridge collapse that killed hundreds.

      Well no, but then nobody died in this incident either, did they? Even in a worst case scenario, only a few employees had the possibility of being killed by this accident. Expecting the same engineering standards as would be used in a bridge construction is stupid. Regardless, the fact that bridges do collapse and that the US continues to severely underfund bridge maintenance clearly demonstrates that even in areas where there's a high risk of loss of life, the cost/benefit analysis always comes out in favour of "doing the best you can with what you can afford to spend" rather than "it's too expensive to do it perfectly so let's not do it at all".

      I think you might be confusing the Real World with your lab. In a lab, you have complete control over every variable and there's no business pressure to produce a workable solution that actually fits within a budget. The Real World doesn't work like that.

    18. Re:Photos or informaton on building? by PPH · · Score: 1

      I think you might be confusing the Real World with your lab. In a lab, you have complete control over every variable and there's no business pressure to produce a workable solution that actually fits within a budget. The Real World doesn't work like that.

      More than likely, your Real World is the one where Bob's One Van Electric Service bids the job cheap. I work in a world where, when things like this happen, we ask what went wrong, what can we do to see that it doesn't happen again. We don't sell our van, close the business and stop answering the phone. In other words, one where customers pay licensed professional engineers to spec the parts correctly so they are suited to the available fault currents.

      Neither you nor I know if building or maintaining this system correctly would really have cost the customer any more. I've seen gold-plated systems delivered that didn't do what they were spec'd to do. So it this case, the customer could afford to pay. But they got crap anyway.

      --
      Have gnu, will travel.
    19. Re:Photos or informaton on building? by p0tat03 · · Score: 1

      I think you might be confusing the Real World with your lab. In a lab, you have complete control over every variable and there's no business pressure to produce a workable solution that actually fits within a budget. The Real World doesn't work like that.

      The licensed professional engineer has a moral, ethical, and legal obligation to produce safe products. Regardless of job security pressures, the engineer can, and *must* push back against any attempts by management to compromise the safety of a system for a quick buck. Cheap shit is allowed to fail (you get what you pay for), but under no circumstances should it EXPLODE.

      Doing things on the cheap does not excuse the engineer from improper practice. Yes, using inferior designs, materials, and methodologies can get you a cheaper product, but in the end its failure modes still need to be enumerated and controlled.

  15. sergeant schlock by Anonymous Coward · · Score: 0

    schlockmercenary.com is down. apparently they lost.

    1. Re:sergeant schlock by Anonymous Coward · · Score: 0

      Or rather, someone got even.

  16. Blank Label Comics, Schlock Mercenary by strredwolf · · Score: 1, Informative

    Schlock Mercenary, the popular webcomic, as well as most of the Blank Label Comics collective is down. Schlockmercenary.com now points to a holder site, and Sunday's comic is on the Livejournal community at http://schlocktroups.livejournal.com./

    --

    --
    # Canmephians for a better Linux Kernel
    $Stalag99{"URL"}="http://stalag99.net";
    1. Re:Blank Label Comics, Schlock Mercenary by strredwolf · · Score: 2, Informative
      --

      --
      # Canmephians for a better Linux Kernel
      $Stalag99{"URL"}="http://stalag99.net";
    2. Re:Blank Label Comics, Schlock Mercenary by dhanes · · Score: 1

      egads! Thankyou, today hasn't been complete without my Schlock fix.

      --
      Wait, What?
  17. Explosion? by mrcdeckard · · Score: 3, Insightful


    The only thing that I can imagine that could've caused an explosion in a datacenter is a battery bank (the data centers I've been in didn't have any large A/C transformers inside). And even then, I thought that the NEC had some fairly strict codes about firewalls, explosion-proof vaults and the like.

    I just find it curious, since it's not unthinkable that rechargeable batteries might explode.

    mr c

    --
    "Physics is like sex. Sure, it may give some practical results, but that's not why we do it." - R. Feynman
  18. Coral cached LOFI status page by martyb · · Score: 4, Informative

    Kudos to them for their timely updates as to system status. Having their status page listed on /. doesn't help them much, but I was encouraged to see a Coral Cache link to their status page. In that light, here's: a link to the Coral Cache lofiversion of their status page:

    • http://forums.theplanet.com.nyud.net:8080/lofiversion/index.php/t90185.html
    1. Re:Coral cached LOFI status page by pcgabe · · Score: 1

      I'm clicking on your link, but nothing is happening. Am I doing it wrong?

      --
      Don't put advice in your sig.
  19. Lithium Batteries in their UPS setup?? by Zymergy · · Score: 2, Interesting

    I am wondering what UPS/Generator Hardware was in use?
    Where would the "failure" (Short/Electrical Explosion) have to be to cause everything to go dark?
    Sounds like the power distribution circuits downstream of the UPS/Generator were damaged.

    Whatever vendor provided the now vaporized components are likely praying that the specifics are not mentioned here.

    I recall something about Lithium Batteries exploding in Telecom DSLAMs... I wonder if their UPS system used Lithium Ion cells?
    http://www.lightreading.com/document.asp?doc_id=109923
    http://tech.slashdot.org/article.pl?sid=07/08/25/1145216
    http://hardware.slashdot.org/article.pl?sid=07/09/06/0431237

    1. Re:Lithium Batteries in their UPS setup?? by Anonymous Coward · · Score: 2, Informative

      If you'd read the linked status report, you'd see that there was a short in a high voltage line. They are dark because the fire department told them not to power up their back-up generators.

    2. Re:Lithium Batteries in their UPS setup?? by RGRistroph · · Score: 1

      A stationary, installed UPS would never use lithium batteries. Unless weight is a factor, they do not compete with lead-acid batteries.

    3. Re:Lithium Batteries in their UPS setup?? by ajlitt · · Score: 1

      A short that causes an arc to form in a conduit carrying high voltage at high currents is enough to cause quite an explosion without any solid or liquid explosives.

    4. Re:Lithium Batteries in their UPS setup?? by Anonymous Coward · · Score: 1, Informative

      Having worked there previously, I can tell you their battery systems use (literally) tons of deep cycle lead acid batteries. Once a year they get this badass huge shipment of Sears Craftsman deep cycle car batteries. Each bank of batteries was.... eh... roughly the size of a Cooper Mini. The process of replacing them was pretty amusing to watch, if only for the fact that the UPSes were so incredibly heavy that they need their own reinforced concrete flooring because of the weight.

  20. kaboom by rarel · · Score: 2, Funny

    Clearly these Sony batteries had to be replaced one way or another...

  21. Re:More planning could have prevented this by AudioInfecktion · · Score: 1

    They should of also had two separate demarcation points for power as well, with a trow switch on both sides of the backup to have physical separation from the farm and the grid, only to be connected up when something like this happens. When you have that many servers, it's the only thing that makes sense.

  22. Re:More planning could have prevented this by Hijacked+Public · · Score: 5, Informative

    It is often the case that transformers are kept apart from all other components And that appears to have been the case here. Had you read the article, or even the unusually accurate headline, you would know that the 9,000 servers were 'dropped' rather than 'blown apart'. They are still physically with us, they are just dropped from service because they don't have any power because the power supply blew up.

    Further, the 9,000 servers were physically, geographically, isolated enough from the power supply (which is what exploded) to be protected. We know this to be the case because we read the article and headline and understood them and they indicate that the 9,000 servers were not blown up.

    To put it another way, only the power supply was damaged by the explosion, the servers were not. Probably there was no way to isolate the power from its own explosion. The servers, however, we protected.

    So, in summary, the 9,000 servers were not blown up. Only the power.

    The power is off due to the explosion but there servers themselves are A-OK.
    --
    "Sacrifice for the good of The State" - The State
  23. FamilyReunion.com by Anonymous Coward · · Score: 0

    We run the largest social networking site for family gatherings and saw all our domains and email services go down as it happened. The issue was indeed a power transformer that exploded and caused the fire. Thankfully no one was hurt and no servers were damaged including FamilyReunion.com, but the fire department ordered all power to the data center cut to reduce the possibility of subsequent fire hazard.

  24. 5 servers, 5 cities, 5 providers by Anonymous Coward · · Score: 2, Insightful

    I have 5 servers. Each of them is in a different city, on a different provider. I had a server at The Planet in 2005.

    I feel bad for their techs, but I have no sympathy for someone who's single-sourced, they should have propagated to their offsite secondary.

    Which they'll be buying tomorrow, I'm sure.

    1. Re:5 servers, 5 cities, 5 providers by aronschatz · · Score: 4, Insightful

      Yeah, because everyone can afford redundancy like you can.

      Most people own a single server that they make backups of in case of it crashing OR have two servers in the same datacenter in case one fails.

      I don't know how you can easily do offsite switch over without a huge infrastructure to support it which most people don't have the time and money to do.

      Get off your high horse.

    2. Re:5 servers, 5 cities, 5 providers by Anonymous Coward · · Score: 0

      If you can't afford two dedicated servers in different locations, you should not be running around convincing people to make you a critical process in their business.

      If you're that small, get out of the business.

  25. The Planet explodes by Anonymous Coward · · Score: 1, Funny

    Never thought I'd see that headline.

    1. Re:The Planet explodes by CptNerd · · Score: 1

      Never thought I'd see that headline. You never read Superman comics? Planets exploding all the time, even The Daily Planet.

      --
      By the taping of my glasses, something geeky this way passes
  26. So that's why the explosion didn't wake me up. by Flamora · · Score: 1

    I like right across the street from one of The Planet's Dallas data centers, so when I saw this article, I was like "So why wasn't I woken by an exploding generator?" Makes sense now.

    Of course, I still have to go to work on Wednesday now, too. Bah.

  27. Re:More planning could have prevented this by Gazzonyx · · Score: 2, Informative

    No, the power was off because the fire department told them to shut it off (during an investigation, I assume). The explosion was in a high power conduit - I'm sure it severed all the lines inside the conduit itself. This is one of those things that couldn't easily be avoided at a single site. But, if your server is of any importance, you do have a colo, right?

    --

    If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

  28. Servers by Rich2k · · Score: 1

    My main server is located there and it's killing me waiting for it to come back up again. The abuse some people are getting though for Planet customers for not having 'switched to their backup data centres' is amazing. Some of us are small fry, we can't afford to run multiple hosting infrastructures.

  29. More details on the outage by 1sockchuck · · Score: 2, Informative

    Data Center Knowledge has a story on the downtime at The Planet, summarizing the information from the now Slashdotted forums. Only one of the company's six data centers was affected. The Planet has more than 50,000 servers in its network, meaning that one on five customers are offline.

    1. Re:More details on the outage by filmotheklown · · Score: 2, Informative

      Not Totally True.

      Many customers also use their DNS service, (the EV1 DNS), so while there are 9000 servers physically 'off' there are many more effectively 'black' as the conical names no longer resolve.

      I'm one of those customers. We're a very small business as are many of the other customers of The Planet (formerly Everyones Internet -- EV1.net)

      I can still access our sever via the IP address, but not via the conical name.

      While we host our site on a private server, many of the servers of other customers are resellers and with the DNS service, I could easily see how 10s of thousand of actual sites are down beyond the 9000 physical servers.

      --
      Filmo The Klown
    2. Re:More details on the outage by hostyle · · Score: 1

      ... as the conical names no longer resolve. ThePlanet is of conical persuasion? And there was me thinking those Flat Earth people were strange ...
      --
      Caesar si viveret, ad remum dareris.
    3. Re:More details on the outage by Anonymous Coward · · Score: 0

      Hosting DNS for second level domains at only one location is an unforgivable mistake. The "at least two servers in separate networks" recommendation is supposed to prevent exactly this kind of problem. Even so, by now there should be route announcements to another data center for the networks with the name servers, so that at least the customers who have redundant setups can switch to their alternate servers. You can't really prevent outages, but you can limit the scope of the damage. I guess this will teach them not to claim 100% uptime in their online data center tour...

    4. Re:More details on the outage by Anonymous Coward · · Score: 0

      We have 8 servers with The Planet and 3 of them (including our primary off-site authentication) were located in the H1 center. Better believe that I'm gonna find out where each one of the other servers are so I can have two backups just in case.

    5. Re:More details on the outage by gnuman99 · · Score: 2, Informative

      Shouldn't they provide, you know, primary AND secondary DNS? And in that case, wouldn't the primary AND secondary be hosted in *different* data centers?

      DNS is *THE* *MOST* critical part of infrastructure. If the HTTP server fail, ok. If mail fails, ok. If data center explodes, you still have DNS so anyone sending email will just be stuck for a few days. But if DNS is offline, then email is offline. You are off the internet.

      I've had a server motherboard die and it took a few days to get new one installed and running. But my DNS was running because backups were on different IPs and places.

      I have to say, this is a BIG no-no for them not to provide proper DNS services.

    6. Re:More details on the outage by MadMidnightBomber · · Score: 1

      Yeah - funny how everyone feels the need for a backup MX but no-one has offsite DNS. If they can't contact your mail server, they will queue mail for you. If there's no MX record, mail starts bouncing RIGHT NOW.

      At my last place they tended to put the same kind of servers next to each other in racks, sharing power and ethernet switch - ie. both of the campus DNS servers - so the "racks were tidy".

      --
      "It doesn't cost enough, and it makes too much sense."
  30. Houston affected, Dallas data center unaffected by MaineCoon · · Score: 1

    Fortunately I'm hosted at the Dallas facility, and this event was at the Houston one.

    --
    Hunt your preferred prey at Aliens vs Predator MUD. Join the war at avpmud.com port 4000
  31. I know this site is hosted on there by Anonymous Coward · · Score: 0

    http://www.vcdquality.com

  32. Huh??? No amount of planning? by www.sorehands.com · · Score: 1, Informative

    Really? What about a little known thing called colocation?

    At least with colocation, if the building gets blown up by terrorists, the servers are still running somewhere else.

    1. Re:Huh??? No amount of planning? by Flamora · · Score: 1

      Colocation requires having another set of hardware that you own (instead of renting from a server provider like The Planet), paying for colocation space, power, bandwidth, etc.

      Most people aren't big enough or don't consider their web presence important enough to have a colocated solution.

    2. Re:Huh??? No amount of planning? by Anonymous Coward · · Score: 0

      I modded you flamebait because there's no -1, Stupid moderation option.

      Where exactly do you think your "colocated" servers are running? In a datacentre somewhere, perhaps? You mean like... ThePlanet's datacentre?

      Or are you implying that ThePlanet should also be responsible for implementing fully redundant mirrored failover for every single one of the 9,000 customer servers in that particular datacentre?

      Most of the people there would much rather have a day or two of downtime once in a blue moon (or indeed, much less frequently) than pay what it'd cost for ThePlanet to provide full redundancy to another facility. Further, most of the people that do want full redundancy badly enough to pay for it would want to use two different providers (which also provides redundancy for administrative mistakes).

    3. Re:Huh??? No amount of planning? by Alioth · · Score: 1

      The datacentre involved *is* a colocation site (i.e. somewhere you can rent a rack, or at least it was when it was EV1Servers).

      Having multiple sites isn't called colocation by the way, colocation is the act of putting your kit in a commercial datacentre rather than self-hosting. Colocation only means if your building burns down, your server is OK. However, if the colo burns down... well, what you actaully need is two sets of servers at two, geographically distant sites.

  33. a bit wrong by unity100 · · Score: 2, Insightful

    its not the 'no name' hosting resellers who host at the planet. no name resellers do not employ an entire server, they just use whm reseller panel that is being handed out by a company which hosts servers there.

    1. Re:a bit wrong by GraZZ · · Score: 1

      True and false. I used to run a "no name", but we still managed our own server. It depends on how hands on you want to be. Also how many add-on services you want to be able to offer (custom services, game servers, etc).

    2. Re:a bit wrong by billcopc · · Score: 2, Funny

      Hey hey! I'm a no-name reseller, but I run my own servers, none of this turnkey reseller bullshit. I am root, and I'm goddamned proud of it :)

      --
      -Billco, Fnarg.com
  34. Did Peter Gabriel move his server to The Planet? by CrimsonScythe · · Score: 1

    If so, that could explain the cause of the explosion...

    --
    The view was horrible and the smell was even worse; Julie severely regretted becoming a proctologist.
  35. And I suppose... by Mopar93 · · Score: 0
    ... the price of gasoline will now jump at the pumps!

    -Maurice

    --
    FixingTheWeb.com Helping to keep the bad guys out...
  36. Correction by Gazzonyx · · Score: 4, Informative

    Sorry for replying to myself, I don't think I made my post clear; the backup power is not on (the mains was blown to bits), because the fire department told them to shut it off.

    --

    If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

  37. This ain't funny, I'm affected. by SoundGuyNoise · · Score: 1
    I'm nervous because we have a major event in one week, our TV ads are starting to run tomorrow (Monday) and my website is down!

    Everyone please let me know when iyfwrestling.com is back up and running!

    --
    You never expect irony, do you?
    Want to be a professional wrestler? Visit www.iyfwrestling.com
    @iyfwrestling
    1. Re:This ain't funny, I'm affected. by osssmkatz · · Score: 1

      Monitor it yourself using the status link provided in this thread. Monday is barely enough time to find another host. Consider stopping the ad run.

      --Sam

    2. Re:This ain't funny, I'm affected. by SoundGuyNoise · · Score: 1

      Can't stop the ad run. It's for an event, meaning paid admission.

      --
      You never expect irony, do you?
      Want to be a professional wrestler? Visit www.iyfwrestling.com
      @iyfwrestling
    3. Re:This ain't funny, I'm affected. by SoundGuyNoise · · Score: 1

      My site is back up. Now the TV ad can run tonight without any glitches.

      --
      You never expect irony, do you?
      Want to be a professional wrestler? Visit www.iyfwrestling.com
      @iyfwrestling
  38. Knocking down three walls... by youthoftoday · · Score: 1

    If this were DreamHost there would be a few flippant words in the official statement but pages and pages of photos...

    --
    -1 not first post
    1. Re:Knocking down three walls... by ajlitt · · Score: 2, Funny

      Hopefully an explosion would jostle out the clog that makes their Rails pipes run slowly.

    2. Re:Knocking down three walls... by maxume · · Score: 1

      Ruby?

      I kid, I kid...

      --
      Nerd rage is the funniest rage.
  39. Crazy rasberry ants maybe? by Fallen+Andy · · Score: 1
    See e.g. here. Yet another reason to move datacenters to more northern (and colder) climes...

    Andy

    1. Re:Crazy rasberry ants maybe? by pfleming · · Score: 1

      Except that with global warming the datacenters will have to be manned by polar bears...

  40. No servers were damaged by cptnapalm · · Score: 4, Funny

    They need to build the building out of what ever they build the servers out of.

  41. It must have been HACKERS by Eudial · · Score: 4, Funny
    --
    GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
    1. Re:It must have been HACKERS by labalicious · · Score: 1

      Thanks. I need a new pair of underwear. *Disclosure: My speakers were on and the volume was pretty high.

  42. Oh, that's why Darklyrics.com is down by Cyberax · · Score: 1

    This morning I was wondering what has happened with Darklyrics.com

    Turns out they were hosted on ThePlanet!

    http://toolbar.netcraft.com/site_report?url=http://www.darklyrics.com

  43. Re:More planning could have prevented this by Anonymous Coward · · Score: 0

    so let me get this straight.. the servers were /not/ blown up? ;)

  44. Re:More planning could have prevented this by ottawanker · · Score: 5, Insightful

    so you're agreeing with me. The servers getting blown up was a huge mistake, one that certainly could have been avoided with a little proper planning. you are a fucking moron

  45. Re:More planning could have prevented this by Anonymous Coward · · Score: 0

    This is just another example of someone writing a critical response without reading the story.

    Well done.

  46. Transformers by BovineSpirit · · Score: 1

    What what I remember from Uni the coils of wire in a transformer want to be straight. When a transformer has power flowing through it the coils can exert some fairly serious pressure. Big transformers tend to be encased in concrete for this reason. Maybe there was a short, a big current flowed through the secondary coil and the force was enough to over come some weak restraints.

    Can anyone give a less arm wavey description of this? Or have I misunderstood?

  47. ev1servers explosion / fire by slashkitty · · Score: 1
    I remember about 5 years ago a transformer fire at ev1servers (not sure if this is the same datacenter?) where a few of my servers were located. Luckly, that one was outside the building, and backup power took over. They had additional backup generators shipped into and kept the whole datacenter running for 4(?) days or so while the transformer was replaced.

    The power to the server was never lost, and I didn't even find out about it till a couple of days into it.

    --
    -- these are only opinions and they might not be mine.
    1. Re:ev1servers explosion / fire by Anonymous Coward · · Score: 0

      Yes, it must be the same, we also had our server back then with ev1servers and have not changed anything since ev1 was bought by The Planet.

    2. Re:ev1servers explosion / fire by VGPowerlord · · Score: 1

      Not necessarily, EV1 migrated machines from one datacenter to another at some point in the past. I just couldn't tell you when.

      --
      GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
    3. Re:ev1servers explosion / fire by 3seas · · Score: 1

      Yes the two ns.ev1servers.net are effected.

  48. Burning Mail Servers? by Sadsfae · · Score: 1

    I once caught a production exchange server on fire due to a faulty wire connected to one of the DLT drives.

    The motherboard was scorched so bad that when you tapped it burnt flakes that used to be the transistors fell off.

    I knew something was up as soon as I smelled burning. The smoke pretty much gave it away, though.

    I don't work there any more.

    --
    Have a squat over at the hobo house.
  49. Monty Python by Sentry21 · · Score: 4, Funny

    electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding their electrical equipment room. But the fourth wall stayed up! And that's what you're getting, son - the strongest data centre in all of Texas!
    1. Re:Monty Python by Anonymous Coward · · Score: 0

      electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding their electrical equipment room. But the fourth wall stayed up! And that's what you're getting, son - the strongest data centre in all of Texas! Damn, now that's funny!
  50. Re:More planning could have prevented this by kylegordon · · Score: 1

    The power is off due to the explosion but there servers themselves are A-OK.

    And more to the point, the rest of the backup power systems were taken offline at the request of the fire brigade.

    It's a common feature to have power shut off in the event of a fire. The Fire Service don't want to be hosing down live cabling after all. It's also why you shouldn't use lifts. Everyone thinks it's "in case the fire reaches the lift". It's actually 'cos the power is likely to be cut off at any moment (the office I work in cuts the power after 3 minutes)
  51. Re:More planning could have prevented this by Anonymous Coward · · Score: 0

    So, in summary, the 9,000 servers were not blown up. Only the power.

    The power is off due to the explosion but there servers themselves are A-OK. Brought to you by the Department of Redundancy Department.
  52. Obligatory INSOC by Stormwatch · · Score: 1



    Turn up the power
    This is the hour
    From every tower
    A million watts of love

    There comes a time when
    You need a good friend
    But all that you have
    Is that glowing screen

    You know you could fly
    Your hate a run high
    But you've been squeezed in
    To that same old scene

    You know what I mean

    Turn up the power
    This is the hour
    From every tower
    Shout it from above

    Turn up the power
    This is the hour
    From every tower
    A million watts of love

    By turning that switch
    You're finding your niche
    And you could tell them
    Where to put the advice

    You should get back in
    It's time to jack in
    We'll help you hack in
    To that glowing life

    You won't have to think twice

    Turn up the power
    This is the hour
    From every tower
    Shout it from above

    Turn up the power
    This is the hour
    From every tower
    A million watts of love

    .

  53. First ISP by Anonymous Coward · · Score: 2, Informative

    You're thinking of The World. See http://www.theworld.com/about/internet.shtml.

  54. Re:More planning could have prevented this by cecil_turtle · · Score: 4, Informative

    ThePlanet has 5 or more datacenters. The cost and complexity of doing a full blown physically separated 2N power system at every datacenter is far more expensive than taking the chance of having to issue a credit against an SLA. Not to mention that when a fire is involved, the fire department has full authority and may instruct you to cut all power anyway - they are coming in to an unknown situation and won't risk their own people just because you say the other power system is isolated.

    Another issue is the complexity of a full blown 2N power system is likely to cause more outages due to human error during routine maintenance over an N+1 system. Complete 2N power systems from grid and backup sources all the way to the servers with no single point of failure (transformers, wiring, switching, PDUs, UPSs, etc.) are enormously complex and expensive, so it's not "the only thing that makes sense". I assure you issuing a one-day pro-rated credit to all your customers is cheaper.

  55. Service Sucked for those affected by Anonymous Coward · · Score: 0

    The Planet buried the news about this in their forum instead of publishing something in the service logins or on the home page. Customers could not log in to the system to see what was wrong because this went down as well. Only after midnight (6 hours later) did they email out details to affected servers. We were lucky we had customers more attentive who informed us a few hours after it happened and found out through the online sales chat which was trying to get us to uy a new server when we visited the site. Very effective use of personnel.

    ETA has continuously been pushed back. Now that the SLA is maxed at 100% credit they seem to be in no rush to get back online. We get a few hundred dollar credit next month (as long as they don't claim force majeur) while our client who's server is affected will lose over $5000 in sales.

    Service has suffered ever since the merger with EV1 and this is just another such example.

    1. Re:Service Sucked for those affected by clare-ents · · Score: 2, Insightful

      SLA is not a substitute for business insurance.

      If your business loses $1000/minute while it's offline, get a quote for insurance that pays out $1000/minute while you're offline. Alternatively if you're happy self insuring take the loss when it happens.

      It's almost as if people believe that SLAs are a form of service guarantee instead of a free very bad insurance deal.

      --
      Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. (Einstein)
    2. Re:Service Sucked for those affected by NuclearDog · · Score: 1

      If you're making/losing that much money due to your web presence, then it's your own fault for not having a redundant server set up. If you're making $1000/day in sales, you'd better consider shelling out another $80 or so for a second box for exactly this kind of situation.

      There's only so much planning that can be done, because every so often a meteor's gonna come done and put a hole right through the middle of your server, and it's not up to your host to have 6" titanium reinforced roofing or anything. If your hosting is that important, BUY SOME REDUNDANCY.

      As well, I've found the service and support has become significantly better since The Planet took over, but maybe it's just because I have reasonable expectations. Most of the people complaining seem to be the "OMG I'M LOSING TEN THOUSAND DOLLARS A DAY ON MY $80 HOSTING PLAN! YOU GUYS NEED TO MAKE IT WORK! NOW!" types.

      ND

      --
      This statement is forty-five characters long.
  56. Ignorant firemen = single point-of-failure by JoeShmoe · · Score: 4, Interesting


    Everyone loves firemen, right? Not me. While the guys you see in the movies running into burning buildings might be heroes, the real world firemen (or more specifically fire chiefs) are capricious, arbitrarty, ignorant little rulers of their own personal fiefdom. Did you know that if you are getting an inspection from your local firechief and he commands something, there is no appeal? His word is law, no matter how STUPID or IGNORANT. I'll give you some examples later.

    I'm one of the affected customers. I have about 100 domains down right now because both my nameservers were hosted at the facility, as is the control panel that I would use to change the nameserver IPs. Whoops. So I learned why I need to obviously have NS3 and ND4 and spread them around because even though the servers are spread everywhere, without my nameservers none of them currently resolve.

    It sounds like the facility was ordered to cut ALL power because of some fire chief's misguided fear that power flows backwards from a low-voltage source to a high-voltage one. I admit I don't know much about the engineering of this data center, but I'm pretty sure the "Y" junction where AC and generator power come together is going to be as close to the rack power as possible to avoid lossy transformation. It makes no sense why they would have 220 or 400 VAC generators running through the same high-voltage transformer when it would be far more efficient to have 120 or even 12VCD (if only servers would accept that). But I admit I could be wrong, and if it is a legit safety issue...then it's apparently a single point of failure for every data center out there because ThePlanet charged enough that they don't need to cut corners.

    Here's a couple of times that I've had my hackles raised by some fireman with no knowledge of technology. The first was when we switched alarm companies and required a fire inspector to come and sign off on the newly installed system. The inspector said we needed to shut down power for 24 hours to verify that the fire alarm would still work after that period of time (a code requirement). No problem, we said, reaching for the breaker for that circuit.

    No no, he said. ALL POWER. That meant the entire office complex, some 20-30 businesses, would need to be without power for an entire day so that this fing idiot could be sure that we weren't cheating by sneaking supplimentary power from another source.

    WHAT THE FRACK

    We ended up having to rent generators and park them outside to keep our racks and critical systems running, and then renting a conference room to relocate employees. We went all the way to the country commmissioners pointing out how absolutely stupid this was (not to mention, who the HELL is still going to be in a burning building 24 hours after the alarm's gone off) but we told that there was no override possible.

    The second time was at a different place when we installed a CO alarm as required for commercial property. Well, the inspector came and said we need to test it. OK, we said, pressing the test button. No no, he said, we need to spray it with carbon monoxide.

    Where the HELL can you buy a toxic substance like carbon monoxide, we asked. Not his problem but he wouldn't sign off until we did. After finding out that it was illegal to ship the stuff, and that there was no local supplier, we finally called the manufacturer of the device who pointed out that the device was void the second it was exposed to CO because the sensor was not reusuable. In other words, when the sensor was tripped, it was time to buy a new monitor. You can see the recursive loop that would have devloped if we actually had tested the device and then promptly had to replace it and get the new one retested by this idiot.

    So finally we got a letter from the manufacturer that pointed out the device was UL certified and that pressing the test button WAS the way you tested the device. It took four weeks of arguing before he finally found an excuse that let him safe face and

    --
    -- I wonder which will go down in history as the bigger failure: the War on Drugs or the War on Filesharing
    1. Re:Ignorant firemen = single point-of-failure by Anonymous Coward · · Score: 1, Informative

      Where the HELL can you buy a toxic substance like carbon monoxide, we asked. Not his problem but he wouldn't sign off until we did. After finding out that it was illegal to ship the stuff, and that there was no local supplier,

      Not that I don't sympathize with your predicament, but carbon monoxide is routinely used in chemistry labs around the country (I did in graduate school). Call up Aldrich and they will happily ship you just about any chemical.

      That being said, carbon monoxide has the potential to be extremely toxic and should not be used without proper training and safety equipment. That's why you have carbon monoxide detectors!

    2. Re:Ignorant firemen = single point-of-failure by Anonymous Coward · · Score: 0

      the fire marshal shut it down to protect the electricians fixing it you idiot!

      but you think its okay to backfeed with a generator and kill people because -your website is down.

      read up on electricity and how it works.

    3. Re:Ignorant firemen = single point-of-failure by Anonymous Coward · · Score: 0

      You know Joe, maybe if you'd had firefighter personnel that you cared about killed by electricity that everyone 'thought was off', you'd be a little more understanding when a chief says something needs to be turned off.

      I work with firefighters all the time (as a volunteer/communications specialist), and I have NO problem with them erring on the side of caution during an incident (I grant that your inspection scenario may be valid over use of power).

      You sound a lot like the ignorant frigtards in the recent Summit Wildfire in the Santa Cruz Mtns. of CA that were upset when a chief wouldn't send his crews up to save their homes (they had already evacuated safely) in a raging firestorm fueled by 50+ MPH winds. The chief's response: "I'm not sending my crews up there to be killed". Note that he got a standing ovation from almost everyone else in the room.

      These guys (and gals) get paid to keep their crews and residents safe. If your fracking computer system got burned up, but a firefighter or crew was protected, I'll take that any day, and just tell you to STFU and quit your whining.

    4. Re:Ignorant firemen = single point-of-failure by aaarrrgggh · · Score: 1

      Standard procedure on killing power to the site in a fire. All fire chiefs will require this, and given the situation it isn't a bad policy, especially if there is a structural failure or a major short. Only way around it is to make independent data halls with separate infrastructure.

      As for your fire alarm story, your engineer or installing contractor didn't do their job; you coordinate it in advance and make sure everybody is on the same page. Same goes for the CO sensor-- you just use a cup around the sensor and inject the gas. (There are often alternate trigger gases that can be used as well.)

    5. Re:Ignorant firemen = single point-of-failure by enosys · · Score: 1

      Probably the easiest and cheapest way to obtain carbon monoxide is to synthesize it from other more easily obtainable chemicals. One way would be mixing formic and sulfuric acid. Yes, that is dangerous.

    6. Re:Ignorant firemen = single point-of-failure by enosys · · Score: 1

      What a troll. The generator wouldn't be set up to feed power back toward the grid. Even in just a power failure that would be unacceptable and it could kill people working on power lines.

    7. Re:Ignorant firemen = single point-of-failure by Anonymous Coward · · Score: 0

      You see, the thing is, these people probably have to deal with many idiots who think they know better than everyone else to the point where they are hostile and can no longer be trusted, so they make blanket policies. But I guess it makes you feel better if you can find someone to vent your anger onto, rather than deal with the situation, right? All the time spent writing that big rant could have been used for some valuable introspection about why you are so angry with yourself.

    8. Re:Ignorant firemen = single point-of-failure by Anonymous Coward · · Score: 0

      You are forgetting that the generators also have power the air conditioning, which means that they Y junction isn't going to be as close to the racks as you seem to think.

    9. Re:Ignorant firemen = single point-of-failure by evilviper · · Score: 0, Troll

      You sound a lot like the ignorant frigtards in the recent Summit Wildfire in the Santa Cruz Mtns. of CA that were upset when a chief wouldn't send his crews up to save their homes (they had already evacuated safely)

      Not trying to feed a troll or slander any firefighters, but I would have more sympathy for the chief if they would STOP telling people, at every opportunity, that they should (and later, ordering them to) all evacuate their homes and leave everything to the fire dept., rather than staying behind and trying to protect their property.

      Personally, I got to watch as fire fighters sat on the bumpers of their trucks for a good 4+ hours, letting a small brush fire (that two men could have put out with shovels) burn towards homes, and horribly pollute the air for 20 miles around with thick ash. I was absolutely mystified at their utter inaction, until I heard an announcement on the radio that the fire depts. were asking for donations from all the surrounding towns... Good work men!

      What's more, the area home-owners were all under mandatory evacuation orders for the entire time the dept. was extorting "donations" and other hand-outs for their crews. Additionally, because we have a fire dept., the police will stop all private citizens from approaching the area of the fire, and the possibility of a non-official fire-fighter putting the damn thing out with a garden hose...

      But my biggest criticism of fire fighters is how utterly helpless they are in the face of forest fires. A fire hose may be able to put out a burning couch in 5 seconds flat, but up against a raging wild inferno, it's like spitting into a bonfire. What fire fighters really need is not water and fire retardant, but chain saws and bulldozers so they can actually make a decent fire break and stop a forest fire dead, far faster than back-burning, and no risk of loss of control. And don't bother telling me that bulldozers can't operate on an incline... Many millions have been spent to retrofit a 747 to carry fire retardant, it would be far less difficult or expensive to pivot-mount a bulldozer's engine, or even opt for an aircraft turbine (that can handle any angle) instead of pistons.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    10. Re:Ignorant firemen = single point-of-failure by markttu · · Score: 1

      It sounds like the facility was ordered to cut ALL power because of some fire chief's misguided fear that power flows backwards from a low-voltage source to a high-voltage one. I admit I don't know much about the engineering of this data center, but I'm pretty sure the "Y" junction where AC and generator power come together is going to be as close to the rack power as possible to avoid lossy transformation. It makes no sense why they would have 220 or 400 VAC generators running through the same high-voltage transformer when it would be far more efficient to have 120 or even 12VCD (if only servers would accept that). But I admit I could be wrong, and if it is a legit safety issue...then it's apparently a single point of failure for every data center out there because ThePlanet charged enough that they don't need to cut corners. -JoeShmoe . I'm glad you admit you don't understand the engineering. As an EE let me assure you that EVERY data center has a single point of failure in its power system. The idea is to minimize the chance that anything could go wrong at that single point. Once walls are blown apart in an electrical room all the power distribution plans are completely shot and the only thing you can do that is safe for both humans and equipment (servers don't react well to 480v instead of 120v) is to shut everything down and go in with a meter to see what is connected to what. Explosions have a nasty habit of rewiring things in ways never intended (though commonly imagined and just as commonly never imagined). As for the generators they are going to run as few 3 phase 480v "big" generators as they can because fewer generators are much more reliable than many generators AND they NEED all that power to keep everything including the AC up and running.
    11. Re:Ignorant firemen = single point-of-failure by Anonymous Coward · · Score: 0

      Yet, those of you morons that want to 'fight the fire and protect your property' end up suing the fire departments and state when you get injured trying to do a job you are ill-equipped to do. Nice work frigtard - maybe your kind will get purged from the gene pool by doing stupid shit like this.

  57. Re:More planning could have prevented this by slimjim8094 · · Score: 0, Flamebait

    ahahaha

    sweet job reading troll. you didn't even need to read all those big words. here's a bunch of tiny words (they happened to be at the end of his post so you couldn't miss them)

    > So, in summary, the 9,000 servers were not blown up. Only the power.

    yeah, I know, I'm being a prick right now, but I've got karma to burn and people who don't read piss me off

    --
    I have developed a truly marvelous proof of this comment, which this signature is too narrow to contain.
  58. Printer ignition source by kmahan · · Score: 4, Funny

    Last message on the linux console before the explosion:

            lp0 printer on fire!

    --
    Invalid Checksum. Retrying.
    1. Re:Printer ignition source by jd · · Score: 4, Interesting

      *wonders how many remember the live incident at the BBC, many years ago, when the Grandstand teleprinter stopped displaying match results and started printing updates on a fire running through the building.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    2. Re:Printer ignition source by moosesocks · · Score: 2, Informative

      For those of you who don't get the joke, there's actually an entire wikipedia article devoted to it.

      In short, most unix printing systems understand a very small number of printer status codes, usually consisting of "READY, ONLINE, OFFLINE, and PRINTER ON FIRE"

      The latter status message was actually semi-serious, and was thrown whenever the printer was encountering a serious error, but for some reason was continuing to print anyway. In the case of a high-speed mainframe printer, if the printer jammed but continued attempting to print, a fire could easily start due to the amount of friction created by the high-speed motors.

      --
      -- If you try to fail and succeed, which have you done? - Uli's moose
    3. Re:Printer ignition source by Pollardito · · Score: 1

      you left out PC LOAD LETTER, whatever that means

    4. Re:Printer ignition source by moosesocks · · Score: 1

      Put some letter-sized paper into the tray!

      Seriously.... that one always seemed rather obvious, not to mention that the paper tray is typically the first thing people check when the printer's not working.

      --
      -- If you try to fail and succeed, which have you done? - Uli's moose
  59. Re:More planning could have prevented this by Zebra_X · · Score: 3, Interesting

    "The power is off due to the explosion but there servers themselves are A-OK."

    Physically OK maybe... lets see how many of them come back up when the power is restored ^ ^

  60. "short in a high-volume wire conduit."? by Animats · · Score: 2, Informative

    They supposedly had a "short in a high-volume wire conduit." That leads to questions as to whether they exceeded the NEC limits on how much wire and how much current you can put through a conduit of a given size. Wires dissipate heat, and the basic rule is that conduits must be no more than 40% filled with wire. The rest of the space is needed for air cooling. The NEC rules are conservative, and if followed, overheating should not be a problem.

    This data center is in a hot climate, and a data center is often a continuous maximum load on the wiring, so if they do exceed the packing limits for conduit, a wiring failure through overheat is a very real possibility.

    Some fire inspector will pull charred wires out of damaged conduit and compare them against the NEC rules. We should know in a few days.

  61. Re:More planning could have prevented this by Anonymous Coward · · Score: 0

    soooo...

    what about all the servers that were blown apart?

  62. How many servers did you say were down? by Chris+Mattern · · Score: 0

    It's over NINE THOUSAAAAAND!

  63. Re:More planning could have prevented this by NewbieProgrammerMan · · Score: 5, Funny

    I wish I had mod points...I think this is the first time I ever wanted to mod those 5 words up.

    --
    [b.belong('us') for b in bases if b.owner() == 'you']
  64. Is this why YouTube is down? by Animats · · Score: 2, Interesting

    YouTube's home page is returning "Service unavailable". Is this related? (Google Video is up.)

  65. _The_ Power Room? by John+Hasler · · Score: 2, Insightful

    > ...they claim redundant power...

    How the hell could they claim redundant power with only one power room?

    --
    Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    1. Re:_The_ Power Room? by sciencewhiz · · Score: 2, Insightful

      They are not running backup power because of the fire department told them not to, not because it doesn't exist.

    2. Re:_The_ Power Room? by CFD339 · · Score: 2, Insightful

      Redundant power they have. Redundant power distribution grids they do not. This is common. The level of certification in redundancy on power for fully redundant grids is (I think) called 2N where they only claim N+1 -- which I understand means failover power. Its more than enough 99.9% of the the time. To have FULLY redundant power plus distribution from the main grid all the way into the building through the walls and to every rack is ridiculously more expensive. At that point, it is more sensible to buy another server at another facility for failover than to spend what it would cost to host a server with that kind of power redundancy -- on top of which, the server itself could still blow up and then where are you?

      --
      The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
    3. Re:_The_ Power Room? by evilviper · · Score: 1

      Redundant power they have. Redundant power distribution grids they do not.

      Thanks for the insight, Yoda.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  66. Exploding transformers by Iphtashu+Fitz · · Score: 1

    I remember seeing a series of photos on a website a few years ago showing the remains of a transformer outside a commercial office building that housed another datacenter. Unfortunately I forget which company it was (I want to say Hurricane Electric but I'm not 100% sure). Those photos were pretty impressive. After the fire department put the fire out there wasn't much of anything left on the concrete slab where the transformer once was...

  67. Re:More planning could have prevented this by Anonymous Coward · · Score: 0

    But surely there must've been some way to keep the servers from being damaged in the explosion?

  68. Re:More planning could have prevented this by njcoder · · Score: 2, Insightful

    I assure you issuing a one-day pro-rated credit to all your customers is cheaper. But not cheaper than losing 7500 accounts to another DC that can handle this type of event gracefully. The fact that it's complex doesn't mean you shouldn't expect it in a data center that claims to be "World Class"

    In related news, I was wondering why I wasn't getting much spam today and my sites didn't have strange spiders hitting them.
  69. oblig meme by Anonymous Coward · · Score: 0

    You sure it wasn't...over 9,000?

  70. Time + Material by Anonymous Coward · · Score: 0

    Time it takes to get a new transformer (assumming a supply house has one laying around in their warehouse and are willing to do business on a Sunday): 1 day
    Time it takes to get a new switchboard: a few weeks usually (although possibly 2 or 3 days at the earliest, assumming GE/SquareD/CutlerHammer have one laying around in their warehouse and the building doesn't have any special needs)
    Time it takes to order all new copper wire and pull it in and terminate it (assumming no problems are encountered with the way the pipe is run, etc): probably about 3 to 5 days at the earliest (remember, their main board exploded, the wire ends are more-than-likely all burnt up and the wire is now worthless)
    Time it takes to rehang/fireseal three walls: dunno, could take a while. sheetrockers are funny like that, LOL.

    But in all seriousness, it sounds like ThePlanet are downplaying this. They are estimating stuff coming online today, when it looks like sometime next weekend (at the earliest (and assumming they're not trying to rig stuff up)).

    1. Re:Time + Material by Jesus_666 · · Score: 1

      They still have backup power; the fire department just told them not to use it. They will probably patch things up until the backup power is safe to use again and then use that while repairing the main system.

      --
      USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
  71. Re:More planning could have prevented this by Loibisch · · Score: 1

    The cost and complexity of doing a full blown physically separated 2N power system at every datacenter... Hehe, he said "full blown" :P
  72. Re:More planning could have prevented this by Anonymous Coward · · Score: 0

    yhbt. yhl. hand.

  73. could be nice! by AV4TAr · · Score: 1

    could be nice to remove the link.. so we (clients) cand be up to date.

  74. fire inspectors are funny by Anonymous Coward · · Score: 0

    We put up a large 2 car barn at the back of the parking lot to house snow removal equipment and for staging storage.

    The outside had to have cement board siding, and the inside had to have TWO layers of 5/8 inch firecode drywall. Why? Because the fire house, which was next door, might be delayed in hooking up to a hydrant that was 30 feet from the building.

    Whatever.

  75. ThePlanet outage by Anonymous Coward · · Score: 0

    umm.... the initial article on Slashdot misspells people.... but I think the article on SE is more informative then this one on SlashDot...

    http://www.syndicatedelitist.com/2008/06/theplanets-h1-datacenter-encounters.html

  76. Strange... by Anonymous Coward · · Score: 0

    Usually the server blows up *after* a link is posted on Slashdot...

  77. Woo... by Anonymous Coward · · Score: 0

    Not very fun being a developer on a game site that has been down for around 24 hours now...

  78. omg by Anonymous Coward · · Score: 0

    "I feel a great disturbance in the Force, as if thousands of servers suddenly cried out in NAGIOS and were suddenly silenced. I fear something terrible has happened."

  79. Quite a feat by Jesrad · · Score: 1

    Blank Labels WebComics are hosted there, and this catastrophe could be the very first thing in over eight years straight to cause Howard Taylor to not update his daily comic Schlock Mercenary.

    --
    Maybe we deserve this world ?
    1. Re:Quite a feat by Jesus_666 · · Score: 1

      Actually, as someone else has pointed out, he did update - on LiveJournal.

      --
      USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
  80. Time + Material by godlkwrth · · Score: 1

    Time it takes to get a new transformer (assumming a supply house has one laying around in their warehouse and are willing to do business on a Sunday): 1 day

    Time it takes to get a new switchboard: a few weeks usually (although possibly 2 or 3 days at the earliest, assumming GE/SquareD/CutlerHammer have one laying around in their warehouse and the building doesn't have any special needs)

    Time it takes to order all new copper wire and pull it in and terminate it (assumming no problems are encountered with the way the pipe is run, etc): probably about 3 to 5 days at the earliest (remember, their main board exploded, the wire ends are more-than-likely all burnt up and the wire is now worthless)

    Time it takes to rehang/fireseal three walls: dunno, could take a while. sheetrockers are funny like that, LOL.

    But in all seriousness, it sounds like ThePlanet are downplaying this. They are estimating stuff coming online today, when it looks like sometime next weekend (at the earliest (and assumming they're not trying to rig stuff up)).

  81. May the Force be with them! by BookRead · · Score: 1
    Good luck to them! Getting the power back is only the beginning. All sorts of problems are likely to crop up with 9000 servers. They'll be at it for a week, mostly likely.

    I've been in DC when the power's dropped. It's surprising how physical it is when it gets quiet. I imagine the initial bang and then the silence broken by the alarms was quite an experience.

  82. Well... by hermia · · Score: 1

    Count me as one affected. They've been great about notifying us and posting about it though. Thankfully my site is just one I run for fun, and not a business site. Sheesh.

  83. Re:More planning could have prevented this by jacquesm · · Score: 2, Informative

    I'm one of their customers, and it takes more than a single instance in 5 years of hosting to make me switch. That said we'll see how long it takes to get things back up. Unfortunately *both* my dns servers are in that DC, I thought they were in physically distant locations... so much for ass-um-ing things...

  84. Re:More planning could have prevented this by cecil_turtle · · Score: 1

    But not cheaper than losing 7500 accounts to another DC that can handle this type of event gracefully Part of my point that you apparently missed was that even a full 2N power system end-to-end doesn't guarantee uptime. There are very few - and I'd even go so far as to say "if any" - datacenters in the world that could handle an explosion / fire without going down. Again, even if the system technically supported it, once fire authorities are on site their responsibility is safety, not uptime. Then you have the issue with smoke damage which is almost as damaging as the fire itself; even if power were available you wouldn't want your servers circulating that air. The argument of "the power should be physically far from the datacenter" is invalid as well, as anybody who knows anything about power is that once you get to those lower voltages there are significant losses when traveling any distance - so the transformers, and therefore UPSs and PDUs need to be nearby.

    The fact is in the real world there is no such thing as 100% guarantee, datacenters no matter how well designed can, do and will go down, and it doesn't mean there is a design flaw or that another datacenter is superior.
  85. YouTube back up. Probably a different problem by Animats · · Score: 1

    YouTube is back up, after a few minutes of outage, so it was probably a different problem.

  86. Sadists! by STFS · · Score: 2, Funny

    as if they haven't been through enough with the explosion and fire and all... you just had to rub it in and slashdot their forum as well... kudos!

    --
    You don't think enough... therefore you better not be!
  87. Damnit... by Anonymous Coward · · Score: 0

    This is rough... Crucial time for our game and -WHAMMO- the server is dead for a day and a half minimum.

  88. I'm a customer in that DC, and I'm a firefighter by CFD339 · · Score: 4, Insightful

    My servers dropped off the net yesterday afternoon, and if all goes well they'll be up and running late tonight. At 1700PST they're supposed to do a power test, then start bringing up the environmentals, the switching gear, and blocks of servers.

    My thoughts as a customer of theirs:

    1. Good updates. Not as frequent or clear as I'd like, but mostly they didn't have much to add.

    2. Anyone bitching about the thousands of dollars per hour they're losing has not credibility to me. If your junk is that important, your hot standby server should be in another data center.

    3. This is a very rare event, and I will not be pulling out of what has been an excellent relationship so far with them.

    4. I am adding a fail over server in another data center (their Dallas facility). I'd planned this already but got caught being too slow this time.

    5. Because of the incident, I will probably make the new Dallas server the primary and the existing Houston one the backup. This is because I think there will be long term stability issues in this Houston data center for months to come. I know what concrete, drywall, and fire extinguisher dust does to servers. I also know they'll have a lot of work in reconstruction ahead, and that can lead to other issues.

    For now, I'll wait it out. I've heard of this cool place called "outside". maybe I'll check it out.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  89. "Murphy's Law" != "Shit Happens" by fm6 · · Score: 2, Insightful

    This goes to show that no matter how much planning you do, Murphy's Law still applies. I am so tired of hearing that copout. Does the submitter know for a fact that ThePlanet did everything it could to keep its power system from exploding? I don't have any evidence one way or the other, but if they're anything like other independent data center operators, it's pretty unlikely.

    The lesson you should be taking from Murphy's Law is not "Shit Happens". The lesson you should be taking is that you can't assume that an unlikely problem (or one you can con yourself into thinking unlikely) is one you can ignore. It's only after you've prepared for every reasonable contingency that you're allowed to say "Shit Happens".
    1. Re:"Murphy's Law" != "Shit Happens" by unity100 · · Score: 1

      I am so tired of hearing that copout. Does the submitter know for a fact that ThePlanet did everything it could to keep its power system from exploding? I don't have any evidence one way or the other, but if they're anything like other independent data center operators, it's pretty unlikely. they arent anything like the other independent data center operators. they are world's biggest independent dc operator.
    2. Re:"Murphy's Law" != "Shit Happens" by Anonymous Coward · · Score: 0

      Besides, it is clear that Murphy's second law would apply in this case - that the worst possible set of circumstances would come together at the worst possible time to do the most harm.

      Or not.

      Amazing though that a place this big didn't have backup dns systems ready to go at a secondary location.

    3. Re:"Murphy's Law" != "Shit Happens" by lena_10326 · · Score: 1

      Please read 1984 before you talk about 1984. Thank you.
      I read half of 1984 (the first half). Can I talk about half of it?

      --
      Camping on quad since 1996.
    4. Re:"Murphy's Law" != "Shit Happens" by fm6 · · Score: 1

      You can talk about the 19 part.

    5. Re:"Murphy's Law" != "Shit Happens" by davmoo · · Score: 1

      I am not the original submitter. But I am a long-time Planet customer. Obviously the only way I could answer your question 100 percent would be if I were a Planet employee. But speaking from a decade of (satisfied) customer experience, including several other "events" of this sort of nature that were handled so well that no one outside the data centers would have known anything happened if it were not announced by them, yes, they probably did do everything they could to prevent such things. I can accept that possibly this was a problem that no one thought of. I'd laugh hysterically at the idea that this was a problem they knew about but convinced themselves it wouldn't happen.

      I agree that in many instances "shit happens" is a total copout. But this is not one of them.

      --
      I want a new quote. One that won't spill. One that don't cost too much. Or come in a pill.
    6. Re:"Murphy's Law" != "Shit Happens" by fm6 · · Score: 1

      Well, I was a satisfied customer of DreamHost for several years. Then they began to have a long series of outages that made them pretty useless.

      But lets just say that Planet is different, and this incident doesn't represent their normal competency. The fact is, their power system blew up. Literally! Not a routine event. Maybe, just maybe, it was an accident they couldn't have anticipated. But it's a lot more likely that this represents somebody's incompetence. If not the systemic incompetence that drove me away from DreamHost, then the isolated incompetence you see at even the best-run companies.

      Of course, I don't know for sure one way or the other how this thing happened. I will tell you that if that if this "Murphy's Law" nonsense had been spouted by somebody working at Planet, I would have taken it as a sign that they didn't have their act together.

    7. Re:"Murphy's Law" != "Shit Happens" by davmoo · · Score: 1

      Not only have I not heard anyone officially with The Planet plead Murphy, I've got a pile of "sorry we fucked up" and "we're going to make it right" messages from them...and my servers are not even in the affected data center.

      --
      I want a new quote. One that won't spill. One that don't cost too much. Or come in a pill.
  90. Re:More planning could have prevented this by njcoder · · Score: 3, Interesting

    Part of my point that you apparently missed was that even a full 2N power system end-to-end doesn't guarantee uptime. There are very few - and I'd even go so far as to say "if any" - datacenters in the world that could handle an explosion / fire without going down. The dc didn't explode, just the power room. It seems there was just one power room. I've been to data centers around here, even small ones that have 2 power rooms.

    While it may be the fire dept that is erroneously preventing them from bringing up their back-up power, it's part of a poor disaster recovery plan to not engage with the fire dept, electric co, etc. before a disaster happens, so that everyone is on-board with your disaster recovery plans and that you have the ability to implement that plan.

    The explosion was isolated to the power room. The servers are fine, the backup generators and batteries are fine. The servers should have been back online if they had a good disaster recovery plan. The whole point of disaster recovery is being able to handle a disaster. You can't say "oh there was a disaster, you can't help that". This is exactly what their plan should have been able to handle. The power room goes offline. It shouldn't matter if it was because of an explosion, a fire, equipment failure or being beamed into outer space.

    It also shouldn't matter who is telling them to keep the power off. Part of the disaster recovery plan should have been making sure local authorities allowed them to carry it out. Fine, they have to shut off all power when firemen are in there with hoses. I understand that. But once the fire is out your plan should allow you to bring up backup power. It didn't. So I don't see how they can call themselves a "World Class Data Center". Part of what they sell and what customers expect is disaster recovery. And there are data centers that can provide this.

    ThePlanet is pretty cheap compared to datacenters like NAC that have more redundancy and security. But ThePlanet wants to advertise that they are just as good. Now they were caught with their pants down when there was actually a disaster and their disaster recovery plan failed.
  91. UMM.. USE STATIC PAGE?? by kyoorius · · Score: 5, Insightful

    There's no reason to use the forum software when they've locked the thread and are only using it to disseminate information. A Pentium one running lighttpd serving a static html page would be sufficient to handle the flood of requests.

  92. The creature by pascalpp · · Score: 0, Offtopic

    The creature is driven by rage, and pursued by an investigative reporter. The creature is wanted for a murder he didn't commit. David Banner is believed to be dead, and he must let the world think that he is dead, until he can find a way to control the raging spirit that dwells within him.

  93. Re:More planning could have prevented this by Anonymous Coward · · Score: 0

    Yup, he is a complete fucking moron.

  94. Re:More planning could have prevented this by cecil_turtle · · Score: 1
    You seem to be confused on what the term "disaster recovery" means. They are recovering from this disaster right now, if they're not already done. Getting a datacenter online within 24 hours of an event like an explosion/fire (which was their initial timeline) is an example of a disaster recovery plan working successfully (especially over a weekend). Disaster recovery does NOT mean 100% uptime, it is what you do in the event of downtime. Without a pre-defined plan, this type of outage would require 5-10 days to recover from.

    I've been to data centers around here, even small ones that have 2 power rooms I'm not sure what level of experience you have, but this means nothing. 2 power rooms does not in anyway imply a 2N power design.

    The explosion was isolated to the power room. While we're both obviously limited to our knowledge of actual events that occurred here, an explosion/fire that "took out 3 walls", whatever that means, is not limited to one room. Presumably at least one of those walls was shared with another room unless this was a standalone building.

    ThePlanet is pretty cheap compared to datacenters like NAC that have more redundancy and security. Ahh, I see now, you're somehow affiliated with Net Access Corporation (in NJ, and you're njcoder) and somehow believe that they are substantively different from any other datacenter (e.g., ThePlanet) and this type of outage could never happen to them/you. Good luck with that. I hope you're hosting with multiple data centers. There is simply no amount of security / redundancy that can be done at a single location that will provide 100% uptime (regardless if you define uptime as application uptime or just power/network uptime). Did you catch the article about Google's datacenters the other day? Clearly they recognize that fact and design around it.
  95. Re:More planning could have prevented this by cecil_turtle · · Score: 3, Informative

    You may also be interested in a pretty positive write-up from SANS about ThePlanet's response and handling of the situation thus far.

  96. Emeregency Disaster Plan :FAILED by Anonymous Coward · · Score: 0

    Somebody didn't get their Emergency Prepardness Merit Badge. This is what you get when you don't hire a Boy Scout!.

    When I signed up for hosting you are given the propaganda speech on redundancy, security and all the rhetoric. The bottom line is: THEY WERE NOT PREPARED FOR THIS. The data center engineers dropped the ball. I have seen facilities that have monitors on the transformers. They could have and should have known before the transformer hit "Critical Mass," shut it down and switched to emergency backup. In this day and age it is posssible. Somewhere was a post on a "Gear jamming" that caused the transformer to blow." Anyone every seen a gear in a transformer?

    Run to the local pet shop and buy all the hamsters and hamster wheels and get me back online.

    Assumption is the mutherF#@&3R of all!

  97. Re:I'm a customer in that DC, and I'm a firefighte by Anonymous Coward · · Score: 0

    Same here, been happy with them for 3+ years, but this looks like we haven't seen the end of it, man I just need to get access to my box to move everything out to a server in Dallas !

    So what time is it anyway...1800 and no news so it looks like 1700 power up didn't do much good.

  98. Found out what caused ther transformer explosion! by Anonymous Coward · · Score: 0

    This was taken from the data center security cameras.

    http://www.youtube.com/watch?v=esymRl_0C2s

  99. I'm getting a kick out of these replies... by w1cked5mile · · Score: 1

    We moved three racks of servers out of ThePlanet last Thursday. Timing is everything.

    1. Re:I'm getting a kick out of these replies... by masonc · · Score: 1

      That's not timing, it's luck.

      --
      CM www.cometenergysystems.com Blog: http://caribbeanrenewable.blogspot.com/
    2. Re:I'm getting a kick out of these replies... by koalapeck · · Score: 1

      Please share your magical crystal ball with everyone else next time.

    3. Re:I'm getting a kick out of these replies... by jacquesm · · Score: 1

      hehe, you're entitled one wicked smile :) next time you decide to switch datacenters please call me...

    4. Re:I'm getting a kick out of these replies... by w1cked5mile · · Score: 1

      Will do... Just wait by the phone.

  100. Re:More planning could have prevented this by Jesus_666 · · Score: 0, Troll

    Building an orphanage into the server room was unforgivable as well. All those children who were burnt crisp when the entire datacenter erupted in a huge mushroom cloud...

    --
    USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
  101. Re:I'm a customer in that DC, and I'm a firefighte by Anonymous Coward · · Score: 0

    2. Anyone bitching about the thousands of dollars per hour they're losing has not credibility to me. If your junk is that important, your hot standby server should be in another data center.


    Many of those same customers bitching about thousands of dollars per hour lost, not only do not have redundancy in another data center, they don't have redundancy on the only server they have. No RAID, no backups, no idea what ssh is or if it's a windows server they don't even know what remote desktop is, they're too dumb to hire an admin to manage their server... I could go on, but the most important point is:

    The vast majority of those bitching customers have the world's most POINTLESS and STUPID websites. The internet is better off for having those sites down.
  102. Re:I'm a customer in that DC, and I'm a firefighte by aliens · · Score: 1

    How are you going to handle the failover to Houston? Round-robin DNS? Or just very low TTL on the root nameservers so if something goes wrong you can update the nameservers and have them point elsewhere?

    Oh and that reminds me, make sure your nameservers are spaced out as well. Learned that one the hard way.

    --
    -- taking over the world, we are.
  103. This affected one of my favorite sites by lena_10326 · · Score: 1

    I feel like crying. A whole weekend without it.

    I have no social life. It's true.

    --
    Camping on quad since 1996.
    1. Re:This affected one of my favorite sites by Anonymous Coward · · Score: 0

      Knotnice.com ?

  104. I'm affected too. by Killshot · · Score: 1

    I have about 100 sites hosted there right now that are offline, but none worth so much that a day or two of downtime will affect me so much. The worst part is getting phone calls from people who I host. and try to explain in as many ways as possible that No, I can not personally go to texas and fix it, and that there is absolutely no alternate way for them to get their email right now.

  105. Re:More planning could have prevented this by njcoder · · Score: 1

    Disaster recovery does NOT mean 100% uptime Then don't put out marketing that claims 100% uptime, when you can't back it up.

    I'm not sure what level of experience you have, but this means nothing. 2 power rooms does not in anyway imply a 2N power design. I don't see where I said anything about 2N anything. I don't even know enough about data centers to comment on that. All I know from shopping around, is that other data centers claim to have the ability to keep going if one of their power rooms go down.

    Ahh, I see now, you're somehow affiliated with Net Access Corporation (in NJ, and you're njcoder) My only affiliation is in considering their services and so far I haven't made a decision one way or another but I was very impressed by what they provide and the comments others have provided about them. If you're in this area you know who NAC is. Equinox is harder to generalize because they have multiple locations and I've only been to one.

    Other data centers I've been to have been smaller and primarily designed to host mainframes for large corporations. Multiple power coming in to two different rooms, multiple backup generators, adequate ups, etc.

    I'm no expert in data centers, but like I previously mentioned, if someone is making claims of 100% uptime I would expect them to have some reasonable way of backing that up. They didn't. Their power room caught on fire. They didn't have a second power room that could be used and they couldn't bring backup power online.

    Electrical systems fail, and sometimes catastrophically. There was a transformer that blew up on a utility pole directly across the street from me. The whole house shook. If that was in an enclosed room I can picture walls being blown down. There was also an underground fire in the wiring at one point. In both cases power was brought back online relatively fast.

    I don't care what anyone says. This is poor performance compared to their marketing claims. It looks like they couldn't bring up their back-up systems because they didn't work with the local authorities when they came up with that plan.

    I'm not saying ThePlanet sucks. But I wouldn't call it "world Class" and I doubt anyone's "100% uptime" claims.

    There is simply no amount of security / redundancy that can be done at a single location that will provide 100% uptime Someone should tell ThePlanet to stop marketing something that is so impossible then.
  106. Schlock Mercenary by Velocir · · Score: 1

    Anyone know if this is why www.schlockmercenary.com is down?

  107. Re:More planning could have prevented this by Viflux · · Score: 3, Informative

    From the status update thread... "Today at approximately 5:45 p.m., a transformer in our H1 data center in Houston caught fire, thus requiring us to take down all generators as instructed by the fire department. All servers are down." I read this as the fire department ordering them to kill *all* the power for safety reasons, rather than the explosion knocking the whole thing out.

  108. For the failover... by CFD339 · · Score: 1

    What I do isn't just a web site, its also a pbx and some other stuff.

    The client software that does the automation is easy. I wrote it to handle the need for a failover server so it will just try the other if the first one fails.

    The PBX failover is easy, the DID provider will route to both, only one will pickup sooner. If the second does catch a call, it will have a flag on it that lets it know if the primary is still up and in that case will try to transfer over the call.

    The Web Site is actually the least important part of the process for me, and I'll likely handle that with a low ttl on that one particular address.

    The inbound mail is easy because I use Postini and it is good at failover.

    The data is stored in a database that knows how to sync in near real time between the servers, and on disk as files. I use unison to keep the file directories in sync within just a few minutes.

    Overall, I think it should work just fine. There are more elegant solutions -- and more expensive. This one will work for me.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  109. Has this happened before? by fastpage · · Score: 1

    Well my server has been offline and even though I use it for personal use. I am considering ordering another server from a different hosting company and restoring from my remote backups and repointing my DNS and canceling my service with The Planet. I'm glad I don't host my DNS with The Planet for this and other reasons.

    Also has this happened before at this data center. Wasn't this data center owned by Rackshack? Because thats who I signed up with originally before they were bought by The Planet. Here is a link from 2003 about a transformer exploding at Rackshack's Houston data center:

    http://www.carrierhotels.com/wiredspace/archives/000010.html

    Is this the same data center?

    1. Re:Has this happened before? by multipartmixed · · Score: 1

      A little bit of googling reveals you are almost certainly correct.

      What's REALLY interesting, though, is that the article you linked to may, in fact, be the only page on the web where Netcraft confirmed that something wasn't dying.

      --

      Do daemons dream of electric sleep()?
  110. 1700 test not necessarily a failure by CFD339 · · Score: 2, Insightful

    First, that time was an estimate -- a target. Second, even if the initial power test passes, it will take hours to bring up the a/c systems, the switches, and the routers.

    The initial draw from each new bank of gear to be given power will be very high so it will need to go slow.

    The battery systems (be they on each rack or in large banks serving whole blocks) will try to charge all at once. If they're not careful, that'll heat those new power lines up like the filaments in a toaster. Remember, the battery plan they have was built with the idea that they'd be used very briefly during transition to generator power -- not drained down all at once.

    Only once all the switches and routing gear is back up can they start updating the network paths (do they use BGP for this -- that's not my area of expertise) so that peering data starts flowing.

    Only once the network is all up and stable (no small task on a site with dozens of high end peering points) can they even start doing banks of servers.

    Its also probably that each bank of servers will needs its own new power lines (and eventually replaced conduit) in the distribution center that was destroyed.

    Bank by bank they'll have to bring up all these servers, each of which will draw its maximum load during boot as disks are scanned and checked.

    Most of these servers probably haven't been shut down in months or years. Some drives may not spin up due to tired motors that can run fine but spinning from cold is just too much now. Other servers may have boot configuration problems undiscovered since the machines have been running without reboot for a long time -- linux ones anyway :-)

    This isn't something out of Young Frankenstein where they'll yell across the room "throw za main svitch!" and a watch the lights dim briefly while 9000 servers boot up with the deafening sound of system beeps. If they did try such a thing -- as if such a thing were possible -- it would immediately blow at least another transformer if not more.

    Think about it. 9000 servers @ an average of what, 300 watts, plus the networking gear, plus the air conditioning, plus charging all those batteries....you're talking megawatts.

    Without a Mr. Fusion or Harry Mudd stumbling in with some chicks wearing dilithium crystal jewelery this is going to take a while.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  111. Feat not accomplished by mdmkolbe · · Score: 1

    The explosion may have tried hard to stop it, but the Schlock site is already back up (abet a minimal version of the site). So he still hasn't missed a daily update.

  112. Ultimate response to "Why is my site down?" by RobertLTux · · Score: 0

    The extreme example is of course a DC in the WTC but "our power room went BOOM" is a good nth level version
    ( on or about 09/16/2001)

    irate customer: Why is my site down AND WHY DID IT TAKE SO [redactd]LONG FOR YOU [redacted] TO GET ON THE [redacted]PHONE

    support tech: well sir as you should know our primary DataCenter and primary support center was located in Tower 1 of the World Trade center [tone type=subzero chilling]You May have heard on the news THAT SAID BUILDING NO LONGER EXISTS [/tone] Thank You for calling please see our website for updates , Good Bye (click)

    --
    Any person using FTFY or editing my postings agrees to a US$50.00 charge
  113. ARGH by dw604 · · Score: 1

    2 servers down. Burn.

  114. summary of events by v1 · · Score: 1

    my first impression would be something battery related. If a short trips something that shuts off your incoming AC, it kicks you over to batteries and generators. If something is then reset and brings you back online a little bit later, your hardware switches back to AC and all the batteries start charging. If the electrical fault wasn't really FIXED, (think sparks spraying from a nearby electrical box) but merely tripped something that you reset, then it can set off a hydrogen explosion from the H and O the batteries are dumping out while being recharged. THAT would require you to take things totally offline to fix since it's the point where your redundant power sources converge.

    The support forum posts were not as heavy on detail as I would like to have seen, but better than about any I have ever seen under such circumstances. (something is always better than nothing) Looks like a transformer went out and did some structural damage. Probably not so much of an explosion. If you've ever seen a substation transformer go, that's probably about what happened here.

    Their main concern besides getting power restored seems to be to repair networking equipment. Hard to say how that was damaged, it may have power spiked their routers and switches. (could have been other related causes - physical damage or got soaked with transformer coolant when it vented) At any rate, hazmat and firemen in general don't like working on live wires so they basically told them we don't care if you can turn some of it back on, you're going to leave it all off until we're done. Looks like they made good use of that time to gather replacement hardware and build an action plan. At this point it appears that they've been given the OK to get in there and start replacing hardware and fixing power.

    There is a good video of a substation problem on youtube. This isn't necessarily what happened here, but you get the idea. Not really an explosion so much as a fire.

    --
    I work for the Department of Redundancy Department.
  115. this is how the virtual world by Anonymous Coward · · Score: 0

    connects to the real world

  116. Re:More planning could have prevented this by sjames · · Score: 1

    Not to mention no no number of redundant power systems helps you at all when the fire department orders the power off.

    Next, after an explosion due to an electrical fault blows 3 walls down, you have a lot of checking to do before just powering the redundant system up unless you'd like more smoke and fire.

  117. I'm a firefighter AND a geek. You, not so much. by CFD339 · · Score: 4, Insightful

    Look, when I go into a building in gear and carrying an axe and an extinguisher, breathing bottled air, wading through toxic smoke I couldn't give crap number one about your 100 sites being down.

    I have a crew to protect. In this case, I'm going into an extremely hazardous environment. There has already been one explosion. I don't know what I'm going to see when I get there, but I do know that this place is wall to wall danger. Wires everywhere to get tangled in when its dark and I'm crawling through the smoke. Huge amounts of currents. Toxic batteries everywhere that may or may not be stable. Wiring that may or may not be exposed.

    If its me in charge, and its my crew making entry, the power is going off. Its getting a lock-out tag on it. If you wont turn it off, I will. If I do it, you won't be turning it on so easily. If need be, I will have the police haul you away in cuffs if you try to stop me.

    My job, as a firefighter -- as a fire officer -- is to ensure the safety of the general public, of my crew, and then if possible of the property.

    NOW -- As a network guy and software developer -- I can say that if you're too short sighted or cheap to spring for a secondary DNS server at another facility, or if your servers are so critical to your livelihood that losing them for a couple of days will kill you but you haven't bothered to go with hot spares at another data center then you sir, are an idiot.

    At any data center - anywhere - anything can happen at any time. The f'ing ground could open up and swallow your data center. Terrorists could target it because the guy in the rack next to yours is posting cartoon photos of their most sacred religious icons. Monkeys could fly out of the site admin's [nose] and shut down all the servers. Whatever. If its critical, you have off site failover. If not, you're not very good at what you do.

    End of rant.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  118. Redundancy? yes sure! by Anonymous Coward · · Score: 0

    What about all the losses? (sells, ads incomes, lost of opportunities, etc.)

    What a shame for a service that advertise REDUNDANCY deceiving all its customers... we should initiate a massive legal measures.

    2 days already for my server and my 10 sites offline. What a crap service.

    1. Re:Redundancy? yes sure! by putaro · · Score: 1

      Any datacenter is subject to some form of catastrophe that will take it completely offline. They had an explosion in the power room - that's pretty wild and pretty nasty. Similarly, a good fire could take the whole place down, or a plane could crash into the building. If you're making enough money to miss the revenue, perhaps you should have a second server at another DC (preferably with a different provider).

      I've had a server in that building for years now, and this is the first major outage we've had. If I were losing enough money to bitch about, I'd be kicking myself for not making sure that things didn't fail over to a completely different DC.

    2. Re:Redundancy? yes sure! by Flamora · · Score: 1

      Except for the fact that The Planet DOES have a redundant power supply that WAS ready to be switched over to - did you RTFA and see the part where the Houston fire department informed them that they were not allowed to switch over?

      This is in no way an intentional deceit.

  119. Re:More planning could have prevented this by sjames · · Score: 1

    Then don't put out marketing [theplanet.com] that claims 100% uptime, when you can't back it up.

    In that case, it cannot be advertised at all. Go ahead and install your 1024N power system and nuclear powered UPS and put it in a nuclear bunker. Fill the room with argon to make fires impossible. But if you can't promise that an asteroid will never ever strike the bunker, you better not advertise 100% uptime.

    Of course, if you read the fine print, they probably have something in there about not being responsible for actions of civil authorities...such as the fire dept. ordering them to shut down all power.

    I'm all for stricter requirements for truth in advertising, but honestly, it's not like this is a common problem they're having. Talk is dirt cheap. It's easy to CLAIM you could put together a system that could sail right through this sort of problem, but another matter to actually DO it. You'll never be 100% sure it'll work until something like this actually happens.

  120. Thousands / Hour by Anonymous Coward · · Score: 0

    I should think only a tiny fraction (maybe .1%) of the customers are in the thousands of dollars per hour of loss category.

    Most little online stores hardly break the tens of dollars per hour.

    If your data is THAT important, multiple hot swap hosts in different colo's are the answer.

    The common site selling earmuffs, or hosting java crapplets doesn't need that redundancy, nor can they properly pay for it.

  121. ant damage? by giantgeek · · Score: 1

    ants are causing a lot of damage to electrical devices in Texas: http://urbanentomology.tamu.edu/ants/exotic_tx.cfm

    --
    new letter/phrase: hex-u means "www"
  122. Not halon... by parasonic · · Score: 1

    They don't let you use halon anymore these days. Back in '06 when my company was upgrading its datacenter, we had a similar fire issue. We had just gotten all of the servers moved into the new racks, and everything was running nicely for a few weeks.

    Well, what do you know? The UPS blew up. Due to improper assembly of the UPS, one of the main cables was stretched and chafed along the chassis. Imagine, let's say, an array of 12 car batteries in series in a parallel arangement of 12 (144 total of course). The cable finally wore through late at night and arc gouged several inches of 12-gauge sheet metal chassis before it finally shorted and destroyed the batteries (from what I heard).

    It was a very nasty mess, and half of the UPS was this blob of burnt-out circuitry. No fun. My company opted to spend several thousand dollars (over 10k IIRC) for a small FM-200 system.

    My guess is that something like this happened to ThePlanet, but it was most likely a chemical like FM-200 that is less environmentally harmful than halon, not as cool nonetheless.

    Anyway, the incident at my company was completely avoidable, and I would wager to guess that the one at ThePlanet was competely avoidable as well. I am just glad that ThePlanet lost my business about a year ago for price gouging a loyal customer after three years of service.

  123. Re:More planning could have prevented this by njcoder · · Score: 1

    In that case, it cannot be advertised at all. Go ahead and install your 1024N power system and nuclear powered UPS and put it in a nuclear bunker. Fill the room with argon to make fires impossible. But if you can't promise that an asteroid will never ever strike the bunker, you better not advertise 100% uptime. I should have said when you can't reasonably back it up.

    One power room doesn't seem to qualify as "fully redundant power system" or a "complete redundant power management system" that "assures 100% uptime".

    With only one power room, you have to wonder how thoroughly and how often they perform maintenance on that equipment.

    While a transformer explosion might be rare it is not uncommon. I don't think it's too much to expect for a data center that talks about "100% uptime", "fully redundant power systems" without "a single point of failure" to go offline because of a fire in the power room.
  124. Comedy by Anonymous Coward · · Score: 0

    The two latest press releases from The Planet:

    "The Planet Announces Server Blowout Sale!"
    and
    "The Planet's Jeff Lowenberg Named Data Center Manager of the Year"

    aaah good times. I'll be here all week, folks.

    No, really, I'll be here all week, reloading. Checking for updates. Wishing I'd backed up my database more often.

  125. Fueling the Fire by BlkStormy · · Score: 1
    They have enough to worry about. Why post a link to their forums and cause more problems?

    As you may have already noticed, our forum servers continue to lag due to very heavy load. This is in part to due to the fact that our outage is now being carried on several sites (including Slashdot). Even though we added servers to our forums last night, we are looking at alternatives at this time to provide simple status updates quickly.
  126. Re:I'm a firefighter AND a geek. You, not so much. by JoeShmoe · · Score: 0, Flamebait


    You forget that part of your job description is to protect property. It can't always be running in to save the cute blond kid who got left behind in the panic to evacuate the home.

    And yes, that means that sometimes firefighters are expected to risk their life to save just a building and prevent that fire from spreading to other empty buildings. If there wasn't a point where we expected firefighters to take an acceptable risk, we would just evacuate a town when a fire started, wait for everything to burn down in absolute safety, then rebuild.

    And yet, I personally have seen a fire break out at a self-storage facility and watched four firecrews sit there and watch the entire structure (with I remind you countless family's personal possessions and sentimental collectibles) burn to the ground. Why? Because the fire chief thought there might be one person there storing ammunition and he wasn't going to risk anyone to stop the fire.

    To me that's complete cowardice. If that was the job some firefighters think they are supposed to do, I want my taxes back. I can hire a $8/hr security guard to cordone off an area and wait for a fire to burn itself out. We pay someone $98K plus benefits to actually try to stop the fire even if there is a eensy chance that something could hurt him.

    You point out that anything can happen at any time. Ditto for firefighters. You could get into an accident on the way to the scene. So why not just stay there in the safety of the firehouse if there are no lives at stake?

    Electricians have to work with LIVE power on a routine basis. Nobody apparently gives a shit if an electrician has to risk death because they don't want to have to power down their racks while he replaces a fuse or adds a new circuit. And I have electrician friends that have the scar tissue to prove that it doesn't always turn out well.

    Risk management. Do I want to see a firefighter die so that my customer can get his email today instead of tomorrow? Of course not. But do I think it makes any sense to have 9000+ people and all the systems they may depend upon go down because the fire chief doesn't think his employees can resisted clutching every exposed wire they run across?

    I obviously never expected that an entire facility with redundant power supplies would lose power. Whoops, yes, my bad but I assure you I can't be the only one who never thought this was a likely possibility. I've never seen "Fireman independant power systems" advertised as a selling feature when shopping for server hosting. Maybe someone will see a market opportunity, I don't know.

    But I expect that people who are paid to minimize damage to property will MINIMIZE damage, and not throw the baby out with the bathwater. The problem was clearly with high voltage. So cut the feed from the power company, case closed. Shutting down the redundant power generators that are DOWNSTREAM from the problem? That's idiotic. It's right up there with the voodoo and witchcraft method of personal protection.

    Neither of us are right, dude. But the point is that I've never heard of generators being ordered off when the whole point of having generators is to provide power when the main fails. So this completely bugs me and I'm skeptical it has anything to do with erring way too far on the side of caution. Otherwise, what's next:

    "Say chief, I know we have all these people in this hospital on life support, but someone left a cigarette in a trash can and I'd sure as heckfire feel safer if we just shut the whole things down"

    -JoeShmoe
    .

    --
    -- I wonder which will go down in history as the bigger failure: the War on Drugs or the War on Filesharing
  127. Re:More planning could have prevented this by kesuki · · Score: 1

    "Did you catch the article about Google's datacenters the other day? Clearly they recognize that fact and design around it."

    I wonder, http://www.informationweek.com/news/storage/showArticle.jhtml?articleID=202400961 how does a 'data center' in a box go down, when it's 'power room' explodes?

    complicated electrical devices, especially where varying current can cause undesirable operation of the device, are the kinds of electrical devices that make a big bang when they go up in smoke, the conventional data center can put these parts far away from the server, so even with 3 walls going down no servers were harmed... but if it's all tightly integrated into a 'box' what happens to all the servers and the data?

    i suppose if the thing is as big as a semi trailer, it could have a blast barrier, between servers, and power unit... otherwise, a data center in a box is a potentially less safe method of implementing a data center than the conventional approach.

  128. MOD PARENT UP by Anonymous Coward · · Score: 0

    mod parent up!

  129. Re:I'm a firefighter AND a geek. You, not so much. by CFD339 · · Score: 2, Insightful

    You sir, don't know what you're talking about. Reaching for ridiculous examples of someone doing their job wrong doesn't change that.

    Our S.O.G. (standard operating guidelines) are actually very specific about risk.

    We will risk our lives to save a human life.
    We will take reasonable risk to save the lives of pets and livestock.
    We will take minimal risks to save property.

    Sorry, but your building isn't worth the risk of my crew. That's reality.

    Don't you DARE tell me what is and isn't bravery or cowardly until you put 50 pounds of gear on and crawl into a pitch black house that's burning over your head.

    Don't you DARE tell me that you think you understand the difference between saving the blonde girl and saving your computer server.

    This isn't TV World. This is the real world. Fire on TV doesn't look like real fire. You know why? Because a real house on fire doesn't look like anything but pitch black and that makes for lousy TV.

    Get over yourself and go volunteer at your local fire department. 86% of the men and women in this country who will risk their lives for yours are volunteers. We could use your help if you have the guts for it. We'll teach you what you need to know -- and we'll keep you as safe as we can so you can go home to your family when its done.

    Your examples are stupid and insulting to the 800,000 brave men and women who volunteer to risk death in the most painful way possible to save your sorry butt.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  130. "100% uptime" promise from The Planet by Animats · · Score: 1

    The Planet's video tour of why this wouldn't happen is up and working. Click on "Take the tour", which has many data center pictures. I like the "100% uptime" part.

    It turns out they didn't have all the redundancy they said they had. Their central server management system and the DNS servers for those hosts were all in that data center. So customers couldn't get in and switch the DNS to another location for hours.

    They now claim to have the server management system back up.

  131. Re:I'm a firefighter AND a geek. You, not so much. by Anonymous Coward · · Score: 0

    Andrew, your comment is one of the most asinine I've encountered on slashdot.

    DON'T YOU DARE sound self-righteous when an observer is making his point cogently and politely.

    I'm sick of these obey-authority fools telling us what we are allowed to criticize and not criticize.

    We all respect fire departments. That's a given. The fact that you should attack the messenger undermines the credibility of any point you have made.

  132. Re:I'm a firefighter AND a geek. You, not so much. by Silver+Gryphon · · Score: 1

    I applaud your response, CFD. It's the responsibility of the data center management to ensure redundancy in the event of fire, flood, earthquake, tornado or curious squirrel. In this case, there was an explosion and fire. I've seen fire up close when we invited the local volunteer fire department to use an old house as training. Once they set the fire, the entire thing was engulfed in 26 seconds. They intended to enter and practice saving occupants but it was 60 year old, dry wood and just went poof, so they just waited a minute and turned the hoses on. It smoldered for about 24 hours and was a pile of ash.

    In this datacenter, there are all kinds of things that could smolder and cause secondary fires if the generators were turned on and something unknown happened (i.e. short from inside?). Plus, don't firefighters need to check for hot spots? Isn't that easier if power's off? Don't get me started on structural stability, either... 3 of 4 walls collapsed. That has to count as added risk.

    So yes, if ThePlanet was willing to take the risk that their building was destroyed by earthquake, they can accept 24 hours downtime at the insistence of the fire chief. Redundant data centers for critical operations; acceptable tactical losses for whatever doesn't have redundancy. Murphy's law happens, and nothing is truly redundant. If their 9,000 customers expected full redundancy, those customers will need to re-evaluate what exact kind of redundancy they're getting. Not everyone needs multi-datacenter stability which is horrifically expensive. After reading this story, I'll be getting a second server for my 19 domains, on a different provider in a different city. Just in case.

  133. Re:More planning could have prevented this by Twinbee · · Score: 1

    Since it was posted AC the second time, it may have been someone else pretending to be him. You never know...

    Otherwise, I'm curious as to the probability of whether he meant that (and completely missed Hijacked Public's point), or is just trolling.

    --
    Why OpalCalc is the best Windows calc
  134. Update 11:14 PM CST by Solokron · · Score: 3, Informative

    As previously committed, I would like to provide an update on where we stand following yesterday's explosion in our H1 data center. First, I would like to extend my sincere thanks for your patience during the past 28 hours. We are acutely aware that uptime is critical to your business, and you have my personal commitment that The Planet team will continue to work around the clock to restore your service. As you have read, we have begun receiving some of the equipment required to start repairs. While no customer servers have been damaged or lost, we have new information that damage to our H1 data center is worse than initially expected. Three walls of the electrical equipment room on the first floor blew several feet from their original position, and the underground cabling that powers the first floor of H1 was destroyed. There is some good news, however. We have found a way to get power to Phase 2 (upstairs, second floor) of the data center and to restore network connectivity. We will be powering up the air conditioning system and other necessary equipment within the next few hours. Once these systems are tested, we will begin bringing the 6,000 servers online. It will take four to five hours to get them all running. We have brought in additional support from Dallas to have more hands and eyes on site to help with any servers that may experience problems. The call center has also brought in double staff to handle the increase in tickets we're expecting. Hopefully by sunrise tomorrow Phase 2 will be well on its way to full production. Let me next address Phase 1 (first floor) of the data center and the affected 3,000 servers. The news is not as good, and we were not as lucky. The damage there was far more extensive, and we have a bigger challenge that will require a two-step process. For the first step, we have designed a temporary method that we believe will bring power back to those servers sometime tomorrow evening, but the solution will be temporary. We will use a generator to supply power through next weekend when the necessary gear will be delivered to permanently restore normal utility power and our battery backup system. During the upcoming week, we will be working with those customers to resolve issues. We know this may not be a satisfactory solution for you and your business but at this time, it is the best we can do. We understand that you will be due service credits based on our Service Level Agreement. We will proactively begin providing those following the restoration of service, which is our number priority, so please bear with us until this has been completed. I recognize that this is not all good news. I can only assure you we will continue to utilize every means possible to fully restore service. I plan to have an audio update tomorrow evening. Until then, Douglas J. Erwin Chairman & Chief Executive Officer

    --
    30% off web hosting. Coupon code "SLASHDOT".
  135. Re:I'm a firefighter AND a geek. You, not so much. by Twinbee · · Score: 1

    As much as I agree with minimizing safety risk, his main point (in my opinion) was:

    "So cut the feed from the power company, case closed. Shutting down the redundant power generators that are DOWNSTREAM from the problem?"

    In other words, surely if the backup power *to* the servers is kept on, and just the primary power is turned off (i.e. from the power company), then that's surely 99.999% safe? After all, the room for the main power is seperate from the backup power's room.

    Maybe I'm missing something, and the power can be leaked from the backup power to the servers, and then finally to the broken main power setup. I doubt that, but at least they should ask the server's technicians if that's even theoretically possible.

    --
    Why OpalCalc is the best Windows calc
  136. Re:More planning could have prevented this by moosesocks · · Score: 1

    Mind you that the fact that ThePlanet own 5+ datacenters lends credit to what you're saying.

    Put your servers in two geographically-isolated datacenters, and you'll be considerably more protected against virtually any sort of calamity that could occur to your servers.

    There are so many things that a datacenter simply cannot be 100% prepared for. Would you really blame the provider if a plane crashed into their building?

    It's far cheaper to simply colocate in 2+ locations than it is to prepare for every single event that can possibly occur, no matter how remotely unlikely it may be.

    --
    -- If you try to fail and succeed, which have you done? - Uli's moose
  137. Wonder if there is a really stupid ass moderation by Anonymous Coward · · Score: 0

    Wonder if there is a really stupid ass moderation for people who mod without a clue?

    The point is that the this type of thing could be avoided with a colocated system. Not that ThePlanet would be doing it with every one of their systems? The point is if the owner of the system hosted by ThePlanet felt if the expense to colocate was worth it to avoid downtime, it could have been done.

  138. Re:I'm a firefighter AND a geek. You, not so much. by gnuman99 · · Score: 0, Flamebait

    firefighter != electrician

    Electricians don't work in areas where an explosion just happened and some wires may or may not be live.

    Both, electricians and firefighters would shut down the power. Electricians may take their time to understand the situation. Firefighter have no time. So in case of fire and shit, firefighters will demand power is shut down before going in, especially in cases of ELECTRICAL FIRE!!

    How stupid can you be? Electricity was what fueled the explosion and/or fire! So you shut it off! All of it. You don't know what is its state, so you shut it down. You know, electricity "flows" from one end of a transformer to another *both* ways. And hell, I would not trust some relay switch with my life after an explosion. Hell, the railing or stairs or support column could be live and who the hell knows what is live and what is not??

    You are like some idiot bitching that there their natural gas supplies were cut because there was an underground pipe explosion 2 miles down from your house.

  139. I'm an obey-authority fool? LOL by CFD339 · · Score: 1

    At first I thought you must know me to use my name, then I realized it was just a cheap trick of looking at my email address or profile.

    You know how I knew? Because nobody who knows me would make the mistake of calling me an obey-authority anything. You've got the wrong fool on that one.

    The arrogant insult dripping through the trolls post to which I responded deserved all the ridicule and self righteousness it contained.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  140. Damn! I cant think of a proper conspiracy theory! by rootpassbird · · Score: 0, Offtopic

    bah! the feds, the cia, the chinese, the....
    who would use a short!!
    We're under attack!
    Declare War on [insert free power source here]

    --
    Hackers have long memories. It works both ways.
  141. Not so simple. by CFD339 · · Score: 4, Informative

    While it sounds like a reasonable approach at first, it makes assumptions that I can't make as an officer on scene.

    1. It assumes that the only problem is with the original transformer. When I arrive on scene I don't know what the problem was -- even if you tell me you do know, I can't believe it. I also don't know what the secondary problems are.

    2. Feeding power into a building that has been physically damaged is very very dangerous. We're not talking about a transformer "failing to work" we're talking about something that blew the walls off the room it was in.

    3. We already know that things didn't go the way they were supposed to. Something failed. Some safety plan didn't work. We have to assume that we're dealing with chaos until proved otherwise.

    So, as a fire officer I arrive on scene and have a smoke filled building with reports of an explosion and MAYBE a report that everyone is out. I need to go in and find out what happened, if anything is still burning or in immediate danger, and if anyone is still in side. To do that safely, the first thing I want to do is secure the power to the building (shut it off) as well as any other utility feeds (oil, steam, liquefied petroleum or natural gas).

    The gear I carry -- even the radio -- is designed to never create even the tiniest spark in its operation. We call it "intrinsically safe". Its one of a great many precautions we take.

    We go in to a place like this not knowing the equipment, not knowing its condition.

    My final proof point --

    If in fact The Planet had powered up their generators, they'd have fried a lot more stuff and caused more fire. The may have destroyed their chances of salvaging the grid within 48 hours at all. Why? It turns out (we now know) that the force of the initial explosion moved three walls in the power distribution center more than a foot (I heard 3 feet I think) off their base. This tore out electrical connections, cables, conduits and power switches. Just now, after 28 hours, they've figured out how to get power to the servers on the second floor, but for the first floor servers they're having to rig up a line from the generators to that floor and it will take until tomorrow to do that. Why? Because the electrical connections from that distribution room to the first floor servers are destroyed. They're going to be running 3000 servers on the first floor off those generators for a week while they get the equipment to rebuild the connectivity to the main distribution room.

    What does this prove?

    1. It proves the fire marshal was right in not allowing them to feed power in their.

    2. It proves that when that big dumb fireman you see (who may be a volunteer who's also a network guy and software developer with an IQ above 95% of the world) may in fact have a good reason for the way they do things on scene.

    Look, as a firefighter I don't set out to ruin someone's day. I set out to keep them safe. If that sounds paternalistic, well, It is paternalistic. It very much feels that way. In my small town, its how I feel. I wonder ever time I walk into a building, how I would protect MY PEOPLE in this building if a fire broke out or a hazmat incident started or whatever. You can't help it, its what you're trained to do.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
    1. Re:Not so simple. by Artuir · · Score: 1

      Thank you for doing the job you do. I've always had a ton of respect for the police and firefighters (and even hospital staff, as my mother was a nurse and I got to meet a great many of them as a kid), but the clarity of thought you guys have is always impressive to me.

    2. Re:Not so simple. by ZOP · · Score: 1

      I've never had a problem with the way any fireman has handled an active incident scene. I have had *many* issues with inspections though. Thats one area that is quite arbitrary depending on the inspector, locale, and how much of a risk he or she feels you might be to his/her crews it seems.

      In terms of incident management, it is standard and well accepted that *ALL* utilities will be shut off prior to any crew doing anything, this includes emergency utilities (this is why many locales have EPO requirements for anything larger than X -- and why all large gear has EPO provisions)

    3. Re:Not so simple. by Jay+L · · Score: 1

      3. We already know that things didn't go the way they were supposed to. Something failed. Some safety plan didn't work. We have to assume that we're dealing with chaos until proved otherwise.

      A-freaking-men to ALL your posts. I think point #3 is the part your (now-modded-below-my-reading-level) antagonist isn't getting.

      Yes, we know how electrical circuits work. We know the theory. We know which way current flows. We know logic.

      We also know that when things are working the way we know them to work, nothing blows up. Therefore, we know that, somewhere in the blown-up, on-fire building, reality has diverged from that theory. And, usually, it's not one thing: it's a confluence of events. There's no reason to place lives at risk; turn the power off, and find out exactly what happened and why.

    4. Re:Not so simple. by jesboat · · Score: 1

      What does this prove?

      1. It proves the fire marshal was right in not allowing them to feed power in their. That the results of the decision the fire marshal made ended up being better than the results if they'd power on proves absolutely nothing; it's like saying "well, I pointed the revolver at my head and pulled the trigger, but that chamber was empty. Therefore, I made the right decision." (Understand, I'm attempting at all to claim that the marshal's decision was wrong, just that the decision can't be judged accurately with more information than was known at the time.)

      2. It proves that when that big dumb fireman you see (who may be a volunteer who's also a network guy and software developer with an IQ above 95% of the world) may in fact have a good reason for the way they do things on scene.

      Look, as a firefighter I don't set out to ruin someone's day. I set out to keep them safe. If that sounds paternalistic, well, It is paternalistic. It very much feels that way. In my small town, its how I feel. I wonder ever time I walk into a building, how I would protect MY PEOPLE in this building if a fire broke out or a hazmat incident started or whatever. You can't help it, its what you're trained to do. I should hope that, once the fire was out, you'd get the fuck away and let the owners of the building do whatever they wanted, because if not, you're seriously impinging on their rights without a pressing reason. As others have said below, if the there isn't an immediate threat to the public (i.e. there's no fire), you should have no ability to force yourself upon the building's owners.
    5. Re:Not so simple. by CFD339 · · Score: 1


      Re: Once the fire is out --

      There are a few steps, actually:

      1st -- it isn't just "fire out" its "scene is safe" which may include hazmat cleanup or other hazards.

      2nd -- Sometimes we have to stay and keep people out for a police investigation. Not often, they usually do that but we have lots of fancy lighting and such that can help.

      3rd -- Often, we do much more. For example, in a house fire we may have put a hole the roof to vent, pips may have burst or broken (or their solder melted, as is frequent), windows are broken, etc. For a home, we usually stay and try to get boards up to keep the weather out. We usually also stay (in winter) to get a plumber on scene to get the heat going where possible, or the rest of the pipes drained when it isn't. That keeps them from freezing and creating more damage.

      4th -- sometimes we have to stay on scene overnight for a fire watch, especially if the alarm systems are damaged or if there was a very large fire and we're watching for hotspots.

      5th -- Once we're finally done with the "Emergency" part, the law in most states requires that the structure be inspected and occupancy requirements are met just as if it were new construction.

      While "YOU" may want to go back into "YOUR" building, that's a long way from a public building or a building where employees have to work. You can't just order your staff back into a building that hasn't got an occupancy certificate or is dangerous -- or tell them they don't have to go, but then they're fired.

      "Forcing myself upon the buildings owners" is really just a rather nasty way of saying you don't like the permitting, building code, and other such laws.

      A fire or explosion essentially makes any previous occupancy permit or code enforcement inspection essentially unreliable. In some states they become instantly invalid. In all states, they should. OSHA would never let you put people back to work in a building that can't pass code inspection. That's basic workers rights.

      The word "Owner" is the problem. I seriously doubt the "Owner" of that data center was on scene demanding to be allowed to personally start repairs in HIS building.

      --
      The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
    6. Re:Not so simple. by jesboat · · Score: 1

      You wrote a long comment, so I won't reply to every part individually. The main things I'd like to say:

      > Additional cleanup, "scene is safe"

      Again, I would hope, in such situations (any in which the building is no longer presenting a danger to the community around it), you'd do so only at discretion of the building's owner.

      > Fire watches, hotspots

      Hotspots are fair cause for concern; granted.

      > Police investigations, workplaces, public buildings, building codes

      Those shouldn't be the fire department's responsibility. Leave it to the police, labor regulators, commerce regulators, etc.

      > "Owner"

      Semantics; s/owner/owner or legal occupant/

  142. No, not necessarily by Sycraft-fu · · Score: 4, Informative

    You are probably thinking of auto insurance. Yes, it usually goes up when used. The reason is because when you use it, it is usually because you did something that changed your risk level. If you get in an accident, that makes you a higher risk. Continue to get in accidents, you are a higher risk still. Thus the companies want more money. It's all based on risk calculation. That's also why they want more money when you are under 25. Statistically speaking, young people are a much higher risk of accidents.

    Well with building insurance, that's not the case. You aren't really a significant risk factor. Risk is instead calculated of of things like what kind of structure it is, how far it is from the fire department, what it's used for, what it contains (that determines what they are on the hook for) etc. So when something happens, unless it was because of a previously unknown risk factor, your rates don't necessarily change. Nothing changed with regards to risk.

    Insurance is really all just risk based. They take the probability of having to make a payout and the amount of said payout vs time and come up with a rate. If something changes the risk, the rate will change as well, but if not then it doesn't change. It isn't as though your one single payout is of any significance to their overall operation.

    Also, the idea of "Just pay for it yourself," is extremely silly. It smacks of someone who's never owned something of any significant value. The reason behind insurance is that you CAN'T just pay for it yourself. For example I have insurance on my house. The reason is that if I lost it, I can't afford to replace it. I don't have a couple hundred grand just lying around in the bank. That's the point of insurance. You are insuring that if something happens that you can't afford, someone will pay for it. The insurance company is then, of course, that it isn't likely to happen and they get to keep the money.

    1. Re:No, not necessarily by Anonymous Coward · · Score: 0

      You are insuring that if something happens that you can't afford, someone will pay for it. The insurance company is then, of course, that it isn't likely to happen and they get to keep the money.

      Damn, you were so close to getting all the words in the right spot.

      I'm pretty sure that you meant to say:

      You are ensuring that if something happens that you can't afford, someone will pay for it.

  143. Re:I'm a firefighter AND a geek. You, not so much. by jacquesm · · Score: 1

    someone mod this up please.

    I'm in DC1 at the planet, I'm down and I'm not pointing any fingers. I'm pretty sure they did a reasonable job of setting up their systems in such a way that the chances of this happening was small to begin with, and when it did happen they seem to have things under control as much as possible.

    There are a lot of 'armchair' specialists and complainers around here and all I would like to say to them is we'll see how it goes when *you* operate 5 large datacenters for years. Accidents do happen (and by their nature are caused by the unforeseeable), how you deal with them is what matters.

    And fire fighters lives are more precious than *any* amount of hardware.

    The only person I blame for not verifying if what I thought was redundant DNS in two locations is me, and I really thought I had it set up that way :(

  144. Yep by Sycraft-fu · · Score: 3, Insightful

    For example someone like Newegg.com probably has a redundant data centre. Reason being that if their site is down, their income drops to 0. Even if they had the phone techs to do the orders nobody knows their phone number and since the site is down, you can't look it up. However someone like Rotel.com probably doesn't. If their site is down it's inconvenient, and might possibly cost them some sales from people who can't research their products online, but ultimately it isn't a big deal even if it's gone for a couple of days. Thus it isn't so likely they'd spend the money on being in different data centres.

    You are also right on in terms of type of failure. I've been at the whole computer support business for quite a while now, and I have a lot of friends who do the same thing. I don't know that I could count the number of servers that I've seen die. I wouldn't call it a common occurrence, but it happens often enough that it is a real concern and thus important servers tend to have backups. However I've never heard of a data centre being taken out (I mean from someone I know personally, I've seen it on the news). Even when a UPS blew up in the university's main data centre, it didn't end up having to go down.

    I'm willing to bet that if you were able to get statistics on the whole of the US, you'd find my little sample is quite true. There'd be a lot of cases of servers dying, but very, very few of whole data centres going down, and then usually only because of things like hurricanes or the 9/11 attacks. Thus, a backup server makes sense, however unless it is really important a backup data centre may not.

    1. Re:Yep by Anonymous Coward · · Score: 0

      Actually, (big bits of) data centers going down is more common than you may think. In 10 years working for a major service provider I've seen it three times, excluding a number of times where just one or a few racks were affected.

      The first (and worst) was when a cooling system failed and began to leak, water went down the wall and under the false floor, it also make the whole datacenter (one room, maybe 100 full racks) foggy. The water managed to find a high-voltage line (and old one from a removed mainframe) and BOOM, no power. They had it restored in 45 minutes, but all the systems needed to be restarted and some had disk and other problems.

      The second wasn't our datacenter, it was one from a large rail organisation. Again cooling failure but it wasn't properly monitored, and the room got to some 65c. Our server alerted me when it's hard disks got REALLY hot.

      The third was because a datacenter was put into service a little early. The local power company didn't have the power to supply yet and the power of the datacenter was supplimented with diesel generators. WHen the power was finally ready they wanted to do a power redundancy check, and switch over to the power net and something went wrong. There was a brownout and most servers went down. This was maybe 1000 racks.

      We've also had various data segments go down for different reasons. with the problem above, a surprising amount of the c* routers forgot thier NVram and had to be manually reconfigured.

      Redundancy is seldom "true", there's always a straw that can break the camel's back. Someone forgot to set up the routers on different power phases, or UPSs. Blown servers taking out networks and therefor failing to fail over. Bad untested backups. Most customers want redundant, but when they realise the true price (maybe 20 times a single solution if you count hardware, networks, training, implimentation, testing).

      This in 10 years working in and with many datacenters. So it happens, and I know for a fact that it also happened often to our competitors.

  145. Re:I'm a firefighter AND a geek. You, not so much. by jacquesm · · Score: 1

    after an explosion you simply can not assume that the original wiring diagram is still matching reality. Any discrepancy between the two translates in to a serious elevation of the risk...

    In other words, what you think is a 'dead' wire could easily be a live one because one or more cables that used to be insulated are now connected.

    before having inspected the situation and seen that things are good you're better off not risking powering up.

    In fact, now that some of the dust has cleared up it seems that the damage was in fact much more serious than was assumed initially, and powering up the servers using emergency generators would not have mattered one little bit (whether it would have worked or not, or even made matters worse is another matter).

  146. 9000???!!! by Anonymous Coward · · Score: 0

    It's over 9000!!!!!

  147. Re:I'm a firefighter AND a geek. You, not so much. by jacquesm · · Score: 1

    never before was 'anonymous coward' more appropriate.

    if you're sick of these 'obey authority' fools telling you what you are allowed to criticize and not criticize I suggest you set up your own commercial firefighting service under your new and enlightened guidelines.

    If you can get so much as 1 single person working for you under those guidelines I'll be very amazed.

    Attacking the messenger is perfectly acceptable if the messenger tells you how to do your job for you and what risks you should take to save their property. No amount of property is worth the life of a firefighting crew.

    Btw, we lost three firefighters in a flashover nearby recently, they were in fact 'just' trying to save some property.

  148. BeOS by ProfessionalCookie · · Score: 1

    Remember the Computer on Fire function in the BeOS kernel?

  149. But.....What about Ilmari's interview? by redspike · · Score: 1

    How will it be affected by the extended fireball and what are the ramifications of a positive or negative response.

  150. I wonder if they changed their data center? by deeny · · Score: 1
    1. Re:I wonder if they changed their data center? by KoReE · · Score: 1

      or if it still looks like this. That data center is not the one that is having problems. I believe that picture is a Dallas data center. The Houston DC known as hstntx1 is the one having issues.
      --
      Instant Karma's gonna get you...
  151. World of Warcraft? by ATMosby · · Score: 1

    I wonder if that's why World of Warcraft is down?

  152. Location by Anonymous Coward · · Score: 0

    So hstntx1 exploded... That datacenter is located in Greenpoint (Guns point), Houston.

  153. NEW service update page for ThePlanet by martyb · · Score: 1

    I'm clicking on your link, but nothing is happening. Am I doing it wrong?

    I click the link and it DOES bring up the page. Unfortunately, since it is a cached copy of the page, it is sometimes out of date. I.e., there have been updates to the actual page that are not reflected in what the Coral Cache copy displays. :/

    As of this writing (Monday morning, 06/02/08), it appears that ThePlanet Datacenter folks have created a NEW STATUS PAGE to lessen the load on their servers:

    1. Re:NEW service update page for ThePlanet by pcgabe · · Score: 1

      I'm clicking on your link, but nothing is happening. Am I doing it wrong?
      I click the link and it DOES bring up the page.
      Hi, sorry, I was teasing you. You pasted a URL, but did not make it a link. Do you perhaps have a linkifier? Because, not everyone does.
      --
      Don't put advice in your sig.
  154. Not entirely true by phorm · · Score: 1

    Insurance companies around here have generally looked at two things when I was applying for home insurance:

    a) Have I been a previous customer of them or their affiliates (discount points)

    b) Have I been a customer of other companies *without a claim* (discount points based on claimless time)

    You can get further points by things such as having security bars on windows (anti-theft), a home alarm system, fire extinguishers, properly placed fire alarms, etc etc.

    Your overall discount is then based on your final number of points.

    So, while your "base rate" doesn't change much based on this, the final price can vary quite a bit depending on which discounts you're eligible for, with claims-free time being one of the factors in this.

  155. If thier down how are they posting? by Anonymous Coward · · Score: 0

    The point I think everyone is missing is.. If their power is down then who are they using to post the updates. I think everyone affected by this outage should use that hosting co as their provider. lol!

  156. omg by Anonymous Coward · · Score: 0

    IT'S OVER 9000!!!11

  157. The Note sent to Account Holders by stimuli_ii · · Score: 1
    Got this note in my control panel (Orbit).

    06/01/2008 11:00 PM CDT Update
    As previously committed, I would like to provide an update on where we stand following yesterday's explosion in our H1 data center. First, I would like to extend my sincere thanks for your patience during the past 28 hours. We are acutely aware that uptime is critical to your business, and you have my personal commitment that The Planet team will continue to work around the clock to restore your service.

    As you have read, we have begun receiving some of the equipment required to start repairs. While no customer servers have been damaged or lost, we have new information that damage to our H1 data center is worse than initially expected. Three walls of the electrical equipment room on the first floor blew several feet from their original position, and the underground cabling that powers the first floor of H1 was destroyed.

    There is some good news, however. We have found a way to get power to Phase 2 (upstairs, second floor) of the data center and to restore network connectivity. We will be powering up the air conditioning system and other necessary equipment within the next few hours. Once these systems are tested, we will begin bringing the 6,000 servers online. It will take four to five hours to get them all running.

    We have brought in additional support from Dallas to have more hands and eyes on site to help with any servers that may experience problems. The call center has also brought in double staff to handle the increase in tickets we're expecting. Hopefully by sunrise tomorrow Phase 2 will be well on its way to full production.

    Let me next address Phase 1 (first floor) of the data center and the affected 3,000 servers. The news is not as good, and we were not as lucky. The damage there was far more extensive, and we have a bigger challenge that will require a two-step process. For the first step, we have designed a temporary method that we believe will bring power back to those servers sometime tomorrow evening, but the solution will be temporary. We will use a generator to supply power through next weekend when the necessary gear will be delivered to permanently restore normal utility power and our battery backup system. During the upcoming week, we will be working with those customers to resolve issues.

    We know this may not be a satisfactory solution for you and your business but at this time, it is the best we can do.

    We understand that you will be due service credits based on our Service Level Agreement. We will proactively begin providing those following the restoration of service, which is our number priority, so please bear with us until this has been completed.

    I recognize that this is not all good news. I can only assure you we will continue to utilize every means possible to fully restore service.

    I plan to have an audio update tomorrow evening.

    Until then,

    Douglas J. Erwin
    Chairman & Chief Executive Officer
  158. 1995 Humor by Paulus+Wolfe · · Score: 1

    Hack the Planet!

    Showing my age

    --
    Did they find Earth? Tell them to turn around. We'll likely blow them outta the sky before they had a chance to talk.
  159. Re:More planning could have prevented this by sjames · · Score: 1

    Given that they shut down under orders from the fire department and that an explosion that knocked 3 walls down didn't damage the batteries, I'd guess they have more than one power room. They have a fair portion of machines powered up now. Since it would take longer than that clean up after an exploded transformer and install a new one in it's place, that room must not have been a single point of failure.

    Honestly, it doesn't matter how well isolated your power systems are from each other, if there is a fire, particularly one with explosions, the fire department WILL order all power shut down. If you don't do it, they will (perhaps not so nicely). You will not be permitted to restore power at all until they are satisfied that it is perfectly safe. They will not be the least bit concerned with your uptime. Their priorities will be 1. nobody gets hurt and 2. no more fires.

    Believe me, I am well past sick and tired of advertising that is synonymous with a big fat lie, but I don't think this is an example of it.

  160. Oblig quote by Astadar · · Score: 1

    (With all due respect to the brave men and women of the fire department, couldn't resist)

    Ray: "Everything would have been fine if dickless here hadn't shut off the main power grid!"
    Walter Peck: "These men caused an explosion!"
    Mayor: "Is this true?"
    Peter Venkman: "Yes, it's true... this man has no dick."

    --
    --Coming up with something clever... please wait...
  161. Re:I'm a firefighter AND a geek. You, not so much. by Anonymous Coward · · Score: 0

    Guess you look a little like a fool, eh? Not only did you design a single point of failure into your system with DNS, but you ranted against not turning on the generators as being, "right up there with the voodoo and witchcraft method of personal protection," and it turns out that powering up the generators would have indeed made things worse.

    Unless the data center was designed to explode, clearly we were in a situation where things were not operating as designed. Trying to figure out what's safe in the middle of an event where you're already in unknown territory isn't very easy.

  162. soon to be former customer by Anonymous Coward · · Score: 0

    This has been handled very poorly. I don't understand the "praise" for communication.

    1. The first notification went out 6hrs after the incident.

    2. The update messages didn't provide any real info - just "we're continuing to work on restoring power."

    I found out about the plan to get the 2nd floor up and running on another site hours before the Planet communicated that. There were messages that "more important" customers were having their servers moved to the second floor to be brought back up sooner, as well as other message boards of "important customers" having their machines moved to other locations. Aren't all customers important?

    3. The tech staff was virtually useless - precanned responses.

    When I first learned of the issue, I asked about switching machines or changing DNS and the tech support person said I was taking a gamble as they'd be up by noon Sunday. I switched DNS as I didn't want my sites down any longer.

    The next day when I asked again whether it was better to just get a new machine, they agreed, but the best part is their offer was for a more expensive machine with less features than the one I just got 2 weeks ago. They wouldn't even match the specs of the machine or the price - what kind of customer service is that?

    Whoever out there thinks The Planet is handling this great is crazy.

    At this point as it closes in on 2 days downtime with really no vote of confidence in being fully restored, they probably could have packed all the servers up and moved them to other locations during the same time period.

    I too will dump these guys as soon as I get all the data off.

    Anyone with recommendations on server solutions should post them to these forums.

  163. Re:More planning could have prevented this by njcoder · · Score: 1

    Given that they shut down under orders from the fire department and that an explosion that knocked 3 walls down didn't damage the batteries, I'd guess they have more than one power room. They have a fair portion of machines powered up now. Since it would take longer than that clean up after an exploded transformer and install a new one in it's place, that room must not have been a single point of failure. They are running on backup generators last I read. Everything they put out talked about one power room.

    Honestly, it doesn't matter how well isolated your power systems are from each other, if there is a fire, particularly one with explosions, the fire department WILL order all power shut down. That's my point. They should have had the local fire authorities involved in their disaster recovery plan so that the fire dept would know that it was safe to turn on the backup generators sooner.

  164. Re:More planning could have prevented this by sjames · · Score: 1

    That's my point. They should have had the local fire authorities involved in their disaster recovery plan so that the fire dept would know that it was safe to turn on the backup generators sooner.

    Ever tried to do that? Unless you are a life critical operation like an intensive care unit they aren't likely to be all that interested and anything they might agree to is non-binding.

    As for running on backup generators, as long as they are willing to sustain that long enough to replace the transformer, what's the problem? They have demonstrated the ability and willingness to provide power in the event that the transformer and power room are blown to bits.

  165. Re:More planning could have prevented this by njcoder · · Score: 1

    Ever tried to do that? Unless you are a life critical operation like an intensive care unit they aren't likely to be all that interested and anything they might agree to is non-binding. I don't know how hard it is in Texas but around here, if you want to have a fire engine standing by on-site for a day you just have to fork over a couple grand.

    Once a quarter when they claim they test their backup systems they should have a fire engine standing by and have an inspector or chief there too and walk them through what's happening. That way if there's a real disaster, the fire dept either payed attention and knows you're backup system is isolated from the main power system or you've showed them some sort of competence during your drills that they believe you if they didn't pay attention.

    As for running on backup generators, as long as they are willing to sustain that long enough to replace the transformer, what's the problem? They have demonstrated the ability and willingness to provide power in the event that the transformer and power room are blown to bits. The problem is they didn't have a fully redundant power system without a single point of failure like they claim. It took them more than 40 hours to start powering systems up. This is not what most people would expect from a "world class data center" that's capable of reliably handling your IT infrastructure.

  166. Re:I'm a firefighter AND a geek. You, not so much. by Anonymous Coward · · Score: 0

    Hm, the fire department is always poking around our power equipment. Hope they are actually looking for something, or know what to look for, or are they just poking around for the hell of it?

  167. Jumping Ship by waderoush · · Score: 1

    Xconomy was one of the sites hosted at "H1," as The Planet calls it. After waiting all day Sunday to see whether we'd be back up on Monday, we decided to move the site back over to Media Temple, our previous hosting provider, at least temporarily. (Ironically, one of the reasons we left Media Temple in the first place was that they couldn't handle the traffic when our flying car stories got slashdotted.) We published a post about our experience with the outage this morning.

  168. Re:I'm a firefighter AND a geek. You, not so much. by JoeShmoe · · Score: 1


    I'm not going to fellate you because you are a firefighter, sorry. There is no such thing as a sacred elephant and there are pros and cons to everything.

    As a previous posted accuratly points out, we have been forbidden from taking our own actions when we consider the risks and costs to be worth it. I would have no problem if the choice to hire a firefighting service and understand that the more the risk, the more it will cost me. What I have a problem with is there being a monopoly of only ONE legally allowed provider, and then having that provider refuse service.

    I can't stand the way if you criticize troops or firemen, you are considered flamebait. It's a volunteer service. I know plenty of macho asshats who sign up specifically because they know they are basically going to be beyond reproach if they make it in.

    People risk their lives every day. I already pointed out that the average electrician (who makes a journeyman's wages of about $25K) faces more danger and risk in his daily job than a firefighter may face in a week. Hell, someone manning a counter at a convenience store in a bad neighborhood has a chance at getting shot.

    You aren't special. You aren't magical. You aren't beyond reproach. You don't have a monopoly on heroism. I can point to countless tales of ordinary people who risked their lives or even died trying to pull someone from a canal or even a burning building.

    In fact, I have more respect for the average joe who does it than the person who collects a paycheck to do it.

    And if you don't like it, then I expect you to support the privitization of social services so that if some chief decides that the 0.000001% of death is an unacceptible risk for saving property, I can pick up the phone and call someone who will.

    PS for SOG that are "very specific" about risk, you have some pretty vague and unspecific terms like "reasonable" and "minimal"

    -JoeShmoe
    .

    --
    -- I wonder which will go down in history as the bigger failure: the War on Drugs or the War on Filesharing
  169. Re:I'm a firefighter AND a geek. You, not so much. by JoeShmoe · · Score: 1


    OK so it sounds like an easy way to kill everyone at a major metropolitan hospital is to set off a smoke bomb in the main eletrical room because then they will unquestioningly shut down ALL ELECTRICITY including the backup generators that are keeping people on life support alive.

    What, that's stupid, you say? Well so is your blanket assertion.

    And you are only confirming what I said... firefighter != electrician which is why I used the term IGNORANT. Regardless of the facts and 20/20 hindsight in this particular scenario, the question remains...is it considered standard operating procedure at ANY hosting facility to shut down all power, INCLUDING backup power, regardless of the size or nature of the threat?

    Or was this an accurate and analyzed response and not just a knee-jerk reactions. Because otherwise I hope the terrorists don't learn we are ten smoke bombs away from having our entire telecommunications infrastructure turned off to "avoid risking firefighters"

    -JoeShmoe
    .

    --
    -- I wonder which will go down in history as the bigger failure: the War on Drugs or the War on Filesharing
  170. Re:I'm a firefighter AND a geek. You, not so much. by JoeShmoe · · Score: 1


    Because you probably wouldnt accept what I said, here is an article with some interesting stats:

    http://money.cnn.com/2006/08/16/pf/2005_most_dangerous_jobs/index.htm

    I don't see firefighters on that list. But I do see electrical workers.

    Who's waving the flags for them?

    -JoeShmoe
    .

    --
    -- I wonder which will go down in history as the bigger failure: the War on Drugs or the War on Filesharing
  171. Re:More planning could have prevented this by silas_moeckel · · Score: 1

    Actually the fact that they have to run off generators shows they did not have N+1 redundancy in there power feed. Reading though the lines it looks like they had a single power feed into the building then generators/UPS after that. I've designed and built "world class" data centers and data centers used by hosting providers don't get the two confused. Hosting is cost centric business in general and insurance is cheaper than gear. Now the funny bit about world class data centers is they are still expected to fail. Services running out of them are hosted at 2 locations if at all possible and a primary and backup set of gear per location. It's all rather expensive but thats how you get to 5 nines or better. Nothing will help you if you have a single DC and the fire trucks roll in and start cutting power.

    --
    No sir I dont like it.
  172. I'm glad you place so high a value on my life by CFD339 · · Score: 1

    Its really sad and funny.

    By the way, Last year I earned less than $2000 as a firefighter. We're volunteers (or in the case of most, the term is sort of a misnomer, we're paid a minimal amount of money to keep some legal requirements met by our tows).

    Here's some numbers for you - I believe these are a year or two out of date, but I'm not going to look for newer:

    Of US firefighters, ~300,000 are full time career firefighters, while ~800,000 are your neighbors who have regular jobs and respond when called.

    Of US fire departments, 86% are all call-responders (volunteers) while 92% have at least half call-responders. Last I knew, FDNY had at least one call-responder station out on Long Island, but that may be out of date.

    I did not and do not ask to be held up as a shining knight of irreproachable perfection. It wouldn't fit well anyway. I did ask that you not ridicule and insult an honorable vocation and the people who, like me, spend hundreds of hours a year training to be aware of how to deal with emergency situations ranging from a toaster oven fire to a train derailment with toxic chemicals or a data center fire with massive hazards.

    So no, I'm no Bruce Willis. I'm a network guy, a business owner, and a software developer -- and a volunteer firefighter who spends as much time training for that field as in computers and technology. You may be surprised, but both are extremely technical fields.

    Your statements do not accurately reflect the real danger of the situation in general, nor did they reflect a solid understanding of this incident in particular. You seem to think this was a transformer outside the building and that generator power could be applied through the generator at minimal risk to the firefighters and the workers in the building. That just isn't the case here, and usually isn't.

    Finally, you are prevented from doing things which can cause you harm in cases where -I- am obligated to save you, in cases where you endanger other people (including me), or in cases where you risk damage to other people's property.

    At this particular scene, the walls of the room containing this transformer were blown several feet from where they'd been. Virtually all the power conduits and lines in that room were totally mangled. It took 28 hours for the electricians to repair enough of that wiring and infrastructure for the second floor power to begin to be restored -- with generators. I'm told a few hours of that was spent waiting for the return of the fire marshal to inspect and certify the work as safe. It has taken almost 20 more hours to create a temporary new power distribution infrastructure to manage the power to the first floor. I'm hearing reports that people's 1st floor servers are starting to come online -- several hours ahead of expectations as laid out last night. The second floor will be on regular power soon if its not now, but the first floor will have to run on those backup generators for a week, while the entire power grid for that floor is re-built from scratch before it can be connected permanently.

    So, it seems the fire marshal was right. Also, it seems that it is very unlikely if he had not required it be kept off line that TP would have brought the generators up without first inspecting the damage, and as soon as they had they'd have known it would be impossible.

    Your frustration at having your servers off line has lead to you declaring that they are worth more to you than the lives of human beings. That, I find disgusting.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  173. As far as SOG's being non-specific by CFD339 · · Score: 1

    SOG's are guidelines and not procedures or rules for reasons like what we're arguing about.

    Take any specific incident and we can pick apart its details -- especially in the light of day with more facts. A firefighting crew - arrives on scene knowing only that there's something really wrong. You may have heard "explosion" or "fire" but often what you hear en-route is dead wrong. You don't know what caused the explosion. You don't know what exploded. You don't know if people are trapped or injured. You don't know if the riped and stripped wiring is carrying enough voltage to kill you when you touch a chair leaning against a wire you didn't see. You don't know if the explosion was from a gas leak and there's more gas leaking now. You don't know what toxic chemicals are in whatever blew up. These days, we're also trained to think explosions may be bombs. In that case you may have a secondary bomb -- people who set bombs like to make a small first one that draws police and fire, then a large second one that kills them.

    You go into a dark building carrying every kind of tool you can carry to deal with whatever variety of broken thing you may find. One group is doing a search for people. One group is doing a search for fire - or other hazards. Fire may look out but be in the walls, or overhead in the drop ceiling. You can't tell.

    In Price William County, a very well trained crew entered a house where fire was visible on the outside back wall. It was before 7am, there were cars in the garage and nobody on the front lawn to say if anyone got out yet. They made the second floor and found temperatures not over 90 degrees and a light haze in the air. We call that a tenable environment so we search. They got down the hallway when the fire dropped down on them from the attack space. One of the two up stairs go t out, the other didn't. The reports say the temperature at face height in that upstairs hallway went from 90 degrees to over 700 degrees in a few seconds.

    I do my work in a small town hundreds of miles away. Still, we studied that incident like we study any other where men are killed. In most cases, they're killed because something got a lot more dangerous than it looked -- even to trained firefighters -- very very fast.

    This is why I ask you not to insult firefighters by pretending to know what is and isn't dangerous without the years of training and practice that go with it.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  174. Depends on the firefighter, but generally.... by CFD339 · · Score: 1

    If its me doing the poking, I probably have a good idea what I'm looking at -- and actually that would make it harder to do my job as a firefighter because I'd be too interested in the stuff.

    What they're looking for at the basics of good electrical line management. No blocked vents, no exposed wiring. No extension cords through walls (common), no extension cords used as permanent wiring (common), no extension cords coiled up and flowing power (heat builds up and they catch fire), no chains of power strips -- and so on.

    Also, are the fire doors operational and not blocked open? Are the sprinkler or other fire suppression systems in order? Are exit signs lit and accurate? Are emergency battery lighting units charged and ready?

    They don't care if you stuff doesn't work. They care if you get trapped because the fire code stuff isn't right.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  175. Re:More planning could have prevented this by Anonymous Coward · · Score: 0

    how does a 'data center' in a box go down, when it's 'power room' explodes? I don't know, but I bet it looks cool when it 'splodes.
  176. Ignorant reseller = single point-of-failure by Anonymous Coward · · Score: 0

    sigh. best practice DNS is to have the servers on distinct networks -- you had your two servers hanging off the same switch didn't you?

    RE fire chief you are also wrong. All fire suppression systems are linked to the power system and kill all power when the fire suppression system is triggered (so that the fire fighters are not electrocuted while fighting the fire). The tech you talked to on the phone that gave you the info your rant is based on likely knows very little about how the data center _facilities_ are operated (they only support the software). The NoC and data center are very different places and the techs have different skills and experience that staff them.

  177. vitaminDXX by Anonymous Coward · · Score: 0

    some pics a friend sent me from his ventrilo providers site who hosts with the planet... looks pretty intense

    http://www.pure-voice.net/forums/viewtopic.php?t=608

  178. More than just H1 servers were affected by newsblaze · · Score: 1

    There was more disruption than to the servers in H1. NewsBlaze.com is in another datacenter, but it seems both NS1 and NS2 are in the same place, in H1. So even though our webserver equipment wasn't involved, we were down too. Depending on DNS cache times, traffic slowly dropped off to just a trickle and then eventually nothing. And it took them a long time to get that fixed, a lot longer than it should have. They never apeared to listen to what I told them when I called - and I tried hard to get the support staff to pass the message along. This is one of the problems with big companies - they don't listen to their customers at the times they most need to. We couldn't have been the only ones in that situation. It would be interesting to know how many servers outside H1 were affected. Thank you to the few slashdotters who visited the NewsBlaze story before the server became inaccessible. I'll be writing more about this. The Planet Houston Data Center Goes Up in a Puff of Smoke

    --
    Daily News http://newsblaze.com
  179. Re:More planning could have prevented this by sjames · · Score: 1

    The problem is they didn't have a fully redundant power system without a single point of failure like they claim. It took them more than 40 hours to start powering systems up. This is not what most people would expect from a "world class data center" that's capable of reliably handling your IT infrastructure.

    What single point failed and took everything down? They had TWO failures, one was the explosion and the other was a shutdown order from the FD.

    Considering that there was an electrical explosion, the latter is only to be expected. Had they somehow not done that, someone (quite probably you) would be griping about how they risked sending 9000 servers and a dozen employees up in flames just to help their stats.

    Funny thing about explosions, things move around as a result (cables, conduits, walls and parts thereof). There WAS after that a non-zero chance that a new hazard developed and that it could have threatened human life. No amount of redundancy would have avoided that. This isn't a spacecraft, it's a data center. It's not like the threat to life from a shutdown was as great or greater than the threat of not shutting down.

  180. Re:More planning could have prevented this by sjames · · Score: 1

    Actually the fact that they have to run off generators shows they did not have N+1 redundancy in there power feed.

    Feed into the building, no they did not, but so what? Into the servers, yes they did, as advertised.

  181. linkifier by martyb · · Score: 1

    I'm clicking on your link, but nothing is happening. Am I doing it wrong?
    I click the link and it DOES bring up the page.
    Hi, sorry, I was teasing you. You pasted a URL, but did not make it a link. Do you perhaps have a linkifier? Because, not everyone does.

    HEY! Thanks for that! I just recently stopped making an explicit link in posts here because they seemed to automagically turn into a link anyway. I just figured it was some /. feature. No idea how long it'd take for me to discover it was my linkification Firefox addon unless you had pointed it out! Thanks again!

  182. Re:More planning could have prevented this by silas_moeckel · · Score: 1

    The point is redundant power feeds is a fairly cheap and common practice. They did not seem to have that in there world class data centers. They put themselves up against IBM datacenters and the three of those I've worked at all have n+2 power feeds into the building from multiple substations.

    --
    No sir I dont like it.
  183. More stupidity, and you're wrong about ignorant... by CFD339 · · Score: 1

    Life support gear is, as I understand it, built with battery powered redundancy and regularly tested. I don't work with that gear, but I believe it is the case.

    I also believe that there are some circuits in hospitals which are specially labeled and are not shut off unless absolutely proved critical or already damaged. In these cases, a lot of money is spent building safety conduits for their cabling and other precautions so that they can handle major damage to the building without becoming a hazard.

    Hospital emergencies are their own unique events and there are pre-plan documents and procedures in place for dealing with exactly the issues you describe.

    Finally, I would point out that many firefighters are in fact electricians. You see, even career firefighters are not paid well, and most have a second job. Of those, a majority are in the construction and or contracting trades. It is a good fit for them.

    People misunderstand the role of a firefighter thinking we just show up and put water on things that are hot. Surely that's the fun part of the job.

    In reality, we also have to be many many other things. We have to be truck drivers (you ever drive fast in a 40,000 pound truck carrying a thousand gallons of moving liquid?). We have to be experts in building construction. We have to understand electrical work. We have to be certified in hazmat operations. More recently, we also have to be certified in NIMS (National Incident Management System) which allows us to inter operate using the same language and procedures. We have to be experts at high and low angle rope rescue, confined space rescue, below ground rescue, first responder medical support, mechanical rescue (man vs. meat grinder), traffic control, flood control, crowd control, tree removal, bees, wasps, & snakes, vehicle rescue (we don't take patients from cars, we remove cars -- in chunks -- from around patients) and anything else risky or scary you might want help with.

    Even plumbing and water supply -- Just imagine showing up on a scene without a fire hydrant for miles, and being able to organize a tanker shuttle, dump tanks, pumpers, and lines to supply more than 1000 gallons of water a minute within 5 minutes of arriving on scene. That's enough water to fill your pool faster than you can fill your bathtub.

    A firefighter crew is a small group of men very much like Macguyver (not as smart maybe but better equiped) with every kind of tool imaginable that they can carry around with them (especially on a heavy rescue unit). You put these guys into ANY situation and within seconds they'll organize around a safe plan for getting to the best possible resolution with the least risk to life and property.

    I think the only ignorance I see here, is that which you are demonstrating in your examples.

    I'll give you credit for one thing, however. When you state that you are not going to fellate me, you are 100% correct. No matter how nicely you ask.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  184. Re:Kevin Hazard? Was JUST speaking 2 his subadmins by Anonymous Coward · · Score: 0

    I was just speaking with Mark Causa, a forums admin of his, this weekend in fact!

    (Kevin Hazard's their "SUPER ADMIN" in fact).

    (It was in regards to a "IPS Driver Error" I was CONSTANTLY seeing on a posting of mine there, in an attempt to update/edit it, on THEPLANET's forums (in regards to securing Windows))...

    WoW! I was trying to point them to security issues too... & they were VERY helpful guys too, trying to help ME out (& going overboard imo in some ways)

    I was also today, in fact, prior to seeing this - going to note they were being listed as a site that had problems with hacker/cracker types abusing them as well, per one of these sites:

    http://www.castlecops.com/

    http://mtc.sri.com/

    http://www.spamhaus.org/sbl/latest.lasso

    http://www.phishtank.com/

    (or, one of the numerous others I look @ daily, like SANS, PacketStorm, etc.)

    They were listing theplanet as being abused etc. the past few weeks now in fact, by hacker/cracker/spammer types.

    APK

    P.S.=> I doubt this is due to "hacker/crackers" though, personally... just bad setup in the server room! apk

  185. Re:More planning could have prevented this by sjames · · Score: 1

    I understand that, but I don't see where it matters one whit so long as they have some means to provide power with the main transformer blown up (in this case diesel).

    What that amounts to is they decided that the extra diesel would cost them less than the second grid tie. Perhaps they were right, perhaps not, but their duty to the customer is to somehow (barring orders civil authorities) be able to provide power to customers when the primary is down. They are doing that now.

  186. Re:I'm a firefighter AND a geek. You, not so much. by jesboat · · Score: 1

    Did you even bother to read the comment you just replied to? It wasn't the same person as the author comment you originally ranted to; on the contrary, he agreed with you.

    Your first post was quite reasonable. This one made me think you're an asshat who thinks he's entitled to decide everything that might remotely affect him, and *not* just in your job.

  187. Haha by Anonymous Coward · · Score: 0

    So sorry to do this, but it must be done....

    IT'S OVER NINE THOUSAND!!!