Slashdot Mirror


LiveJournal Blackout Analysis Online

Hakubi_Washu writes "LiveJournal has posted their official analysis of what happened last Friday. Apparently someone "accidentally" pushed the emergency power off (which should keep all power off, even UPS), reset it and ran off. They had problems to come back up fast, because of "9 machines with faulty motherboards with embedded NICs that don't do auto-negotiation properly", Machines not fully rebooting for analysis reasons and few others. "

13 of 333 comments (clear)

  1. Re:Where was the switch? by grub · · Score: 2, Informative


    They usually are in a server room. They're for emergencies. Ours have red cages around them and a BIG RED SIGN, you have to basically punch them.

    --
    Trolling is a art,
  2. Auto-negotiation by stilwebm · · Score: 3, Informative

    When I first moved company servers in to a new colo four years ago, their engineers advised me that I should turn auto-negotiation off on every port, including our switches and host NICs. I asked why they recommended this and they replied, "trust us, auto-negotiation causes problems when you least expect it." I went ahead and fixed the port speeds everywhere. Now I understand why.

  3. Credit by XorNand · · Score: 4, Informative

    Anyone who's a paid member of LJ can get a 2-week credit here.

    --
    Entrepreneur : (noun), French for "unemployed"
  4. The reason why some NICs don't auto-neg by phaetonic · · Score: 2, Informative

    I have run across this issue in data centers numerous times. This still occurs with the latest hardware, no matter what vendor or OS. I have this problem on SunFire280Rs and Compaq DL360s. What it comes down to is the switch being used in the data center and the settings in the OS. Typically, data centers set their switch to forced 100-full (unless of course they are using fibre or Gb). The OS must be set to force its NICs in the same mode, or they will either drop alot of packets. Sounds like a disconnect in communications between the NOC and the customer.

    1. Re:The reason why some NICs don't auto-neg by caluml · · Score: 2, Informative

      That's what Compaq Lights-Out cards are for. Lovely things. Very handy.

  5. Re:Wait a second! by rah1420 · · Score: 2, Informative

    Technically, yes. I'm hoping that if LJ decides to implement such a scheme (let's call it "LEPO" for "Leisurely Emergency Power Off") that they run it past the fire marshal or the code inspectors first, who may have another opinion about how smart this idea is.

    "If it's stupid and it works, it's not stupid."

    --
    Mit der Dummheit kämpfen Götter selbst vergebens.
  6. Not millions of paying accounts. by EvilStein · · Score: 4, Informative

    Actually, most of the accounts don't pay. They're just freeloading whiners.

    This is a paste from the Livejournal stats:

    * Free Account: 5713743 (98.3%)
    * Early Adopter: 14220 (0.2%)
    * Paid Account: 94857 (1.6%)
    * Permanent Account: 1632 (0.0%)

  7. Re:Also, by Scott+Laird · · Score: 2, Informative

    "Why do we even have that button?" Because it's basically required by law. Covering them with a plastic cover doesn't seem to help either--Internap did that the *last* time someone hit the EPO button in this datacenter.

  8. Re:Lesser OS... by ergo98 · · Score: 2, Informative

    Power failed to get to the computers. It was a power failure - whether it was the electric grid, the UPS blowing up, or all the wires in the wall, or in this case the EPO button, it's a bloody power failure.

  9. You, sir, are an idiot. by Anonymous Coward · · Score: 5, Informative

    Go ahead and read up on how auto-negotiation works. I'll wait...

    No, really. Go read up on it...

    Okay, since you don't bother reading up on it, and since you claim that someone's cheeky because they *document* what happens when you misconfigure a connection, I must conclude that you, sir, are indeed an idiot.

    (To summarize for those of you who won't bother to look it up, a NIC can sense the carrier for 100, so it can differentiate 10/100. Full and half are actively negotiated by the two sides of the connection. If side 'A' is hard set to 100/full, it won't negotiate with the other side. Hearing no negotiation, side 'B' will assume the NIC doesn't support full duplex connections and failover to half duplex. This is the proper, standardized, documented behavior. Anything else would require the psychic interface spec that *still* hasn't been finalized.)

  10. Re:No! by SenorChuck · · Score: 2, Informative

    On all of the (actual) servers I've worked with, the onboard NICs are exactly the same hardware that you get with the server-grade PCI NICs.

    --
    A wise person makes his own decisions, a weak one obeys public opinion. -- Chinese proverb
  11. Re:No UPSes before? by Nonesuch · · Score: 2, Informative
    I'm surprised that they didn't have their own little UPSes to bring the system down cleanly before. Sure, the facility is supposed to provide power at all times, even if there's a power grid interruption, but that doesn't get tested very often and isn't under your control. Furthermore, in the event that the facility's power is actually going to go out, there isn't any way for the machines to find this out and shut down cleanly.
    Unfortunately, this would defeat the purpose of the "Big Red Button", which is there to quickly and definitively cut of all power to all line-powered devices in the data center.

    When you've got an analyst smoking and twitching next to one of the racks as 110VAC courses through her veins, you don't want to have to go hunting to figure out which UPS is supplying the juice.

  12. Nothing wrong with onboard NICs in "real" servers. by Nonesuch · · Score: 2, Informative
    Does not mean it's a good idea! Not a single machine where I work uses the on-board NIC, from servers down to desktops. And all of our machines have a two year lifecycle, tops. We generally plug in a 3Com card of some type.
    The smallest of the Sun 1U rackmount Sparc servers do not even have a PCI slot to take a NIC -- no expansion at all, but two on-board 100M interfaces are plenty for most data center deployments of these small boxes.