Slashdot Mirror


LiveJournal Servers Go Down

Wind writes "According to any journal hosted off of LiveJournal.com, the LiveJournal data center Internap has suffered a critical power failure, leaving all of LiveJournal and its content temporarily offline and requiring the revival of 100+ servers. Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size? Updated information is posted here."

94 of 596 comments (clear)

  1. Lights out by r_glen · · Score: 5, Funny

    Sounds like someone was taking a nap over at Internap

  2. The Pain ... by webfiend · · Score: 5, Funny

    You can't imagine the withdrawals I'm going through. It's like the great Slashdot brownouts of '98.

    I need my fix, man!

    1. Re:The Pain ... by thoughtcriminal87 · · Score: 4, Funny

      Or the great Slashdot 503s of '04

    2. Re:The Pain ... by DrEldarion · · Score: 5, Funny

      Honestly! Now we have to wait a day or so to find out what MelissaMinx492 ate for breakfast today!

    3. Re:The Pain ... by MikeXpop · · Score: 4, Funny

      Don't worry, it was waffles. But her dad used the last of the syrup. Man, he never can think of her needs, can he? I mean really, what's his problem?

      --
      Etiquette is etiquette. He kills his mother but he can't wear grey trousers.
    4. Re:The Pain ... by RichardX · · Score: 4, Funny

      Man, I need them to get those servers back up! I've got a whole pile of journals to not read. I'm getting behind on my ignoring

      --
      Curiosity was framed. Ignorance killed the cat.
  3. Elsewhere by jm92956n · · Score: 4, Funny

    In related news, 6,000 teen-age girls were heard yelling "OMG! WTF! How will John know I life him if I can't blog about it!"

    --
    An effective signature identifies a particular user amongst a base of thousands.
    1. Re:Elsewhere by barryman_5000 · · Score: 3, Funny

      Thats right, sometimes I go by the name John.

    2. Re:Elsewhere by LGagnon · · Score: 4, Funny

      This is LiveJournal we're talking about; I think his spelling came out pretty accurate.

  4. In other news... by Anonymous Coward · · Score: 5, Funny

    ...the collective IQ of the internet has raised about 20 points.

    1. Re:In other news... by Anonymous Coward · · Score: 2, Funny

      ...the collective IQ of the internet has raised about 20 points.

      So just imagine the result of when Slashdot dies!

  5. Sucks for LiveJournal... by whiteranger99x · · Score: 2, Funny

    but that's ONE HELL of a Slashdotting! :)

    --
    Join the TWIT army now!
  6. slashdot has repeated 503 errors, by Anonymous Coward · · Score: 5, Insightful

    and search.pl is constantly being trashed by distributed xanga botnets. perhaps michael wasn't quite prepared to be an editor of slashdot?

    1. Re:slashdot has repeated 503 errors, by stupidfoo · · Score: 5, Insightful

      How is this a troll? It's funny that an "editor" at site with as many problems as slashdot has feels that it isn't amazingly hypocritical to mock another site that is currently having problems. People in glass houses indeed.

      Slashdot has semi-major problems almost every day. 503 errors, "nothing for you to see here" annoyances, and a search engine that goes down more than a Thai hooker.

    2. Re:slashdot has repeated 503 errors, by webfiend · · Score: 2, Funny
      Slashdot has semi-major problems almost every day. 503 errors, "nothing for you to see here" annoyances, and a search engine that goes down more than a Thai hooker.

      Oh man, that would be one *fantastic* search engine! Is there a Google beta for this?

    3. Re:slashdot has repeated 503 errors, by elemental23 · · Score: 3, Informative

      Perhaps you're new here, but italicized text in Slashdot stories is written by the story submitter. Editorial comments, if any, are not in italics. In other words, Michael didn't say anything at all in this story.

      That said, the story submitter is clearly trolling himself, as neither 6A's nor LJ's staff had anything to do with the massive power failure at their co-lo.

      --
      I like my women like my coffee... pale and bitter.
  7. Internap is *down*? by MightyTribble · · Score: 5, Informative
    Internap *down*?
    Bush just appointed Internap's CEO to his National Infrastructure Advisory Council, yet the man can't keep a co-lo facility switched on.

    I'm not sure what that says of Bush or of Interap. And it certainly doesn't seem to have anything to do with SixApart.

    1. Re:Internap is *down*? by gl4ss · · Score: 2, Interesting

      replying to myself. mod my other post down or something.

      "Our data center (Internap) lost all its power, including redundant backup power, for some unknown reason. (unknown to me, at least) We're currently dealing with bringing our 100+ servers back online. Not fun. We're not happy about this. Sorry... :-/ More details later.

      Update #1, 7:35 pm PST: we're up on 'dirty' power for now (it works, but it's unreliable), and we're working to assess the state of the databases. The worst thing we could do right now is rush the site up in an unreliable state. We're checking all the hardware and data, making sure everything's consistent. Where it's not, we'll be restoring from recent backups and replaying all the changes since that time, to get to the current point in time, but in good shape. We'll be providing more technical details later, for those curious, on the power failure (when we learn more), the database details, and the recovery process. For now, please be patient. We'll be working all weekend on this if we have to."

      soo.. EVEN THE BATTERY BACKUP WAS FUCKED. GOOD GOING.

      --
      world was created 5 seconds before this post as it is.
    2. Re:Internap is *down*? by slashrogue · · Score: 4, Interesting

      Sorry, but one man can't control power to an entire co-location. I used to work for a local telecom and one day our fiber went down for about 1/3 of our customers. The reason? Some guy shootin' squirrels blew the fiber lines apart. It was on a Sunday too, when there was minimal phone tech support, I think one guy ended up fielding 350+ calls by himself.

    3. Re:Internap is *down*? by Cramer · · Score: 4, Interesting

      I recall a story from a CP&L (now Prgress Energy) lineman about how someone had shot the fiber line strung on the high voltage tower -- between the million volt lines. As he put it, you put a pencil through the hole and line up where the "hick" (his word) proped up the rifle to shoot what he must've thought was a power line. It no fun working up there. And even less fun having to make fusion plices surrounded by million volt lines. (linemen work up there, not fiber techs.)

    4. Re:Internap is *down*? by ces · · Score: 2, Informative

      The co-location facility in question has plenty of backup UPS power with plenty of generator capacity behind that. Supposedly there is enough generator capacity to fully power everything in the building including the network TV station even with one generator out.

      The UPS gear in Internap's space is all top-of-the-line big datacenter grade stuff. Apparently there was some sort of wiring fault in one of the new UPSes they were bringing online that caused both building power to fail and the self-protection circuits in all of their UPSes to trip.

      IOW it was either a faulty UPS or a faulty wiring job by the electrical contractor.

      Livejournal isn't the only ones who got burned by this outage. The colocation facility in question is supposedly one of the most solid in the state and nothing short of a direct strike from a comet is supposed to be able to take it offline. My company was in the same boat as our gear is in the same facility as LiveJournal's.

      Sure both LiveJournal and the company I work for could have hedged our bets by having redundant gear in another facility in another state, but that is a pain in the ass especially when backend databases are involved. To tell you the truth it probably isn't really worth the bother unless you truely have a need for six nines of uptime.

      --
      Happy Fun Ball is for external use only.
    5. Re:Internap is *down*? by slavemowgli · · Score: 3, Insightful

      That's debatable, and depending on what database you use, having more than one database server (or pool of database servers) in different physical locations that are kept in sync at all times is definitely possible. I'm not sure whether MySQL allows this, but I think if you have a site that has nearly 6 million users, more than 100000 of which are paying you for the service you provide (I'm one of those, one might add), then you really should look into doing just that - or at least I hope the LJ people will do now (I don't really want to blame them for the problem).

      That being said, I think you didn't quite understand what I was trying to say. I really don't care whether they have "plenty of backup power", "plenty of generator capacity" and "top-of-the-line big datacenter grade stuff" (which really sounds more like a collection of buzzwords than anything else, anyway). If a wiring fault (of whatever kind) can bring up the entire UPS system as well as the "generator capacity behind that" and all other safeguards they supposedly had in place as well, then it's just worthless and a waste of money - a UPS is supposed to be an *uninterrupted* power supply.

      And while I admit that it's not possible to guard against *all* problems, saying that the colo facility is "one of the most solid in the state" and supposedly can't be taken offline by something "short of a direct strike from a comet" is just silly when a "wiring failure" can bring down the whole thing, and even more so when it's not the first time that happens.

      Really, this just stinks of an attitude that's all too prevalent in parts of the IT industry - just piecing together the components of a reliable system won't necessarily give you one, and if you can't build one properly, then don't go advertising that you have one. Don't you think the fact that the LJ people are now planning to buy their own UPS equipment to use on top of the facility's should tell you something?

      Oh, and regarding six nines of uptime - I don't think you actually realize for how little downtime that actually would allow. It's about 30 seconds per year, and Livejournal has been down for at least 16 hours, which corresponds to an uptime of about 99.8% - only two nines left. They probably (hopefully!) won't fall down to one, but things are bad enough as it is, and I, at least, fully blame Internap for that (and, again, I'm a paying user on LJ, so I reserve the right to do just that. ^_~)

      --
      quidquid latine dictum sit altum videtur.
  8. What a cock by realdpk · · Score: 5, Insightful

    "Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size?"

    Perhaps shit happens, and a blog service doesn't warrant the necessary investment to survive whatever caused this outage?

    1. Re:What a cock by qcubed · · Score: 2, Insightful

      i'm sorry, how exactly does this reflect poorly on sixapart?

      THIS doesn't reflect poorly on them. their licensing scheme for movabletype does.

    2. Re:What a cock by Cramer · · Score: 2, Interesting

      Power can only be so redundant. More than once, the entire server room at a previous employer went dark. Multiple circuits and multiple power supplies won't do any good when a battery in the UPS explodes and blows the main (150A) breaker -- the entire floor went dark... disconnected from UPS, generator, and utility.

      (Once from a battery failure, and once again from a phase mismatch coming off generator that no one caught before the UPS was drained -- the alarm panel was outside the break room and no one that knew what it was walked past it.)

    3. Re:What a cock by casparianaremi · · Score: 2, Insightful

      I was prepared for my friends page on LJ to be full of "It's SixApart's fault" but didn't expect to see it here! Six Apart bought the company but have made no changes, the servers would be down whether they'd done it or not, so to claim it's their fault is just pretty dumb, IMO.

    4. Re:What a cock by mlefevre · · Score: 2, Insightful

      "half at one co-lo and the other half at another co-lo"

      Then they'd either need multi-gigabit bandwidth between the two co-los (which would probably cost for a week what they make per year), or they'd have to make separate, semi-independent communities. Google's servers don't stay in sync - you get different results according to which servers you hit, which isn't something you can do with "live" journals.

  9. i don't get it by dazedyugo · · Score: 2, Funny

    so it's deadjournal now ?

  10. sounds like good news by freakybob · · Score: 4, Funny

    Well now the millions (?) of users might actually have something to write about when the servers are back up. "Today I went outside. My pupils have never been tinier..."

  11. Was that really called for? by Anonymous Coward · · Score: 3, Insightful

    Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size?

    Ok, I understand that you don't like Six Apart; I'm no fan of their new licensing scheme either. However, I really doubt that SixApart has any control over any power failures that might occur at Internap.

    1. Re:Was that really called for? by Peter+Cooper · · Score: 2, Informative

      To be honest, a deal was announced what.. a week ago? I seriously doubt Six Apart has control over anything at this point.

  12. ANGST!!! by Anonymous Coward · · Score: 2, Funny

    Where will I write about my depression over this event?

    Oh. Slashdot.

  13. Please, Please, Please! by La+Camiseta · · Score: 3, Informative

    Use the Coralized link. No sense in crashing their status page. Plust it'll respond a lot quicker than loading the actual web page.

  14. A disturbance I feel by philkerr · · Score: 4, Funny

    I feel a great disturbance in the force..... It's as if a million bloggers cried out all at once..... and became silent.

  15. Tomorrow's news by Anonymous Coward · · Score: 2, Funny

    The population of depessed pre-teens has just dropped by 20%

    1. Re:Tomorrow's news by Miss_Saturnine · · Score: 2, Funny

      ...in other news, Bic announces all-time high profits in sales of disposable razors...

  16. Melodramatic by CypherXero · · Score: 2, Funny

    It's not like most LiveJournal user's have enough to worry about, here's something for most LJ users to get melodramatic about. I'm serious, randomly pick 5 LiveJournal blogs, and I guarantee 4 out of 5 are going to be "Fuck the World" posts.

  17. poor internap by Indy1 · · Score: 2, Interesting

    sounds like all the fucking spammers they host overtaxed spammer-nap's power resources and brought it all down.

    Seriously though, spammer-nap is a massive spam haus, see for yourself

    --
    Lawyers, MBA's, RIAA? A jedi fears not these things!
  18. Disclaimer: I am Not an Electrical Engineer by ebooher · · Score: 5, Informative

    I know nothing of how InterNap is set up. I just want to throw that out there ahead of time. Now, it's time for my patent pending "Bull Shit Theory of the Day."

    Ok, here is the rant. I used to work for a Colocation facility. Nothing special, small by Telco terms. The whole facility only had about 1500 cabinets. (Though I hear they are now full, and going to be expanding.)

    We had a main power draw off of the local grid. We had a backup power draw off of the *next* cities power grid. (ie, when all the offices around us went dark, we still had power.) And you don't even want to know the kind of red tape we had to go through for *that* pull. I'm still not sure how they did it. We had fly wheel kinetic electricity storage systems, battery backups, and a diesel engine from a train so large it had it's own building.

    We used to joke that if we lost power, we had more important things to worry about. And again, we were small time compared to some of the massiveness that is out there. *cough*AADS Chicago*cough*

    So I'm kind of in agreement with the statement currently on LiveJournal. It's unknown to me how any self respecting colo facility can say "We've had a power outage that also took our redundant systems."

    I have to call bullshit on that entire train of thought. If that's true then they don't *have* any redundant systems, and I'd be looking for a new provider. The most likely thing (at least in my mind) is that someone, somewhere got mad at something specific and decided to make a point by popping the main breaker to their portion of the facility.

    Oh, that was another thing, each room had several "main" breakers. It took a hell of a power surge to pop all of them, and the Liebert systems had power filters of some kind, really really big capacitors or something I think, so a surge really never made it to the other side anyway, it got stored in the cap and then trickled out like the rest of the power.

    But I was a UNIX admin, not the EE that was planning the power generation aspects of the facility. So take some of it with grains of what ever white powdered spice you prefer.

    --
    "Genius may shine aloof and alone, like a star, but goodness is social, and it takes two men and God to make a Brother."
    1. Re:Disclaimer: I am Not an Electrical Engineer by DrBlubGut · · Score: 2, Insightful

      Happens some days. A key breaker at a data center some of my friends work at went bad and took the whole floor with it. Generators didn't even get a shot at takeing the load because the breakage happend later in the circut. No matter how big and bad your infrastructure some points in the design will not be 100% resistent to all problems. We do our best, make plans, design good systems but the world teaches us Shit happens.

    2. Re:Disclaimer: I am Not an Electrical Engineer by CrankyFool · · Score: 4, Interesting

      Back in the great days of the .com boom, people were building their colo facilities to insane (in a beautiful way) standards. I remember touring Exodus and Above.net (I don't know who you're referring to, though I only ever heard of above.net adopting flywheels) and being just very amused at the cool stuff they were putting in place.

      I recently (~8 months back) did some contract work for a small company whose servers were based in some colo facility in San Francisco. One of the first things I noticed was a damn heavy UPS at the bottom of their rack. Weird, I thought -- why not rely on the colo's battery system?

      Because they don't have one.

      Mind you, this was also the colo that had a cardkey system that had long ago stopped being usable, so when you needed access you used a Radio Shack $29.99 wireless intercom system and someone would come to open the door, and when you checked in they carefully wrote your name on a little nametag.

      I think standards have slipped, significantly. In some respects, this is likely a good thing -- it means you have more options now, because you can choose either the super duper "we hook up to two countries' power grids, have eight flywheels and a direct feed from microwaves in orbit" or the "err, here's your cabinet. We'll give you decent power until we don't" options.

      So ... how much are all these people paying LiveJournal again? Couldn't they request some sort of partial refund of their monthly fee?

      Oh, wait...

    3. Re:Disclaimer: I am Not an Electrical Engineer by Anonymous Coward · · Score: 5, Informative

      My friend's company is hosted by internap. Today he messaged me when the power went down. It was only power to the second floor, my friend's servers, while cut off to the internet were still running (on the 3rd floor). Internap has redundancy and backup generators (and enough fuel onsite to run for 30 days without external power). Apparently there was construction occuring on the second floor... my guess is that some dipshit contractor cut through a power cable or 3 and took the whole floor down.

      To all the people accusing LJ of being stupid for not having UPS systems, Internap has 3 fully redundant power systems (yes, I know, didn't help much) so most people probably don't feel the need to run their own ups.

    4. Re:Disclaimer: I am Not an Electrical Engineer by Xoder · · Score: 2, Informative

      Actually, about a year ago, they had some months of bad performance and gave all paid members an additional 2mo (or so, I forget exactly) of paid member-level service, free of charge.

      --
      The previous sig has been removed due to /. protecting your best interests
    5. Re:Disclaimer: I am Not an Electrical Engineer by mizalaina · · Score: 2, Informative

      I work at a co-lo facility now. The problem is probably that what people call redundant power often isn't highly available, nor is usage distributed correctly across the primary and redundant circuits. If one half of your power fails and you've mis-used or overloaded your redundant circuit then the redundant circuit is going to fail when it can't take the load that gets switched over to it. This is a result of poor planning.

      Keep in mind that often people have back-up power that's not conditioned, which is what is indicated by LJ's message. If the power were redundant and both sides were through UPSes, there would be no dirty power at all. A lot of co-lo facilities go on the cheap and their back-up power is just another circuit from a different transformer or a different Hydro company. So think about it: if the grid, transformer or power switching infrastructure fails, and you only have one back-up generator that also fails, or your UPS batteries can't take the pressure, or any of two dozen other things, your power has gone bye-bye.

      My prediction (which we are already seeing at my job) is that power and cooling are the Next Big Problems for co-lo. With blade servers demanding 220V, 30A 3-phase power and pulling 8kVA in 6U of space, no data centre as currently designed will be able to handle that on the scale we're going to see develop in the next year or two. People assumed power and cooling were unlimited resources. We were wrong. Oops!

      BTW, if what LJ is saying is true, this has little to do with Six Apart or Danga. It's Internap's fault within that particular data centre. The sales engineers/technical consultants/whatever they're called at Internap should have thought about this and pushed for audits, but they probably didn't. I doubt Danga knew enough about the potential problem to make good decisions about it: they're just a customer and assumed that the power would work. It's an infrastructure thing, and while the customer should educate themselves, they often don't. It's why I bug my customers constantly with power audits and suggestions.

      Just something to think about. :)

    6. Re:Disclaimer: I am Not an Electrical Engineer by nettdata · · Score: 2, Interesting

      Maybe, but sometimes shit just happens, regardless of design.

      I find this Risk site to be very interesting reading, especially when it talks about some failure issues and scenarios.

      My favourite was about Squirrel that took down the Nasdaq. (I've also heard squirrels/mice/rats etc called "self propelled short circuits", but that's another story)

      Now, I've been involved in systems architecture design, planning, and management for years, and I think that a lot of people drastically underestimate just how fscking complex and dificult proper planning, execution, testing, maintenance, and administration of these systems can be... especially when faced with budgetary restrictions.

      The cost of a system rises almost exponentially as you approach 100% uptime... even 99.999 is freaking expensive to implement and manage. Never mind the complexity and administrative requirements.

      Who knows... maybe dealing with the PR issues of this outtage is still orders of magnitude cheaper in the long run than putting in the systems required to achieve the uptime.

      At the end of the day, what are the business impacts of this outtage? For that matter, they seem to have received more exposure than if they were operating normally.

      A lot of people are aware of the fact that sometimes things break, and we're not landing planes in the fog here. The fact that shit broke and they're bringing it back in an informed and somewhat timely manner may HELP them, in that some people may get a stronger sense of "these guys can deal with problems that hit them".

      --



      $0.02 (CDN)
  19. Gee, I wonder... by Rie+Beam · · Score: 4, Funny

    Update from the site:

    "Update #1, 7:35 pm PST: we're up on 'dirty' power for now (it works, but it's unreliable)".

    Congrats to LiveJournal for assembly a coal generator in a record time.

    1. Re:Gee, I wonder... by Rie+Beam · · Score: 2, Informative

      Sir, I will fight your advice until my grave.

  20. Update by TrevorB · · Score: 3, Interesting

    On the Livejournal main page:

    Update #1, 7:35 pm PST: we're up on 'dirty' power for now (it works, but it's unreliable), and we're working to assess the state of the databases. The worst thing we could do right now is rush the site up in an unreliable state. We're checking all the hardware and data, making sure everything's consistent. Where it's not, we'll be restoring from recent backups and replaying all the changes since that time, to get to the current point in time, but in good shape. We'll be providing more technical details later, for those curious, on the power failure (when we learn more), the database details, and the recovery process. For now, please be patient. We'll be working all weekend on this if we have to.

    Lovely. I just bought another year's subscription for my wife, figuring the change to Six Apart wouldn't change anything for a few months at least. LJ could lose a lot of subscribers with an outage just after the takeover.

  21. Look at me! Look at me! by Cyburbia · · Score: 4, Funny

    live journal is dark like my soul like my heart a void its link is cut just like i'll be doing to my arm i blame my parents

  22. Re:Bad IDea. by jamie · · Score: 2, Informative
    It is. Slashdot gets about 1/10th the pageviews of LJ.

    The Slashdot effect is more visible because we send all our readers to one place at the same time, while LJ is highly distributed.

  23. A great disturbance in the Force... by YowzaTheYuzzum · · Score: 5, Funny

    ... as if millions of teenage girls suddenly cried out in terror and were suddenly silenced.

    1. Re:A great disturbance in the Force... by cHiphead · · Score: 2, Funny

      shouldnt that be "millions of lonely emo teenage boys"?

      hopefully the 'power outage' that took out lj was actually something cool like someone sneaking an emp bomb into the datacenter. and not some dipshit power company employee hooking up something wrong on a transformer outside and melting the lines.

      cheers.

      --

      This is my sig. There are many like it, but this one is mine.
    2. Re:A great disturbance in the Force... by Council · · Score: 2, Informative

      Livejournal is something like 65:35 female:male.

      --
      xkcd.com - a webcomic of mathematics, love, and language.
    3. Re:A great disturbance in the Force... by drsquare · · Score: 3, Funny

      >>Livejournal is something like 65:35 female:male.
      >It's cute to see such naivety still on the internet. Never played any MMORPGs, huh?


      This is different to MMORPGs, MMORPGs are generally a male domain, with men pretending to be women to get favours. On the other hand, blogs involve things women like doing, i.e. fucking going on and on about shit no-one cares about.

  24. Re:./ed !!!! Server Reboot Time? by ebooher · · Score: 4, Interesting

    This is another thing that bothers me about this scenario. I can't say that I've ever admined 100 servers, the most I've ever had was about 30, but if we had a power loss of any kind, you'd just repower them and walk away. Most of them were DEC Alpha gear running Tru64. Why would you spec out a box that has to be handheld every reboot? The only time you should have to handhold a server is during an upgrade. A power cycle without proper SIGHUP or term signals should just run fdisk on it's way back up. (K, so it might take an hour for the server to go live again, but still.) I mean, am I missing something here? Maybe since nothing I've admined got the traffic these things do .... I'm just lost. Some one hit me with the clue by four.

    The only thing I can even think of is they have explicit services that must be started manually ..... but why would you want that? If you have a power hiccup in the middle of the night, you want it to come back up, and be live and happy again *before* you even get the first page. I mean sure, if there was a surge, and that destroyed components, and those components have to be replaced ..... but ..... a reboot is a reboot, man. Here, smoke some source. It's the good stuff.

    --
    "Genius may shine aloof and alone, like a star, but goodness is social, and it takes two men and God to make a Brother."
  25. Six Apart is hosting them already? by MattW · · Score: 3, Insightful

    Er, they just announced Six Apart was buying them like days ago. I doubt they transitioned the servers in the first week.

  26. Re:./ed !!!! Server Reboot Time? by bradfitz · · Score: 5, Insightful

    They all came back up when the power came back.

    But we intentionally don't have databases come back up on boot because if there was a blip, we want to do an integrity check first. (we run InnoDB, so it's ACID, but we're paranoid ...)

    We have clusters of 2 identical databases in separate cabinets, separate switches, separate Internap power feeds... so normally losing one database in each cluster doesn't matter: the other one gets used. But when we lose every single database, in all clusters, all at once... that's the time to be paranoid and double check stuff.

  27. "LiveJournal Servers Go Down" by PornMaster · · Score: 4, Funny

    LiveJournal Servers Go Down

    With thousands of teenage girls unable to ponder in an open forum whether or not to blow their boyfriends, thousands of teenage girls go down.

  28. Where's my irony stick? by gmhowell · · Score: 4, Insightful

    Because michael needs a beating. The site that rolls beta (alpha?) code onto live servers complaining and making jokes because another site goes down through no fault of its own?

    --
    Jesus was all right but his disciples were thick and ordinary. -John Lennon
    1. Re:Where's my irony stick? by elmegil · · Score: 2, Funny

      Oh no. Not a beating. There's a much better use to which we should be putting that irony stick to help Michael out.

      --
      7 November 2006: The day Americans realized corruption and incompetence weren't addressing 11 September 2001
    2. Re:Where's my irony stick? by ces · · Score: 2, Interesting

      Really, talk about the pot calling the kettle black.

      If the datacenter that hosts Slashdot was to have a massive power failure how long would /. be down for?

      That said my company has gear in the same datacenter as LJ, our servers were back up 10 minutes after power was restored. Then again we use Oracle on HP-UX with nice SAN RAID boxes for storage for our database. So our stuff tends to recover from a sudden power loss a little better than a MySQL derivative running on clone hardware.

      --
      Happy Fun Ball is for external use only.
    3. Re:Where's my irony stick? by gmhowell · · Score: 2, Interesting

      Everything in italics is the exact comment from the submitter. Everything else NOT IN italics, is any of the additional comments by the editors. Haven't you ever noticed that????

      Long before you showed up here.

      Michael, as an editor, could easily rewrite the summary (and perhaps he did). Or he could choose the most inflammatorily written piece, and pretend to have presented the article without bias.

      Certainly none of these theories are more tin-foil-hattish than 95% of the stories on YRO.

      --
      Jesus was all right but his disciples were thick and ordinary. -John Lennon
  29. No... by EdMcMan · · Score: 4, Insightful

    Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size?

    What does Six Apart have to do with Internap? Livejournal has been using - and wanting to switch from - Internap for a long time.

  30. Re:./ed !!!! Server Reboot Time? by TrevorB · · Score: 2, Informative

    For those people who might not know, Brad Fitzpatrick is Livejournal User #1.

    I'd have to agree with the AC, Brad, stop posting to slashdot and hover over that DB rebuild a bit more.

    (Yes, posting to slashdot relieves tension... Whatever it takes, Brad.)

  31. Bush supporter too dumb to understand datacenters? by alienmole · · Score: 2, Interesting
    It says nothing of Bush or Internap. It says everything about cheapskate blog admins who think they can run servers without paying for battery backup.

    The LiveJournal status page claims "Our data center (Internap) lost all its power, including redundant backup power". This is nothing to do with "cheapskate blog admins" and everything to do with a serious and quite likely unacceptable problem at Internap.

    Of course, that's why Anonymous Cowards start out with zero points. Guilty of idiocy until proven innocent.

  32. Re:./ed !!!! Server Reboot Time? by bradfitz · · Score: 5, Informative

    At this point all my whiteboards are full of boxes of each database cluster, the machines in that cluster, which have passed their checksum tests. (innodb checksums each 16k page), which replayed their replay/undo logs, where in binlogs each was writing/reading/executing etc...

    So lots of waiting now on the checksum validators. I don't want to put a machine back in and find out in a week there was a database page that was corrupt because the battery-backed write-back cache on the RAID card didn't work as advertised. (which happens on about 95% of RAID cards, in my experience, because they're mostly crap, even the most expensive ones...)

    Also whenever there's any doubt about something's integrity, we backup or snapshot the potentially corrupt version before operating on it. That operation can take time too.

    It's going to be a fun night.

  33. Value of Livejournal - "Open Source Philosophy" by DemonWeeping · · Score: 5, Interesting

    For those who don't know what's so hot about it and for those who think Livejournal is just a bunch of teenage girls whining.... Livejournal has just about four years of my life documented. The ease of use and the ability to "vent" is comforting, but the real value comes in the interaction. My friends see my life at their convenience and I see theirs at mine. We can choose to ignore the whining of others or we can choose to relate and comment on our own experience. Think of it this way: Open-source philosophy, emotion, and life. I put my own out there and others add to it. I add mine to others. Granted ... those quiz/meme things HAVE TO GO. I do not want to read about "what frog best resembles me" or "which 80's hair band song is me." Grrr.

    1. Re:Value of Livejournal - "Open Source Philosophy" by TiggsPanther · · Score: 2, Insightful

      Oh yes. If I ever feel the need to post any of those quiz-things I make good use of the <lj-cut> tag. So if anyone on my Friends list (or a random person finding my Journal) doesn't want to see the results they don't have to.

      Actually one of the more useful LJ Features i know of is one that allows you to screen out images over a set size from your Friends list. So you need to view the entry in question to see the image, which is good for your bandwidth and/or narrow page layout.

      --
      Tiggs
      "120 chars should be enough for everyone..."
  34. Before you get all down on LJ... by Bloodlent · · Score: 4, Informative

    Just remember it's not ALL obnoxious, over-emotional teen-angst teenage girls. I use mine to showcase (non-depressing)poetry and make intelligent comments about intelligent topics. Basically, if someone makes an LJ about their own life, it sucks. If you can manage to write an LJ and make it about things that matter to more people than just you(ie, "Why Bush's Iraqi war is unjust" vs. "Why this babe I know should bang me"), and at the same time make it funny and enjoyable to read, then you have a good LJ. Most LJs DO suck, but there are some diamonds in the rough.

    1. Re:Before you get all down on LJ... by Anonymous Coward · · Score: 2, Interesting

      Agreed, there are plenty of useful uses for a LiveJournal. LiveJournals are also a great outlet for creative content. For example, my friend and I started a satirical blog on LiveJournal dubbed The Rhubarb. (Of course the link won't work until LJ is back up).

  35. Blog, blog, blog, blog... by alienmole · · Score: 2, Funny

    Blog blog blog blog.
    Lovely blog!
    Wonderful blog!
    Blog blo-o-o-o-o-og blog blo-o-o-o-o-og blog.
    Lovely blog! Lovely blog!
    Lovely blog! Lovely blog!
    Lovely blog!
    Blog blog blog blog!

    -- The Viking Blog Song

  36. Not related to Six Apart by wersh · · Score: 3, Insightful

    From the article write-up (and reflecting the thoughts of quite a few of the comments I just read):

    Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size?

    I'd love to know what makes you think this has anything to do with Six Apart. The very first line at http://www.livejournal.com states:

    Our data center (Internap, the same one we've been at for many years)...

    They've been with Internap for years, predating Six Apart's takeover. Unless LJ staff is lying, the fault here sounds like it lies entirely with Internap.

    And as far as I can tell, Six Apart didn't ditch the LJ team when they bought them out, so you probably have the exact same people working on bringing the site back up now as you would have if Six Apart had never got involved.

  37. Re:Slashdot blogging for a fix by Buran · · Score: 2, Interesting

    I know the feeling. I have an LJ (for friends to read) in which I relay news, ramble about things that interest me, and write mini-essays from time to time. I don't whine about my parents or people at school or whatever (well, I do, but it's grumbling about idiots at work, since I work at a university) and the people I know are generally much the same. But I can't stand those typically teen idiotic ramblings either.

    But I too find it irritating that a service I use, that is supposed to be backed up (my account was bouncing up and down numerous times in the past week, too). For a paid service, I'd have expected there to be a lot more backups to make it more difficult for power problems to wipe out the entire site. If the hosting facility doesn't have a UPS, why wasn't one installed?

  38. The Big Red Switch by rah1420 · · Score: 3, Funny

    Someone probably hit the big red switch on the wall, the one covered in a plastic case

    That does happen. I remember working at Purolator Courier's data center in NJ back in -- oh, geez, mid-80s some time. I was a third shift print operator, helped out with the mag tape library too. One night the trouble alarm went off on the fire suppression panel. We'd been having trouble with it all week, and the alarm guy was due in in the morning. One of the newbie operators -- the only one at the console at the time, the others being on a smoke break or asleep in the tape library -- panicked and went over to the annunciator panel. He opened it as I watched him from the console area. I think he thought the halon was about to dump because he reached around the panel and instead of hitting the halon dump abort, he hit the emergency power cutoff.

    BLAM! It was as if a firecracker went off as all the breakers tripped and the fans came to a sighing halt. Both on this floor -- the one with the console and the tape drives -- and the floor above, with the CPU and the disk farms. Dead as a doornail.

    Now, this was Purolator COURIER. We had AIRPLANES coming in to land at Indy center and as of this moment, no way to tell the crews which gate to go to, where to unload their stuff, or how to sort it.

    Not only that, but this was an IBM mainframe shop -- S/390, the Big Iron, with 3380 disk drives. You don't just flip the power switch back on. An emergency power cutoff blows breakers in the power supplies on those DASD strings. The IBM Field Engineer was duly dispatched and arrived with cases of breakers the next morning. But we were still dark when I got off shift the following morning.

    The next night a brand new plexiglass cover was mounted over the Big Red Switch.

    --
    Mit der Dummheit kämpfen Götter selbst vergebens.
  39. bigger explination by moosesocks · · Score: 4, Insightful

    I'm surprised to see that Internap's main servers are back up. It's pretty irresponsible to bring up your corporate servers before those of your clients.

    That being said, LJ's servers are back up now, but they're making sure that the databases are all in sync -- LiveJournal has one of the most massive distributed MySQL clusters in existance along with a complete caching system.

    They need to make sure that the database is all synchronized before bringing it back up -- chances are they're going to rebuild the cache too. If they didn't, the initial strain on the DB servers would probably bring the site down again.

    This does however, bring up some questions about LiveJournal's network infrastructure. Danga (the creaters of LJ, recently purchased by Six Apart) are heavy users of Perl and MySQL. Needless to say, they have made numerous contributions to both projects and have developed an innovative memory caching system for linux.

    The questions raised however, come from Perl and MySQL. Both are questionable in terms of scalability. Although I'm not qualified to comment on this, I belive that the general concensus is that MySQL is one of the least efficent databases today. Livejournal has 100+ servers. I honestly don't think that a system the size of LiveJournal should require a server cluster that big. It seems that they are trying to solve their performance/reliability problems by blindly throwing hardware at it.

    Of course, I love livejournal. It's simple, easy to use, and is a great tool for building communities. Just as it is simple, it can also be incredibly nerdy (there's actually a command prompt!). They're also completely open source.

    Hopefully, Six Apart can make their network infrastructure more 'professional' while still maintianing the community spirit that has made it so successful.

    --
    -- If you try to fail and succeed, which have you done? - Uli's moose
    1. Re:bigger explination by andfarm · · Score: 2, Informative

      InnoDB *is* MySQL.

      --

      TANSTAAFI: There Ain't No Such Thing As A Free iPod.

    2. Re:bigger explination by Kyrrin · · Score: 4, Insightful

      As we've said a bunch of times in the past, moving away from MySQL would be prohibitive. By now we know how to make it work for us; switching away from MySQL would not only involve massive rewriting of stuff and alterations on the existing DB, it'd take the next five years before we got as comfortable with the flaws and advantages of another DB package.

      Sure, MySQL has its flaws -- some of them pretty big -- but we can work around them.

      As for the "not needing a server cluster that big" -- do you have any clue how much data we push in an average day? We maintain so many DB clusters to improve reliability, and we maintain so many web nodes because we push a screaming shitload of traffic.

  40. This will be bad PR... by Sparks23 · · Score: 2

    There were already lots of LiveJournal users who were upset and confused and unhappy with the idea that LJ and Danga (the company which made LJ) had been bought by SixApart. No doubt, as there have been no downtimes of this magnitude at LJ before, doomsayers will be claiming that it's SixApart's fault.

    Never mind common sense; it won't matter that if SixApart can be held responsible for failures at InterNAP's colocation facilities, they're a much bigger -- and more powerful -- company than most people have ever given them credit for...

    --
    --Rachel
  41. Update 2 by KinkifyTheNation · · Score: 2, Informative

    Update #2, 10:11 pm: So far so good. Things are checking out, but we're being paranoid. A few annoying issues, but nothing that's not fixable. We're going to be buying a bunch of rack-mount UPS units on Monday so this doesn't happen again. In the past we've always trusted Internap's insanely redundant power and UPS systems, but now that this has happened to us twice, we realize the first time wasn't a total freak coincidence. C'est la vie.

  42. The cause of the outage? by supersat · · Score: 2, Informative

    According to some LiveJournal employees, a massive UPS exploded. From IRC:

    <rahaeli> As far as we can tell, a UPS exploded.

    Their site now says that they're buying their own UPSes, because this is the second time that the entire data center has lost power. Details on the first outage can be found here (a Google cache since LJ is down).

    For the paranoid: This has nothing to do Six Apart buying LJ. They're still in the same "world-class" data center they've been in for years.

  43. Re:./ed !!!! Server Reboot Time? by cHiphead · · Score: 3, Funny

    you want beer and pizza? email me an address/zipcode at the sig email and ill do my part to support restoring lj.

    if my wife cant post this weekend, im gonna hear about it. and not even be able to post my lj about getting yelled it about lj being down as if i caused the power outage myself. ;)

    not really.

    well maybe.

    Cheers.

    --

    This is my sig. There are many like it, but this one is mine.
  44. It's strange by AndroidCat · · Score: 3, Funny

    Remember when teenagers were happy when people couldn't read all the personal details in their diary?

    --
    One line blog. I hear that they're called Twitters now.
  45. Re:./ed !!!! by Hooded+One · · Score: 3, Informative

    You do realize that LiveJournal handles far more traffic than Slashdot, and when Slashdot got linked on the front page of LJ, Slashdot started spewing out errors (more than normal).

    Oh hey, Slashdot just went down as I was typing this. Smooth.

  46. what Obi Wan would say if he were here by insomnyuk · · Score: 2, Funny

    "I have felt a great disturbance in the force; as if a million voices suddenly cried out in terror."

    Those poor, poor children.

  47. russia in shadows by schweller · · Score: 2, Interesting

    i won't exaggerate if i tell that in recent years most of "social life" in .ru zone moved to livejournal. it's 10 a.m. in russia now, and most of russian lj-addicts still don't know about apocalypse in lj. i hope everything will be turned up in the nearest future. brad, we believe in you! :)

  48. I call bull on all this by Anonymous Coward · · Score: 3, Insightful
    There seems to be a lot of latent hostility towards teenage girls. WTF? Your outlet is geeking out on Slashdot. Theirs is LJ. And how do you all know so much about the content of LJ anyway?

  49. Re:Bringing servers back is hard why? by Lew+Payne · · Score: 2, Funny

    | ...the poor APC UPS batteries weren't able to hold up the 150 servers I run.
    |
    | When the power came back on, we had 143 servers back on-line in ten minutes.
    | We had 149 on line in fifteen minutes. We had two servers (leased dedicateds)
    | that requires some file system repairs before they would come back on-line, but
    | that task was finished 30 minutes after power restoration.
    |
    | What's so hard about that?

    What's so hard about that? Well... not everyone who has 150 servers can get 151 of them back online in 30 minutes.

  50. Re:Bush supporter too dumb to understand datacente by RollingThunder · · Score: 2, Interesting

    Unless it means that the "cheapskate blog admins" were too cheapskate to buy proper dual-power supply boxes so that they can have dual power paths right to the servers.

    You can have all the great redundant mains and backups you want, and it's for shit if you only have one power line to the system and that power bus loses juice.

  51. Makes me wanna laugh by Jesus+IS+the+Devil · · Score: 2, Interesting

    It's funny how I was just met with some Internap sales people a few months ago. They were bragging about how their network infrastructure was superior to most others, since it intelligently routes traffic to the path of shortest response (not hops).

    They even bragged to me how their network uptime SLA is 100%! I mean good god, now I find out this is the SECOND time it's happened (from the livejournal update site)???

    I'm glad I didn't go with them...

    --

    eTrade SUCKS
  52. Like, what's wrong with you, people? by 21mhz · · Score: 2, Interesting

    The comments seem to be full of contempt for teenage -angst inane ramblings that are common on LJ. Come on. It's not like you are forced to read through this stuff.
    I have a few "friends" there at LJ, some of them net.celebs, and I like their posts. It's the matter of whose writings do you find interesting, and you are free to be completely unaware of the rest. Why all the vitriol?

    --
    My exception safety is -fno-exceptions.
  53. Re:./ed !!!! by Hooded+One · · Score: 4, Informative

    The Alexa link was the only tangible example I could find. I distinctly recall seeing a post by Brad himself mentioning how much more traffic LJ handles, but obviously I can't link to it at the moment.

    Anyway, as of Google's last crawl of the stats page (shortly before the outage), there were almost 6 million LJ users, a little under half of those "active." I don't know if /. has any stats available, but skimming through this page, the highest UID I see is in the 800,000 range. I'm not going to even attempt to guess what the relative activity level of LJ users is compared to /., or which has bigger pages or whatever, but I would offhand say that LJ probably handles more image traffic (user pictures, and now the in-testing photo hosting service). I know they used to use Akamai for that, but I seem to recall that fairly recently they switched over to doing something else. (I think they handle it themselves again, but I'm not sure.) There's also the audio files from phone posts. I'd say there's little question that LJ is the more heavily trafficked site.

    Besides, a lot of the DB load on Slashdot is eased tremendously by Memcached, developed by... Danga Interactive, i.e. LJ. Wikipedia uses it too, and just started using Perlbal. (And I do mean "just") Ditto for Audioscrobbler/Last.fm. So /. isn't in much of a position to pooh-pooh the technical ability of Brad/LJ.

  54. Internap Sucks by Nurgled · · Score: 3, Interesting

    I seem to remember that a few years back they had a similar problem (Internap lost all power) and it turned out that some idiot had hit the big red "shut down all power to the entire datacenter" emergency button. This isn't the first time this has happened, and last time it wasn't under Six Apart's management.

    I'd say it's Internap's incompetence that caused this problem. If they can't keep their datacenter running even though they have multiple redundant power supplies then something is very wrong. I see from the outage page that LJ people are now planning to buy their own UPS so that they don't have to trust Internap anymore.

    For power outages, my house has a better record than Internap right now, and I don't even own a UPS!

    1. Re:Internap Sucks by Scott+Laird · · Score: 2, Informative

      A couple points. First, there's *nothing* that you can do about the "idiot hit the big red button" problem--you're required by law to have the button, because it's a safety issue. It has to be accessible--you can't lock it in a closet. And everyone knows that if you put a big red button on a wall, sooner or later someone's going to hit it.

      I don't know what happened this time, but the ~2002 Internap Seattle outage was caused by an idiot Speakeasy tech who couldn't figure out how to use the exit door, so hit hit the Big Red Button instead.

      I worked for Internap at the time, and I spent weeks stuck inside that colo facility. It was basically the only "dot-com" grade thing that Internap built (they were usually somewhat thrifty, at least pre-2001). It sparkled. Everything was over-engineered. You had to go through multiple rounds of security to get access to anything.

      The last I heard last night, no one quite knew what'd happened yet. Apparently, multiple redundant power systems all failed at the same time. This facility was designed by a company that already had ~5 years experience running high-end colo facilities, and it was designed as the flagship facility for showing off to potential customers. This isn't a hole-in-the-wall hosting place, it's more of a bunker hiding in the shadow of the Space Needle. So, frankly, it'll be very interesting to see what happened, because no money was spared to keep this sort of thing from happening.

      (Disclaimer: I haven't worked for Internap since 2002. I still own a bit of stock, because it's not worth the hassle of selling it for what little it's worth. It's not really the same company now that it was when I started in '98, and only a handful of my former coworkers are still with the company. I'm not even going to *start* with my opinion of the current management.)

  55. Re:This is why I use Blogger.... by This+Is+Ridiculous · · Score: 2, Interesting

    Personally, I'll trade a subdomain for the elegant simplicity of the friends system, post security, threaded comments, communities, user images, easy and powerful customization, an open-source backend with some seriously useful software contributed to the community, clients, and a site that, during the 99% of the time it's running properly, is ridiculously fast.

    Actually, I won't trade a subdomain for all that. I'm a paid user, so I get one anyway.

    (And there's a simple solution to the emo teens: ignore them.)

    --
    Hey, you try to find an open nick these days!