Slashdot Mirror


LiveJournal Servers Go Down

Wind writes "According to any journal hosted off of LiveJournal.com, the LiveJournal data center Internap has suffered a critical power failure, leaving all of LiveJournal and its content temporarily offline and requiring the revival of 100+ servers. Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size? Updated information is posted here."

34 of 596 comments (clear)

  1. Lights out by r_glen · · Score: 5, Funny

    Sounds like someone was taking a nap over at Internap

  2. The Pain ... by webfiend · · Score: 5, Funny

    You can't imagine the withdrawals I'm going through. It's like the great Slashdot brownouts of '98.

    I need my fix, man!

    1. Re:The Pain ... by thoughtcriminal87 · · Score: 4, Funny

      Or the great Slashdot 503s of '04

    2. Re:The Pain ... by DrEldarion · · Score: 5, Funny

      Honestly! Now we have to wait a day or so to find out what MelissaMinx492 ate for breakfast today!

    3. Re:The Pain ... by MikeXpop · · Score: 4, Funny

      Don't worry, it was waffles. But her dad used the last of the syrup. Man, he never can think of her needs, can he? I mean really, what's his problem?

      --
      Etiquette is etiquette. He kills his mother but he can't wear grey trousers.
    4. Re:The Pain ... by RichardX · · Score: 4, Funny

      Man, I need them to get those servers back up! I've got a whole pile of journals to not read. I'm getting behind on my ignoring

      --
      Curiosity was framed. Ignorance killed the cat.
  3. Elsewhere by jm92956n · · Score: 4, Funny

    In related news, 6,000 teen-age girls were heard yelling "OMG! WTF! How will John know I life him if I can't blog about it!"

    --
    An effective signature identifies a particular user amongst a base of thousands.
    1. Re:Elsewhere by LGagnon · · Score: 4, Funny

      This is LiveJournal we're talking about; I think his spelling came out pretty accurate.

  4. In other news... by Anonymous Coward · · Score: 5, Funny

    ...the collective IQ of the internet has raised about 20 points.

  5. slashdot has repeated 503 errors, by Anonymous Coward · · Score: 5, Insightful

    and search.pl is constantly being trashed by distributed xanga botnets. perhaps michael wasn't quite prepared to be an editor of slashdot?

    1. Re:slashdot has repeated 503 errors, by stupidfoo · · Score: 5, Insightful

      How is this a troll? It's funny that an "editor" at site with as many problems as slashdot has feels that it isn't amazingly hypocritical to mock another site that is currently having problems. People in glass houses indeed.

      Slashdot has semi-major problems almost every day. 503 errors, "nothing for you to see here" annoyances, and a search engine that goes down more than a Thai hooker.

  6. Internap is *down*? by MightyTribble · · Score: 5, Informative
    Internap *down*?
    Bush just appointed Internap's CEO to his National Infrastructure Advisory Council, yet the man can't keep a co-lo facility switched on.

    I'm not sure what that says of Bush or of Interap. And it certainly doesn't seem to have anything to do with SixApart.

    1. Re:Internap is *down*? by slashrogue · · Score: 4, Interesting

      Sorry, but one man can't control power to an entire co-location. I used to work for a local telecom and one day our fiber went down for about 1/3 of our customers. The reason? Some guy shootin' squirrels blew the fiber lines apart. It was on a Sunday too, when there was minimal phone tech support, I think one guy ended up fielding 350+ calls by himself.

    2. Re:Internap is *down*? by Cramer · · Score: 4, Interesting

      I recall a story from a CP&L (now Prgress Energy) lineman about how someone had shot the fiber line strung on the high voltage tower -- between the million volt lines. As he put it, you put a pencil through the hole and line up where the "hick" (his word) proped up the rifle to shoot what he must've thought was a power line. It no fun working up there. And even less fun having to make fusion plices surrounded by million volt lines. (linemen work up there, not fiber techs.)

  7. What a cock by realdpk · · Score: 5, Insightful

    "Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size?"

    Perhaps shit happens, and a blog service doesn't warrant the necessary investment to survive whatever caused this outage?

  8. sounds like good news by freakybob · · Score: 4, Funny

    Well now the millions (?) of users might actually have something to write about when the servers are back up. "Today I went outside. My pupils have never been tinier..."

  9. A disturbance I feel by philkerr · · Score: 4, Funny

    I feel a great disturbance in the force..... It's as if a million bloggers cried out all at once..... and became silent.

  10. Disclaimer: I am Not an Electrical Engineer by ebooher · · Score: 5, Informative

    I know nothing of how InterNap is set up. I just want to throw that out there ahead of time. Now, it's time for my patent pending "Bull Shit Theory of the Day."

    Ok, here is the rant. I used to work for a Colocation facility. Nothing special, small by Telco terms. The whole facility only had about 1500 cabinets. (Though I hear they are now full, and going to be expanding.)

    We had a main power draw off of the local grid. We had a backup power draw off of the *next* cities power grid. (ie, when all the offices around us went dark, we still had power.) And you don't even want to know the kind of red tape we had to go through for *that* pull. I'm still not sure how they did it. We had fly wheel kinetic electricity storage systems, battery backups, and a diesel engine from a train so large it had it's own building.

    We used to joke that if we lost power, we had more important things to worry about. And again, we were small time compared to some of the massiveness that is out there. *cough*AADS Chicago*cough*

    So I'm kind of in agreement with the statement currently on LiveJournal. It's unknown to me how any self respecting colo facility can say "We've had a power outage that also took our redundant systems."

    I have to call bullshit on that entire train of thought. If that's true then they don't *have* any redundant systems, and I'd be looking for a new provider. The most likely thing (at least in my mind) is that someone, somewhere got mad at something specific and decided to make a point by popping the main breaker to their portion of the facility.

    Oh, that was another thing, each room had several "main" breakers. It took a hell of a power surge to pop all of them, and the Liebert systems had power filters of some kind, really really big capacitors or something I think, so a surge really never made it to the other side anyway, it got stored in the cap and then trickled out like the rest of the power.

    But I was a UNIX admin, not the EE that was planning the power generation aspects of the facility. So take some of it with grains of what ever white powdered spice you prefer.

    --
    "Genius may shine aloof and alone, like a star, but goodness is social, and it takes two men and God to make a Brother."
    1. Re:Disclaimer: I am Not an Electrical Engineer by CrankyFool · · Score: 4, Interesting

      Back in the great days of the .com boom, people were building their colo facilities to insane (in a beautiful way) standards. I remember touring Exodus and Above.net (I don't know who you're referring to, though I only ever heard of above.net adopting flywheels) and being just very amused at the cool stuff they were putting in place.

      I recently (~8 months back) did some contract work for a small company whose servers were based in some colo facility in San Francisco. One of the first things I noticed was a damn heavy UPS at the bottom of their rack. Weird, I thought -- why not rely on the colo's battery system?

      Because they don't have one.

      Mind you, this was also the colo that had a cardkey system that had long ago stopped being usable, so when you needed access you used a Radio Shack $29.99 wireless intercom system and someone would come to open the door, and when you checked in they carefully wrote your name on a little nametag.

      I think standards have slipped, significantly. In some respects, this is likely a good thing -- it means you have more options now, because you can choose either the super duper "we hook up to two countries' power grids, have eight flywheels and a direct feed from microwaves in orbit" or the "err, here's your cabinet. We'll give you decent power until we don't" options.

      So ... how much are all these people paying LiveJournal again? Couldn't they request some sort of partial refund of their monthly fee?

      Oh, wait...

    2. Re:Disclaimer: I am Not an Electrical Engineer by Anonymous Coward · · Score: 5, Informative

      My friend's company is hosted by internap. Today he messaged me when the power went down. It was only power to the second floor, my friend's servers, while cut off to the internet were still running (on the 3rd floor). Internap has redundancy and backup generators (and enough fuel onsite to run for 30 days without external power). Apparently there was construction occuring on the second floor... my guess is that some dipshit contractor cut through a power cable or 3 and took the whole floor down.

      To all the people accusing LJ of being stupid for not having UPS systems, Internap has 3 fully redundant power systems (yes, I know, didn't help much) so most people probably don't feel the need to run their own ups.

  11. Gee, I wonder... by Rie+Beam · · Score: 4, Funny

    Update from the site:

    "Update #1, 7:35 pm PST: we're up on 'dirty' power for now (it works, but it's unreliable)".

    Congrats to LiveJournal for assembly a coal generator in a record time.

  12. Look at me! Look at me! by Cyburbia · · Score: 4, Funny

    live journal is dark like my soul like my heart a void its link is cut just like i'll be doing to my arm i blame my parents

  13. A great disturbance in the Force... by YowzaTheYuzzum · · Score: 5, Funny

    ... as if millions of teenage girls suddenly cried out in terror and were suddenly silenced.

  14. Re:./ed !!!! Server Reboot Time? by ebooher · · Score: 4, Interesting

    This is another thing that bothers me about this scenario. I can't say that I've ever admined 100 servers, the most I've ever had was about 30, but if we had a power loss of any kind, you'd just repower them and walk away. Most of them were DEC Alpha gear running Tru64. Why would you spec out a box that has to be handheld every reboot? The only time you should have to handhold a server is during an upgrade. A power cycle without proper SIGHUP or term signals should just run fdisk on it's way back up. (K, so it might take an hour for the server to go live again, but still.) I mean, am I missing something here? Maybe since nothing I've admined got the traffic these things do .... I'm just lost. Some one hit me with the clue by four.

    The only thing I can even think of is they have explicit services that must be started manually ..... but why would you want that? If you have a power hiccup in the middle of the night, you want it to come back up, and be live and happy again *before* you even get the first page. I mean sure, if there was a surge, and that destroyed components, and those components have to be replaced ..... but ..... a reboot is a reboot, man. Here, smoke some source. It's the good stuff.

    --
    "Genius may shine aloof and alone, like a star, but goodness is social, and it takes two men and God to make a Brother."
  15. Re:./ed !!!! Server Reboot Time? by bradfitz · · Score: 5, Insightful

    They all came back up when the power came back.

    But we intentionally don't have databases come back up on boot because if there was a blip, we want to do an integrity check first. (we run InnoDB, so it's ACID, but we're paranoid ...)

    We have clusters of 2 identical databases in separate cabinets, separate switches, separate Internap power feeds... so normally losing one database in each cluster doesn't matter: the other one gets used. But when we lose every single database, in all clusters, all at once... that's the time to be paranoid and double check stuff.

  16. "LiveJournal Servers Go Down" by PornMaster · · Score: 4, Funny

    LiveJournal Servers Go Down

    With thousands of teenage girls unable to ponder in an open forum whether or not to blow their boyfriends, thousands of teenage girls go down.

  17. Where's my irony stick? by gmhowell · · Score: 4, Insightful

    Because michael needs a beating. The site that rolls beta (alpha?) code onto live servers complaining and making jokes because another site goes down through no fault of its own?

    --
    Jesus was all right but his disciples were thick and ordinary. -John Lennon
  18. No... by EdMcMan · · Score: 4, Insightful

    Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size?

    What does Six Apart have to do with Internap? Livejournal has been using - and wanting to switch from - Internap for a long time.

  19. Re:./ed !!!! Server Reboot Time? by bradfitz · · Score: 5, Informative

    At this point all my whiteboards are full of boxes of each database cluster, the machines in that cluster, which have passed their checksum tests. (innodb checksums each 16k page), which replayed their replay/undo logs, where in binlogs each was writing/reading/executing etc...

    So lots of waiting now on the checksum validators. I don't want to put a machine back in and find out in a week there was a database page that was corrupt because the battery-backed write-back cache on the RAID card didn't work as advertised. (which happens on about 95% of RAID cards, in my experience, because they're mostly crap, even the most expensive ones...)

    Also whenever there's any doubt about something's integrity, we backup or snapshot the potentially corrupt version before operating on it. That operation can take time too.

    It's going to be a fun night.

  20. Value of Livejournal - "Open Source Philosophy" by DemonWeeping · · Score: 5, Interesting

    For those who don't know what's so hot about it and for those who think Livejournal is just a bunch of teenage girls whining.... Livejournal has just about four years of my life documented. The ease of use and the ability to "vent" is comforting, but the real value comes in the interaction. My friends see my life at their convenience and I see theirs at mine. We can choose to ignore the whining of others or we can choose to relate and comment on our own experience. Think of it this way: Open-source philosophy, emotion, and life. I put my own out there and others add to it. I add mine to others. Granted ... those quiz/meme things HAVE TO GO. I do not want to read about "what frog best resembles me" or "which 80's hair band song is me." Grrr.

  21. Before you get all down on LJ... by Bloodlent · · Score: 4, Informative

    Just remember it's not ALL obnoxious, over-emotional teen-angst teenage girls. I use mine to showcase (non-depressing)poetry and make intelligent comments about intelligent topics. Basically, if someone makes an LJ about their own life, it sucks. If you can manage to write an LJ and make it about things that matter to more people than just you(ie, "Why Bush's Iraqi war is unjust" vs. "Why this babe I know should bang me"), and at the same time make it funny and enjoyable to read, then you have a good LJ. Most LJs DO suck, but there are some diamonds in the rough.

  22. bigger explination by moosesocks · · Score: 4, Insightful

    I'm surprised to see that Internap's main servers are back up. It's pretty irresponsible to bring up your corporate servers before those of your clients.

    That being said, LJ's servers are back up now, but they're making sure that the databases are all in sync -- LiveJournal has one of the most massive distributed MySQL clusters in existance along with a complete caching system.

    They need to make sure that the database is all synchronized before bringing it back up -- chances are they're going to rebuild the cache too. If they didn't, the initial strain on the DB servers would probably bring the site down again.

    This does however, bring up some questions about LiveJournal's network infrastructure. Danga (the creaters of LJ, recently purchased by Six Apart) are heavy users of Perl and MySQL. Needless to say, they have made numerous contributions to both projects and have developed an innovative memory caching system for linux.

    The questions raised however, come from Perl and MySQL. Both are questionable in terms of scalability. Although I'm not qualified to comment on this, I belive that the general concensus is that MySQL is one of the least efficent databases today. Livejournal has 100+ servers. I honestly don't think that a system the size of LiveJournal should require a server cluster that big. It seems that they are trying to solve their performance/reliability problems by blindly throwing hardware at it.

    Of course, I love livejournal. It's simple, easy to use, and is a great tool for building communities. Just as it is simple, it can also be incredibly nerdy (there's actually a command prompt!). They're also completely open source.

    Hopefully, Six Apart can make their network infrastructure more 'professional' while still maintianing the community spirit that has made it so successful.

    --
    -- If you try to fail and succeed, which have you done? - Uli's moose
    1. Re:bigger explination by Kyrrin · · Score: 4, Insightful

      As we've said a bunch of times in the past, moving away from MySQL would be prohibitive. By now we know how to make it work for us; switching away from MySQL would not only involve massive rewriting of stuff and alterations on the existing DB, it'd take the next five years before we got as comfortable with the flaws and advantages of another DB package.

      Sure, MySQL has its flaws -- some of them pretty big -- but we can work around them.

      As for the "not needing a server cluster that big" -- do you have any clue how much data we push in an average day? We maintain so many DB clusters to improve reliability, and we maintain so many web nodes because we push a screaming shitload of traffic.

  23. Re:./ed !!!! by Hooded+One · · Score: 4, Informative

    The Alexa link was the only tangible example I could find. I distinctly recall seeing a post by Brad himself mentioning how much more traffic LJ handles, but obviously I can't link to it at the moment.

    Anyway, as of Google's last crawl of the stats page (shortly before the outage), there were almost 6 million LJ users, a little under half of those "active." I don't know if /. has any stats available, but skimming through this page, the highest UID I see is in the 800,000 range. I'm not going to even attempt to guess what the relative activity level of LJ users is compared to /., or which has bigger pages or whatever, but I would offhand say that LJ probably handles more image traffic (user pictures, and now the in-testing photo hosting service). I know they used to use Akamai for that, but I seem to recall that fairly recently they switched over to doing something else. (I think they handle it themselves again, but I'm not sure.) There's also the audio files from phone posts. I'd say there's little question that LJ is the more heavily trafficked site.

    Besides, a lot of the DB load on Slashdot is eased tremendously by Memcached, developed by... Danga Interactive, i.e. LJ. Wikipedia uses it too, and just started using Perlbal. (And I do mean "just") Ditto for Audioscrobbler/Last.fm. So /. isn't in much of a position to pooh-pooh the technical ability of Brad/LJ.