LiveJournal Servers Go Down
Wind writes "According to any journal hosted off of LiveJournal.com, the LiveJournal data center Internap has suffered a critical power failure, leaving all of LiveJournal and its content temporarily offline and requiring the revival of 100+ servers. Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size? Updated information is posted here."
sounds like all the fucking spammers they host overtaxed spammer-nap's power resources and brought it all down.
Seriously though, spammer-nap is a massive spam haus, see for yourself
Lawyers, MBA's, RIAA? A jedi fears not these things!
On the Livejournal main page:
Update #1, 7:35 pm PST: we're up on 'dirty' power for now (it works, but it's unreliable), and we're working to assess the state of the databases. The worst thing we could do right now is rush the site up in an unreliable state. We're checking all the hardware and data, making sure everything's consistent. Where it's not, we'll be restoring from recent backups and replaying all the changes since that time, to get to the current point in time, but in good shape. We'll be providing more technical details later, for those curious, on the power failure (when we learn more), the database details, and the recovery process. For now, please be patient. We'll be working all weekend on this if we have to.
Lovely. I just bought another year's subscription for my wife, figuring the change to Six Apart wouldn't change anything for a few months at least. LJ could lose a lot of subscribers with an outage just after the takeover.
This is another thing that bothers me about this scenario. I can't say that I've ever admined 100 servers, the most I've ever had was about 30, but if we had a power loss of any kind, you'd just repower them and walk away. Most of them were DEC Alpha gear running Tru64. Why would you spec out a box that has to be handheld every reboot? The only time you should have to handhold a server is during an upgrade. A power cycle without proper SIGHUP or term signals should just run fdisk on it's way back up. (K, so it might take an hour for the server to go live again, but still.) I mean, am I missing something here? Maybe since nothing I've admined got the traffic these things do .... I'm just lost. Some one hit me with the clue by four.
The only thing I can even think of is they have explicit services that must be started manually ..... but why would you want that? If you have a power hiccup in the middle of the night, you want it to come back up, and be live and happy again *before* you even get the first page. I mean sure, if there was a surge, and that destroyed components, and those components have to be replaced ..... but ..... a reboot is a reboot, man. Here, smoke some source. It's the good stuff.
"Genius may shine aloof and alone, like a star, but goodness is social, and it takes two men and God to make a Brother."
Back in the great days of the .com boom, people were building their colo facilities to insane (in a beautiful way) standards. I remember touring Exodus and Above.net (I don't know who you're referring to, though I only ever heard of above.net adopting flywheels) and being just very amused at the cool stuff they were putting in place.
... how much are all these people paying LiveJournal again? Couldn't they request some sort of partial refund of their monthly fee?
I recently (~8 months back) did some contract work for a small company whose servers were based in some colo facility in San Francisco. One of the first things I noticed was a damn heavy UPS at the bottom of their rack. Weird, I thought -- why not rely on the colo's battery system?
Because they don't have one.
Mind you, this was also the colo that had a cardkey system that had long ago stopped being usable, so when you needed access you used a Radio Shack $29.99 wireless intercom system and someone would come to open the door, and when you checked in they carefully wrote your name on a little nametag.
I think standards have slipped, significantly. In some respects, this is likely a good thing -- it means you have more options now, because you can choose either the super duper "we hook up to two countries' power grids, have eight flywheels and a direct feed from microwaves in orbit" or the "err, here's your cabinet. We'll give you decent power until we don't" options.
So
Oh, wait...
replying to myself. mod my other post down or something.
:-/ More details later.
"Our data center (Internap) lost all its power, including redundant backup power, for some unknown reason. (unknown to me, at least) We're currently dealing with bringing our 100+ servers back online. Not fun. We're not happy about this. Sorry...
Update #1, 7:35 pm PST: we're up on 'dirty' power for now (it works, but it's unreliable), and we're working to assess the state of the databases. The worst thing we could do right now is rush the site up in an unreliable state. We're checking all the hardware and data, making sure everything's consistent. Where it's not, we'll be restoring from recent backups and replaying all the changes since that time, to get to the current point in time, but in good shape. We'll be providing more technical details later, for those curious, on the power failure (when we learn more), the database details, and the recovery process. For now, please be patient. We'll be working all weekend on this if we have to."
soo.. EVEN THE BATTERY BACKUP WAS FUCKED. GOOD GOING.
world was created 5 seconds before this post as it is.
Sorry, but one man can't control power to an entire co-location. I used to work for a local telecom and one day our fiber went down for about 1/3 of our customers. The reason? Some guy shootin' squirrels blew the fiber lines apart. It was on a Sunday too, when there was minimal phone tech support, I think one guy ended up fielding 350+ calls by himself.
The LiveJournal status page claims "Our data center (Internap) lost all its power, including redundant backup power". This is nothing to do with "cheapskate blog admins" and everything to do with a serious and quite likely unacceptable problem at Internap.
Of course, that's why Anonymous Cowards start out with zero points. Guilty of idiocy until proven innocent.
For those who don't know what's so hot about it and for those who think Livejournal is just a bunch of teenage girls whining.... Livejournal has just about four years of my life documented. The ease of use and the ability to "vent" is comforting, but the real value comes in the interaction. My friends see my life at their convenience and I see theirs at mine. We can choose to ignore the whining of others or we can choose to relate and comment on our own experience. Think of it this way: Open-source philosophy, emotion, and life. I put my own out there and others add to it. I add mine to others. Granted ... those quiz/meme things HAVE TO GO. I do not want to read about "what frog best resembles me" or "which 80's hair band song is me." Grrr.
I know the feeling. I have an LJ (for friends to read) in which I relay news, ramble about things that interest me, and write mini-essays from time to time. I don't whine about my parents or people at school or whatever (well, I do, but it's grumbling about idiots at work, since I work at a university) and the people I know are generally much the same. But I can't stand those typically teen idiotic ramblings either.
But I too find it irritating that a service I use, that is supposed to be backed up (my account was bouncing up and down numerous times in the past week, too). For a paid service, I'd have expected there to be a lot more backups to make it more difficult for power problems to wipe out the entire site. If the hosting facility doesn't have a UPS, why wasn't one installed?
i am a soviet space shuttle
I recall a story from a CP&L (now Prgress Energy) lineman about how someone had shot the fiber line strung on the high voltage tower -- between the million volt lines. As he put it, you put a pencil through the hole and line up where the "hick" (his word) proped up the rifle to shoot what he must've thought was a power line. It no fun working up there. And even less fun having to make fusion plices surrounded by million volt lines. (linemen work up there, not fiber techs.)
Power can only be so redundant. More than once, the entire server room at a previous employer went dark. Multiple circuits and multiple power supplies won't do any good when a battery in the UPS explodes and blows the main (150A) breaker -- the entire floor went dark... disconnected from UPS, generator, and utility.
(Once from a battery failure, and once again from a phase mismatch coming off generator that no one caught before the UPS was drained -- the alarm panel was outside the break room and no one that knew what it was walked past it.)
Maybe, but sometimes shit just happens, regardless of design.
I find this Risk site to be very interesting reading, especially when it talks about some failure issues and scenarios.
My favourite was about Squirrel that took down the Nasdaq. (I've also heard squirrels/mice/rats etc called "self propelled short circuits", but that's another story)
Now, I've been involved in systems architecture design, planning, and management for years, and I think that a lot of people drastically underestimate just how fscking complex and dificult proper planning, execution, testing, maintenance, and administration of these systems can be... especially when faced with budgetary restrictions.
The cost of a system rises almost exponentially as you approach 100% uptime... even 99.999 is freaking expensive to implement and manage. Never mind the complexity and administrative requirements.
Who knows... maybe dealing with the PR issues of this outtage is still orders of magnitude cheaper in the long run than putting in the systems required to achieve the uptime.
At the end of the day, what are the business impacts of this outtage? For that matter, they seem to have received more exposure than if they were operating normally.
A lot of people are aware of the fact that sometimes things break, and we're not landing planes in the fog here. The fact that shit broke and they're bringing it back in an informed and somewhat timely manner may HELP them, in that some people may get a stronger sense of "these guys can deal with problems that hit them".
$0.02 (CDN)
i won't exaggerate if i tell that in recent years most of "social life" in .ru zone moved to livejournal.
it's 10 a.m. in russia now, and most of russian lj-addicts still don't know about apocalypse in lj.
i hope everything will be turned up in the nearest future. brad, we believe in you! :)
Agreed, there are plenty of useful uses for a LiveJournal. LiveJournals are also a great outlet for creative content. For example, my friend and I started a satirical blog on LiveJournal dubbed The Rhubarb. (Of course the link won't work until LJ is back up).
Unless it means that the "cheapskate blog admins" were too cheapskate to buy proper dual-power supply boxes so that they can have dual power paths right to the servers.
You can have all the great redundant mains and backups you want, and it's for shit if you only have one power line to the system and that power bus loses juice.
It's funny how I was just met with some Internap sales people a few months ago. They were bragging about how their network infrastructure was superior to most others, since it intelligently routes traffic to the path of shortest response (not hops).
They even bragged to me how their network uptime SLA is 100%! I mean good god, now I find out this is the SECOND time it's happened (from the livejournal update site)???
I'm glad I didn't go with them...
eTrade SUCKS
The comments seem to be full of contempt for teenage -angst inane ramblings that are common on LJ. Come on. It's not like you are forced to read through this stuff.
I have a few "friends" there at LJ, some of them net.celebs, and I like their posts. It's the matter of whose writings do you find interesting, and you are free to be completely unaware of the rest. Why all the vitriol?
My exception safety is -fno-exceptions.
I seem to remember that a few years back they had a similar problem (Internap lost all power) and it turned out that some idiot had hit the big red "shut down all power to the entire datacenter" emergency button. This isn't the first time this has happened, and last time it wasn't under Six Apart's management.
I'd say it's Internap's incompetence that caused this problem. If they can't keep their datacenter running even though they have multiple redundant power supplies then something is very wrong. I see from the outage page that LJ people are now planning to buy their own UPS so that they don't have to trust Internap anymore.
For power outages, my house has a better record than Internap right now, and I don't even own a UPS!
Really, talk about the pot calling the kettle black.
/. be down for?
If the datacenter that hosts Slashdot was to have a massive power failure how long would
That said my company has gear in the same datacenter as LJ, our servers were back up 10 minutes after power was restored. Then again we use Oracle on HP-UX with nice SAN RAID boxes for storage for our database. So our stuff tends to recover from a sudden power loss a little better than a MySQL derivative running on clone hardware.
Happy Fun Ball is for external use only.
Personally, I'll trade a subdomain for the elegant simplicity of the friends system, post security, threaded comments, communities, user images, easy and powerful customization, an open-source backend with some seriously useful software contributed to the community, clients, and a site that, during the 99% of the time it's running properly, is ridiculously fast.
Actually, I won't trade a subdomain for all that. I'm a paid user, so I get one anyway.
(And there's a simple solution to the emo teens: ignore them.)
Hey, you try to find an open nick these days!
Everything in italics is the exact comment from the submitter. Everything else NOT IN italics, is any of the additional comments by the editors. Haven't you ever noticed that????
Long before you showed up here.
Michael, as an editor, could easily rewrite the summary (and perhaps he did). Or he could choose the most inflammatorily written piece, and pretend to have presented the article without bias.
Certainly none of these theories are more tin-foil-hattish than 95% of the stories on YRO.
Jesus was all right but his disciples were thick and ordinary. -John Lennon