LiveJournal Servers Go Down
Wind writes "According to any journal hosted off of LiveJournal.com, the LiveJournal data center Internap has suffered a critical power failure, leaving all of LiveJournal and its content temporarily offline and requiring the revival of 100+ servers. Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size? Updated information is posted here."
Sounds like someone was taking a nap over at Internap
You can't imagine the withdrawals I'm going through. It's like the great Slashdot brownouts of '98.
I need my fix, man!
Oops?
I prefer a void in conversation to a vacuous one.
In related news, 6,000 teen-age girls were heard yelling "OMG! WTF! How will John know I life him if I can't blog about it!"
An effective signature identifies a particular user amongst a base of thousands.
Well, it wasn't slashdot atleast... Bringing 100+ servers back online isn't an easy task lol ^^.
Good luck to them.
...the collective IQ of the internet has raised about 20 points.
but that's ONE HELL of a Slashdotting! :)
Join the TWIT army now!
and search.pl is constantly being trashed by distributed xanga botnets. perhaps michael wasn't quite prepared to be an editor of slashdot?
Bush just appointed Internap's CEO to his National Infrastructure Advisory Council, yet the man can't keep a co-lo facility switched on.
I'm not sure what that says of Bush or of Interap. And it certainly doesn't seem to have anything to do with SixApart.
Man I am sooo putting this in my LiveJournal!
Why did you have to go and cause a power outage?
I meta-mod all positive moderation Unfair, because it's abuse of the system.
"Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size?"
Perhaps shit happens, and a blog service doesn't warrant the necessary investment to survive whatever caused this outage?
Internap.com is still up, they aren't stupid enough to use their own servers.
so it's deadjournal now ?
Well now the millions (?) of users might actually have something to write about when the servers are back up. "Today I went outside. My pupils have never been tinier..."
Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size?
Ok, I understand that you don't like Six Apart; I'm no fan of their new licensing scheme either. However, I really doubt that SixApart has any control over any power failures that might occur at Internap.
What a sad, lonely, despicable person you must be.
Where will I write about my depression over this event?
Oh. Slashdot.
That's what you get when you hire Tim Allen as your electrician at a Data Center.
Al Borland was nailing Heidi behind the stage when the outage occured.
Where were the APC backups?
Just leave your computer for a minute, go outside and socialize with some people. You'll find out that the shit you mentioned also exists outside the Net.
This is one lame signature, please read the message above instead.
Use the Coralized link. No sense in crashing their status page. Plust it'll respond a lot quicker than loading the actual web page.
It was an on-site power failure -- I don't see how you can blame them (new owners) on that...
I think it was a bad idea to have a site slashdotted while its down . . . . it shouldn't be able to stand a chance. No really, I wish they would have waited a little while. Now the admins are wondering why they suddenly are getting 200,000 hits.
Well the power outage was not a person's or organisation's fault, it just happened. I wonder what Danga would have done that Six Apart is not doing to bring the servers back online. By the way, don't they have diesel generators for backup?!
I feel a great disturbance in the force..... It's as if a million bloggers cried out all at once..... and became silent.
The population of depessed pre-teens has just dropped by 20%
are servers with LOM (lights out management) superior in this case?
Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
It's not like most LiveJournal user's have enough to worry about, here's something for most LJ users to get melodramatic about. I'm serious, randomly pick 5 LiveJournal blogs, and I guarantee 4 out of 5 are going to be "Fuck the World" posts.
Mood: anxious...
Check out this page on the Iternap site for a real laugh. The flash page is a real hoot too.
Anyone seen my jagged little pill?
LiveJournal's offsite status page is status.livejournal.org.
"Update #1, 7:35 pm PST: we're up on 'dirty' power for now (it works, but it's unreliable), and we're working to assess the state of the databases. The worst thing we could do right now is rush the site up in an unreliable state. We're checking all the hardware and data, making sure everything's consistent. Where it's not, we'll be restoring from recent backups and replaying all the changes since that time, to get to the current point in time, but in good shape. We'll be providing more technical details later, for those curious, on the power failure (when we learn more), the database details, and the recovery process. For now, please be patient. We'll be working all weekend on this if we have to."
- Captbaritone
sounds like all the fucking spammers they host overtaxed spammer-nap's power resources and brought it all down.
Seriously though, spammer-nap is a massive spam haus, see for yourself
Lawyers, MBA's, RIAA? A jedi fears not these things!
I know nothing of how InterNap is set up. I just want to throw that out there ahead of time. Now, it's time for my patent pending "Bull Shit Theory of the Day."
Ok, here is the rant. I used to work for a Colocation facility. Nothing special, small by Telco terms. The whole facility only had about 1500 cabinets. (Though I hear they are now full, and going to be expanding.)
We had a main power draw off of the local grid. We had a backup power draw off of the *next* cities power grid. (ie, when all the offices around us went dark, we still had power.) And you don't even want to know the kind of red tape we had to go through for *that* pull. I'm still not sure how they did it. We had fly wheel kinetic electricity storage systems, battery backups, and a diesel engine from a train so large it had it's own building.
We used to joke that if we lost power, we had more important things to worry about. And again, we were small time compared to some of the massiveness that is out there. *cough*AADS Chicago*cough*
So I'm kind of in agreement with the statement currently on LiveJournal. It's unknown to me how any self respecting colo facility can say "We've had a power outage that also took our redundant systems."
I have to call bullshit on that entire train of thought. If that's true then they don't *have* any redundant systems, and I'd be looking for a new provider. The most likely thing (at least in my mind) is that someone, somewhere got mad at something specific and decided to make a point by popping the main breaker to their portion of the facility.
Oh, that was another thing, each room had several "main" breakers. It took a hell of a power surge to pop all of them, and the Liebert systems had power filters of some kind, really really big capacitors or something I think, so a surge really never made it to the other side anyway, it got stored in the cap and then trickled out like the rest of the power.
But I was a UNIX admin, not the EE that was planning the power generation aspects of the facility. So take some of it with grains of what ever white powdered spice you prefer.
"Genius may shine aloof and alone, like a star, but goodness is social, and it takes two men and God to make a Brother."
Update from the site:
"Update #1, 7:35 pm PST: we're up on 'dirty' power for now (it works, but it's unreliable)".
Congrats to LiveJournal for assembly a coal generator in a record time.
On the Livejournal main page:
Update #1, 7:35 pm PST: we're up on 'dirty' power for now (it works, but it's unreliable), and we're working to assess the state of the databases. The worst thing we could do right now is rush the site up in an unreliable state. We're checking all the hardware and data, making sure everything's consistent. Where it's not, we'll be restoring from recent backups and replaying all the changes since that time, to get to the current point in time, but in good shape. We'll be providing more technical details later, for those curious, on the power failure (when we learn more), the database details, and the recovery process. For now, please be patient. We'll be working all weekend on this if we have to.
Lovely. I just bought another year's subscription for my wife, figuring the change to Six Apart wouldn't change anything for a few months at least. LJ could lose a lot of subscribers with an outage just after the takeover.
live journal is dark like my soul like my heart a void its link is cut just like i'll be doing to my arm i blame my parents
Looks like the angst over at Livejournal is no longer limited to the database.
... as if millions of teenage girls suddenly cried out in terror and were suddenly silenced.
half of the newest entries.....oh wait, I see someone's already got the emo lj user-base jokes covered. I have so much room to talk because I have an lj too, haha. But, seriously, how can this have anything to do with six apart?
This sig is o Unfunny o Funny
This is another thing that bothers me about this scenario. I can't say that I've ever admined 100 servers, the most I've ever had was about 30, but if we had a power loss of any kind, you'd just repower them and walk away. Most of them were DEC Alpha gear running Tru64. Why would you spec out a box that has to be handheld every reboot? The only time you should have to handhold a server is during an upgrade. A power cycle without proper SIGHUP or term signals should just run fdisk on it's way back up. (K, so it might take an hour for the server to go live again, but still.) I mean, am I missing something here? Maybe since nothing I've admined got the traffic these things do .... I'm just lost. Some one hit me with the clue by four.
The only thing I can even think of is they have explicit services that must be started manually ..... but why would you want that? If you have a power hiccup in the middle of the night, you want it to come back up, and be live and happy again *before* you even get the first page. I mean sure, if there was a surge, and that destroyed components, and those components have to be replaced ..... but ..... a reboot is a reboot, man. Here, smoke some source. It's the good stuff.
"Genius may shine aloof and alone, like a star, but goodness is social, and it takes two men and God to make a Brother."
Er, they just announced Six Apart was buying them like days ago. I doubt they transitioned the servers in the first week.
That's one 1 down...
And a massive cheer was heard across the land...
Wow. Can't the poor guy do anything right?
While I'm not exactly a security expert, I feel pretty confident saying that the DDOS attack was probably completely unrelated to the power outage...sorry SpitStatic(yes, I captialized your name so it looks better).
Before you mod me funny, think, perhaps I was insightfully funny?
I'm not an electrical engineer, either, but I'm wondering what Dirty Power is? Is that the unfiltered power that tends to anomilate, per the Monster Cable surge protectors advertising? Or am I thinking of something else?
watch it'll come back up as a subscription site and all of your journals are erased if you don't pay..
Regards, Joseph
They all came back up when the power came back.
...)
But we intentionally don't have databases come back up on boot because if there was a blip, we want to do an integrity check first. (we run InnoDB, so it's ACID, but we're paranoid
We have clusters of 2 identical databases in separate cabinets, separate switches, separate Internap power feeds... so normally losing one database in each cluster doesn't matter: the other one gets used. But when we lose every single database, in all clusters, all at once... that's the time to be paranoid and double check stuff.
OMGWTFBBQ!
Reviews with a twist! http://www.sardonicbastard.com
Brad, get back to work! I need my friends page!
LiveJournal Servers Go Down
With thousands of teenage girls unable to ponder in an open forum whether or not to blow their boyfriends, thousands of teenage girls go down.
500GB of disk, 5TB of transfer, $5.95/mo
Because michael needs a beating. The site that rolls beta (alpha?) code onto live servers complaining and making jokes because another site goes down through no fault of its own?
Jesus was all right but his disciples were thick and ordinary. -John Lennon
lj == lockjaw
tell me that many will blog about how they couldn't blog. Some will complain about the stress of not being able to express themselves, others will question the engineering prowess of LiveJournal and wildly speculate about the cause of the power outage followed by plans to re-engineer the data center, the LiveJournal infrastructure and 93% of the Internet to ensure this never happens again (X.25 over barbed-wire will be suggested).
Several individuals will join together to file a class action lawsuit against LiveJournal and the data center citing their inability to express themselves due to neglicance and will seek real and punitive damages totalling over $2.5 billion dollars.
Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size?
What does Six Apart have to do with Internap? Livejournal has been using - and wanting to switch from - Internap for a long time.
Ah I always thought there was redundant power backups for just such an occasion ?
No doubt that something went wrong there, but that doesn't change the fact that it's the data centre's responsibility to supply power, so only a complete moron would suggest SixApart were to blame.
I found it ridiculously ironic that as soon as I wanted to bitch about livejournal being down in my livejournal..... I couldn't.
For those people who might not know, Brad Fitzpatrick is Livejournal User #1.
I'd have to agree with the AC, Brad, stop posting to slashdot and hover over that DB rebuild a bit more.
(Yes, posting to slashdot relieves tension... Whatever it takes, Brad.)
We're out of combinations of phonetic sounds.
That's almost as silly as saying we're out of combinations of musical notes. It's so silly that it just might be true.
"like, how m i suppozd 2 tell meh bf bout dat par-t?"
Note that it is called LiveJournal, not LiveBlog.
I meta-mod all positive moderation Unfair, because it's abuse of the system.
The LiveJournal status page claims "Our data center (Internap) lost all its power, including redundant backup power". This is nothing to do with "cheapskate blog admins" and everything to do with a serious and quite likely unacceptable problem at Internap.
Of course, that's why Anonymous Cowards start out with zero points. Guilty of idiocy until proven innocent.
If other LiveJournal users ever found out you post here Brad, /. might end up with 5000+ replies to everyone of your posts :p
:)
Other than that, cheers and keep up the good work
Yeah, but they left out "paradigm" and "synergy" - upper management will never take them seriously without those!
That happened like 6mo ago.
Can you tell I read status.livejournal.org too often?
The previous sig has been removed due to
"Ted, it seems that we the LiveJournal outage has caused a massive wave of young emo writing singers who just want to be heard."
In Minneapolis, Unisys has I believe two or three large diesel generators. One time when their part of the city lost power, they fired them up and had a lot of juice left over. Northwest Airlines bought some of their power and they still had electricity to spare, and ended up powering thousands of homes in the southeastern suburbs, if I remember the story right.
;)
A friend who worked for Exxon once told me about their power backups...I think it was almost cheaper for them to run on diesel than on the local grid.
"I'm blogging this locally, and will post it when the servers come back up"
Assume I was drunk when I posted this.
Yeah, just like on LiveJournal.com. Thanks for the heads-up though! =)
RTJKJAS
At this point all my whiteboards are full of boxes of each database cluster, the machines in that cluster, which have passed their checksum tests. (innodb checksums each 16k page), which replayed their replay/undo logs, where in binlogs each was writing/reading/executing etc...
So lots of waiting now on the checksum validators. I don't want to put a machine back in and find out in a week there was a database page that was corrupt because the battery-backed write-back cache on the RAID card didn't work as advertised. (which happens on about 95% of RAID cards, in my experience, because they're mostly crap, even the most expensive ones...)
Also whenever there's any doubt about something's integrity, we backup or snapshot the potentially corrupt version before operating on it. That operation can take time too.
It's going to be a fun night.
What am I gonna do, now that I can't update my blog!
For further reference, you'd probably want to use Google News, or just check reliable sources. I'm pretty sure Google only updates their indexes weekly on... is it Tuesdays or Thursdays? I forget.
Information wants to be free.
Entertainment wants to be paid.
You just want to be cheap.
I just went there for some information before coming here. Didn't think much about it but appearently somebody else did.
-Tim Louden
Hey, good luck with this whole thing. I hate it when it happens. Did you have both of your clusters in the same center?
(Yes, I really am Simson Garfinkel)
A long night indeedy.
Is there some sort of load threshold you're willing to live with? Perhaps 50% or 80% of all servers up before starting a cluster? You know your system load distribution based on time of day better than anyone though...
An update #2 on the status page might be called for at this point. People might appreciate some reflection of how many checkboxes are checked off on that whiteboard. It would also give the impression you're busting ass for LJ, which would go over well after the panic some users had over SixApart. "As long as Brad is still around, we're in good hands", that sort of thing.
Good Luck tonight. SysAdmin crises nights suck, but they do actually pass.
I only hope it lasts forever.
For those who don't know what's so hot about it and for those who think Livejournal is just a bunch of teenage girls whining.... Livejournal has just about four years of my life documented. The ease of use and the ability to "vent" is comforting, but the real value comes in the interaction. My friends see my life at their convenience and I see theirs at mine. We can choose to ignore the whining of others or we can choose to relate and comment on our own experience. Think of it this way: Open-source philosophy, emotion, and life. I put my own out there and others add to it. I add mine to others. Granted ... those quiz/meme things HAVE TO GO. I do not want to read about "what frog best resembles me" or "which 80's hair band song is me." Grrr.
The livejournal servers are provided by Internap. Anyways, Bush just appointed Internap's CEO to his National Infrastructure Advisory Council (http://www.tmcnet.com/usubmit/2005/Jan/1104954.ht m), and i'm worndering, if this is some sort of terrorist attack, and then I thought, that is what they want you to think. Rather, it's jsut another step towards the republican squashing of the independant media. Perhaps, the republicans are following me. or you. They know that without live journal, the teenage adolecent girls will surely flock to forums, such as this one and post. The sheer amount of posts will crash another server, and therefore, we have a domino effect (also a technique used by the RIAA to crush peer to peer services, such as bit trorent, by causing more strain on a website not meant to handle it). So with livejounral going down like a Korean hooker, and Slashdot in hot pursuit due to the flock of teengirls, we will be unable to communicate our ideas. And without communication, the left wing majority of this website will be unable to unite and thus ensuring the republicans remain in power and control of the "free" world.
-------
Support Indy Music. Buy
Thank the gods for user settings.
Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
Just remember it's not ALL obnoxious, over-emotional teen-angst teenage girls. I use mine to showcase (non-depressing)poetry and make intelligent comments about intelligent topics. Basically, if someone makes an LJ about their own life, it sucks. If you can manage to write an LJ and make it about things that matter to more people than just you(ie, "Why Bush's Iraqi war is unjust" vs. "Why this babe I know should bang me"), and at the same time make it funny and enjoyable to read, then you have a good LJ. Most LJs DO suck, but there are some diamonds in the rough.
Blog blog blog blog.
Lovely blog!
Wonderful blog!
Blog blo-o-o-o-o-og blog blo-o-o-o-o-og blog.
Lovely blog! Lovely blog!
Lovely blog! Lovely blog!
Lovely blog!
Blog blog blog blog!
-- The Viking Blog Song
All right, who did it? Who pressed the shiiiny, candy-like history eras... I mean emergency stop button?
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
ROFL on the hugs...
That is so LJ.
I'm getting those too now. And they're not all that nice.
--
# Canmephians for a better Linux Kernel
$Stalag99{"URL"}="http://stalag99.net";
For god sakes people, it's a Friday night! If Google went down I could see people panicking, but LiveJournal? Whatever... I'm going out.
The last time I lost my journal, it was accompanied by a loud popping sound and the smell of ozone coming from the power supply of the linux box in the corner of my living room.
As much as I like having an easy interface to the online writings of all my friends, I miss the days of having my own pet web server, of being able to do something myself other than twiddle my thumbs and wait when something breaks.
(sorry ... I know ... random meanderings ... I feel this sudden urge to post a few memes, links to personality tests and a few "what happened at work today!" comments).
From the article write-up (and reflecting the thoughts of quite a few of the comments I just read):
I'd love to know what makes you think this has anything to do with Six Apart. The very first line at http://www.livejournal.com states:
They've been with Internap for years, predating Six Apart's takeover. Unless LJ staff is lying, the fault here sounds like it lies entirely with Internap.
And as far as I can tell, Six Apart didn't ditch the LJ team when they bought them out, so you probably have the exact same people working on bringing the site back up now as you would have if Six Apart had never got involved.
Dear editors, please don't let morons submit news. It's bad.
LiveJournal has not changed any of its hosting or any other infrastructure yet. This has nothing, nothing, nothing to do with the recent acquisition.
Sorry, I'm just cranky because of the withdrawal.
I passed the Turing test.
I just think it's ironic (this is irony, right? I remember that all the things in that Alanis song were in fact not irony) that Wikipedia has been experiencing some rather major server issues, recently resolved but not really explained to anyone outside of the server maintenance IRC channel.
While it was down, the OpenFacts status page was the place for immediate info, but the log of activity was kept on the 'wikitech' account on, you guessed it, LiveJournal.
Sweet, sweet irony.
--grendel drago
Laws do not persuade just because they threaten. --Seneca
I am curious to know what location went down?What janitor/sanitation engineer plugged the Buffer into what wrong socket? What electrician was fired over this? Who maintains all of the back-up systems ? Who was too drunk to find the Actual Light Switch? Who is getting fired for this? Who are they hiring in his place? Do they need tallented replacements if so reply with your e-mail address and I will send you my resume, or if you need a good person and are not related to this please, still reply.. Last but not least, who got stuck in the elevator (doing what), and who got stuck in the bathroom and how they made it out.? If none of the above applies I am curious to know what the ROOT cause analysis is for this situation.. gk
Wait, what new licensing scheme? I didn't even think they'd rolled out the new lawyer-friendly TOS yet.
--grendel drago
Laws do not persuade just because they threaten. --Seneca
I use an ISP that peers with internap for upstream connectivity to it's tier 1 ip network.
I noticed a few unreachable hosts earlier, as well as DNS delays. Didn't think much of it, but now those sites are all back up. They are all in the LA area, and I suspect you are as well.
Is this the case? (I can't tell if it's LA or not from current traces, we appear to be using Level(3) to get there at the moment however)
Also, what about redundant power? Internap is huge, they must have redundant systems. in place.
Internap animated site status
"At this point all my whiteboards are full of boxes of each database cluster, the machines in that cluster, which have passed their checksum tests. (innodb checksums each 16k page), which replayed their replay/undo logs, where in binlogs each was writing/reading/executing etc..." But at least there is still time to read and post on slashdot. :-)
------
insert sig here,here, and here
Someone probably hit the big red switch on the wall, the one covered in a plastic case
That does happen. I remember working at Purolator Courier's data center in NJ back in -- oh, geez, mid-80s some time. I was a third shift print operator, helped out with the mag tape library too. One night the trouble alarm went off on the fire suppression panel. We'd been having trouble with it all week, and the alarm guy was due in in the morning. One of the newbie operators -- the only one at the console at the time, the others being on a smoke break or asleep in the tape library -- panicked and went over to the annunciator panel. He opened it as I watched him from the console area. I think he thought the halon was about to dump because he reached around the panel and instead of hitting the halon dump abort, he hit the emergency power cutoff.
BLAM! It was as if a firecracker went off as all the breakers tripped and the fans came to a sighing halt. Both on this floor -- the one with the console and the tape drives -- and the floor above, with the CPU and the disk farms. Dead as a doornail.
Now, this was Purolator COURIER. We had AIRPLANES coming in to land at Indy center and as of this moment, no way to tell the crews which gate to go to, where to unload their stuff, or how to sort it.
Not only that, but this was an IBM mainframe shop -- S/390, the Big Iron, with 3380 disk drives. You don't just flip the power switch back on. An emergency power cutoff blows breakers in the power supplies on those DASD strings. The IBM Field Engineer was duly dispatched and arrived with cases of breakers the next morning. But we were still dark when I got off shift the following morning.
The next night a brand new plexiglass cover was mounted over the Big Red Switch.
Mit der Dummheit kämpfen Götter selbst vergebens.
I'm surprised to see that Internap's main servers are back up. It's pretty irresponsible to bring up your corporate servers before those of your clients.
That being said, LJ's servers are back up now, but they're making sure that the databases are all in sync -- LiveJournal has one of the most massive distributed MySQL clusters in existance along with a complete caching system.
They need to make sure that the database is all synchronized before bringing it back up -- chances are they're going to rebuild the cache too. If they didn't, the initial strain on the DB servers would probably bring the site down again.
This does however, bring up some questions about LiveJournal's network infrastructure. Danga (the creaters of LJ, recently purchased by Six Apart) are heavy users of Perl and MySQL. Needless to say, they have made numerous contributions to both projects and have developed an innovative memory caching system for linux.
The questions raised however, come from Perl and MySQL. Both are questionable in terms of scalability. Although I'm not qualified to comment on this, I belive that the general concensus is that MySQL is one of the least efficent databases today. Livejournal has 100+ servers. I honestly don't think that a system the size of LiveJournal should require a server cluster that big. It seems that they are trying to solve their performance/reliability problems by blindly throwing hardware at it.
Of course, I love livejournal. It's simple, easy to use, and is a great tool for building communities. Just as it is simple, it can also be incredibly nerdy (there's actually a command prompt!). They're also completely open source.
Hopefully, Six Apart can make their network infrastructure more 'professional' while still maintianing the community spirit that has made it so successful.
-- If you try to fail and succeed, which have you done? - Uli's moose
A cam whore gently weeps.
Dude, where's my packet?
Finally a break from the never ending angsty teenage bullshit.
This space is powered by Google Ad-nauseam.
I hope they do a look-back analysis on this and publish the results.
It will be interesting to see what caused the failure, what they didn't do that could've mitigated the failure, and whether such mitigation makes economic sense.
Should make interesting reading.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
There were already lots of LiveJournal users who were upset and confused and unhappy with the idea that LJ and Danga (the company which made LJ) had been bought by SixApart. No doubt, as there have been no downtimes of this magnitude at LJ before, doomsayers will be claiming that it's SixApart's fault.
Never mind common sense; it won't matter that if SixApart can be held responsible for failures at InterNAP's colocation facilities, they're a much bigger -- and more powerful -- company than most people have ever given them credit for...
--Rachel
hey, you sound cool. can I add you to my journal? kthx. ;-)
"PC Load Letter? What the $@#% does that mean?!"
This makes me wonder if we will see a slashdot article if slashdot ever goes down for more than 2 mintes.
Video Production Support
Update #2, 10:11 pm: So far so good. Things are checking out, but we're being paranoid. A few annoying issues, but nothing that's not fixable. We're going to be buying a bunch of rack-mount UPS units on Monday so this doesn't happen again. In the past we've always trusted Internap's insanely redundant power and UPS systems, but now that this has happened to us twice, we realize the first time wasn't a total freak coincidence. C'est la vie.
According to some LiveJournal employees, a massive UPS exploded. From IRC:
<rahaeli> As far as we can tell, a UPS exploded.
Their site now says that they're buying their own UPSes, because this is the second time that the entire data center has lost power. Details on the first outage can be found here (a Google cache since LJ is down).
For the paranoid: This has nothing to do Six Apart buying LJ. They're still in the same "world-class" data center they've been in for years.
you want beer and pizza? email me an address/zipcode at the sig email and ill do my part to support restoring lj.
;)
if my wife cant post this weekend, im gonna hear about it. and not even be able to post my lj about getting yelled it about lj being down as if i caused the power outage myself.
not really.
well maybe.
Cheers.
This is my sig. There are many like it, but this one is mine.
Are they saying that someone deliberately sabotaged LJ?
Remember when teenagers were happy when people couldn't read all the personal details in their diary?
One line blog. I hear that they're called Twitters now.
I find nothing wrong with the word "blog", as do a significant number of other people in this world. Don't act as though your opinion is the only one that matters.
"I have felt a great disturbance in the force; as if a million voices suddenly cried out in terror."
Those poor, poor children.
i won't exaggerate if i tell that in recent years most of "social life" in .ru zone moved to livejournal.
it's 10 a.m. in russia now, and most of russian lj-addicts still don't know about apocalypse in lj.
i hope everything will be turned up in the nearest future. brad, we believe in you! :)
It was down for about a half hour, maybe a little longer. Most obnoxious for the colo facility that *is never supposed to go down*
If you look at his userpage here, you'll see he only posts a couple times a year.
Be relentless!
there was info in some russian online media, that this turnout was organized by russian officials who thacked down opposition in internet. conspirology rules :)))))
Or, as in the immortal cartoon "Dexter's Laboratory"...
Dee-Dee: OOOOOH! What does THIS button do?!?!?!?
Dexter: GET OUT AUF MY LAH-BOR-AH-TOR-EEE!!!!!
Knowledge is power. Knowledge shared is power multiplied.
Authorize.net (a fairly popular credit card gateway) is also an Internap client - I wonder how many sites (like ours) potentially lost revenue as a result of this outage.
http://www.theboyz.biz/
If you're not living on the edge, you're just taking up space!
What's so unusual about Brad posting here, I thought everyone posted here.
What Would Scooby Do?
Sure, go to www.livejournal.com when it's back up. It's fairly self explanitory.
What's wrong with log?
When $DAYJOB had a present from Sierra Pacific Power of a two-hour blackout, and we discovered there were major problems with our generator, the poor APC UPS batteries weren't able to hold up the 150 servers I run.
When the power came back on, we had 143 servers back on-line in ten minutes. We had 149 on line in fifteen minutes. We had two servers (leased dedicateds) that requires some file system repairs before they would come back on-line, but that task was finished 30 minutes after power restoration.
What's so hard about that?
(With the addition of a three-phase power transformer, our generator is working properly.)
Customers kept calling asking us why it took us so long!
Damn you fuckers move quick.
-- The doctor said I wouldn't get so many nose bleeds if I just kept my finger out of there!
Given the fact that a pyramid scheme is guaranteed to leave the vast majority of the people who get sucked into it with absolutely nothing, do you actually expect you have a good chance to get your free Mac Mini? What makes you luckier than the next guy?
Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!
I mean having the uber-redundant, diesel-powered backup power in the server room fail.
Except the power didn't fail outside the server room, just inside it. There was a faulty breaker that died unexpectedly. Now, we had 100+ servers go down as a result of that, but we were pissed just the same, right along with about 20 other companies.
I'd post a link to the livejournal entry about the incident, but...
"No problem. I have the capacity to do infinite work so long as you don't mind that my quality approaches zero."-Dilbert
What bothers me is that you don't have separate data centres. I run a reasonably large web site, but it's nowhere near the size of LJ. Yet we have multiple geographic sites, so even if the (N+1) power fails completely in one hosting centre, we're only down on capacity, not out completely. I can't believe a site the size of LJ doesn't do the same...
"The invisible and the non-existent look very much alike." -- Delos B. McKown
Unless it means that the "cheapskate blog admins" were too cheapskate to buy proper dual-power supply boxes so that they can have dual power paths right to the servers.
You can have all the great redundant mains and backups you want, and it's for shit if you only have one power line to the system and that power bus loses juice.
Yeah yeah, it's funny and all, but it's pretty fucking uncool on a number of levels. People cutting themselves is really bad news; please don't make fun of it.
Do not pass go, do not collect $200.
Look, Perl rubs me the wrong way. I loathe it, and it makes me wanna hurl. More than that - it's Postgres that rocks my DB world. But personally, I think I'd at least read up on LJ's infrastructure before bashing it.
I mean they've got what? 2.5 million active users?
And how many hits are DB-backed?
Sweet fuck, man. How many servers do you think they're wasting? Assuming no redundancy (ha!), right now they're sitting at an approximate ratio of about 25,000 users per server! What morons they must be to not be squeezing more out of them. (And yes I know that I'm way oversimplifing, but... really?)
What does this button d$#%* NO CARRIER
Hopefully as lambda switching becomes more common, it will be perfectly feasible to run a SAN spreading across 2 or more datacentres.
It's funny how I was just met with some Internap sales people a few months ago. They were bragging about how their network infrastructure was superior to most others, since it intelligently routes traffic to the path of shortest response (not hops).
They even bragged to me how their network uptime SLA is 100%! I mean good god, now I find out this is the SECOND time it's happened (from the livejournal update site)???
I'm glad I didn't go with them...
eTrade SUCKS
The comments seem to be full of contempt for teenage -angst inane ramblings that are common on LJ. Come on. It's not like you are forced to read through this stuff.
I have a few "friends" there at LJ, some of them net.celebs, and I like their posts. It's the matter of whose writings do you find interesting, and you are free to be completely unaware of the rest. Why all the vitriol?
My exception safety is -fno-exceptions.
Thats why you have redundancy in your payment gateways. Use two or more. Use anet and plugnpay.
Anet was hosed a few months ago due to DoS attacks. But all was good because we had a backup provider.
Seconded.
I hope this is one word that we refuse to have foisted on us.
I mean, no one I know says "blog" besides print journalists commenting on those crazy kids with their kooky intarweb.
Hopefully I didn't put any [] around my words.
In South Korea, only Old People blog on LiveJournal
s'wut i sed.
http://www.cafepress.com/blogwhine
:)
Can't whine about it in your blog if you blog isn't there for you to whine about it in.
I can just imagine the huge pile of traffic that LiveJournal is going to get hit with once everything *does* come back up online.
Hrm. Ss there any way that they can blame this on Microsoft?
Unfortunately a couple machines had lying hardware that didn't commit to disk when asked, so InnoDB's durability wasn't so durable (though no fault of InnoDB).
Um, yeah. That happens when you configure the raid cards for write-back instead of write-through but forget to buy the cards with batteries.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
I have more arguments, but I'll let it drop. No sense in arguing over something that neither of us can prove ;-)
I'll admit though, LJ is a major undertaking and they have produced some nice code for the community at large to use.
I seem to remember that a few years back they had a similar problem (Internap lost all power) and it turned out that some idiot had hit the big red "shut down all power to the entire datacenter" emergency button. This isn't the first time this has happened, and last time it wasn't under Six Apart's management.
I'd say it's Internap's incompetence that caused this problem. If they can't keep their datacenter running even though they have multiple redundant power supplies then something is very wrong. I see from the outage page that LJ people are now planning to buy their own UPS so that they don't have to trust Internap anymore.
For power outages, my house has a better record than Internap right now, and I don't even own a UPS!
I wonder if this was the same outage that took down Geocaching.com? Talk about your worst case scenarios...
Happiness is like peeing yourself. Everybody can see it but only you can feel its warmth.
You can read Brad's presentations on LiveJournal's setup. The LISA one is the most recent, I think.
Ironically, I had just finally got around to aquiring a consumer grade UPS for my own system, installed it, and posted on LJ about it shortly before all of this happened. Go figure.
It looks like you're running on Linux...what filesystem are you using? Reiser(4(?)) would be a big help here.
ROMANES EUNT DOMUS
For the humour impaired... that was a joke. * sigh *
People that believe in their opinions don't post AC.
I don't post here. Oh, wait...
Pope Felix the Scurrilous.
Computer Geek by day, religious Icon by night.
Internap has been hosting LJ since long before Six Apart thought about taking over LJ. If you bother to read you'll know this.
"There is a way that seems right to a man, but its end is the way of death." Proverbs 16:25 (NKJV)
Anyone know of other sites that have been affected by the outage?
2. Not filled with teenage girls and emo boys. Depsite the fact I like emo...a bit...
3. OK, so maybe there are teenage girls on Blogger but they don't go "OMFG LYK i GoT kIsSeD bY jOhN tOdAy!!!111"
4. You get a subdomain, not some crappy
5. It can be exported to sites.
6. It never went throught a temporary phase in which you had to buy a damn "invite code" or one from a fucking friend.
Also according to Wikipedia:
In America, you spam computers In Soviet Russia, computers spam you!
Yeah, the hosting center that they've been at for years has a power failure 3 days after a new place buys LJ. Must be Six Apart's fault, they're not "ready to handle this." Or, maybe it's a coincidence?
I just wanna rant about the doofus who followed me around my mom's condo building muttering "9/11", "you just don't get it", "do you live here", etc.
Ok, there, whew.
Mix the failings of Usenet with the shortcomings of the World Wide Web and the result is slashdot.
I KNEW that this story would be on slashdot, and I knew someone would make a crack at Six Apart (who are brand new to the scene) and imply that somehow magically the power wouldn't have gone out if it weren't for Six Apart owning Livejournal.
Oh, whatever. What I'm saying is you get the subdomains for free on Blogger... on LiveJournal you have to pay to get them I'm sure.
In America, you spam computers In Soviet Russia, computers spam you!
Happy to be of service, happily.
-Xoder, Minister of the Department of Redundancy Department.
The previous sig has been removed due to
Hi. Perhaps it's more than just the power failure? Perhaps BradFitz's last cached LJ entries may hold some additional clues as to why it's taking so long after the power failure to bring LJ back up? Yes, perhaps they do. Enjoy: Jan. 12th, 2005 @ 02:31 am *yawn* It's one of those database nights. Watching 3 sets of progress bars move way too slowly. Wish there were something on Tivo at least to kill some time. Jan. 11th, 2005 @ 09:15 pm smart migrator I just wrote a database migration script which, after each chunk of data moved, asks the load balancer (Perlbal) the free user queue depth. If more than 10 (less is just noise), it sleeps a second and asks again. Only once the queues are empty does it migrate more data. End result: it moves data as fast as it can, without affecting page response times. I've been meaning to make a generic wrapper for any utility, where the wrapper parent watches the load balancers, and the child does its work at full speed, but the parent will occasionally SIGSTOP/SIGCONT it...... haven't got around to that yet. The generic wrapper could even have another pluggable-child as its rate-limit determiner, so anybody could use it. Jan. 11th, 2005 @ 01:24 pm Parallel compression Are there any multi-threaded compression algorithms, or at least wrappers/formats for interleaved compression, with variable interleave size? It'd be nice to take advantage of multiple processors when gzipping 380 GB, while still doing sequential reads, even if the resultant file was non-standard.
LJ has always been a somewhat cash-starved operation; they make a significant amount of money from their paid users, but they also have a lot of expenses--full-time employees, an ever-expanding user base on a technology that isn't easy on hardware, bandwidth use...
As it is, most (all?) of their employees are in Portland, so they keep all their servers there, where they can quickly get at them if something happens. Having a second datacenter would be hard on their employees, hard on their budget, and hard on their architecture--for a site that, in the end, isn't critical to have running 24/7.
Hey, you try to find an open nick these days!
Cut the poor bastards some slack, at least they have the excuse of "teen hormones".
Nothing, on the other hand, can excuse Taco's lame blog:
Why is it that my personal value as a human being is always tied 100% to the status of my server. Since last week the box has been cranky (a blown power supply, resulted in the harddrive being happily moved to a machine with 128 megs less RAM, which means the whole thing is just sluggish as hell today. And suddenly I feel like shit. I feel tired unhealthy, and burnt out. A few weeks ago, I was on top of the world: the machine was stable, kicking out 640,000 pages in one day, and performing snappy for everyone. And I was cheerful. Its really strange that a chunk of steel and silicon 3 time zones away defines my mood.
The airline lost my luggage... it contained 4 pairs of boxers.
So ya know that annoying ad with the damn taco bell dog and the cops that keep saying 'Drop the Chalupa' over and over again? I hate that ad.
Good gawd... Taco can put most teenage girls to shame when it comes to lame personal details publicized.
Personally, I'm finally switching from postgres to mysql after 8 years of happy use of the former because it's finally let me down.
They are hoping to have limited capacity on the site in a few hours. They have not slept very much and called Six Apart the minute everything went to hell. Plans for the holiday weekend for the Fitzpatricks went to hell and no one has had much rest. They are testing everything and all the rumors out there are just that - rumors. Mena and her crew were notified the minute it went down. So the people calling up LJ's customer service threatening to slit their throats, saying Six Apart got punk'd, and everything else - can you be more emo? Come on now, if you need to journal that bad - head over to GJ just get your fix. Otherwise, give them time. This has only happened one time before and that was confirmed by Sandy when she called me back (I called asking for an interview and more information as I am doing a story on blogging and this outage for a site I write for.) My god, you would think someone is killing kittens or something the way people are crying in chat rooms. http://www.mandelion.com
OK...I just couldn't resist doing this...In keeping with the American tradition of compassionate response to disaster...
Get your souvenir T-shirts and coffee cups HERE!
this is loaner...my sig is in the shop
Those LiveJournal users are everywhere!
Yes, we do want our fix.
Go, Team LJ, Go!
I agree!
A-ha! Our credit card processor -- Authorize.Net also went down hard yesterday. I did a quick tracroute just now and see that they're also located at Internap.
:)
Whoops
Now somebody's going to start a ribbon campaign.
It's back up. Unless you're on Filet MIgnon or Madcow (like myself.)
Been a bad 24 hours without LJ :(
Its back now though, yay! :D
Speaking as a former teenage self-harmer, they only do it because they take it seriously and believe others will.
Though perhaps not everyone is the same as you. In my experience, many if not most self-harmers (both teenage and older) do not do it because of what they think others will think of it.
When I was a kid -- before email, jackasses -- there were people like you. They mailed "family updates" every Christmas, long letters about what they named their dog's puppies and who their cousin saw in Orlando. We mocked those people. No one took them seriously, and their "updates" always ended up in the trash.
Then you have completely missed one of the points of something like LiveJournal. With email, the sender decides whether the recipient gets sent the information. With a webpage, the user decides whether to read it (and I'm sorry that no one cares about you, but some of us have these things called "friends").
Btw, why did you post this comment? After, if no one cares about what you have to say..
Bizzarely enough we had major downtime for both WoW and LJ this week.
Maybe it's a conspiracy to see if they can force all the computer geeks ut of the house and into the sunshine for a few hours this month?
Sara
Designer, Gamer, Macgrrl in an XP World
Amen to that...I hate the word blog...and it seems so....artificial?, as well..
True, one man can't control it. One man with a backhoe, however, is a different story entirely. ;)
He who can destroy a thing, controls a thing.
The REAL jabber has the user id: 13196
What you do today will cost you a day of your life
Oh fuck you.
In America, you spam computers In Soviet Russia, computers spam you!
Perhaps Six Apart wasn't quite prepared for the responsibilities of a website of this size?
Last time I checked, LiveJournal wasn't experiencing 503 Service Not Available error downage for 1 - 2 hours every day (unlike certain other web sites).
Speed/Duplex negotiation is an OS configuration issue, not a hardware NIC issue.
If the OS can't configure the negotiation, that's still the OS, not the hardware. It just means that the driver isn't capable of properly configuring the NIC. Just because your workaround was in hardware, does not mean that that is the cause of the problem.
As for adding your own UPSes which ignore the EPO, is surely that defeats the object of the EPO. I don't know USAian requirements, but if, as you say, the EPO is required, is it legal to bypass it with your own UPSes?
LJ clearly have not heard of DR; although a true DR configuration is probably overkill for this type of site, this report gives the strong impression that basic sysadmin competencies were not followed when there was time available - during design and deployment, and then later during normal running. These problems had apparently not occured to anyone until it happened. Isn't "what's the worst-case scenario" a common-enough question? Wouldn't "total power failure" be one of those answers?
Even with write-though caches, a small battery in the array can flush data to disk after a power failure. This isn't rocket science - buy the right kit for the job, understand what you're buying, and how to configure it. If you don't understand what it is, what on earth made you decide to buy it?!! You've got dual-powered systems, but didn't use that feature - why did you buy it then? It wasn't a conscious decision to take the risk, and it wasn't a conscious decision to get dual-powered hardware for resilience. No thought was made about power. Most colo's provide dual-sourced power supplies for this type of problem - power from seperate grids, so even if the grid providing power to the datacentre goes down, the alternate grid continues running.
Sigh. I often have customers nearly as daft as this, though I don't think I've come across such a poorly considered deployment for a long, long time.
Author, Shell Scripting : Expert Re
It isn't that someone did a breakfast journal. I could see someone doing that as a joke. It doesn't shatter my worldview.
What gets me is the sheer volume of comments! Not just from a small group of people, but if you look, you see lots of different people doing it.
That DOES shatter my worldview.
Slashdot. It's Not For Common Sense