Handling the Loads
I woke up and it seemed like a normal day. Around 8:30 I got to the office and made a pot of coffee. I hopped on IRC, started rummaging through the submissions bin, and of course, began reading my mail. Within minutes someone told me on IRC what had happened just moments after the impact of the first plane. Just a minute or 2 later, submissions started streaming into the bin. And at 9:12 a.m. Eastern Time, I made the decision to cancel Slashdot's normal daily coverage of "News for Nerds, Stuff that Matters," and instead focus on something more important then anything we had ever covered.
I couldn't get to CNN, and MSBNC loaded only enough to show me my first picture of the tragedy. I posted whatever facts we had: these were coming from random links over the net, and from Howard Stern who syndicates live from NY, even to my town. Over the next hour I updated the story as events happened. I updated when the towers collapsed. And the number of comments exploded as readers expressed their outrage, sadness, and confusion following the tragedy.
Not surprisingly, the load on Slashdot began to swell dramatically. Normally at 9:30 a.m., Slashdot is serving 18-20 pages a second. By 10 we were up to 30 and spiking to 40. This is when we started having problems.
At this point Jamie and Pudge were online and we started trying to sort out what we could do. The database crashed and Jamie went into action bringing it back up. I called Krow: he's on Western time, but he knows the DB best, and I had to wake him up. But worst of all, I had to tell him what had happened in New York. It was one of the strangest things I've ever done: it still hadn't settled in. I had seen a few grainy photos but I don't have a TV in my office and hadn't yet seen any of the footage. After I hung up the phone I almost broke down. It was the first time, but not the last.
The DB problem was a known bug and the decision was made to switch to the backup box. This machine was a replicated mirror of Slashdot, but running a newer version of MySQL. We hadn't switched the live box simply because it meant taking the site down for a few minutes. Well we were down anyway, and the box was a complete replica of the live DB, so we quickly moved.
At this point the DB stopped being a bottleneck, and we started to notice new rate limits on the performance of the 6 web servers themselves. Recently we fixed a glitch with Apache::SizeLimit: Functionally, it kills httpd processes that use more then a certain amount of memory, but the size limit was to low and processes were dying after serving just a few requests. This was complicated by the fact that the first story quickly swelled to more than a thousand comments ... we've tuned our caching to Slashdot's normal traffic: 5000-6000 comments a day, with stories having 200-500 comments. And this was definitely not the normal story. Our cache simply wasn't ready to handle this.
Our httpd processes cache a lot of data: this reduces hits to the database and just generally makes everything better. We turned down the number of httpd processes (From 60 on each machine, to 40) and increased the RAM that each process could use up (From 30 to 40 and later 45 megs) We also turned off reverse hostname lookups which we use for geotargetting ads: The time required to do the rdns is fine under normal load, but under huge loads we need that extra second to keep up with the primary job: spitting out pages as fast as possible.
This was around noon or so. I was keeping a close eye on the DB and we noticed a few queries that were taking a little too long. Jamie went in and switched our search from our own internal search, to hitting Google: Search is a somewhat expensive call on our end right now, and this was necessary just to make sure that we could keep up. We were serving 40-50 pages/second ... twice our usual peak loads of around "Just" 25 pages a second. I drove the 10 minutes to get home so I could watch CNN and keep up better with what was happening.
We trimmed a few minor functions out temporarily just to reduce the number of updates going to frequently read tables. But it was just not enough: The database was now beginning to be overworked and page views were slowing down. The homepage was full of discussions that were 3-4x the average size. The solution was to drop a few boxes from generating dynamic pages to serving static ones.
Let me explain: most people (around 60-70%) view the same content. They read the homepage and the 15 or so stories on the homepage. And they never mess with thresholds and filters and logins. In fact, when we have technical problems, we serve static pages. They don't require any database load, and the apache processes use very little memory. So for the next few hours, we ran with 4 of our boxes serving dynamic pages, and 2 serving static. This meant that 60-70% of people would never notice, and the others would only be affected when they tried to save something ... and then they would only notice if they hit a static box, which would happen only one in 3 times. It's not the ideal solution, but at this point we were serving 60-70 pages a second: 3x our usual traffic, and twice what we designed the system for. We got a lot of good data and found a lot of bottlenecks, so next time something that causes our traffic to triple, we'll be much more prepared.
At the end of the day we had served nearly 3 million pages -- almost twice our previous record of 1.6M, and far more then our daily average of 1.4M. During the peak hours, average page serving time slowed by just 2 seconds per page ... and over 8000 comments were posted in about 12 hours, and 15,000 in 48 hours.
On Wed. we started to put additional web servers into the pool, but that ended up not being necessary. We stayed dynamic and had no real problems on all 6 boxes all day. We peaked at around 35-40 pages/second. We served about 2 million pages. Thursday traffic loads were high, but relatively normal.
Summary So here is what we learned from the experience.
- We have great readers. I had only one single flame emailed to me in 24 hours, and countless notes of thanks and appreciation. We were all frazzled over here and your words of encouragement meant so much. You'll never know.
- Slashteam kicks butt. Jamie, Pudge, Krow, Yazz, Cliff, Michael, Jamie, Timothy, CowboyNeal, you guys all rocked. From collecting links to monitoring servers, to fixing bits of code in real time. It was good seeing the team function together so well ... I can't begin to describe the strangess of seeing 2 seperate discussions in our channel: one about keeping servers working, and another about bombs, terrorists, and war. But through it all these guys each did their part.
- Slash is getting really excellent. With tweaks that we learned from this, I think that our setup will soon be able to handle a quarter million pages an hour. In other words, it should handle 3x Slashdot's usual load, without any additional hardware. And with a more monstrous database, who knows how far it could scale.
- Watch out for Apache::SizeLimit if you are doing Caching.
- Writing and reading to the same innodb MySQL tables can be done since it does row-level locking. But as load increases, it can start being less then desirable.
- A layer of proxy is desirable so we could send static requests to a box tuned for static pages. For a long time now we've known that this was important, but its a tricky task. But it is super necessary for us to increase the size of caches in order to ease DB load and speed up page generation time ... but along with that we need to make sure that pages that don't use those caches don't hog precious apache forks that have them. Currently only images are served seperately, but anonymous homepages, xml, rdf, and many other pages could easily be handled by a stripped down process.
What happened on Tuesday was a terrible tragedy. I'm not a very emotional person but I still keep getting choked up when I see some new heart breaking photo, or a new camera angle, learn some new bit of heart breaking information, or read about something wonderful that somebody has done. This whole thing has shook me like nothing I can remember. But I'm proud of everyone involved with Slashdot for working together to keep a line of communication open for a lot of people during a crisis. I'm not kidding myself by thinking that what we did is as important as participating in the rescue effort, but I think our contribution was still important. And thanks to the countless readers who have written me over the last few days to thank us for providing them with what, for many, was their only source of news during this whole thing. And thanks to the whole team who made it happen. I'm proud of all of you.
I know that a lot of shackers and other people on the net aren't christian or don't even beleive in God. Thats fine. Tomorrow (now today) you will hear a lot of people praying, asking you to pray, etc. This isn't the snickers comercial where they bring in a representative of every religion before the big game. It will feel weird. I feel that a week ago that if NBC was showing a service that someone would whine. Today, I ask ya just let it slide. When they say pray, interperate that as 'do what makes you feel comfortable. Please just be respectful like your mama would want you to be. But for today, just kinda chalk it up to all those people burned, crushed, flateneted, chocked, suffocated, etc. to death.
Thank you
The ultimate network admin tool needs HELP!
Not only with Slashdot (did that REALLY say 2-thousand-something comments on the front page?!?!), but with CNN, ABCNews, the NY Times, and just about every other major news source I can think of. Tuesday afternoon was tough. By Tuesday evening all these sites were responding as though I was the only connected user. The server power that must have been thrown at some of these sites is staggering.
I've spent the last few days in something of a daze, waiting for the real ramifications of Tuesdays horror to sink in. Many of my collegues up here in Canada are not sure what to make of the events, and possible response, but we're sure it will be bad.
That said, in all my experiences on the net over the last couple of days, it was Slashdot I came back to for my info feed/dump. Who had their site up and running in the face of the massive demand? Slashdot.
CNN was there during the Gulf War. Slashdot was there for the start of this new era, and I'm sure will be there in the face of whatever is to come. You guys are just another indication of the strength the US can have in the face of adversisty.
Thank you.
Beware the Whyte Wolf.
With a gun barrel between your teeth, you speak only in vowels...
It is not common for people to recieve thanks for the great service that they do for a community but I am going to go ahead and give you thanks for feeding us the information that I was not able to get through TV and the basically non-exitant other news-sites.
/. and the editors but this entire week I felt that they did an extraordinary job of keeping us informed. For once I am going to applaud you.
/. for making sure we got the news we needed.
I am normally a critic of
I got links to personal experiences on the tragedies, movies, images not seen on TV, and personal reflection on the entire ordeal by people that seem to have valid ideas (not the crap that you hear from most people about the attacks)
Thank you again
Could you please cache the stories in NESTED mode instead of threaded? When the site is being hammered I would imagine it is far better to have guys grab a single large, cached page than a smaller cached page and then have to try to have teh system survive thosands of clicks for more information.
I really do thank you guys for this site and your decision to carry the news. I have a new respect for the amount of bandwidth you throw around with impunity on a daily basis.
one final request: get search back online so I can get to the old stories! Google doesn't have them (even now!)
Slashdot itself did very, very well in my experience. I experienced far fewer delays and errors than on other sites. Thanks to everyone who worked so hard to keep it running. You've made a huge difference for thousands of people.
sulli
RTFJ.
That was really well said.
During my life I've always taken "bow your head and pray" as "shut up and look serious".
And I thought Slashdot did a very good job this week. I woulda emailed Taco that, but I figured there was enough traffic over the Internet.
Really good job guys. Between Slashdot and Drudge I felt as informed as a guy can be.
First off kudos to Slashteam. You kept a valuable news source up and running while most people were too stunned to do anything other than watch, horrified, at the TV. Good work. You provided a valuable service to many people in this crisis.
/.'s recovery, but it was rather impressive given the HUGE load they were experiencing. First, they stripped down the page content to low-bandwidth versions, then phased in their site. I'm not sure about CNN, but MSNBC added static mirrors to their pool, and got Akamai servers to serve all their media. By around noon, both sites were running their normal full-content versions, even though they were probably still getting hammered to high-heaven.
Also, to those who are getting down on CNN and MSNBC... From what I've heard, those sites are already tuned--and regularly do--serve around 45 pages per second...even with loads of media.
Crashing them was likely no small feat, either. Likely every person with internet typed in the very familiar cnn.com or msnbc.com just on instinct. It probably didn't help MSNBC or CNN that the MSN and AOL/Netscape portals, respectivly, link to them directly.
I was actually pretty impressed with how they handled the load...it was a little slower than
Personally, I give many thanks to all the techs for all the news sites who worked like mad to ensure that people were able to understand what was happening. It must not have been easy to work in conditions like that: especially considering the stress that was put on them.
-Jayde
What's a sig?
When I started reading this, I was disgusted. I was expecting something like CNN's ads after the Gulf war, touting the fact that they were the ones who got most of the scoops.
By the time I got half-way through the actuall content (not the front-page piece) I was in awe of how much went on. Usually when a massive load spike happens on my watch, I try to get everyone's fingers out of the pie so that we have a good chance of the machines just doing their jobs. The fact that these folks were able to make emergency changes in real-time to compensate for the load is just astounding.
CNN should be rolling out a Slash-based discussion forum for top stories. Heck, so should Whitehouse.gov!
Thanks guys, and good luck with your ongoing coverage of News For Nerds, Stuff That Matters!
I first heard what was going on from Slashdot, and I had to turn the television on to believe it - it sounded too much for a prank story when I first read it.
For me, the television was more important than Slashdot for recieving information on what was going as and when it happened.
But for me, Slashdot has been much more important as a place where I could see what other people from all over the world were thinking about this tradegy. I hope that the different pesrpectives and posts which I have read have allowed me to more maturely handle how I feel about the situation than I otherwise would have been able to.
While my heartfell thanks go to
Slashdot was serving 50 pages per second, CNN was peaking at about an estimated 50,000 hits per second.
In light of this it was amazing that CNN was up at all, slow as it was.
My only gripe is I think it was very out of place and a bit insensitive that right in the middle of this (around 12pm if IIRC) Jon Katz took this tragedy as an opportunity to post some rant about how technology led us to this evil situation we were in and how technology was changing the way people get news or some such. I'm normally not a Katz-basher, but I think this was WAAAY out of place and insensitive to the people that died that day. Not only that, but it was unnecessary noise while people were still scrambling to get to the FACTS of what was going on. We really didn't need some insensitive wanna-be journalist's opinion on technology, of all things, in the middle of all of this. Maybe it would have been more appropriate on Wednesday or Thursday, but (to me) it was out of line at 12pm on Tuesday. Not to mention the whole crux of his article was off base (people killed people Tuesday, not technology).
Okay, I'll stop bitching now. Thanks again Slashdot, for stepping up to the plate and knocking one out of the park!
Shayne
Today I didn't even have to use my AK; I got to say it was a good day -- Icecube
I've read a few reports about how the Internet failed during this disaster since almost all news sites were too busy to respond. I disagree with that. Slashdot was here, as well as things like IRC.
On the channel I've frequented for years I got more up to the minute information than anyone in the office. Everyone was wondering where my news was coming from, especially since it was so accurate. While some people were sitting around watching CNN we were discussing and talking about what was going on with people very close (too close) to the events.
This doesn't even take in to consideration email. With cell phones and land lines too congested people were sending emails back and forth to get word on loved ones or just to talk about the events.
I think the Internet did a great job.
While the television remained my primary news feed, Slashdot was my primary web feed. It provided the community side of the equation: a finger on the pulse of the world and, particularly, America.
Thanks to the Slashdot crew for scrambling to provide the best possible service during a time when many other people were in emotional and occupational shutdown.
And thank-you to the people who form this community. On the whole, the discussions have been remarkably insightful and rational.
I'm hopeful that this web community is representative of the American population, and that we will see your political and military leaders taking sane action. This tragedy could all too easily throw us into devastating war with continuing long-term consequences.
I'll also take this opportunity to apologise for the several postings where I lost my head. While most of what I've written has attempted to educate a broadly ill-informed public as to why this attack took place, and to preach sanity in dealing with the attack, I have also lost my head in responding to some of the more dreadfully ignorant folk. For that, I am sorry: I should have been more patient and tolerant.
In closing, I'd like to assure our American friends that this has been a global tragedy. The outpouring of support, and demonstrations of grief and sorrow, have encircled the globe. Every nation mourns with you, and every nation feels a sense of shock and loss.
You are not alone.
--
Don't like it? Respond with words, not karma.
In the future, you might consider making the "HTML Light" mode the default mode under heavy load.
Granted, it doesn't alleviate the DB problem, but it does limit the images sent down the pipe.
(more ideas pulled out of the ass) Perhaps another Apache instance or a Perl script (horrors!) to watch traffic and to ratchet the options down as traffic increases, based on a weighted system (level 1: no sigs, level 2: drop journals, level 3: no search, ... level N serve only static HTML)
This is an interesting problem, and I'm impressed with y'all ability to handle it.
Potato chips are a by-yourself food.
Couldn't the switch to static pages happen _automatically_ if the database goes down? The only difference to most users would be inability to post comments.
Hmm, that is actually quite a problem (though still better than just having the site go down). Maybe a 'comment spool' where the comments can be saved as flat files, ready to be inserted when the DBMS comes back up?
Anyway, kudos to Taco and the gang for keeping Slashdot up. Three million pages in 24 hours... how does that compare with the really big sites like Yahoo, AOL and CNN?
-- Ed Avis ed@membled.com
Folks, I've got a very very bad feeling and if it's true then the worst is yet to come and President Bush is going to need all the support he can get. The other day when he got off the phone from the mayor and governor of New York (neither of which I can spell), President Bush started speaking off the cuff (undoubtably to the horror of his PR people) and after rambling a little he said, "...I'm a likable guy....but I've got a job to do...and I'm going to do it..." and he said this with tears in his eyes. Several people in my office including me think they've decided to use a nuke and Bush is getting shook up about how HE is the one who is going to go down in history for authorizing it. This is a terrible burden for him, no matter what. He deserves your thoughts and support....
Jerry Falwell and Pat Robertson blaming the events on liberals, feminists, etc. etc. etc.
While I wholly condemn the actions of the terrorists I do have to critically ask, "Did the government of the United States of America have this coming?
You'd have to be blind to see that the U.S. government has been supplying arms and training and money to factions around the world for over 50 years. You'd have to be blind to see the American government change its mind mid-stride -- first by supporting a group (again, with weapons and money), then by turning face, cutting off support or even condemning the actions of the group they supported.
You'd have to be insane to believe the 1973 crap propaganda article by Gordon Sinclair is a clear and frank view of the United States of America and its leaders and their policies.
The government of the United States of America has been bullying and harassing nations for a very long time, flaunting themselves as a superpower which is untouchable. They've stuck their noses in other nations' business too many times and someone had decided to cut it off.
I don't agree entirely with this Guardian article but it does rise a very strong and important point: The U.S. must change the way it carries itself in foreign affairs. The American people must stand up and take active interest in their nation's government. The American media must stop downplaying foreign affairs.
an aside: the Canadian people aren't much (any) better in this regard. Canadian readers: How much interest do you show in your government??
I do not believe that this is the act of one nation, or even of a nation. And I am frightened because I do not think this is the last.
The U.S. government and media is running around crying "Why me? Why us?" and you have the President standing frail and shaken, telling his nation that "He's gotta do what he's gotta do" instead of analyzing the situation properly and keeping cool.
I must give Bush credit -- he did not spout off about Arabs or "them guys" as Clinton did with OKC -- Bush remained calm and rational. I fear that this is quickly fizzling out because his anger is taking over and as President, he is not allowed to have those emotions. He is a man with the power of a very large, wealthy and military nation. He is not allowed to be angry. I think he is grappling with those emotions and his reserve is failing.
As a Canadian, I demand retribution for what happened in the United States this week. I am not saying "forgive and forget." Blood will be shed, and rightly so. Check out my /. userpage for views on what I personally feel is acceptable for retaliation. I also think the President should send a strong message that it is not acceptable to hate the middle eastern people -- Just as there was no witchhunt against all white people with OKC, there should be no anger towards the Arab, Muslim and other middle-eastern people within or outside the U.S. This is not an attack by the middle eastern people nor their religion; this is an attack by terrorists and cowards too cowardly to stand up and fight.
And I fear that we will be brought into a world war because of it.
In Executive Orders, terrorists flew a plane into the White House, killing most of the major gov't officials that were there for a ceremony. Sound famaliar?
Don't remember which book it is (Never finished it), but there's one about a small terrorist group gathering materials and machining a nuclear bomb to set off in a football stadium in Colorado. The possibility of it seemed to be almost zero...but so did the possibility of a hijacked plane being flown into the Pentagon. What's scary is that it's not as hard as it seems. With bin Laden's resources, he could have procured the materials and gotten people to carry a nuke-in-a-briefcase into the WTC, and taken out all of Manhattan. Not that I'm saying what happened isn't terrible - it is - but the possibilities for terrorists these days are WAY past frightening.
Terrorists don't care about diplomacy, they don't care about the politics, they care about causing pain, terror, death, and destruction. How are we supposed to fight an enemy that is free to use weaspons many times more destructive than our government will dare to use? How can you guarantee safty?
Look at me...I'm turning into a paranoid conspiracy theorist.