Surviving Slashdotting with a Small Server
S.BartFarst writes "Our little departmental server has been slashdotted twice in the last year and survived! Implementation of a two-headed redundant hardware scheme using linux virtual server and backup and failover capabilities enhanced by the linux high-availability tools has produced a nifty low-cost solution. Gotta love those little white boxes!
(also having a university-supplied BIG PIPE doesn't hurt). More interesting is the documentation of the apparent exponentially decaying attention span of slashdotters. Anybody else observed similar phenomena?"
I was under the impression that a 20k fiber or 100mbs one that can dynamically shift traffic would be needed.
http://saveie6.com/
My server has been slashdotted a few times and I can tell you it's pretty simple to not get overloaded.
The first time I learned my lesson. The server was on a T1 line that was 2/3 full already, and slashdot linked to a page full of large photos. That'll kill your link pretty quickly. Low-budget solution: sign up for a burstable web hosting account somewhere and just put all your large images there.
Later when we got some actual office space for the business, I moved the main server up to a colo facility in fremont. All slahdottable content is hosted there on a fast server with a 100mbps ethernet link. Other oddball services that need their own machine are hosted from the other end of a point-to-point T1 line going directly back to the office from the colo.
So depending on your budget it's really not hard to set up your site to survive a slashdotting. If you don't have a lot of dough to spend but you want to run your own server for configurability/security reasons, just host the static stuff somewhere else. Or if you're serving enough to make it economical, get a colo account with a burstable link.
There's a widespread misconception here that slashdotting is caused by server overload. In reality this is almost never the case. It's caused by insufficient bandwith. This in turn may cause server overload because of too many slow clients being connected, but that is purely a secondary effect.
I really don't think the Slashdotter attention span is any different (or if different, it is longer) than the average Internet user.
When articles appear on the first page, they get attention, as they scroll to the bottom they get less, as they move to background pages they get significant;y less.
While I often look beyond the front page, I am less likely to delve into the articles or discussions there, since almost everything that needs to be said HAS been said by then.
I've carried on conversations with people regarding Slashdot articles long after the article appears. This can take place in journal entries or via e-mail where the discussion material can be easily kept as opposed to Slashdot comments which ultimately disappear anyway.
The fact that people don't continue to click on the original source URLs doesn't mean anything.
With static web pages, server power is rarely a problem, it's all about the pipes. However, if the pages are dynamically generated, and don't have a lot of caching, then you've got yourself a big problem.
So take, for example, loading a forum page in UltimateBB. AntiOnline handily tells you how many db requests it takes to create a page, and how long it took. This one over here says 61 requests and .3 seconds. Now, the poster claims to be peaking at ~37000 page views/hour, which is 10 hits per second. Now in that .3 seconds, where 61 database connections were established, there were another 3 requests coming in, making it an average of 240 database connections every .3 seconds. That's not an unreasonable number of connections, but what if your DB server can't keep up? What if, due to the load, the queries take 10x longer than usual? At that point, over .3 seconds, you get 240 connections, but only service 24 of them. Over the next .3 seconds, you get another 240 requests, but service only another 24, leaving you with 436 pending. After 30 seconds, you've serviced 2400 requests, but have another 21,600 pending. before too long, you're out of possible TCP ports.
There are ways to keep your servers from crapping out under heavy load. One is to buy a studly, fire-breathing DB server that can process requests faster than your web servers can send them. Another (cheaper) solution would be to pool and marshall your DB requests, being sure to remove requests from the queue when the remote user times out (either by clicking the stop button or running up against a built-in limit of their browser). This way, your site may get slow, but nothing will crash. A final method is to use enough caching on the web server that you pages are, essentially, static. This is, for instance, what Vignette does, which is why all the major news sites use it. This method combines the flexibility of database-backed CMS systems with the database load of static web pages.
So, essentially, there are many ways to let your database-backed web site survive a slashdoting, but embedding a bunch of PHP SQL queries against a locally-running installation of MySQL is not one of them. Unless you have a big honkin' cluster.
Congratulations on surviving /.ing. I have a few questions.
How were LVS and HA configured? With two systems, I can only guess that each was a real server (using the LVS terminology). Also both would be load balancers, with one being selected as active using HA.
How did using HA or LVS help surivive a /.ing? Were there failovers? How many? When? Why? If surviving /.ing consisted of a high rate of failovers then the hardware wasn't up to the job.
What is the "automated backup system?" Are you rsyncing the contents? From each other? From another system? Or does it refer to regular "tar" backups to tape?
Having separate UPSs is overkill, unless the one UPS could not handle the load of both systems.
Is there any dynamic content on the servers? Databases? How was keeping these synchronized handled?
What I'd really like to see would be a graph of a BIG site when we Slashdot them now. It would be very interesting to see the subscribers and what they do before the /.ing public sees it. I couldn't seem to see one on the graph that they posted. Is it just that small? Just wondering.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
It simply stops serving images at anything but a really slow rate
What's the point? Either way you're slashdotted.
Besides, I think in the case of server overload (as opposed to network overload), throttling will only exacerbate the problem by increasing the number of slow clients you have to deal with. This is the #1 bottleneck in web servers, the more clients you have, the longer it takes to deal with each one of them. Losts of processes to switch between, long arrays in an out of select(), etc.
Also, when a user doesn't get a page in his browser, what does he do? He clicks the link again and again.... even more connections to handle.
Really the only way to cure an overloaded server is to drop incoming SYNs. Any other measure is just pouring gasoline on the fire.
Even though only a small percentage of Slashdot readers look at the comments, Slashdot's readership is so huge that the number of people reading the comments is still significant. It's not enough to kill a server, but I posted links to three images, around 80KB each, on my home server a few days ago fairly deep down in the discussion and got 3904 hits from it. It didn't kill my server (Pentium 133MHz, 64MB RAM, Debian 3.0, Apache 1.3.26, 3000/256 cable) and didn't result in any nasty letters from my ISP.
OT: It was interesting reading the logs. There are quite a few Linux users on here (but even more Windows users), and I saw lots of people using Mozilla, Opera, Safari, etc. Compare that to sites aimed at the average user where 95% of visitors are using IE or AOL and don't know that there's anything better out there.
It's an operating system, not a religion.