Slashdot's Setup, Part 1- Hardware

Posted by ryuzaki0 on Friday October 19, 2007 @04:00AM from the lookit-all-them-wires-in-there dept.

As part of our 10-Year anniversary coverage, we intend to update our insanely dated FAQ entry that describes our system setup. Today is Part 1 where we talk mostly about the hardware that powers Slashdot. Next week we'll run Part 2 where we'll talk mostly about Software. Read on to learn about our routers, our databases, our webservers and more. And as a reminder, don't forget to bid on our charity auction for the EFF and if you are in Ann Arbor, our anniversary party is tomorrow night.

CT:Most of the following was written by Uriah Welcome, famed sysadmin extraordinaire, responsible for our corporate intertubes. He Writes...

Many of you have asked about the infrastructure that supports your favorite time sink... err news site. The question even reached the top ten questions to ask CmdrTaco. So I've been asked to share our secrets on how we keep the site up and running, as well as a look towards the future of Slashdot's infrastructure. Please keep in mind that this infrastructure not only runs Slashdot, but also all the other sites owned by SourceForge, Inc.: SourceForge.net, Thinkgeek.com, Freshmeat.net, Linux.com, Newsforge.com, et al.

Well, let's begin with the most boring and basic details. We're hosted at a Savvis data center in the Bay Area. Our data center is pretty much like every other one. Raised floors, UPSs, giant diesel generators, 24x7 security, man traps, the works. Really, once you've seen one class A data center, you've seen them all. (CT: I've still never seen one. And they won't let us take pictures. Boo savvis.)

Next, our bandwidth and network. We currently have two Active-Active Gigabit uplinks; again nothing unique here, no crazy routing, just symmetric, equal cost uplinks. The uplinks terminate in our cage at a pair of Cisco 7301s that we use as our gateway/border routers. We do some basic filtering here, but nothing too outrageous; we tier our filtering to try to spread the load. From the border routers, the bits hit our core switches/routers, a pair of Foundry BigIron 8000s. They have been our workhorses throughout the years. The BigIron 8000s have been in production since we built this data center in 2002 and actually, having just looked at it... haven't been rebooted since. These guys used to be our border routers, but alas... their CPUs just weren't up to the task after all these years and growth. Many machines plug directly into these core switches, however for certain self contained racks we branch off to Foundry FastIron 9604s. They are basically switches and do nothing but save us ports on the cores.

Now onto the meat: the actual systems. We've gone through many vendors over the years. Some good, some...not so much. We've had our share of problems with everyone. Currently in production we have the following: HP, Dell, IBM, Rackable, and I kid you not, VA Linux Systems. Since this article is about Slashdot, I'll stick to their hardware. The first hop on the way to Slashdot is the load balancing firewalls, which are a pair of Rackable Systems 1Us; P4 Xeon 2.66Gz, 2G RAM, 2x80GB IDE, running CentOS and LVS. These guys distribute the traffic to the next hop, which are the web servers.

Slashdot currently has 16 web servers all of which are running Red Hat 9. Two serve static content: javascript, images, and the front page for non logged-in users. Four serve the front page to logged in users. And the remaining ten handle comment pages. All web servers are Rackable 1U servers with 2 Xeon 2.66Ghz processors, 2GB of RAM, and 2x80GB IDE hard drives. The web servers all NFS mount the NFS server, which is a Rackable 2U with 2 Xeon 2.4Ghz processors, 2GB of RAM, and 4x36GB 15K RPM SCSI drives. (CT: Just as a note, we frequently shuffle these 16 servers from one task to another to handle changes in load or performance. Next week's software story will explain in much more detail exactly what we do with those machines. Also as a note- the NFS is read-only, which was really the only safe way to use NFS around 1999 when we started doing it this way.)

Besides the 16 web servers, we have 7 databases. They currently are all running CentOS 4. They breakdown as follows: 2 Dual Opteron 270's with 16GB RAM, 4x36GB 15K RPM SCSI Drives These are doing multiple-master replication, with one acting as Slashdot's single write-only DB, and the other acting as a reader. We have the ability to swap their functions dynamically at any time, providing an acceptable level of failover.

2 Dual Opteron 270's with 8GB RAM, 4x36GB 15K RPM SCSI Drives These are Slashdot's reader DBs. Each derives data from a specific master database (listed above). The idea is that we can add more reader databases as we need to scale. These boxes are barely a year old now — and still are plenty fast for our needs.

Lastly, we have 3 Quad P3 Xeon 700Mhz with 4GB RAM, 8x36GB 10K RPM SCSI Drives which are sort of our miscellaneous 'other' boxes. They are used to host our accesslog writer, an accesslog reader, and Slashdot's search database. We need this much for accesslogs because moderation and stats require a lot of CPU time for computation.

And that is basically it, in a nutshell. There isn't anything too terribly crazy about the infrastructure. We like to keep things as simple as possible. This design is also very similar to what all the other SourceForge, Inc. sites use, and has proved to scale quite well.

CT: Thanks to Uriah and Chris Brown for the report. Now if only we remember to update the FAQ entry...

7 of 273 comments (clear)

Min score:

Reason:

Sort:

Re:Interesting by CatPieMan · 2007-10-19 04:33 · Score: 2, Insightful

Non-logged in user see the same page, so its basically a static page that gets updated every couple of minutes.

Logged in users can have a bunch of customization options on the front-end, which would take more resources.

I find it just as interesting that the logged-in readers use up that much more CPU.

--
---You're all I need, When the water runs deep, You're all I need, Now I cry my soul to sleep -- Collective Soul, Needs
Re:Interesting by ZachPruckowski · 2007-10-19 04:34 · Score: 3, Insightful

Who said subscribers have two dedicated servers to read the main page? The article/summary says that two servers serve for ACs reading the main page, and 4 for logged-in users. I saw no subscriber/non-subscriber distinction.
Re:Possibly obtuse question by saterdaies · 2007-10-19 04:42 · Score: 3, Insightful

Usually these decisions are made based on familiarity, availability, and the like. If you're staff and you are all really familiar with RedHat, why would you force them to run BSD or Debian? Each system has pros and cons, but to be honest, the largest pro or con is usually familiarity. It's really easy to get familiar enough with any *nix to get Apache running. The issue is whether you have the knowledge to deal with it when your live webserver suddenly stops responding to requests.

Stability and familiarity are more important than the latest cool distro. Is there a reason that they should have picked BSD over RedHat? Of course there are some. There are others to pick RedHat over a BSD. In the end, you have to go with what you're comfortable and familiar with in order to ensure that you can deal with sudden, unexpected problems.
One way it's better by Synn · 2007-10-19 04:59 · Score: 3, Insightful

It's familiar to people who are used to working with Red Hat.
Re:Redhat 9 by MisterFuRR · 2007-10-19 06:10 · Score: 4, Insightful

If it works, and theres no need to change -- why introduce unknown incompatibility...its a production network -- not your home box.
Re:Active-Active Gigabit uplinks by cecil_turtle · 2007-10-19 09:59 · Score: 3, Insightful

If you're on an OC-24 or above loop your providers may be able to sell you bandwidth in gigabit increments. If you're on an OC-3 or OC-12 loop you can normally buy in 100 megabit increments. Otherwise "legacy" OC-3 is something like 155mbps and OC-12 is 622mbps if you're using the whole line (not breaking it off into T-1's, DS-3's, etc).

Once you have multiple uplinks from different providers you would typically use the BGP protocol to announce your IP space on both providers, then when people try to get to your site they will come down the provider which is the fewest hops for them. If one of the lines goes down, the other one is still active and then everybody will go down the running line until the broken one gets fixed.
Re:backup? by shokk · 2007-10-20 01:40 · Score: 2, Insightful

At which point, Slashdot is the least of people's worries. This is a news entertainment site, not a critical care facility.

--
"Beware of he who would deny you access to information, for in his heart, he dreams himself your master."