Building Big Sites on a Budget
Joe Mamma writes: "There is a good article on Anandtech.com about how they upgraded their backend. They are running a bunch of AMD chips in their servers and make good use of the Linux Virtual Server Project software for their load balancers. Anyway its a good read for those who are looking to expand their web backend on a budget."
Enjoy! :-)
"We have the right to believe at our own risk any hypothesis that is live enough to tempt our will."
http://gabrielcain.com/
Just check this page: Advertising. They wouldn't lie on that page (they claim >40 million page views monthly), it would give them a bad reputation with advertisers if they found out.
"I love my job, but I hate talking to people like you" (Freddie Mercury)
We serve between 200.000 and 250.000 dynamic page views / day from 1 single-cpu front-end box and 1 dual-cpu mod_perl box and have room for approx. 3 times higher traffic. In the end it's all down to programming, page size and cacheability ...
For a good example of an very scaleable configuration look at Google - their software must be extremely well designed.
"I love my job, but I hate talking to people like you" (Freddie Mercury)
For 5 frontend webservers looking at 1Ghz AMD Athlons, 512MB RAM, with an ASUS KT133A motherboard. Looks like the 3COM 905C for a NIC and an IBM Deskstar rounds out that package nicely. $900/each with the 2U case.
The backend database server is tricker. The FastTrak 100 RAID controller, a nice IDE raid solution is not supported under Linux. What are the good (and cheap) alternatives for RAID 1 or 10? Will Dual 1GHz pentiums really beat out a 1.3 GHz athlon? How about a nice NIC that works under Linux? $2000 would be fine...
I suspect I'm not alone in wishing their were some good solid sites that had some recommeneded systems for Linux. As is, I end up wading through a lot of Windows tech sites to find that something is not supported under Linux? The hardware compatability lists don't descriminate between worthwhile products and overpriced junk. And it's be great to know this products works great and these manufactures actively track kernel development etc...
Pointer, comments and experiences welcome.
This is kinda useless. Yes, they tell us that they are running 15 servers total all on 1Ghz PCs, but they do not tell you what kinda hits they take on it.
K5, for example, has been able to take several direct Slashdottings on 1 VA Fullon box. 1 box which does MySQL, Apache w/ Mod_perl, and plain image serving Apache. (DNS is handled by other boxen). We handle about 65,000 to 70,000 hits a day (on average, mod_perl only.. no images traffic) with that one dual processor box. Vs the two dedicated dual proc DB servers, 11 web servers, two load balancers, etc of Anandtech. And we're at 8 months uptime with our single server. Sounds a bit better than requiring a load balancer which has to remove downed NT servers from the pool..
I could theorize on how well their Cold Fusion/NT solution stacks up against my Slackware/Apache/mod_perl/MySQL solution IF they were so kind as to give info on hits. Without that, this is just another point-and-drool at some RAQmount stuff which performs a job somewhere, somehow.
--
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
Rather than upgrading all their hardware ($$$), they could've just switched to apache/php and upped the efficiency of their existing hardware. It is hardly surprising that they were having trouble with solaris Cold Fusion implementations. The engine is flaky enough on it's native platform!
;)
IMO, php, perl, or perhaps python (I have no experience with it, though) would all make better alternatives to Cold Fusion. (aka Server Side Scripting For Dummies, not that it doesn't have it's place)
I suppose they could also install a BSD or Linux to cut down on useless cruft (like GUI, etc) running on their web servers, (flame-retardant suit on) but then again I suppose many would argue that Linux contains a huge amount of cruft in the first place
Almost every car has a water pump. Any schmuck can design a $400 water pump that does it's job. A real engineer can design one that does it's job for 20 bucks.
As a network engineer, coder, or architect, your job is to make your client's project a success and deliver the most value possible. Your job is not to cover your ass, or to to get the project done with as little of your sweat as possible.
That's the unstated contract that drives tech jobs to get the salaries they do, and the "Let's just throw Solaris at it," or "Microsoft is easier - they say so!" attitude is undermining the tech inustry.
For web hosting, networking and network servers, Linux or a BSD on X86/AMD gives more bang for the buck than any other offering right now. Even though they have less froofy tools and don't have as much in the clicky GUI department. Yes, you have to be careful with hardware choices - You don't with NT? Yes, you need to know what you're doing - why do you have this job if you don't know what you're doing?
MySQL does a great job, no matter how much it may suck "theoretically", or how it may fail the ACID test. Compare the number of MySQL horror stories to the number of MySQL positive anecdotes you hear. It works, it's fast, it does the job.
Building solid, cheap boxen and deploying them in a stable configuration at low cost is an art. Call them 'frankenstien boxes' if you want, but they get the job done, and they let me come in with quotes at half of what my competiton does, and the stability and value get me return business.
Buying Solaris and having them do all the heavy lifting for you generally means you haven't done the value calculation for your clients. Solaris is good, and Sun hardware is good - but neither is *that* good.
Buying Microsoft - I hate to bash MS for the sake of bashing MS. Win2k *is* thier best offering yet. But it still bluescreens, DLL Hell still exists, it still needs to be reformatted every few months, it costs too much, it eats RAM, and not being able to remotely maintain it or to see the source so I can figure out what's causing strange behavior, coupled with MS' predatory business practices just makes it a non-option for me. YMMV.
The days of the Mercedes dot-bombs is over, and tossing cash at problems isn't the way things are being done today. Look around you and start gearing up for the Toyota world of working fast, smart and as cheaply as possible to get the job done.
under different brand name, Microsoft proudly present Clustering on a Budget - and the meaning of the term on a Budget in Microsoft's dictionary($28,000 for two clustering PC).
Nice little chunk of money saved by using Linux virtual servers over Arrowpoint, however I would like to know how a high content site would hold up with a lot of those perl scripts running to cache, one of the possible problems you won't find with Arrowpoint, Alteon Ad Directors, Netapps, not to say they're better, but the article did mention "Big Budget", aside from that some information on traffic handling would have been nice to show, e.g. amount of data passed into the network would give insight as to why they may have chosen to go via certain routes (not routers, or routing protocols.. choices) versus others.
;) but then again this was a semi "Big Budget" article, not a Poor Man's Network which in my case would be my Cisco 1xxx series running Zebra and GBGP (what you know about that.. Ghettotized BGP werd), 400mhz i386 running OpenBSD for the website, my spanking U1 for db stuff, ghettotized rj45's I found, with stolen bandwidth running out "Moving Day to Day Networks" run from my garage, and a C64 for DNS (fear)
I remember some of the guys where I'm at did some overhauling, and when we were doing the firewalls, instead of ordering 4-5 Nokia's or looking into other fw's, we ended up getting one Nokia 650, and since we were running FreeBSD we threw on ipf on all the boxes and created rules to eliminate the load of ACL's, and the FW load which was actually cheaper than buying x amount of new firewalls, and since we jumpstarted most of the machines, we had a slew of tightened security scripts for Sun, and BSD's to have an auto locked down network no matter how much shit was upgraded.
One of the things I wonder about though is the "dual processor" factors, which has many people going gah-gah over. Dual 700mhz's may sound nice, but to only serve up web content I wonder how is that better than just 1 700mhz chip or a 1ghz Athlon for that matter (anyone care to comment?)
As for switching from Oracle to SQL7, sounds like a good move, however again there's no mention of how much data goes into their database, so while it may suit them, what about mega sites like Yahoo, I wonder how they would stand up to SQL over BSd's, Linux versus a nice Sun E10k running Oracle?
Well they certainly have a pretty cool network, I wish they would have included actual network information as well such as router info, traffic stats, etc., now they would have blew my mind had they said, they're running strictly Zebra on a Nix box versus a Cisco or Juniper
Blackbox Themes
360 degrees of Karma
Load balancing via IP routing tricks is kind of nice. Mosix goes one step further and allows live processes to migrate across a cluster. Experimental add-ons also will do socket migration and havedistributed file system support. I think that's the kind of approach to clustering you are going to see in the long run.