High-Performance Web Server How-To
ssassen writes "Aspiring to build a high-performance web server? Hardware Analysis has an article posted that details how to build a high-performance web server from the ground up. They tackle the tough design choices and what hardware to pick and end up with a web server designed to serve daily changing content with lots of images, movies, active forums and millions of page views every month."
I'd suggest everybody with the need of a high-performance web server to try out
fnord. It's extremely small, and pretty fast (without any special performance hacks!), see here.
A monkey is doing the real work for me.
The guys use 10'000 RPM drive for "reliabilit" and "performance" ... 10k drives are LESS reliable, since they move faster. Moreover, they're not even necessarily that faster.
Computer hardware is so fast relative to the amount of traffic coming to almost any site that any web server is a high-performance web server, if you are just serving static pages. A website made of static pages would surely fit into a gigabyte or so of disk cache, so disk speed is largely irrelevant, and so is processor speed. All the machine needs to do is stuff data down the network pipe as fast as possible, and any box you buy can do that adequately. Maybe if you have really heavy traffic you'd need to use Tux or some other accelerated server optimized for static files.
With dynamically generated web content it's different of course. But there you will normally be fetching from a database to generate the web pages. In which case you should consult articles on speeding up database access.
In other words: an article on 'building a fast database server' or 'building a machine to run disk-intensive search scripts' I can understand. But there is really nothing special about web servers.
-- Ed Avis ed@membled.com
.. if their webservers are as reliable as the ones in the article..
:P
i guess there's only one way to find out..
slashdotters! advance!
There is no useful information in that infomercial. They seem to have judged "reliability" through vendor brochures and in a couple days; reliability is when your uptime is > 1 year.
This article should be called "M. Joe-Average-Overclocker Builds A Web Server".
This quote is funny:
That brings us to the next important component in a web server, the CPU(s). For our new server we were determined to go with an SMP solution, simply because a single CPU would quickly be overloaded when the database is queried by multiple clients simultaneously.
It's well known that single CPU computers can't handle simultaneous queries, eh!
If we were to use, for example, Microsoft Windows 2000 Pro, our server would need to be at least three times more powerful to be able to offer the same level of performance.
"three times?" Can somebody point me to some evidence for this sort of rather bald assertion?
The article seemed way too focused on hardware.
Anyone who's ever worked on a big server in this cash-strapped world will know that squeezing every last ounce of capacity out of apache and your web applications needs to be done.
* I prefer SCSI over IDE
* RedHat is a pain to strip down to a bare minimum web server, I prefer OpenBSD. Sleek and elegant like the early days of Linux distros.
* I've used Dell PowerEdge 2650 rackmount servers and they're VERY well made and easy to use. Redundant power supplies, SCSI removable drives, good physical security (lots of locks).
Yes, their performance server is so good and fast that it has been slashdotted withing minutes of posting the artice...
Every time that you click on a link and get bumped back to the front page here on Slashdot, it's a failure of mysql. So much for high-performance.
Why hasn't Slashdot changed to postgresql?
I thought this was a good question, if slightly off-topic.
I know that in the server market you often go for tried-and-tested rather than latest-and-greatest, and that the Pentium III still sees some use in new servers. But 1.26GHz with PC133 SDRAM? Surely they'd have got better performance from a single 2.8GHz Northwood with Rambus or DDR memory, and it would have required less cooling and fewer moving parts. Even a single Athlon 2200+ might compare favourably in many applications.
SMP isn't a good thing in itself, as the article seemed to imply: it's what you use when there isn't a single processor available that's fast enough. One processor at full speed is almost always better than two at half the speed.
-- Ed Avis ed@membled.com
If their servers are so good, why is their site down after only 20 of being /.ed?
Quidquid latine dictum sit altum viditur
Step one: Submit story on high performance web servers.
Step two: ???
Step three: Die of massive slashdotting, loss of reputation and business
Still, if someone has a link to a cache...
Karma:This parrot is dead! (and so is the joke.)
... Don't forget to post an article on /. so you can actually measure high-volume bulk traffic.
/content/article/1549/ HTTP/1.0
[~] edwin@topaz>time telnet www.hardwareanalysis.com 80
Trying 217.115.198.3...
Connected to powered.by.nxs.nl.
Escape character is '^]'.
GET
Host: www.hardwareanalysis.com
[...]
Connection closed by foreign host.
real 1m21.354s
user 0m0.000s
sys 0m0.050s
Do as we say, don't do as we do.
bash$
Maybe it's their idea of a stress test. It's kinda like testing a car's crash durability by parking it in front of an advancing tank.
An article about creating high performacne webservers being slashdotted
Microsoft IIS is to webserving as KFC is to healthy eating
Many other people will likely post a comment like mine, if they haven't already. But hey, karma was made to burn!
According to my computer clock and the timestamp on the article posting, it's only been about 33 minutes (since the article was posted). Even so, it took me over a minute to finally receive the "Hardware Analysis" main page. The top of that page has:
Draw your own conclusions.
Furry cows moo and decompress.
ping www.hardwareanalysis.com
Pinging www.hardwareanalysis.com [217.115.198.3] with 32 bytes of data:
Request timed out.
Reply from 217.115.198.3: bytes=32 time=765ms TTL=240
Reply from 217.115.198.3: bytes=32 time=1038ms TTL=240
Reply from 217.115.198.3: bytes=32 time=2036ms TTL=240
Ping statistics for 217.115.198.3:
Packets: Sent = 4, Received = 3, Lost = 1 (25% loss),
Approximate round trip times in milli-seconds:
Minimum = 765ms, Maximum = 2036ms, Average = 1279ms
> One processor at full speed is almost always better than two at half the speed.
You can safely drop that 'almost'.
Soo slow!
"With Microsoft, you get Windows. With Linux, you get the full house" - unknown
1. goto here :)
2. click buy
3. upon delivery open box and plugin
4. turn on Apache with the click of a button
5. happily serve up lots of content
6. (optional) wait for attacks from ppl at suggesting using apple hardware...
These guys got taken down a few weeks back:
Hard Drives Evaluated for Noise, Heat and Performance
I'm sure spreading out their content over nine pages is definitely helping their server load.
I don't understand.
Their article is about building a high performance web server, and they tell people to use Apache.
Apache is featureful, but it has never been designed to be fast.
Zeus is designed for high performance.
The article supposes that money is not a problem. So go for Zeus. The Apache recommendation is totally out of context.
{{.sig}}
Why hasn't Slashdot changed to postgresql?
Or better yet, SlashSQL?
"With Microsoft, you get Windows. With Linux, you get the full house" - unknown
Watch our high performance webserver get slashdotted, in real time!
How long until it melts? Let's see if those aftermarket heatsinks really paid off.
There are 3 registered and 1643 anonymous users currently online. Current bandwidth usage: 1215.81 kbit/s
There are moments in our lives, especially when one is under some strain (due to the imminent locality of others predominantly), when we might utter a nonsensical word. Eg. I found myself saying the word "unrelentless" just this evening. What I meant to say was "unrelenting" or "relentless". The ghost in the machine is a busy guy nowadays with 6 billion of us.
Ace's Hardware has (IHMO) better article about this subject: http://www.aceshardware.com/read.jsp?id=45000240
Have a good weekend,
Sander Sassen
Email: ssassen@hardwareanalysis.com
Visit us at: http://www.hardwareanalysis.com
I know there are faster webservers then apache. but you can't beat the price/preformance ratio...
Warning: Too many connections in /web/admin.hardwareanalysis.com/include/db.php on line 9
Unable to connect to database. Too many connections
what does the hardware mean anyway...if the software is not configurated right?
Query error: Commands out of sync; You can't run this command now
Creationists are a lot like zombies. Slow, but powerful and numerous. And they all want to eat our brains.
Here's a quicker howto.
Get the fastest AthlonXP out there.
Get a motherboard with onboard SCSI.
Get 15,000RPM SCSI 160MB/s drives
Get a NIC
Install linux
Install apache
Install mysql, php, perl, etc.
And there you have it. Is it really necessary to write a long article when all you're basically saying is "get the fastest hardware out there and slap it into one machine"? Come on folks.
I guess the ppl running the webserver with the article should have used the info on in cuz i just can't access due to high load :)
1) use multiple machines / round robin DNS
2) use decent speed hardware but stay away from
'top of the line' stuff (fastest processor,
fastest drives) because they usually are not
more reliable
3) replicate your databases to all machines so
db access is always LOCAL
4) use a front end cache to make sure you use
as little database interaction as you can
get away with (say flush the cache once per
minute)
5) use decent switching hardware and routers, no
point in having a beast of a server hooked up
to a hub now is there...
that's it ! reasonable price and lots of performance
MP3 Search Engine
I was really excited to see this article, because oddly enough I am seriously considering setting up my own webserver. In fact am thinking of running slashcode. So far everyone has been saying that the article generally sucks. So the question remains where should I start? I was thinking of buying a few of my company's used PCs and building a cluster... that scares me a bit, as I'm not a computer genius, but I can get a great deal on these computers (between 5 and 10 500mhz wintel computers)
OK, I know that was rambling so to recap simply, is it better to go with a expenive single MP solution like the article, or with a cheaper cluster of slow/cheap computers
Business News and Resources: www.usasource.net
What kind of 'high performance' web server uses back-leveled software? Apache 2.x may not be totally API compliant, but it certainly provides more than 1.3x in terms of performance.
I am glad they used an IDE RAID, however. The SCSI myth can now go on the shelf.
- Use lots, and I mean lots of graphics. Cute ones, animated ones, you name it and people expect to see them. Skimping here will hurt your image.
- CSS style sheets may be the way of the future, but just for now make sure you include dozens or even hundreds of font tags, color tags, and tables in your site. Trust us. This has the added benefit of increasing your page file size by at least 30%. You do want a robust site right?
- Make sure you are serving plenty of third party ads! Their bandwidth matters also, and you know the way to make money on the web is be serving lots of "fun" animated ads. This will not slow down the user experience of your site one bit! Those ad people are slick, they know that you are building a high bandwidth / high performance site and will be expecting the traffic.
- A site is not a high performance site until is has withstood the infamous Slashdot effect. You will want to post a link to your site on
/. post haste to begin testing.
That should be enough to get you started. Now you too can build a rocking 200K per page site, and having read our hardware guidelines, you can expect it to perform just as well as ours did. One more free tip: Placing a cool dynamic hit counter or traffic meter on your site in a prominent position will encourage casual visitors to hit the reload button again and again, driving the performance of your site through the roof.Guy who didn't read the article makes an uninformed M$ bash and gets modded to four...
(they're running linux, and there must have been some other problem because it's usable now)
It's a shame I'm banned from moderation for my failure to jump on the Linux bandwagon; vast numbers of readers of this site are using IE6.0, and that doesn't come in any linux distro I know of. I'm just honest about my use of software from the beast of Redmond.
Death to Argument by Slogan!! (This post twice-encrypted with ROT-13. Replies not using same will be ignored)
Thanks, I wish I hadn't posted earily in this article so I could use my mod points. Now, my only question is how fast is decent speed? I'm about to build my own server (actually I'm going to have some help, but I want to at least sound like I know what I'm doing) nothing fancy. I don't expect a huge hit count or anything, so would using older (500-750 mhz)second hand computers, properly upgraded memory and storage, work? Also would you recomend replacing the powersuply. One the guys whoes helping me swears that will save me money in the long run on energy costs, but I don't know if its worth the cost.
Business News and Resources: www.usasource.net
Does building this high performace web server prevent you from being slashdotted?
at about 2:05 gmt
...that a simple slashdotting took down this "monster" server?
Their "high performance server" seems to be fixed now... I'm getting a 500 error almost instantly! Good work, guys!
Check out the SEDA architecture with Haboob as web-server. It seems to outperform Apache.
Haven't got the traffic myself to test it though. :-)
Could not connect to server..... 14:18 GMT
This looks like either PostNuke or PHPNuke web site. And while I was visiting it was serving up at a rate of around 500k before it ran out of DB connections. Guess they should have did some research on expanding the DB connections to MySql from PHP. Im sure the slashdotting will give them some insite into that. Im sure they will also come and read all the constructive comments here on /. so give em some good ones.
Draw your own conclusions.
How nice of them to share that information.
The obvious conclusion is that my cable modem could take a minor slashdoting if Cox did not crimp the upload and block ports. Information could be free but thanks to the local Bell's efforts to kill DSL things will get worse until someone fixes the last mile problem.
The bit about IDE being faster than SCSI was a shocker. You would think that some lower RPM SCSIs set to strip would have greater speed and equivalent heating. The good IDE performance is good news.
Friends don't help friends install M$ junk.
3) replicate your databases to all machines so
db access is always LOCAL
This is probably a bad idea. Accessing the database over a socket is going to be much less resource intensive than accessing it locally. With the database locally, the database server uses up CPU time and disk I/O time. Disk I/O on a web server is very important. If the entire database isn't cached in memory, then it is going to be hitting the disk. The memory used up caching the database cannot be used by the OS to cache web content. A separate database server with a lot of RAM will almost always work better than a local one with less RAM.
This Apache nonsense of cramming everything into the webserver is very bad engineering practice. A web server should serve web content. A web application should generate web content. A database server should serve data. These are all separate processes that should not be combined.
The company I work for successfully runs our webserver(php & MySQL) on an old pentium 166. We have several thousand visitors every month & use it for an ftp site for suppliers, a router, firewall, gateway & squid server.
:)
I think that your 700mhz machine would work fine for just web pages.
http://www.hardwareanalysis.com/ slashdotted at : UTC Sun Oct 20 00:46:49 2002 -0.796378 seconds
instead of pointing us to these hypocrits why don't the slashdot server admins themselves write some good stuff and put it for us to see . if you people are too busy then request the google geeks .
I'd post sooner, but it took forever to get to the article.. here are my thoughts...
First off SCSI.
IDE drives are fast in a single user/workstation environment. As a file server for thousands of people sharing an array of drives? I'm sure the output was solid for a single user when they benched it... looks like /. is letting them know what multiple users do to IDE. 'Overhead of SCSI controller'... Methinks they do not know how SCSI works. The folks who share this box will suffer.
Heat issues with SCSI. This is why you put the hardware in a nice climate controlled room that is sound proof. Yes, this stuff runs a bit hot. I swear some vendors are dumping 8K RPM fans with ducting engineered to get heat out of the box and into the air conditioned 8'x19" chassis that holds the other 5-30 machines as well.
I liked the note about reliability too... it ran, it ran cool, it ran stable for 2 weeks. I've got 7x9G Cheetahs that were placed into a production video editing system and ran HARD for the last 5+ years. Mind you, they ran about $1,200 each new... but the down time cost are measured in minutes... Mission critical, failure is not an option.
OS
Lets assume the Windows 2000 Pro was service packed to at least SP2... If that is the case, the TCP/IP stack is neutered. Microsoft wanted to push people to Server and Advanced Server... I noticed the problem when I patched my counter strike server and performance dogged on w2kpro w/sp2 - you can find more info in Microsoft's KB... (The box was used for other things too, so be gentle) Nuking the TCP/IP stack is was the straw that cracked my back to just port another box to Linux and run it there.
Red Had does make it easy to get a Linux box up and running, but if this thing is going outside the firewall, 7.3 was a lot of work to strip out all the stuff that are bundled with a "server" install. I don't like running any program I did not actually install myself. For personal boxes living at my ISP, I use slackerware (might be moving to gentoo however). Not to say I'm digging through the code or checking MD5 hashes as often as I could, but the box won't even need an xserver, mozilla, tux racer, or anything other than what it needs to deliver content and get new stuff up to the server.
CPU's (really a chassis problem):
I've owned AMD's MP and Intel's Xeon dually boards. These things do crank out some heat. Since web serving is usually not processor bound, it does not really matter. Pointing back to the over heating issues with the hard drives, these guys must have a $75 rack mount 19" chassis. Who needs a floppy or CD-ROM in a web server? Where are the fans? Look at the cable mess! For god's sake, at least spend $20 and get rounded cables so you have better airflow.
+++ UGUCAUCGUAUUUCU
A tulatin P3-S 1.26GHz does not even need a heat sink, just a low RPM fan blowing over it.
A P4, even worse an Athlon XP/MP, produce much more heat than a P3-S, requiring heat sinks and very loud fans. Want a 1U solution?
And a faster processor is not going to give you better performance in a web server.
Wonder why the P3-S did not make it above 1.4GHz? Because it was outperforming P4s 1.7GHz.
Remember this article:0 /01/163725 3&mode=thread&tid=137
http://slashdot.org/article.pl?sid=02/1
The owner of Hardware Analysis is Sander Sassen. He apparently has two usernames and is posting articles to his own site. Does anybody see anything wrong with that?
It's funny that an article about setting up a high performance web server is on a server that can't even handle the slashdot effect.
No better way than to get the Slashdot crowd to do a quick bandwidth, hardware and security test! ../../ paths :)
All that free NMAPing, clicking, and trying
Get your own free personal location tracker
Not so...
.02 secs .00003 secs
You can cache with technologies like Sleepycat's DBM (db3).
We have a PHP application that caches lookup tables on each local server. If it cant find the data in the local cache, then it hits our Postgresql database. The local DBM cache gets refreshed every hour.
Typical comparison
-------------------
DB access time for query:
Local cache (db3) time:
We server load dropped from typical 0.7 to an acceptable 0.2, and the load on the DB server dropped like a rock! This is with over a million requests (no graphics, just GETS to the PHP script) every day.
We also tuned the heck out of Apache (Keepalive, # of children, life of children etc).
Some other things we realized after extensive testing:
1. Apache 2.0 sucks big time! Until modules like PHP and mod_perl are properly optimized, there's not much point in moving there.
2. AolServer is great for Tcl, but not for PHP or other plugin technologies
Because of all these changes, we were able to switch from a backhand cluster of 4 machines, back down to a single dual processer machine, with another machine available on hot standby. Beat that!
They tackle the tough design choices and what hardware to pick and end up with a web server designed to serve daily changing content with lots of images, movies, active forums and millions of page views every month.
Yeah, but how about millions of page views per day?
I'll just mention a couple of items:
1) For a high performance web server one *needs*
SCSI. SCSI can handle multiple request at one time and performs some DISK related processing compared to IDE that can only handle request for data single file and uses the CPU for disk related processing a lot more than SCSI does.
SCSI disk also have higher mean times to failure than SCSI. The folks writting this article may have gotten benchmark results showing their RAID 0+1 array matched the SCSI setup *they* used for comparison, but most of the reasons for choosing SCSI are what I mention above -- not the comparitive benchmark results.
2) For a high performance webserver, FreeBSD would be a *much* better choice than Redhat Linux. If they wanted to use Linux, Slackware or Debian would have been a better choice than Redhat Linux for a webserver. Ask folks in the trenches, and lots will concur with what I've written on this point due to mainenance, upgrading, and security concerns over time on a production webserver.
3) Since their audience is US based, It would make sense to co-lo their server in the USA. Both from the standpoint of how many hops packets take from their server to their audience, and from the logistical issues of hardware support -- from replacing drives to calling the data center if there are problems. Choosing a USA data center over one in Amsterdam *should* be a no brainer. Guess that's what happens when anybody can publish to the web. Newbies beware!!
What's truly funny is now that they've tuned the ONE page that's linked in the /. article, the rest of the site is unavailable.
Just try going to their main page or to an old article. Pretty sad really.
Ooh! Ooh! I really want you guys to teach me how to build a high performance webserver! What's that? You can't, because your webserver is down? Curses!
(Obligatory disclaimer for humor-impaired: yes I understand that the slashdot effect is generally caused by lack of bandwidth rather than lack of webserver performance.)
Karma: Incomprehensible (Mostly affected by posting at +5, reading at -1, and metamoderating everything unfair.)
"Disk I/O on a web server is very important"
Maybe if you are running a porn site or something that's very static content-heavy. However, I imagine that many sites (think of slashdot for example) are a relatively small number of scripts that fit neatly into memory cache, with all of the disk i/o happening on the db-level.
...because the link can't take the slashdot effect :)
I like the new header on thier site: Please register or login. There are *a few* registered and *quite a few* anonymous users currently online. Current bandwidth usage: 350.79 kbit/s
-- mg
we serve up between 5 and 7 million pageviews daily to up to 100,000 individual IP's
Decent speed to me is one in which the server is no longer the bottleneck, in other words serving up
dynamic content you should be able to saturate the pipe that you are connected to.
I have never replaced the power supply because of energy costs, it simply isn't a factor in the
overal scheme of things (salaries, bandwidth, amortization of equipment)
500-700 Mhz machines are fine for most medium volume sites, I would only consider a really fast machine to break a bottleneck, and I'd have a second one on standby in case it burns up
MP3 Search Engine
Really? There was an earlier discussion on this topic. (Related to 9/11 or some other day with extremely high traffic.)
From that discussion I got the impression that what happens when you are bumped to the front page is that you have tried to access a story with non-standard setup. (What you get if you are logged in and change your view preferences.) The system is setup so that some servers only serve static content. (Because that's what most users view.)
During high load situations a dynamic request is sometimes sent to a static serving server. This is when you are bumped to the front page. (Unfortunately I couldn't find anything about this in the FAQ/About, so I can't verify it.)
Too bad "millions of page views every month" is simply not even in the realm that would require "High-Performance Web Server"(s). These guys need to come back and write an article once they've served up 5+ million page views per day. Not hits. Page views.
As of 9:37AM PST, the site seems to be down (connection refused).
it's an excellent idea ! you have a lot more reliability like that and you can incrementally increase your database capacity.
Nothing worse than having your one 'monster' database server go down on you...
Also there usually are limits as to how big that 'monster' server can get in practice, whereas by breaking it up and replicating you can scale as large as you want, and you also avoid trouble by slowing down that one machine when you do your backups.
(Simply replicate once more and have your tapedrive in a machine that you can take 'offline' without hurting your app). The
replication mechanism will take care of bringing it back into synch once you are done making your backup.
If you don't want to have the db and the www server residing on the same box you can always break that up into pairs of machines, but I really have not yet found a need for that (and I
have done quite a bit of *really* high volume web serving to back that up)
MP3 Search Engine
"Unable to connect to database. Can't connect to MySQL server on '217.115.193.148' (111)" ...
What real world comparisons? The software section only mentions Win2k pro but this is patently not a real world comparision. It's designed to run on the desktop and it's been limited it to 10 concurrent TCP/IP connections by MS so that anyone looking to set up a proper server will need to get the Server or Advanced versions
The article goes on to say
With Linux however you can basically turn it on and walk away, provided you got a system administrator that knows what he's doing and has set up everything correctly.
maybe their sysadmin didn't/dosen't know what he was doing!!
How I understand it to work is that under high load, the database servers melt down and are toast until manually restarted, an event which is apparently a frequent occurance, going by hints let loose by the editors.
So yes. Under load, your dynamic requests will get sent to the static server. This is because the load has killed mysql... Every two or three weeks there'll be a time where I can't get a dynamic page for a period of more than three hours. (Who knows how long they last, but I'm rarely webbrowsing for any longer than that)
1. load it full of pr()n /.
2. post the link on
3. check back in 30seconds
if it still works, it's high-performance
Then use slackware. You can't get more "off" than that. Gives you the control to squeeze all the power out of your hardware.
Well the MS solution does have one thing in common with that article about planes. Throw enough engine into it and even a brick will fly. Throw enough hardware into the problem and even a MS site will fly. Bang for the buck MS loses.
There is a huge bottleneck in this configuration not to mention the limits of the tests (load tests, scalability). This is probably one of the worst configs for a web server I have ever seen.
Save the World! Use a Quote!
The mere fact that they recommended 7200 rpm Western Digital drives for their high performance system gives me the impression they haven't a clue.
I disagree with the assertion that a 10,000 rpm SCSI drive is more prone to failure than a 7,200 IDE drive because it "moves faster". I've had far more failures with cheap IDE drives than with SCSI drives. Not to mention that IDE drives work great with minor loads, but when you start really cranking on them, the bottlenecks of IDE start to haunt the installation.
1)/5) For the front end, you might be better off with a weighted load balancer (or LVS on the cheap). Also consider a specialized HTTP multiplexer like NetScaler/Redline (these typically give content encoding, SSL acceleration for free).
3)This is probably a bad idea
Business plan:
1. Build a beowulf cluster of webservers.
2. Put "First post!!!" in the index.html file.
3. Announce it on Slashdot.
4. Get a free bandwidth usage and server reliability test.
5. Change hostname and I.P. address to stop Slashdot effect.
6. Upload real content.
7. ???
8. Profit.
No, seriously, you could look in to the possibility of using the webserver built in to the Linux kernel - it is sill an experimental feature, and probably not ready for production use yet, but in a few months it could be.er the text in that file's own buffer.
Someone already has. It's called APC, for Alternative PHP Cache. It's an open source PHP bytecode cache. I don't know if it works with PHP running as a CGI program or not, but the website doesn't say that it doesn't, so...
All I want is a kind word, a warm bed and unlimited power.
This setup doesn't account for HA or scaleability. With hardware as cheap as it is today there is no excuse for not using multiple servers to avoid downtime, and allow for maintenace without taking the site down. Also what about backup, not even mentioned. Last I don't fully agree with the RAID 0 + 1. For a large database, but on a small setup like this I wouldn't. They article seems to imply the data is more read than write RAID 5 has better read performace.
So article was missing a lot for a professional setup.
That motherboard runs *REALLY* slowly with the Redhat 7.3/ 2.4.18 linux kernel in a 4 gig configuration. My company bought about 8 of these machines, and our vendors don't have a solution. I did a small write up about this.
Wondering...
Would the idea of replicating databases to servers only be viable for web sites that have 99%-100% read-only contents?
Suppose you have a high volume ordering/inventory system. Wouldn't the replicated database raise the possibility that two orders will collide?
But the problems were all software related and Apache took the bulk of it. The problem with Apache is that the per-connection overhead is too high. It's a couple Megs per connection generally, and if you use keepalives (enabled by default), then each connection process will by tied up for as long as 30 seconds (which is the default I think) after the request has been completed.
Additionally, since Apache works with a pool of individual processes to handle connections, there is no way to have a global shared resource between all processes. So, in the case of your database connections, you have a 1:1 relationship between db connections and HTTP processes. The result is that you have HTTP processes with open db connections serving images and so forth that don't even need db links. So, you end up using a lot more db connections than you actually need.
The thing we need to do to be able to handle such loads in the future is change from Apache to something that uses a worker thread model within a single process. Apache 2.0 may be setup to work like this, but I think it uses a hybrid model that still uses processes for dynamic stuff like PHP. Apache 2.0 will definitely help a little, though.
But anyway, what's also happening is that MySQL is only able to handle so many requests and then you're getting HTTP processes piling up waiting for it. So if we can cut down on the number of requests per page that will make a pretty significant difference when spread across thousands of users.
So yes, I think the Apache keep-alives 'did us in' for the most part and the pool of child processes you create becomes unmanageable at some point with many 1000s of connections at the same time. The worst part is that optimizations such as this can't be found in the manual, you'd need to have been in the 'trenches' to know about things like this. Fortunately we have a great team and Vitaliy, our CTO, is really on top of things, and actually had a great time this weekend, or as he put it 'this is better than simulation'.
Overall I'm more than happy with the performance of the server, it was never designed to handle such loads, and yet it kept on running, it never faltered and it certainly did not turn into a smoking heap of rubble as some suggested. We just were a little slow with serving out those pages and must've been unreachable to some with a slower connection.
If anybody else has some additional comments or insights I'd be happy to discuss this further, or go into greater detail. After all we're all here to learn right?
Sander Sassen
Email: ssassen@hardwareanalysis.com
Visit us at: http://www.hardwareanalysis.com
Sorry your site got /.ed ;) everbodys wish...
Anyway, you can find alot at google do a search on mysql server optimization. But here is a good starting point, its a bit later down on the article, about setting the server during operation. A very big note in performance can be realized by the compilation options of mysql. http://atmail.nl/docs/mysqloptimize.html
So which fork of phpnuke did you use?
Regards,
Rod Longhofer
It is practically impossible to teach good programming style to students
that have had prior exposure to BASIC: as potential programmers they are
mentally mutilated beyond hope of regeneration.
-- Edsger W. Dijkstra, SIGPLAN Notices, Volume 17, Number 5
- this post brought to you by the Automated Last Post Generator...