Scaling Server Performance
An anonymous reader writes "When Ace's Hardware's article Hitchhiker's Guide to the Mainframe was posted on Slashdot, they got 590,000 hits and over 250,000 page requests during one day. This kind of traffic caused only a 21% average CPU load to their Java-based web server, which is powered by a single 550MHz UltraSparc-II CPU. In their newest article, Scaling Server Performance, Ace's Hardware explains how this was possible."
I would be more interested in stats on a webserver that took a puke. It would be interesting to see what started the dominos falling and what ultimatly brought it down. It would be as good a learning experiance as this article is.
Not to be cynical, but serving (nearly) static pages shouldn't be a huge load by any standard. Even with dynamic (fully dynamic) pages, 250,000 isn't a huge number.
As an example, I run a pretty popular site that pumps out about 250,000 as well, all CGI-created and database fed pages. This is being served by two 1ghz web heads and a 1ghz db server. Granted that those three machine run at 100% load during peak hours, it's still not a huge deal (this is because I haven't finished the local caching mechanism yet). Did I mention that the two webservers also toss 1 million images a day too?
Of course, I don't wan to belittle the article that much -- If anything, it shows the preformance gains one gets when you use efficient hardware (I have no doubt that their 550 mhz Ultrasparc II has nearly the same horsepower as a 1 ghz x86) and efficient caching (caching data in RAM and serving from there, avoiding disk access penalties, is a huge performance increase).
Hilary Rosen's speech was about her love of money and her desire to roll around naked in a pile of money.
If you limit yourself to static stuff then you can cram any amount of traffic out of very limited boxes.
Even google does everything they can to cache stuff and turn dynamic requests into static ones, and they actually have a reason (lotsa traffic, complicated requests).
The fact that you can use java to write speedy code doesn't prove a thing either, it only says that it is now no longer a bottleneck.
You can probably saturate a decent sized pipe using -- aaarghh -- VB or something asinine like that as long as you do 'pictures and pages'.
MP3 Search Engine
Lots of people could use this type performance. I only had a chance to use JSP on one project, a while back. Tomcat was notoriously difficult to install back then. But when it was up, the difference between JSP application server and PHP become apparent. Application servers can make quite the difference.
Just having an application scope for variables saved us a trip the the ldap server per request. PostNUKE, squirellmail, and lots of other large PHP apps could be sped up drastically if some of those features were available in the PHP engine.
Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW
Of course, there are more complex applications where data caching can be implementing, such as discussion forums where multiple users can be adding, editing, and deleting messages simultaneously. But that's a topic for another article.
Most of the applications I write involve updating data almost as often as fetching it from the database. In an environment like Apache where you have individual processes serving content (and database connections are process-centric), implementing caches that are updatable becomes a very complex excercise, without implementing an additional layer.
eToys used a b-tree (Sleepycat?) database layer situated in front of the database layer - they would store objects in the b-tree, and fetch them from there if they had not expired. Once cache amongst all the servers made this worth doing; a Java web server can do something similar, since the objects are stored in memory shared between the various serving threads. The end result is similar to what Ace's Hardware has done.
What have other people done? Since I use Apache, I'm leaning towards a disk-based caching system.
Reloading their page a couple times (2nd page of the article, not the one slashdot linked to), I'm getting occasional 503 errors, and the rest are taking a very long time to load. Usually the page comes up with some "broken" images that didn't load.
At the bottom of each page, there's a number that seems to indicate the time they believe their server spent serving the page. Usually is says something like "2 ms" or "3 ms"... That may be how long their code spent creating the html, but the real world performance I see (via a 1.5 Mbit/sec DSL line) is many seconds for the html and many more for the images, some of which never show up, and sometimes a 503 error instead of anything useful at all.
So, Brian, if you're reading this comment (which will probably be worthy of "redundant" moderation by the time I hit the Submit button)... it ain't workin' as well as you think. Maybe the next article will be an explaination of what went wrong this time, and you can try again???
PJRC: Electronic Projects, 8051 Microcontroller Tools
True enough, I meant to say that too.. Dynamic web pages are relatively 'new'.
.avi file and rendering it realtime in hardware.
The difference between just showing a page and creating one is like the difference between a pre-rendered
I still figure bandwidth is the big killer. I mean you can only stuff watermelons through a garden hose so fast.
I don't need no instructions to know how to rock!!!!
Unfortunately, the comment on the processor isn't quite right.
The UltraSparc II only goes up to 480MHz, and the UltraIII starts at 750. In between is the grey area of the IIe and IIi, and the ONLY Sun box with a 550MHz processor is the SunBlade 100/150.
If that's their web server, then the CPU is the least of their worries--the thing has internal IDE drives, two (only) 33MHz narrow PCI slots, and not much else. Assuming that one of the PCI slots is used for a faster and/or redundant network connection (QFE card most likely), then the other one is the only connection to SCSI disks. That CPU, low-end as it is (for Sun), is definitely going to spend its time waiting for the rest of the system.
(And yes, I know that was your second point--I just wanted to back it up with some detail)
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
I remember reading that original article, and yes, I was impressed at the responsiveness of the server. But before they are congratulated so much, consider this. The original story was posted on slashdot at 1AM.. so the initial spike of activity resulting from the linking being in the top few on Slashdot was directly proportional to the number of people on Slashdot at the time. As you can see from their graphs (if they're showing up for you) that traffic spiked, then continued on during the day.
This time around, the link got posted at 2PM not 1AM, and so far as I can see, they handle this flurry of hits much less gracefully than the previous ones! There are a lot more people online at 2PM than 1AM (all arguments of nocturnal nighthawks and people in other time zones aside).
perhaps. perhaps id be impressed if their cpu could keep up with the hits IF THEIR BANDWIDTH COULD KEEP UP
:(
*REQUEST TIMED OUT*
My 1ghz server with 3 terrabytes of ram can handle any traffic you can throw at it!!! Now to upgrade that 56k....
Burning karma
[I can picture a world without war, without hate. I can picture us attacking that world, because they'd never expect it]
I am familiar with serving dynamic content of very high information density, and let me tell you, Ace's doesn't compare. The data I serve from work is updated every second; the stories on Ace's (and most other hardware-review sites) change every couple of days.
138974 ms
A little over their 4 ms goal. Specifically, 138,970ms.
________________________________________________
suwain_2
Basically, the whole article has 1 message: you should cache stuff. I couldn't agree more. Why doing a database request every time a page is hit? Even if you're going to show the same information say 1000 times? By combining dynamic and static elements, the "server load" part of the slashdot effect can be eliminated, I think slashdot also does this, but differently.
Obviously, if you don't have enough bandwidth, you are screwed anyway, but usually it's the server load that is the problem.
MfG shurdeek
The article pretty much said two things:
-- Caching objects saves a database hit and makes things fast.
-- Resin scales better than Apache.
That's great and everything, but it really doesn't help anyone else. Ok, so now I want to apply this object caching to my own application. Where does this cache live? If I'm not running Resin, then I guess every apache process has one. How do I handle dirty objects which need to be written back to the DB? What if they have been dirtied in two different processes? If I am using some sort of service external to the web application to do the caching, how fast is that? Faster than the database? Perhaps, but now it has to scale too, and it STILL has to consult the database, only for writes, which are worse than reads.
This happened to work for their application, but in order to be applied more generally, it needs lots more explanation.
Any object persistence mechanism which is smart enough to handle caching in a read-write system with any level of configurability is going to be a large, complicated piece of software itself, and will have its own issues to bring to the table.
It's not about hits per second, it's about efficiency, and I think he (Brian I guess) did it very well. Some stupid idiotic slashdotters say, my servers handle this and that, but the point is, HOW MUCH DOES IT COST? HOW MANY MACHINES DOU YOU NEED? WHAT OS YOU WILL USE? WHAT'S THE BANDWITH SPENT? Let's not forget also the attitude to optimize the content to the modem users, well they also need to be taken in consideration. About the cache use, very intelligent, WE USE CACHE EVERYDAY all the time. Let's use it with efficiency. The 2 points where maybe some people are missing are, APACHED WAS ASS KICKED in the ass by this server RESIN, honestly I've never heard of it, I'm off web development since a long time... and PHP and APACHE can't be a good combination when things get ugly, and you need scalability. That's what I think was the most useful information, along with the IPC info. Well I'm impressed, It loads fast for me and I have a Half T1, so I could se if it was slashdotted, like somebody said, if /.ed it's a bandwith problem, never the CPU SETUP.
Kudos to Solaris with it's threads and Sun servers(Blade isn't even a server!! IT's a workstation) with it's architechture.
And I tought JAVA was slow, maybe for other uses, but in web development, it seems to be the best option yet.