Building a Better Webserver
msolnik writes: "The guys over at Aces' Hardware have put up a new article going over the basics, and not-so-basics, of building a new server. This is a very informative, I think everyone should devote 5 minutes and a can of Dr Pepper to this article."
Is quick enough ?
good read
That was beautiful, man.
No posts yet. Looks like most are taking the advice.
But don't.
Actually a very interesting article, to be honest, in my 1 year of building webserver applications. I haven't gone through a process like this once. Usually we make a rough guess about how the application has performed (or more usually underperformed on existing servers, and just scale a percentile. As you can imagine, this is hardly realistic. Thanks for the read!
Lets see exactly how long their lovely new webserver stands up to a slashdoting... :)
(Maybe they just sent this so they could test it? Plan.)
Instant traffic to your site, no advertising!
The above post is an editorial, the poster cannot and will not be held responsible for all or in part for it's contents
going on 5 minutes after the initial posting, and still no slashdotting...
Seems to me that these guys might be onto something here...or maybe they just really know what they're talking about...
Well it did load quickly.
It was a good read and I wish we could do something vaguely similar with our web servers here. Not that we get the server load to demand such improvements at the moment, but I figure it's best to spend the money early on, get a good setup going that can handle high volumes, that way you're not caught with your pants down when things take off for you. It's unfortunate bean counters never think this way.
Of course I don't think I'll be taking this approach at home - even if it would be fun to have a Sun Blade sitting in the living room purring along answering the 1 or 2 web hits we get a day.
In a row???
Note this article for information on connecting USB keyboards and mice, what shorting the debug pins does on the keyboard, and replacing that measly ATA33 hard drive cable with an ATA100 (surprise, surprise: it actually increased performance :) ).
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
is a hardware store really the right place to shop for a server?
They should have used Zope.
Microsoft has written several white papers of this sort already. Of course, they're Microsoft, so that means I can kiss my +2 bonus goodbye. Seriously, though.
If you fall off a building, go real limp, because maybe you'll look like a dummy and people will be like hey, free dummy
Why would they go with the desktop version when they want a rackmount server? You can get the Netra X1 for 50$ less and it comes with the exact same hardware but in rackmount case. Check it here.
Already /.ed. Next.......
Considering Slashdot is one of the slowest sites on the Net, and crashes frequently, I think that the Slashdot owners should really read this.
/.
aceshardware.com _JUST_ fell over. I guess it couldnt keep up with
--fatboy
Well, it looks like the new server was unable to handle the load from us. It's not responding at all...
I usually get a decent speed processor PIII 800, a really fast scsi drive or raid (Depending on the site), 512MB of ram ( Or more depending on the site ), and a copy of Slackware.
--
FearLinux.com
It is 15 minutes after the article post and the site is dead. I got to the part about calculating how much RAM was required per visitor and multiplying by the expected number of visitors.
:-)
Maybe they need to adjust their constants.
It is those d*mn modem users that drive up the RAM use. They stay connected longer on their GET and tie up resources longer.
who couldn't figure out at first why Ace Hardware put up information about a new webserver?
It would be interesting to see an update from them tomorrow with the same graphs as on the Servers in Practice page with today's data.
/. load.
Their site is slowing down under the
http://www.thehungersite.com
Hmm.... New server huh? Can't connnect, so apparantly it can't take being slashdotted....
Me fail English? That's unpossible!
While I can hardly argue with the MySQL bit, but if you think a site this big does (or possibly could) run via CGI, you're incredibly inexperienced.
Weblogic, complete with native performance pack (since the stock java one-thread-per-connection model could never scale) is still slower than a well tuned Apache server..
Apache is fairly slow, but Java is slower. What it comes down to is that Apache is always fast enough to saturate whatever bandwidth you have. If you have a large amount of bandwidth, you've already clustered for relability.
If you're worried about truely high performance on a single CPU (by far the most common case for webservers) an event driven single process model will destroy both a multithreaded and a multiprocess model.
Optimizing your webserver for SMP machines is stupid, since webserving is trivially paralellizable, and clustering gets you better reliability and is usually cheaper in these days of inexpensive 1U servers...
<I think the part about Java/Resin is the most crucial. Anybody can throw hardware at a problem, but their programming methodolgy makes tremendous sense (ie: dump this Apache/CGI garbage in favor of real multithreading)>
Funny. What was our next closest competitor spent several million dollars on Sun hardware and everything done in Java. We spent less than $40,000 on some dual-proc Intel machines, doing everything with Postgres, Perl, and Apache. The result? Our servers have many times the capacity that theirs do, and they're almost completely out of business.
steve
Oh, you're not stuck, you're just unable to let go of the onion rings.
Should we really be taking advice on building a web server from someone who's server crashes under /.s load?
...the one with a lot of mirrors.
"Those who have never entered upon scientific pursuits know not a tithe of the poetry by which they are surrounded."
"Real multithreading" is really no panacea. See the notes from John Ousterhout's talk, Why Threads Are A Bad Idea (for most purposes).
It really feels like they only made a token gesture towards using an x86 box. To be honest my next box will probably be a sunblade too (but hey, I'm gonna use it for a desktop ;) Mind you this was a really good article, but I think they should have said that they were just more comfortable with sparc and that was that. There was another good article on a similar subject not long ago, on Anandtech's new server. For that article they benchmarked different configs (mobo, proc, etc) then did a price performace.. as far as I recall. And they chose AMD ;)
You are only young once, but you can stay immature indefinitely.
of Dr. Pepper? I expect to go through a whole case waiting for the slashdot effect to wear off...
Once upon a time, we had 1 web server that did everything, so it needed to be able to do everything. Now everytime we do something new we toss out a new webserver (or 2 or 10 of 'em). And they all basically need to do one thing (webmail, portal, whatnot) and do it well and that's it.
So we've got a whole bunch of Apache servers which a bucket load of apache processes who basically spend all day doing little more than exec'ing the same CGI over and over (and copying the data around a couple of extra times).
I'm pretty much now convinced that would my next step is going to be to franken-meld my cgi with something like mini-httpd so it is a single, persistant, app.
I'm certainly not redoing the whole thing in Java though! :)
Shut up, be happy. The conveniences you demanded are now mandatory. -- Jello Biafra
I was enjoying the article until it /.'ed, and I couldn't get anymore pages to load.
Therein, a stress test to the folks at Ace's Hardware.
I'm thinking that their "better webserver" isn't so hot, considering the "connection refused" messages I'm getting.
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
The moral of this story:
If your website is dynamically generated from a database, and your name isn't Amazon.com, don't let Slashdot link to you.
A single $999 box isn't going to stand up to Slashdot, unless every page is static.
Well, they seem to be slashdotted, now. Great webserver. *snicker*. Oh man.
Actually Slashdot is usually one of the fastest site on the Net for me. I frequently use it to test if my DSL connection works properly. Their scripts/database often get hosed at high loads, though, I wonder what the bottleneck is. But Java as a replacement? Puhleeaze..
I'd call this buying a web server rather than building one...
"I want peace on earth and good will toward men." "We're the U.S. government. We don't do that sort of thing!!"
Dr. Pepper's website. Check it out, yOU!
While you're at it, why not purchase a Dell computer?
/.ed Well, so much for THAT idea . . .
I devoted a can of Dr. Pepper to this article... had to replace my keyboard. I guess that wasn't a good idea, huh?
It's easy to stand out when the general level of competence is so low.
Aha! Realtime load dependant hardware upgrades! That's gotta be the plan!
Now let's just see...
--This isn't a man who is leaving with his head between his legs.
Is it just me, or do most folks confuse these two. If a popular website only has a 9 Mbps pipe to the Internet, it doesn't matter how many Crays they have running their webserver farm, they're only going to be able to churn out 9 Mbps (minus overhead). Granted that the converse is possible... gobs of bandwidth, but a slow server... but I would imagine that bandwidth is the limiting factor of at least 99% of websites.
Bah, speak for yourself. Java relies on the virtual machine, so that's your bottleneck (as in beer and performance). With proper software (like the new version of Apache still in beta) and tuning, or other threaded servers like aolserver or Xitami and PHP or modperl instead of Java I bet my money _that_ configuration will scale better.
Also, don't confuse the CGI protocol with short-lived CGI binaries. Slashdot uses modperl, whcih is NOT a short-lived process, but Apache is still a forking server in the 1.3.x branch.
It's pretty easy to just do:
...
for (;;) {
n = select(...);
perConnStructPtr = getPerConnPtrByFd(anActiveFd);
}
after all.
Shut up, be happy. The conveniences you demanded are now mandatory. -- Jello Biafra
I read through those slides, and the biggest drawback to threads given was that people are too dumb to use them properly. While I certainly agree with that sentiment, it doesn't make using threads bad in and of itself.
I also disagree with the assertion that concurrent execution is an exceptional case rather than a typical case. It may be exceptional in terms of individual steps being performed, but it's been my experience that it's a common occurence on an application level. Simplistic GUIs work well with event-driven approaches. Most tasks could be 90% handled by event-driven approaches. But a lot of operations have some subtasks that would be better handled by concurrent execution.
Consider a user with a typical analog modem that has an average maximum downstream throughput of, say, 5 KB/s. If this user is trying to download the general message board index page, about 200 KB in size (rather small by today's standards), it will require a solid 40 seconds to complete this single download.... To maximize the efficiency of the network itself, we can compress the output stream and thus, compress the site. HTML is often very repetitive, so it's not impossible to reach a very high compression ratio. The 200 KB request mentioned above required 40 seconds of sustained transfer on a 5 KB/s link. If that 200 KB request can be compressed to 15 KB, it will require only 3 seconds of transfer time.
Except that 56 Kbps modems get 5 KBps thoughput by compressing the data! If the client and server compress, the modems won't be able to; the net effect is lots of extra work on the server side, and probably no increased throughput for the modem user.
The server might or might not see a decrease in latency, and in the number of sockets needed simultaneously; it depends on how much it can "stuff" the intermediate "pipes". The server will see an overall decrease in bandwidth needed to serve all the pages.
Ironically, broadband customers (who presumably don't have any compression between their clients and Internet servers) will see pages load faster. (And the poor cable modem providers from the previous story will be happy.)
Stupid job ads, weird spam, occasional insight at
article discussing how to design better webserver
software -- something which would have been
very interesting since it has been ages since I
saw a fresh take on that.
instead: another article on piecing together hardware. *sigh*
Yep I can't get to the site, its slashdotted. Hmm, I wonder if some people try to get sites posted just to have them taken down. It would be kind of like a poor mans DOS attack.
SPARCs come from Sun, everybody makes a PC - so guess which is cheaper? We see some reasons why they went for the Blade (a nice machine, but rather more expensive than a couple of PCs).
Please get this right, I'm no x86 fan, but I love the competition going through the line from the processors, chip-sets, mother-boards, etc. This has got to ensure that unless you really want the 2GHz Pentium 4, you have plenty of choice.
As for reliability, I don't know the Blade, but the SPARC 20s used to give some headaches over their internal construction. It always seemed a little complicated with the daughter boards and they seemed to lose contact after machines were moved around.
See my journal, I write things there
One thing that does seem to work against the onslaught is a throttling webserver. If you haven't got the bandwidth etc to serve a sudden onslaught of requests, probably the best thing to do is to just start 503'ing -- at least people get a quick message 'come back later' instead of just dead air.
Shut up, be happy. The conveniences you demanded are now mandatory. -- Jello Biafra
Now I see why Ace's crapped out while I was reading the article --- I witnessed the /. effect live. In fact, I was a victim of it. Boo hiss.
The SPARCstation 20 was one heck of a great machine back in the day, especially for its size (a low profile pizzabox). The design was a lot like it's older brother (the SPARCstation 10 from 1992)... that is, two MBUS slots (for up to 4 CPUs) and 4 SBUS slots (Sun Expansion cards, 25 MHz x 64 bit = ~ 200 MB/sec each, but 400 MB/sec bus total).
I remember using a Sun evaulation model at Rice many years ago... the machine had two 150 MHz HyperSPARC processors (though 4 were avilable for more $$), a wide SCSI + fast ethernet card, two gfx cards for two monitors, and some sort of strange high speed serial card (for some oddball scanner, I think). Not to mention 512 MB of ram, in 1994! The machine was a pretty decent powerhouse and sooo small! I sort of wish the concept would have caught on, given how large modern workstations are in comparison. Heck, back then an SBUS card was about 1/3 the size of a modern 7" PCI card.
Then there's the other end of the spectrum... one department bought a Silicon Graphics Indigo2 Extrme in 1993. The gfx cardset was three full size GIO-64 cards (64 bit @ 100 MHz = about 800 MB/sec), one of which had 8 dedicated ASICs for doing geometry alone. 384 MB of RAM on that beast. Pretty wild stuff for the desktop.
Ahh, technology. I love you!
500 Servlet Exceptionp plication.java:2779)
s ionTag.java:47)
i debar_articles.jsp:17)
8 1)
i lterChainPage.java:166)
i on.java:277)
c heInvocation.java:129)
( QRequestDispatcher.java:338)
( QRequestDispatcher.java:247)
j ava:467)
8 1)
i lterChainPage.java:166)
( FilterChainFilter.java:87)
i on.java:277)
c heInvocation.java:129)
H ttpRequest.java:216)
o n(HttpRequest.java:158)
. java:140)
java.lang.NullPointerException
at com.caucho.server.http.Application.getAttribute(A
at AcesHardware.tags.DiscussionTag.doStartTag(Discus
at _site._sidebar_0articles__jsp._jspService(/site/s
at com.caucho.jsp.JavaPage.service(JavaPage.java:87)
at com.caucho.jsp.JavaPage.subservice(JavaPage.java:
at com.caucho.jsp.Page.service(Page.java:474)
at com.caucho.server.http.FilterChainPage.doFilter(F
at com.caucho.server.http.Invocation.service(Invocat
at com.caucho.server.http.CacheInvocation.service(Ca
at com.caucho.server.http.QRequestDispatcher.include
at com.caucho.server.http.QRequestDispatcher.include
at com.caucho.jsp.QPageContext.include(QPageContext.
at _read__jsp._jspService(_read__jsp.java:84)
at com.caucho.jsp.JavaPage.service(JavaPage.java:87)
at com.caucho.jsp.JavaPage.subservice(JavaPage.java:
at com.caucho.jsp.Page.service(Page.java:474)
at com.caucho.server.http.FilterChainPage.doFilter(F
at ToolKit.GZIPFilter.doFilter(GZIPFilter.java:22)
at com.caucho.server.http.FilterChainFilter.doFilter
at com.caucho.server.http.Invocation.service(Invocat
at com.caucho.server.http.CacheInvocation.service(Ca
at com.caucho.server.http.HttpRequest.handleRequest(
at com.caucho.server.http.HttpRequest.handleConnecti
at com.caucho.server.TcpConnection.run(TcpConnection
at java.lang.Thread.run(Thread.java:484)
Ousterhout says threads are bad for apparent concurrency but good for taking advantage of multiple processors, and for building scalable servers.
In other words, with the right hardware architecture, threads could be very useful for sites such as Ace's Hardware (though they happened to go with a uniprocessor) and Slashdot.
Java threads are also easier to program than C and C++ threads, though not easy. (Manual memory management is hard; thread programming is hard; manual memory management in a threaded program is very hard. I'm not speaking hypothetically on the last point; I've really envied Java programmers the last few weeks.)-:
Stupid job ads, weird spam, occasional insight at
Looks like their article was their server's achille's heel....
very ironic.
Ace's Hardware? Why do I see a nasty trademark violation in some poor webmaster's future?
*sigh* Probably because we've seen enough of it in the past...
--Fesh
Kill -9 'em all, let root@localhost sort 'em out.
Of course, the integrated video and sound is not very important to us, as the system runs headless mounted in a rack.
Just a thought, but video would be really handy for the install, i'm guessing!
So... anyway, in short it looks like they took a workstation and dropped a shitload of ram into it... big deal
In a part about databases and persistent connections they confuse the issues more than a bit. The real problem is not too many processes, what automatically makes threads look better, but the symmetry among processes -- any request should be possible to serve by every process, so all processes end up with database connections. This is a problem particular to Apache and Apachelike servers, not a fundamental issue with processes and threads.
In my server (fhttpd I have used the completely different idea -- processes are still processes, however they can be specialized, and requests that don't run database-dependent scripts are directed to processes that don't have database connections, so reasonable performance is achieved if the webmaster defines different applications for different purposes. While I didn't post any updates to the server's source in two last years (was rather busy at work that I am leaving now), even the published version 0.4.3, despite its lack of clustering and process management mechanism that I am working on now, performed well in situations where "lightweight" and "heavyweight" tasks were separated.
Contrary to the popular belief, there indeed is no God.
I find it kind of ironic that I cant see read and artilce on building a better server because of server problems ("The page cannot be displayed") :oP
I found it somewhat funny that here I am working on a SunBlade 100 in my university's computer lab and they consider this enough to be their webserver. While it's a nice box, I certainly wouldn't use the thing as a mainstream server.
If not now, when?
It sure didn't last very long.
the best thing about working with sun vs. working with x86 is that you don't need a monitor/video card. The serial console works just fine.
If you managed to get the first page (sounds like you did) you could see they were doing it headless, most likely (hopefully) with some sort of console server. Of course, I don't know if you've ever worked with anything but x86.
So, no, they really don't care about the integrated video.
But it's a wonderful chaser for Seagram's 7...
http://www.aceshardware.com/read.jsp?id=45000241
)
h e.java:82)
l dCache.java:45)
B uildCache.java:50)
8 1)
i lterChainPage.java:166)
( FilterChainFilter.java:87)
i on.java:277)
c heInvocation.java:129)
H ttpRequest.java:216)
o n(HttpRequest.java:158)
. java:140)
500 Servlet Exception
java.lang.NullPointerException
at BenchView.SpecData.BuildCache.(BuildCache.java:96
at BenchView.SpecData.BuildCache.getCacheOb(BuildCac
at BenchView.SpecData.BuildCache.getLastModified(Bui
at BenchView.SpecData.BuildCache.getLastModifiedAgo(
at _read__jsp._jspService(/site/sidebar_head.jsp:60)
at com.caucho.jsp.JavaPage.service(JavaPage.java:87)
at com.caucho.jsp.JavaPage.subservice(JavaPage.java:
at com.caucho.jsp.Page.service(Page.java:474)
at com.caucho.server.http.FilterChainPage.doFilter(F
at ToolKit.GZIPFilter.doFilter(GZIPFilter.java:22)
at com.caucho.server.http.FilterChainFilter.doFilter
at com.caucho.server.http.Invocation.service(Invocat
at com.caucho.server.http.CacheInvocation.service(Ca
at com.caucho.server.http.HttpRequest.handleRequest(
at com.caucho.server.http.HttpRequest.handleConnecti
at com.caucho.server.TcpConnection.run(TcpConnection
at java.lang.Thread.run(Thread.java:484)
Resin 2.0.2 (built Mon Aug 27 16:52:49 PDT 2001)
Speaking as the maintainer of a site that is periodically slashdotted...
e d, everyone waiting in queue gets a periodic update, at a certain point the load of generating the updates swamps the machine. I have to limit the number of people in queue.)
Yes, a throttling server is a great idea. If you recognize that there will always be a load too high for you to handle (10 requests per minute for my site, yes minute, it is a physical device), then you must either decide to deal with the load or let the load crush your machine.
Consider a typical web server. When it gets overloaded it slows down, each request takes longer to handle, there are more concurrent threads, overall efficiency drops, each request takes longer to handle.... welcome to the death spiral. (on my site-which-must-not-be-named-less-it-be-slashdott
The key decision is to determine how many concurrent threads you can handle without sacrificing efficiency and then reject enough traffic to stay under that limit.
This is where optimism comes in and bites you in the ass. You remember that every shunned connection is going to cost you money/fame/clicks whatever so you set the limit too high and melt down anyway.
brag up new server
kind soul links us from slashdot
looks like we eat crow
anarchy rules
What a great webserver... I think an increase in bandwidth is in order.
Linux is like a wigwam - no windows, no gates, apache inside!
http://tomgould.com/
5:45pm EST.
Site is toast. Can't wait to read the article.
No sig.
I find it very funny that this article in about a better web server and the site is unreachable. Slachdot effect??
man oh man, /. 's wrath...
its gotta hurt their pride to write a good article like that just to succomb to
Good article but i didn't see any pictures because their server buckled....
This presentation is barely junior level CS work. I'm horrified that this is even being referred to as the support for an argument concerning threading vs. non-threading. Ousterhout sees threading everything as a problem, but that should be obvious to anyone who has looked at the drawbacks of threading in the first place. As an alternative, he proposes THE EXACT PROBLEM THAT THREADING TRIES TO SOLVE.
Event-driven software is a dead-obvious design for anyone who has spent any time looking at code, and threading is a godsend to solve the problems that event-driven design can't handle. But I don't think anyone really involved in the science of computer science sees the two ideas at odds with each other so much as they are complimentary programming techniques.
Since you are obviously a non-programmer, let me put it into these terms: Event-driven computing is like addition. After a while, you realize that saying 2 + 2 + 2 + 2... is a pain in the ass so someone teaches you multiplication (threading in our analogy). So now it's easier to do certain types of addition, but that DOESN'T mean that everything you do should be multiplication. Sometimes it's easier to simply say 2 + 3 instead of (2 * 1) + (3 * 1). Just like these two math functions, threading/events are just programming techiques, one harder than the other, but not mutually exclusive.
My brain is still reeling that this guy actually bothered to put together a powerpoint presentation on the subject (provided he's not a programming teacher).
How ironic is it that the site authoring a guide to 'Building a Bettwe Webserver' is apparantly 'slashdotted'?
neophyte@tumeke:~$ telnet www.aceshardware.com 80
Trying 216.87.214.213...
telnet: connect to address 216.87.214.213: Connection refused
telnet: Unable to connect to remote host
Hence beyond conserving some/a lot (depending on the nature of a web site) bandwidth, gzip actually improves perceived download speed for modem users. Just compare modems 2-3x compression, and possible huge numbers gzip may get on big files with a lot of similar patterns. To highlight my point I just compressed a specific web page (an atypical case, but... :-) from inittial size of 117,340 to 4,466 bytes :-D
At first glance, I would say a 500MHz Sun BL100 might be a tad on the underpowered side for a web server with broad user appeal, but I'm really interested in a deep and meaningful way to hear the followup on their design article with all of the Slashdot Effect taken into account.
Why 64bit? Is there a lot of big integer math going on here? Does the web server/jvm do a lot of memory-copy operations on data that is 64 or more bits large? What kind of stuff does ldd(1) tell you about the 64bit implications of the web server?
How *is* the disk IO on the blade? That's the traditional bottleneck for any system design to tackle first. How about their Internet provider's network? That's the first culprit in low-cost systems' ability to handle more concurrent users. What is the CPU spending most of its time doing?
--- Nothing clever here: move along now...
... it took almost five minutes to load the splash page.
Ya Sure! You Betcha!, The_THOMAS
Gotcha, makes sense not to have a video/sound card then. If with the Sun system you can use a serial console, you can save the resources that an unused video/sounds system takes.
I've used mostly x86, and get too used to having a monitor hooked up to any machine 'cuz if i need to do anything major(i.e. reinstall), a remote terminal just dont work.
Looks like the server about building a better server is actually a really crap server.
Slashdot - Beating the living shit out of servers, everyday.
At the bottom of the page are load times. I visited while this was still the top story here. First page load time was 3 ms. Page 2 was 59,000 and change. Page 3 timed out. :-)
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
* I assume that's what they're claiming -- I couldn't read beyond the second page because all I got was errors
The server apparently is spending alot of time dealing with the .jsp!
I've seen pentium 166's cope better than this site.
It's back it already recovered from the load. That seems like a pretty good recovery time after a /.ing to me.
"A witty saying proves nothing." - Voltaire
Wow! New server, new configuration. Hey George, let's find a way to stress-test the configuration.
Good plan, Stan. Let's write an interesting article and post it on Slashdot. I bet it can handle all them readers!
Yup, so do I, George. So do I!
[And the rest is history...]
Privacy is terrorism.
There are more factors than just CPU and Bandwidth... like what's between the two. A new coworker recently told me of his major learning experiences in the mid 1990s running several popular news websites durring the beginning of the web boom. One of the more popular sites he ran originally had a T1 routed through a Cisco 4000 router. Things worked great until he had an additional, load balancing T1 added for added thruput and redundancy. Things didn't feel much faster, in fact, they were almost slower. After much investigation he learned that the router didn't have enough RAM or CPU to handle the packet shuffling that intelligent multihoming routing requires. A similar instance happined with a friend's company when they tried to run a T3 through their existing router. While the old cisco had enough cpu and ram in theory, its switching hardware and thruput couldn't handle the full number of packets the T3 was providing thru the shiny new HSSI high speed serial card.
Now, I realize modern hardware (Cisco 3660 and 7x00 series, and pretty much any Juniper) can route several T3s (at 45mbps each direction) worth of data, but older routers and minimally configed routers do exist.
There are MANY bottlenecks in hosting a website. Server daemon, CPU, router, routing and filtering methods, latency and hops between server and internet backbones, overall bandwidth thruput, and much more.
It's not as simple as "lame server, overloaded CPU, should have installed distro version x+1".
This one always seems to confuse people.
An HTTP request is a request for a single "document" from a web server. But it normally results in many subsequent requests. For example, on the "Post Comment" page here at
So counting the number of hits on an HTML document gives no real indication of the server load, since my one "hit" is using 9x the resources of loading a
To get your metrics wrong by a factor of 9 (probably worse, the HTML I downloaded was, say, 20k, the 8 GIFs I've downloaded are presumably much more), means that if you've done your maths correctly, then you're going to get 8/9 users failing to load a page - and then retrying, causing more load on the server.
Thankfully, all these pseudo-equations are meaningless.
Author, Shell Scripting : Expert Re
...is crap crap crap crap crap. The Blade 100 is one of Sun's IDE-based machines. Because Sun expects that such boxes are going to be used either as low-end workstations (the Blade 100) or disk-avoiding compute farm servers (the Netra X1), the disk subsystem of them is painfully below par: Sun routinely ships under-speced IDE disks (remember the 4800 RPM drive? Sun does, and they've got a whole warehouseful to refresh your memory with!) and compounds the problem with Solaris 8's ATA/IDE drivers, which are the worst in the known universe -- watch your entire system drag to a half as the OS attempt to write out a .5MB file! Whee!
I don't know what Ace's traffic numbers are normally like, but using a Blade 100 for anything other than a small, personal website is flat-out folly. At a minimum, they should have been using a Netra T1/AC200 ($3k, nicely configured, and a 1U rackmount machine to boot), and I would probably have thought seriously about scrounging a used E250 or E220R off of Ebay.
News for Nerds. Stuff that Matters? Like hell.
You should not envy them.
In java if I have an array "arr" and
for (int i = 0; i arr.length; i++)
then the arr.length will be evaluated once for each loop because of the _possible_ concurrent access. The compiler cannot be sure nobody else is modifying your array so it cannot optimize it by taking it out of the loop.
Contrast this with C/C++/whatever when you know what is shared and guard it and the compiler is free to optimize the rest.
Mebbe they really needed a v880 or summat before they started getting posted on /. :)
If you haven't noticed by now, Ace's Hardware has a neat little indicator on each page that shows time processing and queue time it spent getting to you (very bottom left-hand corner of each page). Most are about 74ms - 112ms for me. This, plus the result of some pings and traceroutes leads me to belive they're heavily BANDWIDTH bound right now, not CPU bound. I do hope Ace puts up a summary of the Slashdot effect as well as some other data for us to pour over. Some MRTG router graphs of the bandwidth usage would be *really* nice, too.
They discuss on xxx the possibility of adding a new machine as a database server, leaving the current as a webserver, but say that this adds an additional point of failure.
Au contrere, with two servers in a Cluster, the worst-case scenario is that the newer (more powerful) machine goes down, in which case the database flips over to the old SS20 - giving them their original config. back automatically, while they deal with the problem on the database server.
The other scenario, that the SS20 fails, gives them the current configuration, of apache and database running on the SunBlade100.
I'm sure they'll get nothing for the old SS20, so there's no additional cost involved (apart from the Cluster software, and configuring it), they get better performance than either their previous or current configuration, with the worst-case scenario being their old config for a while until they fix the other server.
Oh - and they get network failover, disk mirroring (which they hopefully have already), and such like bundled too.
(note: I work for Sun, but would be *very* surprised to see such a crude web server)
Author, Shell Scripting : Expert Re
Steve, are you saying that your $40,000 server platform is better than their multi-million dollar platform? I doubt that is true. I think that it is more likely that their developers just didn't know how to effectively use the tools they were given. As an analogy, a Dodge Viper is a fast car but you have to be able to use a stick shift to drive it effectively.
That is one reason you should write it this way: ... }
...}
for (int i = arr.length; i > 1; --i) {
Or like this:
int l = arr.length;
for (int i = 0; i < l; ++i) {
In fact, I think your post really illustrates why Java is good at multithreading: Java defaults to to being safe but lets you be unsafe if you want. In C++, for comparison, everything is unsafe unless you make it so.
Score this -1 redundant, but...
I've been out of "the business" of programming for the last 7 years but this article was the single most practically useful article I've read in that time as far as explaining the whys and hows of web server operation. Kudos to the Ace's Hardware guys for this post.
Gushing praise and all that...
These guys truly bought the farm, didn't they. Sheesh. I wish them luck with their setup.
Pushin' 'n dealin', shovin' 'n stealin'
In a server a great deal of the Java will be translated into native processor instructions for you automatically, as well as doing significant dynamic code optimizations (such as inlining). In addition, you have true database connection pooling and easy access to shared memory between threads. A high end application server will dynamically configure itself into a hybrid multi-process, multi-threaded server and will scale out your application as necessary.
AFAIK, that doesn't happen with PHP or Perl.
Finally, some application servers (e.g. Oracle9i AS) actually use Apache as the web server.
If you can't live without Apache, there's always mod_bandwidth.
Not quite as elegant a solution, but it's nice for preventing your web server from taking all of your bandwidth (if, say, you run it off your cable modem, and wish to continue gaming...).
You're completely wrong, you can't resize arrays in Java and arr.length is an integer constant not a method so the compiler *can* optimize it out.
It's clear you really don't know what a "Server" is.
"lack of a server class PC in their price range, the $1000 price tag"
Interesting, because the Sun Blade is *NOT* a server class machine. It uses IDE and non-ECC SDRAM, has no redundant power, not even mirrored harddrives.
It's a cheap desktop computer from Sun. They also take the same internals and put it into a rack mount case and call it the Netra.
Succhiare su un rubinetto!
It's Java... It's supposed to be cross-platform!
Sucez sur un robinet!
Occasionally you'll find a web page that's got several hundred KB of actual text, but it's usually not that way - most of the bits are decorative GIFs or JPGs which your modem won't compress. So you've got to pay attention to it upfront - use image formats that are already compressed (compare GIFs, JPGs, newer formats like DjVu, different resolutions), and pay attention to how much you want to clutter up the pages with them. Are they fundamental content? Nice but could be lighter weight? Unnecessary clutter when you could use a nice solid-color background instead? How often do you reuse them? Can you cache them effectively, either in the user's browser or ISP, or does the browser think each one of those customized bullets is a different dynamically generated file that it needs to download?
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
I think I read:
...
Building a Butter Webserver
Maybe you should read the article before babbling.
It's still Java, with the VM and JIT compiler overhead. What if I do not want to use Java? What choices of application servers do I have in such a case? Furthermore, that application server will probably cost you an arm and a leg more than a bunch of clustered x86s running open software.
As for the connection pooling, it might not even be a requirement for a site. If it's really needed, it will be implemented either in PHP or somewhere else. Also, it's nice to know that the new version of Apache is a hybrid threaded and multi-process, and it even has state threads module from SGI.
The Extreme boardset has (8) GL processors, not three.
:)
Must have got carried away!
man tunefs | grep fish
Speed shouldn't be the reason you switch to Java. If anything, I've found that PHP has been faster for simple web applications and page serving (and loads faster to develop applications with), while Java stands out as being more robust and stable.
> VM and JIT compiler overhead and interperting a perl/php/asp... script has no overhead? > What if I do not want to use Java? Then use a converter (Woo, JCC :)
or
class ICantBeBuggeredToRedoThisBitInJava { ...
perlIn = System.exec("/bin/perl /wwwroot/MyIttlePerlApp.pl");
out = perlIn; ...
}
But hey, at the end of the day just use what you feel does the job best, then write an artical, and have your website 'load tested' ;-)
CS!
Insightful but Overrated Troll
quote:
::same:: address space, and only get new memory when they try to touch a page that is write only (ie they can run and run and run, but once they try to access their memory they get new memory space with the contents copied). It saves time and memory.
:(
This means an Apache web server using keepalives will need to have more child processes running than connections. Depending upon the configuration and the amount of traffic, this can result in a process pool that is significantly larger than the total number of concurrent connections. In fact, many large sites even go so far as to disable keepalives on Apache simply because all the blocked processes consume too much memory.
::end quote::
lets see, anyone here hear of COW (copy on write) Linux uses this idea to save time on fork'd child processes, they get the
The only setback is when a process fork's a child, its current time slice is cut in half with half given to the child, so the main proc will run aground if to many requests come in and the server has more processes to worry about.
-ShadoeLord
this is my sig, there are many like it, but this one is mine.
Bzzt, while array length is constant, the array itself need not be! You can't optimize the length check away unless you know that the array reference can't change (pretty hard unless the array is stack local).
Actually, you should make your test >=0 if you want to do it faster. Going backwards and testing for >1 actually runs slower than going through the array forwards on some machines! This somewhat depends on your processor (platform efficient branch tests are your friends). It's true on the Athlon I'm typing this on, though.
When I discussed this issue with Thau (or to be precise, he did most of the talking) he gave the reason for using processes over threads as the awful state of the then pthreads packages. If Apache was to be portable it could not use threads. He even spent some time writing a threads package of his own.
I am tempted to suggest that rather than abandon apache for some java server (yeah lets compile all our code to an obsolete byte code and then try to JIT compile it for another architecture), it should not be a major task to replace the Apache hunt group of processes with a thread loop.
The other reason Thau gave for using processes was that the scheduler on UNIX sux and using lots of threads was a good way to get more resources, err quite.
Now that we have Linux I don't see why the design of applications like apache should be compromised to support obsolete and crippled legacy O/S. If someone wants to run on a BSD Vaxen then they can write their own Web server. One of the liabilities of open source is that once a platform is supported it can end up with the application supporting the platform long after the O/S vendor has ceased to. In the 1980s I had an unpleasant experience with a bunch of physicists attempting to use an old MVS machine, despite the fact that the vendor had obviously ceased giving meaningfull support for at least a decade. In particular they insisted that all function calls in the fortran programs be limited to 6 characters since they were still waiting for the new linker (when it came it turned out that for functions over 8 characters long it took the first four characters and the last four characters to build the linker label... lame, lame, lame)
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Once a server has handled its initial requests there is very little overhead from the JIT compiler because everything has already been compiled. So, as long as you keep good uptime (you don't restart the server), the JIT won't be a factor.
Since almost everything will be compiled to native processor instructions, there is very little VM overhead. I bet the VM overhead is very competitive than the oeverhead of a Perl or PHP interpreter.
There are free applications servers (e.g. Apache Tomcat). There are low-cost Application Servers (e.g. Orion). And there are expensive, but well-supported applications servers (e.g. Websphere, iPlanet). So I don't think absolute software price should be an issue.
I don't have anything bad to say about Apache or PHP or Perl. I'm just trying to reduce the spread of anti-Java FUD.
"lots of big iron gets crushed by the slashdot effect too. This thing is running on a piddly little Sun."
first post at 4:46pm
/.'ed at 5:05pm
There seems to be no discussion about the use of asynchronous I/O. I'm no Apache expert, but I would think that a single Apache process using a select() loop could serve many clients simultaneously. Current implementations of Java have to allocate a thread per connection, which is extremely inefficient. Granted, Java 1.4 introduces asynch I/O, but it is not a production release yet. But this will be a significant enhancement to Java once it is.
Great Windows SFTP Server!
well the best server (foreign) that i have seen so far is www.eperolehan.com.my, i bet they must've beefed it up with a large array of sun pizza boxes.
I don't know about PHP, so I won't make any comment about that language, but Perl has an adequate interface for sharing persistent objects in the form of the CPAN module IPC::Shareable:
I think this day trading weasel has just bought Dr. Pepper stock and he is trying to use the /. effect to bring up the stock price!!
Hmmmmmm.
You did it correct.
Links and a short description what you link.
And it helps if you say... this costs me karma but...
Slashdot's new advertising scheme!
-J
Sure, but I feel Java VM still has a higher overhead, as in the case of Perl and PHP they lack a VM, and are straight interpreted languages.
What if I do not want to use Java at _all_. Never! Ever! What application servers can I choose from? Not that many, eh?
Ok, so I may be FUDding. The only way to really compare is to compare real-life application running on both implementations on the same hardware. It sucks, however, that most if not all application servers (not tied to a web ser ver) are Java-based.
They took a "powerfull" desktop and made it the new server. When i look at the sun site you can see this is a desktop that is not very upgradable, they already took it to almost the top.
Memory: Max 2Gb, 2Gb used. (4x increase old memory) may sound a lot but "we will never need more than 640Kb" and already 50% is used and "not growing."
Processor: 500 Mhz now 25% used. But no more extra processors are possible. (I know 1 sun Mhz != 1 athlon Mhz, but 25% load is far fro near idle)
They can work arround this limitations by placing an extra server and placing some functions on the other server, but they started with that in their case an extra server would be an extra point of failure.
In other words, if they keep developing their site we will see such an article agian in one or 2 year. gues this one will be about load balancin g on cheap (sun or x86) hardware.
I am a little bit suprised they didn't use x86 hardware since that is waht they review all the time. They looked futher than what you would expect.
It's not just the size of the dictionary, but with a data stream the dictionary has to be dynamically built and adapted. With a static file the whole of the data can be analysed at once for the optimum dictionary, which can then be appended to the compressed data.
Phillip.
Property for sale in Nice, France
A classic example of idiot moderation. I just took two karma away from two idiots.
This isn't a troll, and it isn't flamebait; you may disagree with it, but it's a legitimate technical opinion.
It's arguable (I don't think Slashdot is anywhere near the slowest site on the net), but it does crash frequently, and MySQL is probably a contributing factor in this.
It's got abrasively-presented technical opinions in it (Apache garbage? That's a bit extreme) but they're all defensible positions.
Learn to moderate, or stop doing it, children.
I agree. I have come across this many times, including a rather extreme example of where I reduced a stock import time for a book retailer from 12hrs to under 30 seconds by (a) re-doing the tables and throwing away useless joins and (b) rewriting the Perl in C using better algorithms eg pre-buildng and doing one insert instead of doing a two pass insert then update. Another example when displaying results about 20 books, instead of doing for (i=1...20) select and doing 20 queries, do select where in (list) which gets the same results in one query. Designing web applications requires more than just the ability to code, you really need to know the architecture of the whole system and how they interact.
Phillip.
Property for sale in Nice, France
If you don't want to use Java at all, then use Apache/Tomcat solution.
Insightful but Overrated Troll
The only way to really compare is to compare real-life application running on both implementations on the same hardware.
This is a difficult copmarison. First of all, the architecture of a Perl-based application is certainly going to be different than that of a Perl-based one. Furthermore, many vendors have high-performance (sometimes transparent) J2EE add-ons for their application servers. This makes comparisons amongst Java App Servers nearly impossible; you might get great performance on Oracle9iAS do to all the automatic optimizations (e.g. ESI, DBCache,), but the same exactly application might run very poorly on another vendor's product.
There is not a standard benchmark application for J2EE called ECPERF. Perhaps somebody could write a Perl application with the same functionality. Then you could have some head-to-head comparison. You'd probably also have to take a Perl-specific benchmark application and port it over to Java and to validate the ECPERF scores.
Tomcat is still Java.
There is a SPARC organisation but attempts to produce non SUN SPARC systems have had a number of problems. The x86 architecture is crude/rude compared with the newer UltraSPARC designs but, so what, there is competition!!
See my journal, I write things there