At least in the case of Linux high-end performance scales down to the low-end as well. But these higher-end benchmarks are more like an indicator of 'look, this far can we take your business today'. You dont have to actually hit this limit, but it's good to know that you could, eg. if your business grows.
And please do not ignore the lower-end benchmarks as well - single-Xe on server with 2GB RAM is not all that uncommon these days.
1) the maximum filesize in the SPECweb99 benchmark is 900kb, this is why there is a 1MB limit set. Your claim that there are 1MB objects in the benchmark is false.
2) the CGI executable is mandated by the SPECweb99 Run Rules. A process must be created and destroyed. But the total amount of CGI requests is 0.1%! All the other 99.9% of the workload was handled with IIS 'low application priority' modules, which is a DLL loaded into IIS's address space, not a.EXE.
3) the IIS object cache was set to 2GB (not 2MB). It's set to 2GB because Windows 2000 + IIS has a serious limitation, threads (such as the IIS threads) can only address 2GB. This is a design flaw in Windows 2000, which hunts them in the enterprise now.
4) are you really seriously promoting the idea that the top 4 PC OEMs (Dell, IBM, Compaq, HP) and Microsoft did not tune IIS to the max and somehow conspired in making Linux+TUX numbers look good?
Fact is, the only reason why the TUX result was compared to the same Dell system is that the Dell system also happened to have the fastest Windows 2000 results. Your whole line of argumentation is obviously flawed if you compare IBM's similar Windows 2000 SPECweb99 result to the TUX result.
so your opinion is that IBM (4 CPUs), Dell (4 CPUs), HP (2 CPUs) and Mindcraft (2 CPUs) all misconfigured their Windows 2000 Advanced Server SMP systems to intentionally (or due to lack of expertise) degrade Windows 2000 SPECweb99 performance? The top 4 PC OEMs doing 60% of all Windows sales all mess up Windows 2000 tuning, in a similar way? Dont you think that it's in the basic interest of Microsoft to actively help these companies to tune their Windows 2000 systems properly?
Reality is that these results are all well-tuned. The reason for the differences you noticed is that OS tuning is very different, even on the same hardware. One OS works better with large buffers, one with smaller buffers. Some parameters might not make any difference to performance, but are at some non-default value (and thus have to be reported to SPEC).
If you think that those Windows 2000 systems are not tuned well enough then more power to you, i'm sure you'll be hired immediately by any of these companies, good SPECweb99 performance is a top priority for every hardware vendor.
No, i dont think there is any such divide, and i think TUX does not contradict Unix concepts. CPUs get faster and protocols get more complex every day. Right now the HTTP protocol is common enough to be accelerated by kernel-space - just like the TCP/IP protocol got common enough 10-15 years ago to move into the kernel in many other OSs.
The question thus is not 'should we put HTTP into the kernel', but rather '*when* and *how* should we put HTTP into the kernel'. Think of this as an act of 'caching', the OS caches and should cache 'commonly used protocols'.
Where is the limit? There is no clear limit, but the limit is definitely being pushed outwards every day. HTTP is becoming a universal communication standard, with the emergence of XML the role of HTTP cannot be overhyped i think.
And the last but not least argument, if you dont need it, you can always turn CONFIG_TUX off.
Sorry, but still no. He did not talk about a multithreaded TCP/IP stack, he talked about a 'forked' stack Linux had, which is a clear misconception. Even under Windows the concept of 'multi-threaded stack' and 'SMP-threaded stack'. Eg. it's possible to have a multithreaded TCP/IP stack on a single-processes system. ('multithreaded' in this context means: 'multiple threads can do socket operations' - there are OSs where only one process is allowed to execute TCP/IP operations, even if that process is not executing right now.)
Divide the maximum nr of SPECweb99 connections requested by the nr of CPUs to get the maximum TCP backlog seen. Since Windows did 1598 connections, they needed a backlog of at most ~400. Linux did 4200, so a backlog limit of 1050 or better was needed. Note that there is no point in increasing the backlog limit over the maximum number of connections. (we used 3000 just to be safe.) So the results are comparable. There are other W2K submissions in the same ~1600 connections range. PC vendors and Microsoft submitting those results sure did their homework.
There is no discrepancy here. The SPECweb99 benchmark measures 'number of conforming connections', and the tester choses the # of connections. The SPEC requierment is that every conforming connection must have an average bitrate of at least 320kbits/sec.
What does this mean? Vendors obviously try to maximize # of connections, but they have to keep the bitrate above 320kbits to have a valid benchmark run. You can test with 1 million connections as well, but you'll get an invalid run because the kbit rate will be somewhere around 0.1kbits/sec. This is why you see almost identical kbits values (and all are a bit above 320kbits/sec), but different connections and ops/sec values. I hope this explains things.
See the SPEC-enforced Web99 Run Rules, there are alot of very strict requirements for a result to be accepted by SPEC.
Your argument is flawed. Look here for an IBM/IIS SPECweb99 result done on a similar 4x 700 MHz Xeon system. Check out this IBM result as well. And there are HP and even Mindcraft submissions. Dell has the fastest Windows 2000 numbers, and it's fair to compare the fastest Windows 2000 results to the fastest TUX results, especially if they were done on similar hardware.
You assume that IBM, HP, Mindcraft and Dell are all in a big conspiracy to make Windows 2000 numbers look bad - are you kidding? The reality is that there is fierce competition for best SPECweb99 numbers, and Linux/TUX is just plain faster.
The other flaw in your argument is this TUX dynamic module. Check out the source code, TUX does dynamic modules. (besides, the SPECweb99 workload includes 30% dynamic load, so all SPECweb99 webservers must support dynamic applications.)
You are confusing two completely different architectural concepts.
"threads" (which get created) and "processes" (which get forked) are 'context of execution' entities. Linux has both, TUX 1.0 uses both.
A "threaded TCP/IP stack" is a slightly mis-named thing, it means "SMP-threaded TCP/IP-stack", which in turn means that the TCP/IP stack has been "SMP-deserialized" (in Windows speak) - TCP/IP code on different CPUs can execute in parallel without any interlock/big-kernel-lock overhead or other serialization.
A 'threaded TCP/IP stack' has no connection whatsoever to a 'threads'.
FYI, the Linux TCP/IP stack was completely redesigned and deserialized during the 2.3 kernel cycle, this redesign/deserialization was done by David Miller and Alexey Kuznetsov. The TUX webserver of course relies on the deserialization heavily, but this is not the only architectural element TUX relies on.
i'm the one who designed/wrote most of TUX, and here are some facts about it.
'TUX' comes from 'Threaded linUX webserver', and is a kernel-space HTTP subsystem. TUX was written by Red Hat and is based on the 2.4 kernel series. TUX is under the GPL and will be released in a couple of weeks. TUX's main goal is to enable high-performance webserving on Linux, and while it's not as feature-full as Apache, TUX is a 'full fledged' HTTP/1.1 webserver supporting HTTP/1.1 persistent (keepalive) connections, pipelining, CGI execution, logging, virtual hosting, various forms of modules, and many other webserver features. TUX modules can be user-space or kernel-space.
The SPECweb99 test was done with a user-space module, the source code can be found
here. We expect TUX to be integrated into Apache 2.0 or 3.0, as TUX's user-space kernel-space API is capable of supporting a mixed Apache/TUX webspace.
TUX uses a 'object cache' which is much more than a simple 'static cache'. TUX objects can be freely embedded in other web replies, and can be used by modules, including CGIs. You can 'mix' dynamically generated and static content freely.
While written by Red Hat, TUX relies on many scalability advances in the 2.4 kernel done also by kernel hackers from SuSE, Mandrake and the Linux Community as a whole. TUX is not one single piece of technology, rather a final product that 'connects the dots' and proves the scalability of Linux's high end features. I'd especially like to highlight the role of extreme TCP/IP networking scalability in 2.4, which was a many months effort lead by David Miller and Alexey Kuznetsov. We'd also like to acknowledge the pioneering role of khttpd - while TUX is independent of khttpd, it was an important experiment we learned alot from.
Other 2.4 kernel advances TUX uses are: async networking and disk IO, wake-one scheduling, interrupt binding, process affinity (not yet merged patch), per-CPU allocation pools (not yet merged patch), big file support (the TUX logfile can get bigger than 5GB during SPECweb99 runs), highmem support, various VFS enhancements (thanks Al Viro), the new IO-scheduler done by SuSE folks, buffer/pagecache scalability and many many other Linux features.
The filesize limit difference does not matter because maximum filesize in SPECweb99 is 900KB. Ie. both IIS and TUX cache all files.
backlog limit does not make a difference, because neither IIS nor TUX hit the backlog limit.
IIS is not using a mirrored disk for logging, where do you take that from?
The other questions are answered here.
And please do not ignore the lower-end benchmarks as well - single-Xe on server with 2GB RAM is not all that uncommon these days.
8GB RAM was used in the server. Could it be that you misread the SPEC sheet and saw the amount of RAM clients have?
1) the maximum filesize in the SPECweb99 benchmark is 900kb, this is why there is a 1MB limit set. Your claim that there are 1MB objects in the benchmark is false.
2) the CGI executable is mandated by the SPECweb99 Run Rules. A process must be created and destroyed. But the total amount of CGI requests is 0.1%! All the other 99.9% of the workload was handled with IIS 'low application priority' modules, which is a DLL loaded into IIS's address space, not a .EXE.
3) the IIS object cache was set to 2GB (not 2MB). It's set to 2GB because Windows 2000 + IIS has a serious limitation, threads (such as the IIS threads) can only address 2GB. This is a design flaw in Windows 2000, which hunts them in the enterprise now.
4) are you really seriously promoting the idea that the top 4 PC OEMs (Dell, IBM, Compaq, HP) and Microsoft did not tune IIS to the max and somehow conspired in making Linux+TUX numbers look good?
Fact is, the only reason why the TUX result was compared to the same Dell system is that the Dell system also happened to have the fastest Windows 2000 results. Your whole line of argumentation is obviously flawed if you compare IBM's similar Windows 2000 SPECweb99 result to the TUX result.
Reality is that these results are all well-tuned. The reason for the differences you noticed is that OS tuning is very different, even on the same hardware. One OS works better with large buffers, one with smaller buffers. Some parameters might not make any difference to performance, but are at some non-default value (and thus have to be reported to SPEC).
If you think that those Windows 2000 systems are not tuned well enough then more power to you, i'm sure you'll be hired immediately by any of these companies, good SPECweb99 performance is a top priority for every hardware vendor.
It says 4 CPUs in the server, 2 CPUs on every client. Could you be confusing those two numbers?
No, i dont think there is any such divide, and i think TUX does not contradict Unix concepts. CPUs get faster and protocols get more complex every day. Right now the HTTP protocol is common enough to be accelerated by kernel-space - just like the TCP/IP protocol got common enough 10-15 years ago to move into the kernel in many other OSs.
The question thus is not 'should we put HTTP into the kernel', but rather '*when* and *how* should we put HTTP into the kernel'. Think of this as an act of 'caching', the OS caches and should cache 'commonly used protocols'.
Where is the limit? There is no clear limit, but the limit is definitely being pushed outwards every day. HTTP is becoming a universal communication standard, with the emergence of XML the role of HTTP cannot be overhyped i think.
And the last but not least argument, if you dont need it, you can always turn CONFIG_TUX off.
(sorry for the messed up sentence, it should read: 'even under Windows the concept of a multithreaded and SMP-threaded TCP stack are different.')
Sorry, but still no. He did not talk about a multithreaded TCP/IP stack, he talked about a 'forked' stack Linux had, which is a clear misconception. Even under Windows the concept of 'multi-threaded stack' and 'SMP-threaded stack'. Eg. it's possible to have a multithreaded TCP/IP stack on a single-processes system. ('multithreaded' in this context means: 'multiple threads can do socket operations' - there are OSs where only one process is allowed to execute TCP/IP operations, even if that process is not executing right now.)
Divide the maximum nr of SPECweb99 connections requested by the nr of CPUs to get the maximum TCP backlog seen. Since Windows did 1598 connections, they needed a backlog of at most ~400. Linux did 4200, so a backlog limit of 1050 or better was needed. Note that there is no point in increasing the backlog limit over the maximum number of connections. (we used 3000 just to be safe.) So the results are comparable. There are other W2K submissions in the same ~1600 connections range. PC vendors and Microsoft submitting those results sure did their homework.
What does this mean? Vendors obviously try to maximize # of connections, but they have to keep the bitrate above 320kbits to have a valid benchmark run. You can test with 1 million connections as well, but you'll get an invalid run because the kbit rate will be somewhere around 0.1kbits/sec. This is why you see almost identical kbits values (and all are a bit above 320kbits/sec), but different connections and ops/sec values. I hope this explains things.
See the SPEC-enforced Web99 Run Rules, there are alot of very strict requirements for a result to be accepted by SPEC.
Sure, more info is here. (which happens to be a comment in this thread :-) )
oops, wrong URL, the right link for the source code is here. It's standard user-space code, you can output any dynamic page with TUX as well.
You assume that IBM, HP, Mindcraft and Dell are all in a big conspiracy to make Windows 2000 numbers look bad - are you kidding? The reality is that there is fierce competition for best SPECweb99 numbers, and Linux/TUX is just plain faster.
The other flaw in your argument is this TUX dynamic module. Check out the source code, TUX does dynamic modules. (besides, the SPECweb99 workload includes 30% dynamic load, so all SPECweb99 webservers must support dynamic applications.)
You are confusing two completely different architectural concepts.
"threads" (which get created) and "processes" (which get forked) are 'context of execution' entities. Linux has both, TUX 1.0 uses both.
A "threaded TCP/IP stack" is a slightly mis-named thing, it means "SMP-threaded TCP/IP-stack", which in turn means that the TCP/IP stack has been "SMP-deserialized" (in Windows speak) - TCP/IP code on different CPUs can execute in parallel without any interlock/big-kernel-lock overhead or other serialization.
A 'threaded TCP/IP stack' has no connection whatsoever to a 'threads'.
FYI, the Linux TCP/IP stack was completely redesigned and deserialized during the 2.3 kernel cycle, this redesign/deserialization was done by David Miller and Alexey Kuznetsov. The TUX webserver of course relies on the deserialization heavily, but this is not the only architectural element TUX relies on.
'TUX' comes from 'Threaded linUX webserver', and is a kernel-space HTTP subsystem. TUX was written by Red Hat and is based on the 2.4 kernel series. TUX is under the GPL and will be released in a couple of weeks. TUX's main goal is to enable high-performance webserving on Linux, and while it's not as feature-full as Apache, TUX is a 'full fledged' HTTP/1.1 webserver supporting HTTP/1.1 persistent (keepalive) connections, pipelining, CGI execution, logging, virtual hosting, various forms of modules, and many other webserver features. TUX modules can be user-space or kernel-space.
The SPECweb99 test was done with a user-space module, the source code can be found
here. We expect TUX to be integrated into Apache 2.0 or 3.0, as TUX's user-space kernel-space API is capable of supporting a mixed Apache/TUX webspace.
TUX uses a 'object cache' which is much more than a simple 'static cache'. TUX objects can be freely embedded in other web replies, and can be used by modules, including CGIs. You can 'mix' dynamically generated and static content freely.
While written by Red Hat, TUX relies on many scalability advances in the 2.4 kernel done also by kernel hackers from SuSE, Mandrake and the Linux Community as a whole. TUX is not one single piece of technology, rather a final product that 'connects the dots' and proves the scalability of Linux's high end features. I'd especially like to highlight the role of extreme TCP/IP networking scalability in 2.4, which was a many months effort lead by David Miller and Alexey Kuznetsov. We'd also like to acknowledge the pioneering role of khttpd - while TUX is independent of khttpd, it was an important experiment we learned alot from.
Other 2.4 kernel advances TUX uses are: async networking and disk IO, wake-one scheduling, interrupt binding, process affinity (not yet merged patch), per-CPU allocation pools (not yet merged patch), big file support (the TUX logfile can get bigger than 5GB during SPECweb99 runs), highmem support, various VFS enhancements (thanks Al Viro), the new IO-scheduler done by SuSE folks, buffer/pagecache scalability and many many other Linux features.