High Performance Network Applications
An Anonymous Coward sent in this: "An article over at SysAdmin magazine seeks the truth while comparing network application performance under RH Linux, Solaris x86, FreeBSD 4.2, and Windows 2000. I'm a little suspicious of the writer's results, but you be the judge."
They say:
/etc/system:
n ms 415/patch1/TuningGuide.html )
/etc/init.d/inetinit:
/dev/tcp tcp_keepalive_interval 30000
/dev/tcp tcp_time_wait_interval 15000
/dev/tcp tcp_conn_req_max_q 1024
/dev/tcp tcp_conn_req_max_q0 1024
/dev/tcp tcp_xmit_hiwat 32768
/dev/tcp tcp_recv_hiwat 32768
/etc/vfstab, like this:
/dev/rdsk/c0t1d0s7 /opt ufs 2 yes logging,noatime
/etc/sysctl.conf:
> At Lyris Technologies, we write high-performance, cross-platform,
> email-based server applications. Better application performance is
> a competitive advantage, so we spend a great deal of time tuning all
> aspects of an application's performance profile (software, hardware,
> and operating system). Our customers frequently ask us which operating
> system is best for running our software. Or, if they have already chosen
> an OS, they ask how to make their system run our applications faster.
> Additionally, we run a hosting (outsourcing) division and want to reduce
> our hardware cost while providing the best performance for our hosting
> customers.
What a crap! They're claiming to be experts! Ha!
They just don't know how to tune Solaris or FreeBSD properly.
Results will be completely different if they've tuned it well.
Solaris Tuning Guide.
1) Apply latest recommended patches from http://sunsolve.sun.com
2) Add the following to the end of
* Raise TCP connection buffer size
set tcp:tcp_conn_hash_size=262144
* Increase various kernel buffers
set maxusers=2048
* Set hard limit on file descriptors
set rlim_fd_max=1024
* Set soft limit on file descriptors
set rlim_fd_cur=1024
* Increase directory name lookup cache
set ncsize=100000
* Should be the same as setting above
set ufs_ninode=100000
* Enable priority paging
set priority_paging=1
(These settings are based on information taken from:
http://docs.iplanet.com/docs/manuals/messaging/
3) The following should be at the bottom of
# TCP stack tuning
# default is 7200000
ndd -set
# default is 240000
# change to "tcp_close_wait_interval" on Solaris 2.6
ndd -set
# default is 128
ndd -set
# default is 1024
ndd -set
# default is 8192
ndd -set
# default is 8192
ndd -set
4) Speed up filesystem access under Solaris 2.7 and later.
Add logging to filesystem mount options in
/dev/dsk/c0t1d0s7
I have added noatime - this is another setting that might help
on very busy filesystem, but not that much as logging.
FreeBSD Tuning Guide
Recompile kernel with increased number of MAXUSERS (good number
to start is 256) and NMBCLUSTERS (I use 10000, see netstat -m
under load to get number that good for you).
You might want to play with "options HZ=1000".
Add this to
kern.maxfiles=65536
kern.maxfilesperproc=32768
net.inet.tcp.delayed_ack=0
net.local.stream.recvspace=65535
net.local.stream.sendspace=65535
net.inet.tcp.sendspace=65535
net.inet.tcp.recvspace=65535
Turn on softupdates on all filesystems
using tunefs -n enable (noatime might help as well).
Vadim Mikhailov
Agreed -- it's been a long time since I've seen a "benchmark" as poor as this one. But I don't think Windows was treated any more poorly than the other OSes. It wasn't a fair test of any of them.
The "tuning" for the Unix systems consisted in bumping up the maximum number of file descriptors. That's it. The FreeBSD system in particular was left completely mistuned and clearly running out of socket resources -- they report that it was logging errors but seem entirely ignorant of what those errors were (beyond their being load-related) and how to correct them.
Polling is hardly the best system interface for multiplexing TCP connections on either Windows or FreeBSD. As you mention, completion ports are best for Windows. Kqueue is best for FreeBSD. It just happens that polling is used in the crappy commercial SPAM program they "benchmarked". (All the OSes support scatter/gather, BTW, so you can't claim Windows was treated unfairly by its omission.)
None of the systems were testing in a way that shows their actual capabilities. The article is just a thinly disguised commercial for a (barely-)cross-platform "bulk email" product.
While your point that this benchmark is somewhat flawed is correct, you also point out a large problem with Windows:
You are forced to use proprietary MS-only extentions rather than straight, standardized POSIX calls to achieve the best performance. That means you have to suffer proprietary lock-in if you want to code high performance network applications for Windows.
I think is deliberate: there is no reason why calls like malloc, creat, mmap, poll, whatever, couldn't have been tuned to get similar performance to the Windows specific VirtualAlloc, CreateFile, etc. Microsoft wants you to trade off portability for speed.
I think is deliberate: there is no reason why calls like malloc, creat, mmap, poll, whatever, couldn't have been tuned to get similar performance to the Windows specific VirtualAlloc, CreateFile, etc.
... apart from the fact that they expose different paradigms entirely?
Malloc - heap based allocation
VirtualAlloc - allocates entire pages from the VMM. Allows you to reserve or commit pages when and as you need them.
fopen - opens a file handle
CreateFile - Allows you to open a file handle, specifying buffers to use, etc etc etc.
poll - you sit there waiting and doing nothing most of the time because you're asking all your connections "are we there yet?"
CompletionPorts - the OS comes back to you when it's done, and tells you that it's finished. You can now use those spare cycles doing something else - like another 1000 network connections.
Simon
Coming soon - pyrogyra
Nice! So in other words, they used straight BSD sockets for their
implementation - which is NOT the way to get performance from Windows. You
need to use:
1. Asynchronous, Event based socket handling.
2. Completion ports.
3. Scatter/Gather buffering.
Polling is lousy no matter what way you do it. You'll lose most of your
performance spent going round a small loop.
Similarly you can infer that they used straight malloc() for their memory
handling, and most likely file handling - again very lousy
performance-wise on windows compared to the alternatives, such as
VirtualAlloc, CreateFile(), scatter-gather file handling and more.
As for the second test, we can guess (from their comments) that they're
using straight C++/C file operations under windows instead of tuning them to
the architecture, so of course performance is going to be lousy -- they're
benchmarking Microsoft's C runtime implementation, nothing more, nothing
less.
Also note that:
1. They don't provide details of which compiler they're using.
2. They don't provide details of the actual benchmark code for test 2.
3. They only tuned the Linux, FreeBSD and Solaris setups -- they should have
tuned Win2k server as well.
Sheesh. Talk about a crappy way to benchmark.
Simon
Coming soon - pyrogyra
The method used here for programming Windows 2000 is almost certain to guarantee slow results. Assuming he's written his code to use select() or even WaitForSingleObject() then he's signifiantly slowing down the system.
If you want to write high performance socket applications on Windows you MUST use I/O completion ports (something this article failed to mention at all). Most high load applications I've written using sockets have shown a 50% to 100% improvement in throughput for the same CPU load when switching to I/O Completion ports from a tradition (Unix style) asyncronous I/O model.
I'm not saying in this case that Win2k would beat Linux, just that the tests were skewed by the author's inadequate knowledge of writing high performance code on Windows 2000.
Fear: When you see B8 00 4C CD 21 and know what it means
I read this a couple of weeks back when a linux-centric friend sent it to me... my main observation: This is Obviously a comercial masquerading as a 'test'. When the 'device' being used to do this so called 'benchmark' is a software application written by the testers for something else, there is nothing else to call it. Maybe the title of the article is a bit misleading, the meat clearly says all they are doing is showing which OS they have optimized thier application for. They then use that as the FLAWED basis for determining which OS is 'best'? Give me a break.
It's clear from their comments that they did not turn on Softupdates on the filesystems when they set up their FreeBSD machine for the testing. It's no wonder that they found disk I/O to be slower on FreeBSD, therefore.
Traditionally, Linux has traded speed for safety in filesystem meta data handling. FreeBSD has always refused to do so, insisting that metadata be updated synchronously. With softupdates, the metadata is cached, but the cache is flushed in the right order. The upshot is that you get the speed and the safety.
In short (too late), I am sure that their opinion of FreeBSD would improve markedly if they would set it up properly.
From what I see, just about every other OS represented has a defender saying exactly the same thing. That doesn't speak well for the thoroughness of the testing. I'll leave it at that.
I was going to read this article and make an informed comment about it. But, because of my laziness to wait forever for it to load, I'm just going to post this summary of comments to come:
Linux users: Linux is better, Windows is unstable.
Win users: Windows is better, Linux is hard.
BSD users: You're both wrong.
Mac users: Hey, look at us. We are pretty.
Top 3: Mac, shut up.
BeOS users: We're better but y'all will never know it.
Bill Gates: All your $$ is belong to me.
---
Solaris is much more finely grained in its locking than any of the other OSes mentioned. Because of that, comparisons with other OSes running on one or two CPUs (usually on PCs) do not do Solaris its due justice. Sure, Linux or FreeBSD, which aren't very finely grained in their locking (but are working towards changing that) spend less overhead in locking calls, so they run faster.
But how fast can they run on a 32-cpu machine? Or a 64-cpu machine? According to some public documents I saw, Sun will release a 72-cpu machine this summer. They currently support 64 cpus on their E10000 machines. Solaris is a highly scalable OS. Linux is not. FreeBSD is most certainly not. Windows2000 may like to style itself scalable, but come on, we all know they are dreaming. Maybe scalable to 4 CPUs (if you own Pentium Xeons), and maybe in someone's wet dream it could scale to 16 CPUs or so, but none, I repeat none, of these OSes can scale like Solaris.
Solaris' strength isn't the fact that it's blazing fast on a single CPU, because a lot of tests can show Linux is faster. But Solaris *is* blazing fast on massively parallel machines. Solaris shows time and again an amazing ability to scale performance with the addition of more CPUs. The overhead required to build that scalability into the OS penalizes Solaris on single or dual-cpu machines, and that *must* be taken into account by people.
And don't even talk about 64-bit. Sure, Solaris for Intel is limited to 32-bit address spaces due to the constraints of the CPU architecture on which it runs, but Solaris the OS is built through and through as a 64-bit OS, and Solaris running on UltraSparc hardware supports zillions of bytes of RAM. The new SunFire 6800s can support in the hundreds of gigabytes of RAM.
Can Windows2000 do that? Can Linux do that? Can FreeBSD do that? Really we are talking about different markets here, that's all. You really need to test the OSes in the areas they are designed to operate, and then you'll see who the real champ is.