High-Performance Programming Techniques on Linux
Dejected @Work writes "A senior IBM developer has come out with a series of articles on high performance Linux programming techniques using pipes, sockets, threads, and processes. The series has been running for a while and juxtaposes these high performance technique with Linux and Windows. Guess who wins?"
Slashdot makes no guarantees as to the accuracy of the above information. All data provided are for entertainment purposes only, and any damage to hardware as a result of following the above advice is the sole responsibility of the user.
Slashdot's Attorney
Linux, always know for it's fast context switches, is faster that Windows, which is pretty slow at context switches, as far as OS's go. Is faster doing a minimal load multi-threaded application. Well, it's not quite a shocker, but glad to know that we are still ahead.
That's the most ludicrous statement. How can one call a normal send/recv loop high performance socket code?
I, for one, remain totally unconvinced by this article (at least the guy who wrote it admits he doesn't know anything about Windows). How can one possibly compare "high performance" I/O on Windows without using overlapped I/O, and possibly even completion ports?
From the article about pipes:
The number 24 in the first executable line of code above was determined experimentally. I found no mention of it anywhere in the Platform SDK. If it is not present, the program doesn't work. Apparently, the pipe facility requires a 24-byte header on each write to the pipe.
If this were Linux, we'd be able to know what that 24 bytes was.
If tits were wings it'd be flying around.
This article is a typical case of benchmark bullshit. The author has taken a deliberately Unix-centric view of comuputing, and ignored design and implementation concepts that are normal for Windows-based systems.
In the synchronisation article (the /. poster missed the link for that one) only Mutexes, Semaphores and Critical Sections are evaluated. It is well known that mutex performance on Windows is poor compared to *nix, but that is mitigated by a number of benefits in the Windows threading model.
Here's a brief intro, to show why they CAN'T be compared:
Windows has processes and threads as first class citizens, and they have fair (multi-level round-robin) scheduling. Mutexes, semaphores and critical sections are the primary locks, but there are also atomic check-and-increment functions as well as events/signals (long lasting flags). Every object (mutex, sem, section, event, thread, process, file, socket, etc) in Windows can be waited on, and you can wait on any number and combination of objects at once, in either an AND or OR configuration. e.g. wait for a mutex AND an async socket IO; or wait for a semaphore OR a thread to end OR an event
Linux's options are far more limited - to achieve the same results you have to use a different architecture (not that this is necessarily a bad thing); on the other hand Linux's primitives and context switching is faster than the Windows equivalents. Linux has kernel scheduled processes, userland threads (kernel threads are available), a fair but not deterministic scheduler, mutexes, semaphores and condition variables.
A condition variable is similar to an event, but is instantaneous - if no thread is waiting on the condition variable, nothing happens. An event stays set until it releases a thread (auto-reset events) or until explicitly reset (manual-reset events). A condition variable one of the few time-waitable objects in Linux (all objects are time-waitable in Windows; mutexes and semaphores are not time-waiting in Linux).
The comparitive power of the Windows' threading and synchronisation model may not be obvious to long-time Unix programmers, but consider the wider range of architectural possibility when you can wait (with a timeout) on any combination of any objects in the system.
In the socket article, the author compares the BSD socket API on Linux with the WSASocket API on Windows, which is meant primarily for asynchronous operation. Despite claiming techniques for "high performance" sockets, he fails to mention /dev/poll, POSIX AIO, or Window's IoCompletion Ports. POSIX AIO can be reasonably compared to Window's async socket/file support, but it is impossible to make a valid comparison between /dev/poll (or kqueue, etc) and IoCompletion Port because they require significantly different architectures to function at peak efficiency.
On to processes and threads. CreateProcess() has the combined functionality of fork() and exec(), so the article starts off on the wrong foot. It also supports security attributes, so the equivalent Linux example should have had a larger function starting with fork(), then dropping permissions in the child and exec()ing another binary.
The author incorrectly assets that Linux threads are scheduled by the CPU - he is using the pthreads library, which is userland threading. pthreads is also far from "fair"; Windows uses a multi-level round-robin algorithm, which makes thread scheduling very deterministic; pthreads is far more prone to thread starvation in a system where processing cascades between threads. e.g. an input thread, processing thread and output thread, which use mutex-protected queues to communicate; this is an excellent architecture for Windows, but performs poorly by comparison on *nix because a sudden heavy load will see the input thread scheduled more often that other threads, until it's load dies down, at which point the processing thread will get the load, and so on - throughput stays much the same as a Window system, but latency near-triples.
Benchmarking thread creation is a load of crap. Few seriously high-performance servers use a thread-per-connection architecture anymore; and at the very least they use thread pools.
The entire article is unfair to both sides: on Windows, threads are first-class citizens; on Linux you are more likely to use multiple processes for stability and performance.
I've already covered everything necessary to dispute the bullshit in the Scheduling article.
Conclusion: this is an excellent case of "don't believe the FUD". You can't compare apples and apples when some of the apples are growing on an orange tree. The only way to achieve a meaningful comparison of these platforms is to construct applications with equivalent functions, but designed and implemented for the target platform.
i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
For those of us who write cross platform code, we need to know these numbers in order to see if we want to try to get by with the most common code possible, or do we spend the extra time on the windows platform and write some sort of non-portable crap to work around the fact that Windows Posix support truely, truely sucks.
And don't look for this to change anytime soon. It is in MS'es best interest to force people to use their API's instead of open API's developed with 35 years of experience as to what works and what doesn't.
multi-threading is why, for example aolserver can do with one process what apache needs a bunch of processes to do. (though i digress, aolserver only has to run tcl interps, where apache is much more versatile.)
meanwhile, both FreeBSD and NetBSD are trying to get SMP and scheduler activations into their kernels. this would improve their support for multi-threading substantially. there's a paper which explains this better than i ever could.
The validity of the exercise is compromised by his assumption that that multiple processes as opposed to multiple threads was the best choice for whatever his benchmark is supposed to model, and that if they are, RPC, COM or shared memory are not more appropriate to the IPC task. Windows has many ways of doing IPC and concurrent tasking, and most applications use other IPC methods than pipes. This failure of choice is an important reason why such like-for-like benchmarks are of little value.
In short, these "high-performance techniques" are high-performance on Linux only, the way he does it. On windows, other methods, not available on Linux, are more used.
NO ID: BEING FREE MEANS NOT HAVING TO PROVE IT