Linux Kernel Benchmarking: 2.4 vs. 2.6-test
frooyo pastes from kerneltrap: "Cliff White recently posted some re-AIM multiuser benchmark results comparing the stable 2.4.23-pre5 kernel against the 2.6.0-test5 and 2.6.0-test5-mm4 development kernels. In his conclusion he makes reference to earlier scheduler tests posted by Mark Wong saying, "Short summary: we mostly rock.""
I run 2.4.22 at work and 2.6.0-testX at home. The 2.6.0test(vanilla) series feel much more responsive, especially in X. I have not done any real benchmarks of my systems, but after working with 2.4 all day 2.6 seems to fly.
Just my observation
-the_crowbar
Have you read the Moderator Guidelines
AIM (now at version 7) is not an instant messanger client. It's a benchmarking tool. Click on the link in the story to see what it is/does/etc.
A better comparision would have been against Solaris x86. Solaris scales very linearly with every added processor.
Gnumeric (which I have on KDE at least) is a non-sucky spreadsheet. In fact, in the course I was TAing last spring the prof had to switch to it from Excel because it could handle the operations better. The only complaint I have about it is that I can't (or at least I haven't figured out) how to cut and paste into a text document (and vice versa). ...But that was point #4 as opposed to #3, so you can strike one off.
The fix to #5 is easy.
I have something in common with Stephen Hawking...
Since it seems your running debian and all those cpu intensive operations are also hd intensive operations have you checked hdparm -d /dev/hda . I know it is simple but it is so simple that I forgot to check for about a month. Debian appears to have dma off by default.
Devphaeton, you hit the nail on the head about 2.6.0. Its main advantage over 2.4.x (for this luser anyway) is the smoother multitasking even on a uniprocessor system. I'm running a tweaked 2.6.0-test5 on my laptop, and jobs that would make 2.4.x unusable are barely detectable (from the standpoint of moving the mouse around, typing up slashdot articles, and the like).
:-)
Of course, the ACPI support and swsusp doesn't hurt either
Of course, there is the possibility of trimming cycles from the process of switching contexts. Linux, though, already had that pretty low. That's why Linus is so resistant to shared-memory, shared-context threads: the cost of processes is so low that the benefits are small. However, some speed was gained in context switches.
Overall, though, more switching means slower performance, even though the user feels like the system is faster. It's not faster. It's actually slower. It's just more responsive.
Confused yet? :)
Linux IT Consulting and Domino Development in Michigan
I've tried the new kernel, and I got more responsiveness issues than improvements. But besides that (I might very well have misconfigured something), I'd like to point out that the kernel itself isn't all that matters: the new drivers that accompany it are just as much important. I noticed a significant increase in X's launch time as well as a whopping 250 FPS with glxgears to be compared to the 150 FPS I got with my 2.4.22 setup. This is probably due to major improvements that were brought to the drivers for my i830M chipset.
This is a simulation of a database load. Basically, larger numbers are better. The numbers are tasks per minute and peak user count. The load adds users each iteration until a max is reached. See http://developer.osdl.org/cliffw/reaim/index.html for more
The workload simulates a multi-user system by running an increasing number of users. Each user does a list of tasks. We keep adding users, until the load reaches a max. The score shows tasks per minute, and peak user count. Bigger is better. http://www.osdl.org/stp
Yes, the number for dual is not 1017, but more like 1545.
Here are the actual numbers for 2.6.0-test5 and the compute workload:
1 - 992.06 - 100%
2 - 1545.03 - 155%
4 - 5175.28 - 521%
Now for why the 4 processor case is actually 5 times better than the single CPU case, I do not know enough about the benchmarks to comment.
Genebrew
Notice that while the new kernel 'kicks ass' on SMP systems, on uniprocessor systems the 2.4 kernel is the one kicking ass. Anyone benchmarked 2.4 against some of the pre-SMP kernels on a uniprocessor machine?
Yeah, they missed an important test - latency for interactive processes. A lot of scheduler work went into improving this, and it makes a huge difference when you have large memory processes working hard.
This aspect is improved across the board in 2.6, as well as the SMP issues. Sure, the uniprocessor machine may be a little slower, but response latencies in X are a lot better, and this makes more of a difference to users.
a quad cpu more performant than 4 * single cpu?
Odd but not impossible.
For example, if in the single cpu config the processes are doing a lot of memory-cache missed then having 4 cpus (with 4 times more the amount cache) could reduce the number a cache misses and so could make the quad configuration more than 4 times faster.
The same reason could explain why 2 cpus are not faster than one: if 2 caches are not large enough and if the processes have a very bad locality then you may get as much cache misses with the dual cpu system than with the single cpu system.
Desktop Linux kicks ass. With 2.6, interactivity on an unloaded system is close to WinXP, and on a heavily loaded one (the steady state of my machine :) kicks XP's ass all over the place.
A deep unwavering belief is a sure sign you're missing something...
The issue with hyperthreading's performance drop comes from the fact that both logical threads are contending for the same cache. Thus, code has to be rewritten in an HT-equipped machine to only use half the cache it normally would take. Thus, in your typical 512k cache machine, you've got to profile your loops, etc, so that it only uses half that cache. The typical program is not written with specific requirements on how much cache they use, thus they throw as much data as possible into cache, causing the two logical threads to fight over the cache, degrading performance. Pretty much any program will act this way, unless compilers get smart enough to have compile-time control of a cache model so that one can recompile everything to take advantage of HT.
Marxism is the opiate of dumbasses
but it doesn't. GTK+ for Windows is very very buggy.
I don't post charts when sending to a text-only mailing list such as linux-kernel. Not much point to that. If you'd like charts, see the full reports here: http://developer.osdl.org/cliffw/reaim/index.html
I use Gaim 0.68 on w2k and FreeBSD. It works quite well.