5 Years of Linux Kernel Releases Benchmarked
An anonymous reader writes "Phoronix has published benchmarks of the past five years worth of Linux kernel releases, from the Linux 2.6.12 through Linux 2.6.37 (dev) releases. The results from these benchmarks of 26 versions show that, for the most part, new features haven't affected performance."
What about running the same study on the Windows kernel from XP to 7?
phoronix is an embarrassment to open source and linux devs. Worst site ever.
They tested in a VM. Now where's the proof that by itself doesn't affect performance in an unpredictable way?
Obviously... they forgot the bloat feature.
HTC EVO 4G LTE w/ CM 10.2 | NookColor w/ CM 10.2 | Samsung Epic 4G w/ CM 10.1
WHAT?!!? Nobody cares about old Linux benchmarks?!
It would seem that disc performance was better on 2.6.28, then 2.6.29 something was introduced that wasn't fixed and has been the legacy of 2.6.30.
The question is: what prioritized process arrived in 2.6.30 that causes disk access to be inferior?
Also of note, BENCHMARK IS IN VIRTUAL MACHINE not an actual installation: Failure.
It seems almost every benchmark that had any difference was slower in more modern kernels. It's not all sunshine and roses.
It seems that Phoronix needs a faster kernel on their server...
Seriously though, Some of the performance drops (and how they have been sustained in later kernel versions) makes me wonder if there is adequate load testing as part of the kernel QA process.
What more Linux benchmarking do you need besides bogomips? Jeez.
I find this hard to believe, what with 2010 being the year of Linux on the desktop and all.
Where are the kernel-level tests that do more than exercise the filesystem and network driver (singular) and the scheduler? More than half of those charts were flat, which could mean they weren't making appropriate measurements.
For example, show how mutexes have improved, or copy-on-write, or interrupt handlers, or timers, or workqueues, or kmalloc, or anything else that a system and kernel programmer would care about. I like the user-centric perspective: it's very good information to have and share, but don't call what you've done a kernel benchmark. Maybe call it a kernel survey of its impact on users.
Who's your user, program?
I love that Phoronix is willing to take the time to run tests like this. I just wish they'd learn how to run meaningful tests. For instance, why are they testing a bunch of CPU-bound things? Kernel won't affect that unless we're talking about SMP performance. If you want to test the kernel, test how well it handles SMP, network I/O and disk I/O. And bear in mind that disk I/O will be hugely affected by which filesystem is used and its configurable settings.
Another problem with their article is that it tests individual kernels. Most folks don't use a vanilla kernel. They use one provided by their distro, which may have distro-specific patches that address some of the performance problems (or add new ones). What I would have preferred to see is a comparison of different distro releases over the last 5 years, focusing on the most popular ones (say Ubuntu, Fedora and SuSE).
The meaningful tests (and their results) were:
1. GnuPG: avoid 2.6.30 and later.
2. Loopback TCP: avoid 2.6.30 and later.
3. Apache Compilation: avoid 2.6.29 and earlier.
4. Apache static content: avoid 2.6.12, 2.6.25, 2.6.26, then 2.6.30 and later.
5. PostMark: avoid 2.6.29 and earlier.
6. FS-Mark: avoid 2.6.17 and earlier, 2.6.29, then 2.6.33 to 2.6.36.
7. ioZone: unless you're willing to run 2.6.21 or earlier, avoid 2.6.29 and you're fine.
8. Threaded I/O: avoid 2.6.20 and earlier, 2.6.29, then 2.6.33 to 2.6.36.
Based on these results, #1 and #2 seem to be testing the same thing, and tests #3 and #5 seem to be testing the inverse of whatever that thing is. 2.6.29 seems to be especially crappy, performing worse than the kernels immediately before and immediately after it on tests #6, #7 and #8. In terms of recent kernels, tests #6 and #8 suggest a regression in 2.6.33 that has been resolved in 2.6.37.
If it were me, I'd look at either running 2.6.37 (when its released) or fall back to 2.6.32 if my hardware was supported.
Or so trolls would like to believe.
What's next, we all believe Eugenia from OSNews when she spews about BeOS? These guys are just page-view leeches, ignore them and they'll wither and die.
I want to delete my account but Slashdot doesn't allow it.
This comes as no surprise. In any activity which is mostly limited by CPU in user mode, not much changes, you can track that over a number of operating systems. What has gotten slower is disk io and network transfer time, and some tests, such as web serving, may be using all or mostly pages in memory, so this is not as obvious as it might be.
In addition, the test was run in a virtual machine, so to some extent the huge host memory provided more resources, and the very fast disk hides poor choices in the io scheduling and provides additional write cache and buffers. In other words, neither the tests chosen, or the environment used, were typical for small server or generous desktop.
For a meaningful test no more than four CPUs (or two with hyperthreading) should be used, and all io should go to a real rotating disk, like a $100 1TB WD or Seagate, and the filesystems should be on that, not some fancy large SSD. Then some numbers can be identified which reflect the performance on machines in the small server or fast desktop price range of a motivated home user or budget limited small business. Then the limitations of the CPU and io scheduler changes will be more evident, and perhaps the performance using the deadline scheduler should be included, since discussions on Linux-RAID mailing list indicate that many of us find the default scheduler is a bottleneck for typical loads (particularly raid-[56]).