5 Years of Linux Kernel Releases Benchmarked

← Back to Stories (view on slashdot.org)

5 Years of Linux Kernel Releases Benchmarked

Posted by samzenpus on Wednesday November 3, 2010 @06:49PM from the line-them-up dept.

An anonymous reader writes "Phoronix has published benchmarks of the past five years worth of Linux kernel releases, from the Linux 2.6.12 through Linux 2.6.37 (dev) releases. The results from these benchmarks of 26 versions show that, for the most part, new features haven't affected performance."

8 of 52 comments (clear)

Min score:

Reason:

Sort:

Windows Kernels by Anonymous Coward · 2010-11-04 01:32 · Score: 5, Interesting

What about running the same study on the Windows kernel from XP to 7?
Virtual machine, really? by edelholz · 2010-11-04 01:47 · Score: 5, Insightful

They tested in a VM. Now where's the proof that by itself doesn't affect performance in an unpredictable way?
1. Re:Virtual machine, really? by mtippett · 2010-11-04 02:05 · Score: 4, Informative
  
  Considering the efforts going into VM these days and the massive deployments in Fortune 500 companies, the performance of VM based systems is predictable. All the testing with Phoronix Test Suite is repeated until there is less than 3% variance between the results - or the result set is discarded.
  Realistically, looking at older kernels on modern hardware is actually a very critical dimension for corporate server environments. There are applications in that space that are deployed and supported only on some old distribution. Being able to achieve and understanding how Red Hat 7.1 will act vs Red Hat 5 is critical for some environments.
Overkill by TrailerTrash · 2010-11-04 02:35 · Score: 3, Funny

What more Linux benchmarking do you need besides bogomips? Jeez.
You call those kernel benchmarks? by m4c+north · 2010-11-04 03:15 · Score: 5, Insightful

Where are the kernel-level tests that do more than exercise the filesystem and network driver (singular) and the scheduler? More than half of those charts were flat, which could mean they weren't making appropriate measurements.
For example, show how mutexes have improved, or copy-on-write, or interrupt handlers, or timers, or workqueues, or kmalloc, or anything else that a system and kernel programmer would care about. I like the user-centric perspective: it's very good information to have and share, but don't call what you've done a kernel benchmark. Maybe call it a kernel survey of its impact on users.

--
Who's your user, program?
ugh by buddyglass · 2010-11-04 04:15 · Score: 5, Informative

I love that Phoronix is willing to take the time to run tests like this. I just wish they'd learn how to run meaningful tests. For instance, why are they testing a bunch of CPU-bound things? Kernel won't affect that unless we're talking about SMP performance. If you want to test the kernel, test how well it handles SMP, network I/O and disk I/O. And bear in mind that disk I/O will be hugely affected by which filesystem is used and its configurable settings.
Another problem with their article is that it tests individual kernels. Most folks don't use a vanilla kernel. They use one provided by their distro, which may have distro-specific patches that address some of the performance problems (or add new ones). What I would have preferred to see is a comparison of different distro releases over the last 5 years, focusing on the most popular ones (say Ubuntu, Fedora and SuSE).
The meaningful tests (and their results) were:
1. GnuPG: avoid 2.6.30 and later.
2. Loopback TCP: avoid 2.6.30 and later.
3. Apache Compilation: avoid 2.6.29 and earlier.
4. Apache static content: avoid 2.6.12, 2.6.25, 2.6.26, then 2.6.30 and later.
5. PostMark: avoid 2.6.29 and earlier.
6. FS-Mark: avoid 2.6.17 and earlier, 2.6.29, then 2.6.33 to 2.6.36.
7. ioZone: unless you're willing to run 2.6.21 or earlier, avoid 2.6.29 and you're fine.
8. Threaded I/O: avoid 2.6.20 and earlier, 2.6.29, then 2.6.33 to 2.6.36.
Based on these results, #1 and #2 seem to be testing the same thing, and tests #3 and #5 seem to be testing the inverse of whatever that thing is. 2.6.29 seems to be especially crappy, performing worse than the kernels immediately before and immediately after it on tests #6, #7 and #8. In terms of recent kernels, tests #6 and #8 suggest a regression in 2.6.33 that has been resolved in 2.6.37.
If it were me, I'd look at either running 2.6.37 (when its released) or fall back to 2.6.32 if my hardware was supported.
1. Re:ugh by mtippett · 2010-11-04 05:55 · Score: 3, Insightful
  
  This made me laugh - in a good way, not at you :).
  When Phoronix does a distro-comparison the crowd calls out that the tests are only really testing gcc differences, and should have less variables changing. When Phoronix does a fixed comparison varying only one part of the system, the crowd calls out that it isn't a good basis since people don't run it that way.
  Phoronix runs tests in different ways to explore the performance landscape. For some it precisely gives the information that they need, for other it's completely irrelevant. In this particular case, I'm glad that the data gave you enough to have some open questions about 2.6.32 vs 2.6.37. If people walk away with those sorts of first order interpretation, the article served it's purpose.
  Of course the next step would be how do we take a tighter look at the delta between 2.6.32 and 2.6.37 - any thoughts?
  Regarding meaningful vs meaningless tests. The tests Phoronix runs are a collection of tests to explore. The tests were run, and for some of them, the results yielded nothing interesting but were still reported. You don't know until you run the tests, and if the tests are run, you report on them. Some tests may be stable now, but may have sensitivity to other parts of the systems. Even CPU bound tests will yield different results in different cases (scheduler, etc).
Re:Results don't support conclusion by timeOday · 2010-11-04 04:43 · Score: 4, Informative
I would agree it's not all sunshine and roses, but let's at least look a little more closely. There are some disturbing regressions in there, although keep in mind other improvements (such as moving to a journalling filesystem) may come at a cost to performance, which may be justified.
Better
- Apache Compilation: 40% less time
- Disk Transactions: 50% less time
Worse
- GnuPG File Encryption: 60% more time
- time to transfer 10GB via the TCP network loop-back: 100% more time
- Apache static web page serving: 50% more time
- IOZone Writes - 20% more time
Same
- CAMELLIA256-ECB cipher
- OpenSSL
- NASA's NPB
- TTSIOD 3D rendere
- C-Ray multi-threaded ray-tracing
- Crafty, an open-source chess engine
- MAFFT multiple-sequence alignment test that deals with a molecular biology
- Himeno Poisson Pressure Solver
- Blowfish performance with John The Ripper
- LAME MP3 encoding
- 7-Zip compression
- Dhrystone 2
- FS-Mark
- IOZone Reads
- Threaded IO tester
- Parallel BZip2 compression