5 Years of Linux Kernel Releases Benchmarked

← Back to Stories (view on slashdot.org)

5 Years of Linux Kernel Releases Benchmarked

Posted by samzenpus on Wednesday November 3, 2010 @06:49PM from the line-them-up dept.

An anonymous reader writes "Phoronix has published benchmarks of the past five years worth of Linux kernel releases, from the Linux 2.6.12 through Linux 2.6.37 (dev) releases. The results from these benchmarks of 26 versions show that, for the most part, new features haven't affected performance."

9 of 52 comments (clear)

Min score:

Reason:

Sort:

Virtual machine, really? by edelholz · 2010-11-04 01:47 · Score: 5, Insightful

They tested in a VM. Now where's the proof that by itself doesn't affect performance in an unpredictable way?
1. Re:Virtual machine, really? by Anonymous Coward · 2010-11-04 03:40 · Score: 2, Insightful
  
  They tested in a VM. Now where's the proof that by itself doesn't affect performance in an unpredictable way?
  Does it matter?
  They are after delta's not absolutes.
  *IF* they test each kernel in the same VM on the same metal then any change is valid. The numbers are abstract, the difference between release is what is key
Re:wops by gmack · 2010-11-04 02:10 · Score: 1, Insightful

Keep in mind that the biggest drop was most likely do to ext4 adding data journaling rather than the usual medtadata journaling to make file contents less likely to be corrupted after an unplanned shutdown(power outage etc)
I didn't see any mention of them turning that feature off to find out one way or another.
You call those kernel benchmarks? by m4c+north · 2010-11-04 03:15 · Score: 5, Insightful

Where are the kernel-level tests that do more than exercise the filesystem and network driver (singular) and the scheduler? More than half of those charts were flat, which could mean they weren't making appropriate measurements.
For example, show how mutexes have improved, or copy-on-write, or interrupt handlers, or timers, or workqueues, or kmalloc, or anything else that a system and kernel programmer would care about. I like the user-centric perspective: it's very good information to have and share, but don't call what you've done a kernel benchmark. Maybe call it a kernel survey of its impact on users.

--
Who's your user, program?
1. Re:You call those kernel benchmarks? by CAIMLAS · 2010-11-04 07:15 · Score: 2, Insightful
  
  IF you were running the tests on real hardware, I'd be more likely to agree.
  They weren't. They were running it on a virtualized host in KVM. This means that not only were their results largely determined by the specific network, etc. drivers they used (which can see significant revision between kernels and not accurately reflect the kernel itself), but any idiosyncratic behavior in KVM in how it treats guest interfaces may account for the discrepancies.
  
  --
  ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
Re:wops by ustolemyname · 2010-11-04 03:39 · Score: 2, Insightful

Some off the changes noted in the Linux 2.6.30 kernel change-log that was used throughout the Linux testing process included...
Yeah, that new EXT4 filesystem that they didn't use for obvious reasons. Huge impact on the results.
Re:Windows Kernels by coolsnowmen · 2010-11-04 04:38 · Score: 2, Insightful

While interesting, it isn't exactly the same; in linux, you can actually just change the kernel, without changing all the services and starting software.
Re:ugh by mtippett · 2010-11-04 05:55 · Score: 3, Insightful

This made me laugh - in a good way, not at you :).
When Phoronix does a distro-comparison the crowd calls out that the tests are only really testing gcc differences, and should have less variables changing. When Phoronix does a fixed comparison varying only one part of the system, the crowd calls out that it isn't a good basis since people don't run it that way.
Phoronix runs tests in different ways to explore the performance landscape. For some it precisely gives the information that they need, for other it's completely irrelevant. In this particular case, I'm glad that the data gave you enough to have some open questions about 2.6.32 vs 2.6.37. If people walk away with those sorts of first order interpretation, the article served it's purpose.
Of course the next step would be how do we take a tighter look at the delta between 2.6.32 and 2.6.37 - any thoughts?
Regarding meaningful vs meaningless tests. The tests Phoronix runs are a collection of tests to explore. The tests were run, and for some of them, the results yielded nothing interesting but were still reported. You don't know until you run the tests, and if the tests are run, you report on them. Some tests may be stable now, but may have sensitivity to other parts of the systems. Even CPU bound tests will yield different results in different cases (scheduler, etc).
Re:ugh by TheLink · 2010-11-04 08:29 · Score: 2, Insightful

I suspect the scheduler would make a bigger difference if you were running multiple processes at the same time.

e.g. multiple processes in various scenarios:
CPU intensive.
disk IO intensive.
network IO intensive, single NIC.
network IO intensive, two NICs.
network IO intensive, four NICs.
And various combinations of CPU, disk, network.

Then latency tests:
One to X processes with high CPU, while measuring latency experienced by another process.
One to X processes with high IO, while measuring latency experienced by another process.
--
- Too many replies beneath your current threshold