Improving Linux Kernel Performance
developerWorks writes "The first step in improving Linux performance is quantifying it, but how exactly do you quantify performance for Linux or for comparable systems? In this article, members of the IBM Linux Technology Center share their expertise as they describe how they ran several benchmark tests on the Linux 2.4 and 2.5 kernels late last year. The benchmarks provide coverage for a diverse set of workloads, including Web serving, database, and file serving. In addition, we show the various components of the kernel (disk I/O subsystem, for example) that are stressed by each benchmark."
oh wait, thats not ported to nix yet....
I'm just curious what they are quantifying performance against. Everything here seemed to be strictly on the Network side of things. Are they trying to increase the actual Kernal processing of the individual threads for the network applications (File Serving, DB, and Webserving), or are they just measuring the eff. for the processing of data packets for the services.
It sounds interesting, but it looks like the tuning is done specifically on the IBM platform, which makes me wonder. Linux already blows and MS product away for these applications, so I'm curious what they are comparing the results to. Did they just take an arbitrary point (processor load) for specific applications, or are they creating a specialized measurement (like SysMarks in Windows) that is only valid in their test suite.
Anyway, it should be interesting to see where it ends up, eventually.
is here
Some howto's include recompilering the kernal, enabling UDMA, turning off logging and enabling MMX enhancements.
I was also suprised to see that they still use some of the old performance monitoring tools like looking at /proc, and other ascii tools, rather than something like PCP that collects all these statistics together so that you can look at any combination of subsystems on the same time line. Then they could have graphs showing the interraction and load on the disk, cpus, vm, network etc.
There is folly and foolishness on the one side, and daring and calculation on the other. - Admiral Pellew, Hornblower
Why is what we compare it to the most important issue?
Sure, we want to see how the Linux kernel is performing, but that's unrelated to increasing it's performance - when working on the performance of a single part, people built a test for that part, and tweaked it.
No benchmark or comparison is required in this case.
My other
I have to disagree, I thought Figure 3 illustrated how important it is to baseline to ensure that you are heading in the right direction with each change you make (although they did not have a uni-processor baseline result).
It also showed that with the changes in June they are able to get a 4 times performance with another 7 cpus. Maybe next time they will show how it scales over the number of cpus you have.
There is folly and foolishness on the one side, and daring and calculation on the other. - Admiral Pellew, Hornblower
>time make clean bzImage modules
[...]
real 6m2.519s
user 5m13.950s
sys 0m20.080s
=> efficency: 93.6%
(2.4.18,xfs,ide)
These benchmarks, like so many you see nowadays, do not include or even mention deviation across benchmark runs. There is no evidence that the tests were run more than once in order to achieve a more statistically accurate view of the benchmark numbers.
In theory, all benchmarks should come with an average value, and an error margin. Without this, the data should be not be trusted. It not only implies that the margin of error *might* be over 100%, it indicates the people running the bench marks don't know what they are doing.
There are a lot of reasons benchmarks can have errors, one of them being the benchmark program itself can be broken. How would you know that the numbers returned on some test weren't random if you didnt run it more than once?
Also, disk drives and networks have latencies which can make a huge difference; those difference can wash out apparent benefits of OS tweaks.
Why does it seem that all these benchmarks are primarily concerned with CPU performance or network throughput or single-disk reading and writing? For a large category of enterprise applications (which this paper says it is trying to address), I/O performance can usually be the most important part.
The problem is that the typical PC hardware is just not designed for that. Large proprietary Unix or mainframe systems usually have multiple very high speed buses; a single 32-bit PCI bus is rather low-end in comparison. Now of course this is not Linux's fault; but then again Linux is not just a PC operating system! So I guess my question is if this is about benchmarking Linux for enterprise use, how about some information about Linux running on enterprise-class hardware rather than suped up PC's. I'm sure IBM must have a few resources there.
In particular I'm interested in how the Linux kernel is designed to handle multiple independent I/O buses. Are the I/O schedulers weighted down with locking issuesor interrupt contention. Or what about the allocation of memory buffers between faster and slower I/O devices. Or even it's support for advisory I/O operations (hinting) that some proprietary OS's provide? What about asynchronous I/O?
And of course Linux suffers from the general Unix philosophy when it come's to giving I/O the same level of attention as CPU. For instance there are lots of processor use controls, such as process nice levels, processor affinities, real-time schedulers, threading options galore, etc. But how do you say that a given process may only use 30% of the I/O bandwidth on a particular bus? And those are things that mainframes were good at, so how does Linux on mainframes compare?
I've been reading the comments from some Mozilla people ever since Apple came out with Safari based on KHTML, and it's been suggested that the bloat and delay of Mozilla comes from too many developers. Makes me wonder if Linux will succumb to the same problem.
Of course this applies to something else, like making transfers zero-copy, too.
Benchmark junkies are abound, around and have wet dreams over these articles.
I am one of them.
Please, mooore!!!
Things actually useful are: disabling unnecessary services on startup (if you don't use atd, don't start it to save start-up time, and in many machines it is unnecessary to detect hardware changes using kudzu upon startup); for machines with multiple HD's, put the swap on the faster HD.
"In physical science the first essential step in the direction of learning any subject is to find principles of numerical reckoning and practicable methods for measuring some quality connected with it. I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be."
members of the IBM Linux Technology Center share their expertise as they describe how they ran several benchmark tests
Notice not "IBM share Benchmark testing results"
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
What about kernel developers creating a sub version of the kernel (so that only those who choose to use it) to log and relay information on performance of that kernel on various users' machines?
Is this a bad idea? Would it take too many hours of extra work?
-JB
"I love deadlines. I love the "whooshing" sound they make as they pass by." - Douglas Adams.
I wish there were some interactive workload benchmarks - I know this is history, but when the kernel went 1.2 I found my machine really slow; the benchmarks were better but somehow the usability had gone down. It would be neat to measure the way the mouse tracking feels, the "snap" with which menus open in an application, Netscape getting a page and rendering it, etc. . Kernel compilation and numerics are not the main use of a desktop machine these days ...
On a related note, my Mac Powerbook was really sluggish until I managed to kill some unneeded processes; they weren't really eating up time by themselves, but were somehow impacting system reactivity: The load factor hardly moved but the system became responsive to mouse clicks.
This is not a signature.
I agree that I/O is a weakness of Linux currently and that it needs a lot more attention. CPU speeds and the ability of Linux to make the most of the processor is very good and has already been very well developed. With CPUs having advanced as far as they have in the past few years means that the CPU is no longer the main bottleneck of the system. I/O technologies have stood pretty much still with only small advances, so no wastage or inefficiencies on the part of the OS are acceptable.
It is a pity that Linux like Unix developers have become a little stuck in their ways - hopefully they will do their best to address this in the 2.6 and 3.0 kernels.
I like the idea of a modularised kernel, where people could use the I/O system that best suited their setup - but this could involve an awful lot of division and arguments and the number of bugs that would result could be huge. Perhaps Linux itself could automatically adapt the way it works more to suit its needs - hence solving the problem of Linux hugely varying performance. Does anyone else have any suggestions or comments on this?
"Some of the issues we have addressed that have resulted in the improvements shown include adding O(1) and read copy update (RCU) dcache kernel patches and adding a new dynamic API mod_specweb module to Apache."
Uhmm... isn't this considered cheating?
source code for the patch
Mozilla uses C++ (and most methods are virtual) and component interfaces like XPCOM. Such things probably enhance developer's productivity, but they incur quite a bit of overhead in code size and (less so) in speed.
It is great that core developers actually care about code size and instruction-level speed (such as the recent syscall patch, or those highly optimized inline functions in headers), and there are many people sending patches to clean up code. Maybe linux won't get as bloated as mozilla after all...
rm -Rf /
that would be integrity. it can't be bought. it diminishes as greed/fear based .coNTracting eXPands. it has become VERY scarce DOWn here. integrity is the goaled standard/fuel oil of the gnu millennium. see you there.
tell 'em robbIE.
I wouldn't be a slashdot poster if I checked whether it's been done.
I agree
If they give full disclosure of their methods and configuration so that anyone can reproduce the results. An example of this not being the case: sysmark/bapco/intel.
There is some interesting info at the bottom of this page outlining some improvements Oracle and RedHat have made to this linux kernel regarding things such as SMP processor affinity and asynchronous I/O. Presumably these are open source changes -- the artical doesn't mention them at all.
The contest benchmark might be what you are looking for. It tests system responsiveness by running kernel compiles under different kinds of load.
Still based on kernel compiles, granted, but at least it tries to measure responsiveness. Been used heavily to benchmark recent kernels - check Kernel Trap for results.
The Linux scheduling latency page of Andrew Morton might be useful as well. Alas, kernel patches tend to work on x86 first before PPC..
Michel
Fedora Project Contribut
It's funny how these people think that running Volanomark in loopback mode stresses the Ethernet driver.
The article lacks substance, specifically what did they tune to arrive at those results they claim. None of that basic information is included in the report.
The IBM paper is interesting, but beyond doing these straightforward kinds of measurements, I can think of a lot of better approaches to improving kernel and core application performance, based on research I've seen... When I was doing profiling work on supercomputer stuff a few years back I surveyed the tools and found some systems that use really novel approaches which could definitely be adapted to this purpose. I suppose word doesn't really get out about some of this stuff; anyway, take a look and see for yourself:
S-Check
S-Check starts with your original source code and points suspected of being bottlenecks. It adds artificial delays at the specific points throughout the parallel code. These delays can be switched ON or OFF. The switched delays generate numerous new versions of the program, with the delays simulating adjustments in code efficiency. S-Check methodically executes the many variants, recording delay settings and corresponding run times. S-Check analyzes the recorded entries against a linear response model using techniques from statistics. The results are a sensitivity analysis from which program problem areas can be identified. This provides a portable, scalable, and generic basis for assaying parallel and network based programs.
Paradyn
(overview)
"...a heuristic, goal-seeking algorithm was coupled with a dynamic instrumentation package to drive an automated, systematic inquiry into the performance of a parallel application."
The upshot is tools which can instrument a running system on the fly, and use statistical techniques that identify "hot spots" by looking for the amount of "collateral damage" when adding artificial delays to a particular location. You can even go farther, mapping out relationships, etc.
These are approaches that came out of parallel supercomputing, because in that field traditional approaches to benchmarking and profiling are often useless and/or impractical, and the systems (and programming problems) have become so complex that effective hand tuning becomes nearly impossible as well. Of course the kernel isn't so simple either, and these days you have parallelism to boot... I would love to see these techniques solving a wider range of problems.
Want to Know How to Cheat the GPL? Read On!
And in conclusion, graphs are going up... so I'm happy.
Cheers
Stor
"Yeah well there's a lot of stuff that should be, but isn't"
Look at the emails of the peoples : .
2 from ibm and one from AMD . It seems amd is
looking at intel boxes ?? "The architecture used for the majority of this work is IA-32 (in other words, x86), from one to eight processors. We also study the issues associated with future use of non-uniform memory access (NUMA) IA-32 and NUMA IA-64 architectures."
Hmm i am shure the next hammers could do the NUMA maybe they try do do it better in linux
developer http://flamerobin.org
well, it is.
Isn't XFree86 and in fact X11 the most successful OpenSource project EVER??
I don't trust any article that calls it a "kernal".
One thing that's hard to measure is desktop performance.
I have a crap all in one mobo, with shared memory Graphics without DRI support (ok i needed a pc quick), KDE is super clunkey under 2.4, with the CK performance patchset.
Under 2.5 the desktop is quick and smooth, applicartion seem to load a lot faster, Java applets don't hog the CPU.
So, if your running linux on the desktop, and you feel sufficiently compitent. Start testing 2.5.
thank God the internet isn't a human right.