Slashdot Mirror


Improving Linux Kernel Performance

developerWorks writes "The first step in improving Linux performance is quantifying it, but how exactly do you quantify performance for Linux or for comparable systems? In this article, members of the IBM Linux Technology Center share their expertise as they describe how they ran several benchmark tests on the Linux 2.4 and 2.5 kernels late last year. The benchmarks provide coverage for a diverse set of workloads, including Web serving, database, and file serving. In addition, we show the various components of the kernel (disk I/O subsystem, for example) that are stressed by each benchmark."

15 of 97 comments (clear)

  1. The Problems with Benchmarking like this... by dWhisper · · Score: 5, Insightful

    I'm just curious what they are quantifying performance against. Everything here seemed to be strictly on the Network side of things. Are they trying to increase the actual Kernal processing of the individual threads for the network applications (File Serving, DB, and Webserving), or are they just measuring the eff. for the processing of data packets for the services.

    It sounds interesting, but it looks like the tuning is done specifically on the IBM platform, which makes me wonder. Linux already blows and MS product away for these applications, so I'm curious what they are comparing the results to. Did they just take an arbitrary point (processor load) for specific applications, or are they creating a specialized measurement (like SysMarks in Windows) that is only valid in their test suite.

    Anyway, it should be interesting to see where it ends up, eventually.

    1. Re:The Problems with Benchmarking like this... by nicodaemos · · Score: 4, Informative

      Hmmm .... tpc.org is an interesting organization. It is a non-profit who is funded by memberships from the hardware/software companies on which it produces benchmarks.

      According to their website, "Full Members of the TPC participate in all aspects of the TPC's work, including development of benchmark standards and setting strategic direction. Full Membership costs $15,000 per calendar year."

      Wow, a large percentage of the benchmarks are using MS operating systems. Oh look full members get to set benchmark standards. Mmmm, the only pure OS company who is a full member is Microsoft. I wonder what kind of conclusion can be drawn .......

  2. Actually finding the performance problem? by Chatz · · Score: 5, Interesting
    It would be great to see a follow up/some examples on how these tools are used to actually track down a performance problem. I have and I have seen many others take some performance data and make completely the wrong judgement about what is the expected behaviour, what is the bottleneck, and what to do to fix it.

    I was also suprised to see that they still use some of the old performance monitoring tools like looking at /proc, and other ascii tools, rather than something like PCP that collects all these statistics together so that you can look at any combination of subsystems on the same time line. Then they could have graphs showing the interraction and load on the disk, cpus, vm, network etc.

    --
    There is folly and foolishness on the one side, and daring and calculation on the other. - Admiral Pellew, Hornblower
    1. Re:Actually finding the performance problem? by Chatz · · Score: 5, Informative
      That's probably a bit harsh, both IBM and SGI have worked pretty hard to get scalability improvements into the linux kernel. The article does mention some of these things:

      Some of the issues we have addressed that have resulted in improvements include adding O(1) scheduler, SMP scalable timer, tunable priority preemption and soft affinity kernel patches.

      --
      There is folly and foolishness on the one side, and daring and calculation on the other. - Admiral Pellew, Hornblower
  3. Call me incredibly stupid, but.. by Subjective · · Score: 5, Insightful
    Wouldn't we (always) want to improve the Linux kernel performane in comparison to itself?

    Why is what we compare it to the most important issue?



    Sure, we want to see how the Linux kernel is performing, but that's unrelated to increasing it's performance - when working on the performance of a single part, people built a test for that part, and tweaked it.

    No benchmark or comparison is required in this case.

    --
    My other .sig is also this bad
  4. Re:Huh? by Chatz · · Score: 4, Informative
    Which you need to know before interpreting the results...

    I have to disagree, I thought Figure 3 illustrated how important it is to baseline to ensure that you are heading in the right direction with each change you make (although they did not have a uni-processor baseline result).

    It also showed that with the changes in June they are able to get a 4 times performance with another 7 cpus. Maybe next time they will show how it scales over the number of cpus you have.

    --
    There is folly and foolishness on the one side, and daring and calculation on the other. - Admiral Pellew, Hornblower
  5. Is there deviation? by rufusdufus · · Score: 4, Insightful

    These benchmarks, like so many you see nowadays, do not include or even mention deviation across benchmark runs. There is no evidence that the tests were run more than once in order to achieve a more statistically accurate view of the benchmark numbers.
    In theory, all benchmarks should come with an average value, and an error margin. Without this, the data should be not be trusted. It not only implies that the margin of error *might* be over 100%, it indicates the people running the bench marks don't know what they are doing.

    There are a lot of reasons benchmarks can have errors, one of them being the benchmark program itself can be broken. How would you know that the numbers returned on some test weren't random if you didnt run it more than once?
    Also, disk drives and networks have latencies which can make a huge difference; those difference can wash out apparent benefits of OS tweaks.

  6. More attention to IO needed by dmeranda · · Score: 5, Interesting

    Why does it seem that all these benchmarks are primarily concerned with CPU performance or network throughput or single-disk reading and writing? For a large category of enterprise applications (which this paper says it is trying to address), I/O performance can usually be the most important part.

    The problem is that the typical PC hardware is just not designed for that. Large proprietary Unix or mainframe systems usually have multiple very high speed buses; a single 32-bit PCI bus is rather low-end in comparison. Now of course this is not Linux's fault; but then again Linux is not just a PC operating system! So I guess my question is if this is about benchmarking Linux for enterprise use, how about some information about Linux running on enterprise-class hardware rather than suped up PC's. I'm sure IBM must have a few resources there.

    In particular I'm interested in how the Linux kernel is designed to handle multiple independent I/O buses. Are the I/O schedulers weighted down with locking issuesor interrupt contention. Or what about the allocation of memory buffers between faster and slower I/O devices. Or even it's support for advisory I/O operations (hinting) that some proprietary OS's provide? What about asynchronous I/O?

    And of course Linux suffers from the general Unix philosophy when it come's to giving I/O the same level of attention as CPU. For instance there are lots of processor use controls, such as process nice levels, processor affinities, real-time schedulers, threading options galore, etc. But how do you say that a given process may only use 30% of the I/O bandwidth on a particular bus? And those are things that mainframes were good at, so how does Linux on mainframes compare?

    1. Re:More attention to IO needed by g4dget · · Score: 4, Informative
      In particular I'm interested in how the Linux kernel is designed to handle multiple independent I/O buses.

      By running multiple kernels. Seriously: the way to get great performance out of PC hardware is to buy lots of it and cluster it. You still end up paying less for more performance than with the high end systems.

    2. Re:More attention to IO needed by virtual_mps · · Score: 4, Informative

      The problem is that the typical PC hardware is just not designed for that. Large proprietary Unix or mainframe systems usually have multiple very high speed buses; a single 32-bit PCI bus is rather low-end in comparison.


      A single 32 bit PCI bus is anemic these days. That's why high-end servers based on ia32 processors include multiple PCI busses, increasingly PCI-X (133MHz, 64bit). Note that servers based on other processors are increasingly moving away from proprietary busses and using the same PCI you'll find in those intel-based systems.
  7. For simplicity's sake by r6144 · · Score: 5, Informative
    When running with multiple CPU's, the kernel instances running on these CPUs need regulation when they access shared data. Such regulation is usually implemented with locks. A simple approach is to use a small number of "big" locks (like a lock that makes sure that only one CPU can run actual kernel code). This is very simple and easy to debug, but may cause poor performance because one CPU cannot (for example) do network transfers while another is reading disk, while this should be allowed in principle. So we should use finer-grained locks. However, as we make locks more and more fine-grained, we have more and more locks, so things get messy, hard to debug, and locking/unlocking overhead goes up to make performance degrade for fewer-cpu machines. Because of such a cost, we should make locks finer-grained when it actually improve performance much according to benchmarks.

    Of course this applies to something else, like making transfers zero-copy, too.

  8. Usually not necessary by r6144 · · Score: 5, Insightful
    I have installed linux several times, on different machines, now (mostly redhat). UDMA settings are almost always right on modern machines. The only exception is an old P166 machine with a very old HD, where the original kernel 2.2 does not support DMA on it, but 2.4 do (transfer rate 5MB/s -> 10MB/s). Fussing with the kernel usually doesn't give much benefit, and is definitely not for newcomers.

    Things actually useful are: disabling unnecessary services on startup (if you don't use atd, don't start it to save start-up time, and in many machines it is unnecessary to detect hardware changes using kudzu upon startup); for machines with multiple HD's, put the swap on the faster HD.

  9. Measurement - Lord Kelvin said it best by Anonymous Coward · · Score: 4, Insightful

    "In physical science the first essential step in the direction of learning any subject is to find principles of numerical reckoning and practicable methods for measuring some quality connected with it. I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be."

  10. Benchmarking for interactivity by ehack · · Score: 5, Interesting

    I wish there were some interactive workload benchmarks - I know this is history, but when the kernel went 1.2 I found my machine really slow; the benchmarks were better but somehow the usability had gone down. It would be neat to measure the way the mouse tracking feels, the "snap" with which menus open in an application, Netscape getting a page and rendering it, etc. . Kernel compilation and numerics are not the main use of a desktop machine these days ...

    On a related note, my Mac Powerbook was really sluggish until I managed to kill some unneeded processes; they weren't really eating up time by themselves, but were somehow impacting system reactivity: The load factor hardly moved but the system became responsive to mouse clicks.

    --
    This is not a signature.
  11. I/O vs CPU & Modularised Linux by AtomicX · · Score: 5, Interesting

    I agree that I/O is a weakness of Linux currently and that it needs a lot more attention. CPU speeds and the ability of Linux to make the most of the processor is very good and has already been very well developed. With CPUs having advanced as far as they have in the past few years means that the CPU is no longer the main bottleneck of the system. I/O technologies have stood pretty much still with only small advances, so no wastage or inefficiencies on the part of the OS are acceptable.

    It is a pity that Linux like Unix developers have become a little stuck in their ways - hopefully they will do their best to address this in the 2.6 and 3.0 kernels.

    I like the idea of a modularised kernel, where people could use the I/O system that best suited their setup - but this could involve an awful lot of division and arguments and the number of bugs that would result could be huge. Perhaps Linux itself could automatically adapt the way it works more to suit its needs - hence solving the problem of Linux hugely varying performance. Does anyone else have any suggestions or comments on this?