Slashdot Mirror


Robert Love Explains Variable HZ

An anonymous reader writes "Robert Love, author of the kernel preemption patch for Linux, has backported a new performancing boosting patch from the 2.5 development kernel to the 2.4 stable kernel. This patch allows one to tune the frequency of the timer interrupt, defined in 2.4 as "HZ=100". Robert explains 'The timer interrupt is at the heart of the system. Everything lives and dies based on it. Its period is basically the granularity of the system: timers hit on 10ms intervals, timeslices come due at 10ms intervals, etc.' The 2.5 kernel has bumped the HZ value up to 1000, boosting performance."

12 of 62 comments (clear)

  1. It doesn't improve performance. by Professor+Collins · · Score: 5, Informative
    One of the great paradoxes of computer science is that perceived performance and actual performance almost always come at a tradeoff. By raising the frequency of the timer interrupt, individual timeslices are shorter and the processor needs to make more context switches, resulting in less overall processing being performed. However, because these context switches occur more frequently, it appears to the user that apps are more responsive and fluid.

    To make a long story short, for number crunching machines, servers, and other applications which don't need much user interaction, larger timeslices are preferable because it doesn't matter how responsive the user interface is. For desktop systems, the timeslice can be decreased to improve the responsiveness of the user interface and give a better "feel" to the system at the expense of a minor performance loss. Being able to tune these parameters to meet your needs is one of Linux's great strengths.

    1. Re:It doesn't improve performance. by zenyu · · Score: 5, Informative

      One of the great paradoxes of computer science is that perceived performance and actual performance almost always come at a tradeoff. By raising the frequency of the timer interrupt, individual timeslices are shorter and the processor needs to make more context switches, resulting in less overall processing being performed.

      This is not quite true. If you only have a single program running just one thread this is true. You have to do a context switch at each tick to Ring 0 and back, which takes maybe 500 cycles, or 1/20 microseconds on a 1Ghz machine. Do this 1000 times and you've lost 50 microseconds of processing time.

      BUT once you have more than one program or thread running the situation is different. Say you have one thread running flat out and another that needs to do 100microseconds of work. With 100 ticks per second you will lose 5 usec to context switching and 9900 usec to waiting for the next context switch. With 1000 ticks per second you lose 50us to context switching and 900 usec to waiting for the next context switch. So you get more work done.

      For someone who always runs at 100% processor utilization 1000 ticks per second is probably a setting since you are probably just running one thread 99% of the time and once in a while writing logs to disk or responding to some other events. If you are more like me and run at 1% of your processor utilization most of the time, with the 100% utilization only happening when you compile so you would rather be able to continue to use the computer than save 1ms on the 5 minute compile then an even higher value might make sense. 10000 maybe, assuming there aren't limitations in the kernel that prevent the higher value.

      Disclaimer: I've been applying Love's patches for a while now. They make a real difference in the responsiveness of X, esp if your running stuff like Mozilla or Gnome/KDE on your box. I haven't applied it on any servers cuz the preempt patch is not quite stable.

    2. Re:It doesn't improve performance. by jquirke · · Score: 4, Informative

      Actually the last time I checked, the kernel had to be recompiled to change the HZ variable. Not trolling or anything, but it's been pointed out FreeBSD has this as a sysctl parameter. Hopefully Linux will offer this (correct me if I'm wrong!).

      Also, you don't necessarily have to increase the clock frequency by a whole order of magnitude. A fair compromise could be 200Hz, or 250Hz, or 500Hz. A typical workstation running X-Windows could use 250 or 500, for example.

    3. Re:It doesn't improve performance. by Kopretinka · · Score: 2, Informative
      AFAIK, the parent is not quite true. 8-)

      A reschedule does not happen only on the timer tick (100 or 1000 times a second depending on HZ setting), it happens on a number of occasions, timer tick being one of them. The other ones remove the concerns zenyu seems to be having:

      1. when a process sleeps - when a process calls the kernel in order to sleep, the kernel reschedules because sleeping can be handled using normal timer and in the meantime other processes may work
      2. when a process yields - when a process says that it's done its stuff in this tick, whatever that means
      3. when idle, on any interrupt - when no process wants to work, the first one that wants to work is scheduled right away

      The second point may seem a little weird, but a process can only become willing to do something as a result of some interrupt - a timer if the process was sleeping for a given amount of time; a i/o interrupt if the process handles the keyboard or the mouse. In any case, interrupts are handled by the kernel and so if a process is to wake up from its sleep or if a process gets something in some stream on which it is waiting (stdin on keyboard interrupt, socket on network card interrupt etc.), that process is just scheduled to wake up and work.

      So on an idle machine the HZ does not really have much impact, and on a utilized machine the smoothness of process interaction (like window manager vs. X server) increases with increased HZ but this also increases the overhead.

      Hope it's clearer.

      --
      Yesterday was the time to do it right. Are we having a REVOLUTION yet?
    4. Re:It doesn't improve performance. by pthisis · · Score: 4, Informative

      From the look of the article, under Linux, select actually does some sort of polling at or related to HZ. It may be on some sort of almost-run queue: a selecting process gets allocated timeslices; on its slice, it polls and either returns to userland or goes back to onto the almost-run queue. I don't have time to verify that-- I don't know my way around the Linux kernel-- but it seems to be reasonable, based on the article. Can I get a Linux developer to confirm/deny my guess?

      Deny. It's actually the idle timeout that's affected by HZ. select() itself doesn't poll at all, and e.g. a select() call with an infinite timeout will be completely unaffected by HZ (select will wake up when the network gets an interrupt resulting in readable data/writeable buffer space).

      Example of the timeout effect: a game could have a select() loop that waits on user input, but also has a timeout argument so that it can go ahead and update the screen, do enemy AI, etc. The kernel, in absence of interupts, schedules on HZ boundaries. Suppose that you as a programmer put a 1/60 second timeout argument in the select loop (intending to update the screen with a 60 HZ refresh and figure out where everything's moving). If you call select() right after a HZ boundary, you could find yourself waiting until 1/50 second passes even on an idle machine with HZ=100; after 1/100 sec, your timeout hasn't expired yet. Next chance to schedule is at 2/100 (1/50) sec.

      With HZ=1000, you'll schedule no more than 1/1000 sec after the 1/60 sec boundry (on an idle machine).

      This example is really simplified; a real-life app would adjust for scheduling creep by keeping track of wall-time. But the same concept, with more complicated apps, can cause faster HZ ticks to give you better CPU utilization (especially in e.g. video editing apps and such) because you get around to using the CPU closer to when you want it.

      The preempt kernel is an even better example of where decreasing latency can increase throughput, sometimes significantly. There you can really get around to dealing with I/O quickly, keeping CPU saturated (and saturated with cache-warm data) and benefiting things like heavily loaded web servers just as much as sound editing stations.

      Sumner

      --
      rage, rage against the dying of the light
  2. I think RedHat did this... by GreyWolf3000 · · Score: 5, Informative
    ...since one of the biggest criticisms of X is how choppy window movement is due to the networked architecture of X (a signal gets sent to the server, the server responds, etc). When the timeslices are reduced, the "lag" gets significantly decreased, since the signal gets processed sooner, the server gets the message sooner, the server can report back sooner, etc.

    I tried recompiling the stock RedHat kernel, and sure enough that was a on option in there to increase the hz for the internal timer.

    --
    Slashdot: Where people pretend to be twice as smart as they really are by behaving like children.
  3. Re:Moore's law by WolfWithoutAClause · · Score: 4, Informative
    It's not quite that simple though. It's more tied to memory speed. The processors are improving at a faster rate than the memory is- and this clock tick is more related to memory speed.

    The reason is that across a scheduling tick the processors cache gets flushed and reloaded. This means that you end up doing a burst of memory reads, and that will dominate if the clock tick is too short.

    --

    -WolfWithoutAClause

    "Gravity is only a theory, not a fact!"
  4. Good for streaming media by Anonymous Coward · · Score: 1, Informative

    My first thought is, "It's about time..." FreeBSD has had this for ages, and it struck me as strange that Linux was nailed to HZ=100 when I started porting some apps over.

    Among other things, streaming media is an important beneficiary of this change. Let's say you have a medium-bitrate video stream (about 2.5 to 5 megabits). That means that your packets should be spaced about 2 to 4 milliseconds apart. This is easy to schedule when your system has a 1 millisecond granularity, but is a disaster when your clocks are 10 milliseconds apart -- your packets end up going out in clumps. Your 100bT network may not care either way, but if you are pushing video over ADSL, 802.11b, or ATM, you may find your packets getting lost along the way.

  5. Re:Moore's law by zenyu · · Score: 3, Informative

    The reason is that across a scheduling tick the processors cache gets flushed and reloaded.

    Whoa! What architecture is that!

    That just doesn't sound right. The register files get flushed(well swapped), but if that 2 meg cache got flushed on every context switch there wouldn't be much point in having it at all. You can get cache thrashing if too many cache hungry programs are running simultaniously but that's why you get a bigger cache if you run lots of those programs, it so that their working set is saved across context switches.

    Perhaps you mean the L1 caches? They can get tossed out cuz it can only hold a few inner loops and a few small working sets at a time anyway, but all that stuff should still be in the L2 cache and get loaded very quickly into those puny L1 caches, the L1 data cache is practically a register file anyway on P4's, 64 bit moves to/from them happen in a cycle...

    Those L2->L1 moves might start to affect you at 1,000,000 ticks per second, but no one is proposing that, right? Even so in a typical environment the other context is just the scheduler which I can't imagine filling the L1 cache... It's not that complicated on a mostly idle machine. (Quick & Dirty schedulers have been written, some which looked through the entire process list. Erm, but on my machine there are less than 100 processes right now, still not so bad for L1 ;)

    Anyway I think 1000 is just fine, if you're doing real-time music synthesis on lotsa channels a larger number might be better. Someone in Europe is working on a music disto, so maybe they will discover that 8000 is the magic number for 16 channels at 48000khz on a P4 at 2Ghz.

    It would be neat if someone came up with metrics so that the tick was set so that 99.999% of the time the sound systems got their slices once every 500 usec but otherwise the timeslices were as large as possible. Then you could just tune that 500 usec thing, make it longer if you're on a 386, shorter if you really need more than half millisecond timings. I guess any program that needed frequent time slices could write to some proc file how much more often it should be called, or if it could afford to be called less often. For example 1.2 if it want's to be called more often, 0.8 if it's time needs were met. The kernel would only have to insure all the numbers it got were less than 1.0, and if the largest one were less than 0.95 it could even afford fewer time slices. The kernel might also want to ensure through process accounting that the time sensitive processes never got more than a certain percentage of the cycles available even if it meant they got called less often. This to prevent a denial of service where you just always write 10 to that proc file whenever you get run so the time tick grows until you spend all your time in the scheduler. It might also want to set a floor, so that a human can interact with the machine. Ticks should never be less than say 10 for instance on a PC(or 250 if it's my machine). Though for some special purpose interstellar Linux probe you might want to sleep for a whole second at a time before checking your direction once on your way so a tick of 1 would be acceptible once out of your solar system. (You still want 64 bit uptimes for you're interstellar probe it would be so embarassing if it arrived and the aliens were like, "Woah this species can't develop an operating system with more than 3 day uptime for a space probe that took like 40 years to get here, what l0s3rs!")

  6. Great - now binaries are broken. by ProtonMotiveForce · · Score: 1, Informative

    Way to go. Any binary that used the 'HZ' variable (a constant defined in a header file) will need to be recompiled for these new kernels. Way to go, Linux. Keep it up.

    1. Re:Great - now binaries are broken. by Dan+Aloni · · Score: 4, Informative

      That's not true. The kernel still reports HZ=100 to userspace, and as far as jiffies calculation concern toward userspace, nothing has changed.

      --
      0x2b or not 0x2b, the answer is -1
  7. Re:Wrong! Re:It doesn't improve performance. by WolfWithoutAClause · · Score: 3, Informative
    No, look swapping is not the same as virtual memory. Virtual memory is useful even in the absence of any disk or swap space at all.

    The point is that virtual memory reduces the amount of real memory you need for each thread- each only takes what it really needs. Sure if memory is cheap, it may not matter so much. But even if it is cheap do you really want to give each process 1 gig of space on the off-chance that it might need it? I don't think so.

    Virtual memory is when a process thinks it has 1 gigabyte of memory, but it actually only has, say 128 megabytes. It can read or write to any bit of it, and the OS does what is necessary to ensure that it never notices the difference; obviously upto the actual system limits.

    Virtual memory and swap space go together very nicely, but one does not imply the other. You can use virtual memory to implement garbage collection for example; with no backing store at all.

    I guess there are other ways to do similar things- for example, don't use virtual memory, use real memory and set up the MMU so that each thread can only see its own map. But there are issues with that, and it isn't necessarily faster.

    --

    -WolfWithoutAClause

    "Gravity is only a theory, not a fact!"