Slashdot Mirror


Debate on Linux Virtual Memory Handling

xturnip sent us a good piece running over at Byte about Linux's VM. Somewhat more technical then the stuff we usually see online, this one talks about different VM systems, and the egos in the kernel. Its worth a read.

18 of 330 comments (clear)

  1. Re:It should all be configurable. by Anonymous Coward · · Score: 1, Informative

    This has come up in the kernel mailing list (see the summary at href="http://kt.zork.net/">kernel traffic. The conclusion was that it was simply too hard with the version of kbuild/config included with the 2.4 series.

  2. Re:To fork, or not to fork by scooby-doo · · Score: 2, Informative

    Essentially that is already going on. The Andrea VM is in Linus's tree now and Rik's VM is still in Alan Cox's tree. So by choosing the official Linus kernel or the -ac kernel you can choose which VM subsystem you would rather use.

  3. Re:OSS Power by BenHmm · · Score: 3, Informative

    Sure they could - provided all of the users of XP were the sort of people who don't mind downloading and recompiling a new kernel every two weeks.

    They're not. So Microsoft put these changes in point releases instead.

  4. Re:To fork, or not to fork by sshore · · Score: 3, Informative
    Why not let the 2 VM's compete and the users will decide?

    The problem is the duplication of effort and decreased manpower for each VM. Not only that, but any project that works closely with the VM has to test under twice as many conditions, and may require different code for each. Talk about a maintenance problem.

    It's certainly good to have competition to bring out the best in each system, but it would be horribly inefficient to keep it going in the long run.

    Regarding the users choosing - the users don't have the opportunity to choose only on the basis of the VM. It's not like they can apply the "VM patch" to the stock kernel to try out the other one, rather, they have to apply a fairly large -ac patch that changes a lot of unrelated things.

  5. Alan will be switching VMs soon... by rakarnik · · Score: 5, Informative
  6. Re:To fork, or not to fork by ethereal · · Score: 4, Informative

    Well, drivers eventually do get from the -ac tree into the Linus tree, you know - the whole point is that AC tries them out until they are stable enough for Linus. Not to mention that Mr. Cox does have some responsibility to provide RedHat with the best kernel he can, no matter what Linus thinks of it. The only weird thing here is that as far as the VM goes, Linus has picked up the more experimental code first. So people who always recompile the Linus kernel when they install a new distro may find that their kernel operates very differently after that.

    My naive thought is that the best way to do it would be to somehow modularize the two VMs so that it can be a compile-time or boot-time option, and let users try both on the same box to see which is better. However, I imagine this would be a ton of work to set up.

    --

    Your right to not believe: Americans United for Separation of Church and

  7. Re:It should all be configurable. by jacobito · · Score: 4, Informative
    That's not going to happen in the 2.4 series. The kernel hackers think that making the VM policy configurable would be a nightmare:
    Michael T. Babcock asked how ugly it would be to make Rik van Riel's and Andrea Arcangeli's Virtual Memory subsystem code into a compile-time option, so folks could try each one out as they pleased. Alan Cox replied simply, "Too ugly for words." Mike Fedyk suggested that it might be feasible in 2.5, and asked if there were a way to make it non-ugly. Marcelo Tosatti replied, "Even if its non-ugly, its non-easy. Way too much overhead. For 2.5 we'll probably be able to get people working together."
    This is from Kernel Traffic #139.
  8. Re:To fork, or not to fork by einstein · · Score: 3, Informative


    My naive thought is that the best way to do it would be to somehow modularize the two VMs so that it can be a compile-time or boot-time option, and let users try both on the same box to see which is better. However, I imagine this would be a ton of work to set up.

    this was discussed on the kernel mailing, (check out http://kt.zork.net). The general conclusion was that this would be really had to do with the current build/module system, but kbuild 2.5 has the ability to apply patches before building as long as the patches don't overlap.
    ---

  9. Re:ext3 in 2.4.x by Svenne · · Score: 2, Informative

    You're looking for this.
    I've been running ext3 since kernel 2.4.11 with these patches.

    --

    Slagborr
  10. Re:Make it a build option by Rik+van+Riel · · Score: 5, Informative
    I wonder what Rik has to say about the new "blessed" VM? If he thinks it's a better all-around VM, then the debate can stop pretty quickly I would think.

    Well, since you wanted to know ;)

    First let me explain that most of the time in the beginning of 2.4 was spent making the VM stable, stopping it from chrashing on highmem machines, etc... Speed improvements were a secondary thing, to do later on. Secondly, Linus is a very busy man and didn't seem to have the time even to apply critical bugfixes at times, so his kernel has had a big disadvantage over Alan's kernel.

    Around the time where the VM in Alan's kernel got stable, I was finally getting the time to work on speed improvements and Linus still lagged a few patches, suddenly Andrea surprised us all by posting the first version of his new VM online. An even bigger surprise was that Linus integrated this into the kernel within 24 hours, without even asking Andrea!

    As to why Andrea's VM is faster for desktop use ... it was optimised for speed on low to medium loads in exactly the same way the 2.2 kernel was. Note that this also means the server falls over quicker under high load and it is basically impossible to tune the system to run decently under all loads ... just like 2.2.

    My VM was slower for desktop loads, but since the thing stabilised I put in some time to make things faster and I seem to have mostly caught up with Andrea on the speed front now. The benchmark results posted on the linux-kernel mailing list seem to indicate that Andrea's VM is faster for some things, while my VM is faster for some other things.

    Personally, I think it is easier to make a solid VM fast than it would be to make a fast VM solid. This opinion was formed because of the living hell of the Linux 2.2 VM, which was undocumented and horribly subtle.

    In the future, I know I'll always be optimising for (1) maintainability, (2) correctness/stability and (3) performance, in that order...

  11. it's been ac kernels by MenTaLguY · · Score: 3, Informative

    RedHat ships an -ac kernel with RH 7.2, I think 7.1's was also an -ac kernel.

    Not pure -ac kernel, though, like most major distributions they also pull stuff from Linus and other kernel trees (there are others) so what they actually ship is really the "RedHat" tree.

    --

    DNA just wants to be free...
  12. Linux Kernel list link by commanderfoxtrot · · Score: 2, Informative

    This is a link to the kernel-traffic discussion with details and basic benchmarks: here!.

    --
    http://blog.grcm.net/
  13. Re:Compound errors by puetzk · · Score: 4, Informative

    FWIW, I think that Andrea's setup is modeled after the 2.2 VM (which he did a fair amount of work on tuning). So this is really more of a pragmatic revive-the-old-approach than it might initially seem.

    We all know this simplistic setup had scalability problems (like much of 2.2) but at least it worked right. Hopefully given some more time, Rik can really get his to go, since it seems more sophisticated/scalable long-term.

    --
    The Matrix is going down for reboot now! Stopping reality: OK. The system is halted.
  14. Re:ok, here's the thing by derF024 · · Score: 2, Informative

    i don't know what you're doing to that poor machine, but i have a 366 mhz laptop with 128 megs of ram and i can do all of those things under linus' 2.4.12 just fine. I play video under avifile instead of mplayer, but I never lose a frame, even across 10 mbit ethernet.

  15. Re:Why VM is bad by DaveWood · · Score: 4, Informative

    Alright, I'll bite. What you say is interesting, and I believe your comments regarding the changing relative costs of traditional VM paging algorithms make sense. The problem is that I suppose I don't understand the alternative you are proposing. I am certain this is due to my own ignorance; please give tolerance to my questions, and don't let my inquisitiveness be mistaken for criticism.

    You say, "The price of having virtual memory is terrible performance once paging between active processes starts." Assuming the VM algorithm is working correctly (big assumption lately), this means basically that you are trying to run more than your memory can handle, and have reached a load-shearing point with respect to RAM. From this I surmise that we might be talking about a "smarter" VM system that would shear better, perhaps by identifying the condition, and perhaps by better communication with higher levels - in other words, a different/better application-level interface to the VM system.

    And, indeed you say, "On a server which is processing short transactions, you're much better off throttling at the transaction launch point [than thrashing]... This requires some coordination between applications and memory allocation." So I think I understand so far.

    Then you say: "A basic problem with shared libraries is that you load in the whole library, needed or not, when you need any function from it." This is where I perhaps display my ignorance of the kernel, but that's not what I have understood was going on. My impression of things was that an application was loaded into memory by mapping its data on the disk into "virtual" memory, and that the VM subsystem arbitrated between real and virtual memory by retrieving from the disk only what blocks were "necessary" (i.e. being referenced by the executing code), and that this process naturally extended to libraries, and especially shared libraries (which need only exist in "real" memory in one location, despite being mapped into multiple "virtual" memory environments). Then again, perhaps it is a minor point - if the whole SO image is loaded and then unused pieces are unloaded or vice versa, it seems less important than the contention problem already on my mind...

    You say "VM ejects [unused bits of libraries and applications] from memory. That's what VM is really used for today." Absolutely! But regardless of the relative differences, isn't this process of migrating data between different "tiers" of data storage in the computer (each with a different latency, throughput, and cost/availability) always going to be necessary? While I can certainly see a major advantage in creating/improving ways for the application to communicate with the memory management system, is there really some fundamental alternative to the block-based VM "guesswork" that takes place in absence of directives set at compile time?

    You say: "So if you're actually page-faulting, VM is hurting, not helping." I am wondering if the VM is either hurting or helping per se, since the real problem is that you don't have enough RAM even for the "active" blocks you want to run. Of course, the quality of your VM will determine how close you can get to "perfect" utilization of your RAM.

    Then you say, "I'd argue that it's time to go back to a swapping model - all of an app has to be in before it runs." This is where you lose me, I suspect because I do not understand what you are really proposing. You go on to say "in support of this, apps need more information about the current memory situation. And they should be able to designate parts of their space as pageable, at least at the shared object/DLL level. Only a few apps (web servers, window managers) need much memory awareness, so that's feasible.Throttling needs to occur at a smart place, just before allocating substantial resources, such as CGI process launch or connection opening. By the time the VM system becomes involved, it's too late; resources are already overcommitted."

    At first it sounds as though you are saying that you want to eliminate swap altogether. I do not doubt that for some situations this is preferable - you want to have consistent performance and a sharp failure rather than the long thrash in the case where you use up your resources (and you mention QNX). However for general-purpose computing, I'm not so sure this is a good idea, even with RAM as cheap as it is. Depending on what you're trying to do, the slight loss in predictability and overall performance is vastly preferable to sharp failures for many, I would even say, "most" applications, even on the server.

    But moving on, it seems you are saying that what you dislike about the VM is that data is broken into arbitrary blocks - and so we should rely on application programmers to designate what it would be a good idea to swap out in case of memory contention ("designat[ing] parts of their space as pageable"). The problem I see with this is that you are relying on the programmer to do something that, if they do not do it, their program will appear to run anyway.

    This is therefore automatically classified a frivolous expense by commercial software developers, and even OS people working for the love of the game may be tempted into the same pitfall. This is superficially similar to the argument between malloc/free proponents and garbage collector advocates. Giving the programmer another "lower-level" thing to worry about gives them an opportunity to optimize it, but in practice we often find that on the balance we get more mistakes and the quality of the user experience suffers.

    The compiler probably could be coaxed to do it for you. But the various tradeoffs between compile time "pre-blocking" and runtime blocking might leave compile-time computations, whether in the compiler or even in the developer's head, looking inferior to what a good VM system can do while observing actual behavior in real-time.

    Your point about throttling occuring "at a smart place" is not lost - obviously many applications could benefit from more transparency by the memory management system in managing their affairs - apache users really don't want to have to guess how many processes/concurrent users should be allowed, they want apache to determine it for them based on what the system can handle. But most application programmers are not going to do this extra work or do it right, and a VM seems like what you need as a "default behavior," even if its benefits (and its audience - those who have enough RAM that they never need fear swap) are lessening over time.

  16. Re:What about the AIX VM? by Anonymous Coward · · Score: 2, Informative

    The AIX VM is bizarre and different - it is almost entirely unlike any other UNIX VM out there. It is hideously ugly, MP support was added as an afterthought and looks like it was written by a pascal programmer.

    It also relies heavily on a segment registers based architecture i.e Power/PowerPC (each segment describing 256MB chunks of virtual address space). You start getting into lots of fun when hitting/crossing segment boundaries.

    I have some doubts how well this maps to/performs on non-segment based architectures. IBMs inability/unwillingness to put an AIX product out on the Itanium after some heavy investment *may* be related.

  17. Re:swap space? by h2odragon · · Score: 2, Informative
    This is true.

    There was a comment on LKML not all that long ago that dealt with the detials; IIRC it was in response to someone wondering about using software RAID0 for swap.

  18. Re:unstable kernel, sigh.. by Anonymous Coward · · Score: 1, Informative

    re SMP: NetBSD right now has development only SMP code that works on the big-ass-lock model, like Linux 2.0.x and FreeBSD 4.x. This is not usable in any real production environment, as it wastes your second+ CPUs unless you are running 2+ 100% compute bound tasks. Linux 2.2 destroys all the freely available BSD's SMP wise, and Linux 2.4 competes very well with BSD/OS.

    re Networking: Linux 2.4 has scored some very impressive, world-beating SPECweb numbers. All we get about BSD is anecdotes. My own anecdote is that running apache, the difference between FreeBSD and Linux 2.4 on my hardware is lost below the noise floor, performance wise. I've never had any reliability problems with either. The networking comparison might have been valid in the Linux 2.0 days, but not today.

    re NFS: I will never use a NFSd that includes a lockd that simply tells every client that the lock is available. Its one thing not to implement something, but to implement it in a way that is 100% guaranteed not to work (obviously) is not acceptable.