Slashdot Mirror


Debate on Linux Virtual Memory Handling

xturnip sent us a good piece running over at Byte about Linux's VM. Somewhat more technical then the stuff we usually see online, this one talks about different VM systems, and the egos in the kernel. Its worth a read.

30 of 330 comments (clear)

  1. His favorite? by LinuxGeek8 · · Score: 4, Interesting

    He seems to think a lot in favor of the Andrea VM.
    That's ok to me, but he might want to take notice of the fact that linus didn't accept Rik's patches a lot and that 2.4.9 still had actually the VM of 2.4.5. The -ac tree was more up to date.
    So for a good comparison you'll need to compare the linus and the ac tree.

    --
    Well, don't worry about that. We can get you back before you leave. (Dr. Who)
  2. Re:It should all be configurable. by pwagland · · Score: 4, Interesting

    Sadly, no.

    While it is nice to be ultra configurable that leads to two seperate problems:
    1) Code maintainability
    2) User maintainability

    1) Is a serious problem. If you have to test the impacts on two different VM systems, and fully understand the impact that any change will have is a mammoth task.

    2) Users are not all technically literate anymore. Look at the recent slashdot story on microsoft losing there grip in Asia...

  3. AC kernels are not a fork by rakarnik · · Score: 5, Interesting

    Moshe Bar seems to indicate that Alan Cox is creating some kind of fork of the Linux kernel. Actually, -ac kernels are alwasys different from Linux kernels to some extent, since they include slightly more experimental code (e.g. ext3), or code that Linus has not had a chance to review yet. This way, the experimental code gets more testing before going into official Linus kernels. You can read more about -ac kernels at KernelNewbies.Org.

    As anyone following LKML knows, Alan thinks that drastic VM changes should be reserved for 2.5, and so continues to keep Rik's VM going. This actually helps quite a bit as both VMs get tested and there have been several comparative tests conducted leading to improvements in both VMs. Competition in this case is certainly helping Linux.

    Oh and for all you fork conspirators, here's another fact: Andrea Arcangeli also releases his own kernel releases, called -aa. I don't think any of these are considered forks; everyone understands that this way pacthes get more testing, "crosstalk" between the different flavors is a given.

    Much ado about nothing, IMHO...

    -Rahul

    1. Re:AC kernels are not a fork by bwt · · Score: 5, Interesting

      I don't think any of these are considered forks; everyone understands that this way pacthes get more testing, "crosstalk" between the different flavors is a given.

      Well, I disagree -- they are ALL forks. Any time you create a patch you are forking. The open source development model relies on perpetual fork and merge to accomplish its development. Most projects are forked this way into a development and a stable branch. I call this a "constructive fork". The AC kernels are perpetually different, but importantly they are generally about the same "distance" away, and "crosstalk" as you call it keeps it that way.

      As the "distance" increases, tension increases, and if it isn't resolved it will divide the development camp. If the crosstalk stops, and the idea of eventual merge is abandoned, you have a "true" fork. Developers have to pick sides, and the split can become permanent.

      I think the AC kernels have always been the former kind of constructive fork. If he never adopts the new VM, then his kernel will begins to diverge since developing for two VM's is hard. In this way, a small perterbation can become a full blown deviation that divides developer resources. I really doubt that the VM issue will divide the linux kernel team permanently. As AC's kernel gets farther away from the main line, the tension on everyone will increase. Eventually, I predict, the team will force one solution, but there is no guarantee.

  4. Re:Make it a build option by iamsure · · Score: 5, Interesting

    This was actually answered on the list, and summarized in a Kernel Traffic. As Alan Cox put it "It would be horribly difficult".

    While it sounds simple enough, as they said in the KT, the "replacement" of the VM was no small feat. It took 170 patches, which touched a very large percentage of the kernel.

    Imagine doing so TWICE (or more) and trying to code 'around' the issues for each.

    No.

    This way madness lies. While it is a nice idea, the simple truth is that it doesnt belong in 2.4.

    2.5 should have branched the second that the patches were considered. Linus didnt want to deal with bitching about 2.4 not being "good enough" and was impatient.

    So be it. The differences between Linus' and Alan's kernel trees (other than the VM) is growing VERY small this week, and will probably be 'close-enough' for a handoff within the next two weeks.

    The only question is which VM will end up in the 2.4 series. (NOT when Linus hands it over, but when Alan begins his releases of it).

    I would not be shocked to see Alan disagree with Linus, and stay with the 2.4.x (x10) VM, and I also wouldnt be shocked to see him agree with Linus and use the new VM.

    As to the patch on install idea, it is actually also discussed for kbuild in the 2.5 series.

    2.5 will be very excited, if we can only get Linus to get working on it, instead of muddying the stable-series water!!

  5. Re:It should all be configurable. by Flower · · Score: 4, Interesting
    While the idea is interesting I don't think it is practical. From what I've read on KT and in this article changing the VM forces design considerations on userland programs. It's additional complexity that most developers (and especially companies like Oracle) wouldn't appreciate. I also think it would raise support costs. At the very least I'd want some variable in /etc that would clearly state which VM was being used. For me at least, the issue is simplicity in favor of flexibility

    I think the biggest bone of contention in the community is Linus replaced the VM in the current stable version instead of pushing it into 2.5. Again, not being a kernel hacker and only going from everything I've read, this was a radical change. I'd almost be willing to say the latest kernels should be labeled 2.6 but that's just me.

    Oh, and finally, to paraphrase an old saying, give any tech-savvy user enough rope and they will hang themselves.

    At least that's what I think. :)

    --
    I don't want knowledge. I want certainty. - Law, David Bowie
  6. swap space? by archen · · Score: 3, Interesting

    From the article - " All earlier 2.4 kernels (since 2.3.12) needed at least the same amount of RAM in swap and then more to give you additional virtual memory. This meant that on an 8-GB server, you needed to put aside almost a full 9-GB disk just to be able to swap"

    Is this accurate? For just about everything I've always gone with 512Mb of swap, regardless of whether I had more or less RAM (not that I'm technically proficient or anything). This would also be a shortcoming of Linux since it would make it a pain in the ass upgrading RAM if you needed to allocate more swap space somewhere else each time. Well I'm all for the newer VM. Simple is good.

  7. Re:But it is in the 2.4.10 linus series by LinuxGeek8 · · Score: 2, Interesting

    Well, I was actually saying that if you compare 2.4.10 with 2.4.9, you're actually comparing 2.4.10 and 2.4.5.

    Even though the kernel had gradually evolved from 2.4.0 to 2.4.9, it was evident that the VM design was more of a liability than an advantage.

    Point is, the kernel did not gradually evolve to 2.4.9, but only to 2.4.5.
    Rik's VM has problems, but in the current ac tree it is doing quite well. Maybe as well or better then Andrea's VM.

    Anyway, let's hope that the best VM wins, if there is a best VM.

    --
    Well, don't worry about that. We can get you back before you leave. (Dr. Who)
  8. Re:Make it a build option by sql*kitten · · Score: 3, Interesting

    Do you think that Windows 2000 DataCenter has the same VM system as Windows 2000 Professional? I severely doubt it

    It's actually probably the same algorithm, with different parameters. That's how NT4 did it, in Workstation and Server versions. The kernel would note which version it was supposed to be on startup, then initialize the VM system differently.

  9. Re:FreeBSD 4.4-STABLE vs 2.4 comparison? by ZerothAngel · · Score: 2, Interesting
    The old benchmark is here, but as the poster above noted, the new benchmark is forthcoming.

    Although it will be comparing a moving target (Linux 2.4.x) to a moving target (FreeBSD 4.x), the results will be interesting. AFAIK, there weren't any major changes (I mean like VM changes :) in FreeBSD, so comparing the old and new benchmarks would give a good indication on how much Arcangeli's VM improves things.

  10. Re:Make it a build option by felicity · · Score: 2, Interesting

    I agree with most folks that this should have waited for 2.5, 2.4 should only be bug fixes at this point. That's why it's the *stable* kernel tree. Big huge changes (and replacing the VM system is defintately in this category) are not appropriate here.

    I wonder what Rik has to say about the new "blessed" VM? If he thinks it's a better all-around VM, then the debate can stop pretty quickly I would think.

    I think it'll be interesting when the handoff occurs. Will people have to deal with different VMs constantly during official releases? 2.4.0-2.4.9, a change in 2.4.10-2.4.13, and a change back for 2.4.14 and beyond?

    I also wonder which way major distros will go (since most people don't deviate from those kernels.) RedHat, for instance, usually bases themselves on the AC kernel tree (surprise) and then additionally patch it a whole lot more. While others take the most recent blessed kernel and go with it straight. Should be interesting.

    My overall view of this is simple though: Linus is God, in relation to the kernel, until he says otherwise. (to paraphrase Eric Raymond) If someone wants to maintain a patch against the now-blessed VM to revert to the previous behavior, fine. The decision has been made for the new VM though, let's continue on with things shall we?

  11. ok, here's the thing by Velex · · Score: 5, Interesting

    I don't care if you want to swear by the Linus kernel, but it gets killed by IO. I mean, come on, I'm using 2.4.12, and I can't rip a CD an play an MP3. Under the AC series, I can rip CDs, play MP3s, watch divx movies, surf the web, untar a file, and have a compile job going at the same time. Even for more usual setups, like viewing a video without doing anything else, the Linus kernel drops frames left and right, whereas the AC series laughs at it. Don't tell me I need to use mplayer with SDL, because I do.

    Because I treat my Linux box as though it were a Windows box (one of the reason I switched over to Linux for everything is that the widgets in GTK are prettier than the widgets in Windows -- it's nice to have people ask me how to get their desktops to look like mine and tell them they have to install linux) and I expect it run at least as well as a Windows machine, I must use the AC series. While I'm sure that the Linus kernel has it's applications, it is simply unacceptable for replacing the Windows kernel.

    Mod me flamebait or troll if you want, but I speak the truth. I have a Thunderbird-750 with 224 MB of ram, and I find it simply unacceptable when I can't run Quake or view movies under linux because of the Linus kernel. When mp3s skip because I'm moving some data around, it tells me that something is wrong with the Linus kernel. I'm glad that I had a friend who introduced me to the AC series, or I would have given up on linux. Plain and simple, politics aside, the end user doesn't care that he's being loyal to Linus the Great, he just cares that he can view that movie. If Windows outperforms linux in multimedia, he'll use Windows.

    --
    Join the Slashcott! Stay away entirely Feb 10 thru Feb 17! Close all tabs to prevent autorefresh!
    1. Re:ok, here's the thing by choward · · Score: 5, Interesting

      I use the Linus stock kernel on a _very_ similar setup (Duron 700, 384MB ram) and I don't have the problem you mention. One thing I've noticed is that with the Linus kernel, DMA is _never_ turned on by default, you must use hdparm explicitly at startup. Once you do that, skipping mp3s are a thing of the past.

      Running the hdparm tests,
      w/out DMA: 4.01MB/sec
      with DMA: 34.96MB/sec

      Quite a change.

      Craig Howard

      --
      -- Craig Howard
  12. Re:Why does the ac tree persist? by tubby · · Score: 3, Interesting
    The article seems to come out in favour of the new VM code. It makes it sound like it works much more effectively. So, why does Alan Cox continue with the old VM code? There must be some reason why he thinks it's better, or why go through the effort of continually patching the old code into the newer kernel?


    Basicly because nether of them are good in all conditions. Each of them is better than the
    other in some situations. eg, big systems, little systems or whatever. While i am on the kernel mailing list i haven't been following the discussions closely enough to say any more than that, but it's the gist of it. Also for a while Alan continuing to run the Rik VM gave people a way to run a later version kernel without being lab rats for the new VM, which really hadn't had much testing in 2.4.10/11.

    I think that this article overrates the AA VM by a large margin. It cant really be said to have solved the linux VM woes, which is what it implies.



    I have now used both of the .13 kernels and personally found the -ac vm to be better for my needs. On the other hand, since i brought 768MB of RAM today, my needs have just changed.

  13. Re:OSS Power by Xzzy · · Score: 5, Interesting

    > so Linus used Arcangeli's new VM code. Problem solved. Stable as ever.

    This is actually a wad of baloney. In normal applications (ie, running xmms, reading slashdot and maybe running gimp, with your glitzy desktop of choice), sure the VM works fine.

    In any SERIOUS situation though, 2.4 simply falls apart crying because the kernal handles memory so badly. One would like to think that in a low memory situation the kernel would start hacking off whatever was causing the problem so that it could survive. Well, it doesn't. It just freezes. This has been a situation I've been forced to deal with over the past month.. so while I'm not a guru on the subject, I have pieced together some bits of the story.

    Basically at my job we have a programming group that has mountains and mountains of source that they have to compile. Lazy as programmers tend to be, they also try to compile it over nfs on the machines with the biggest specs. To give a sense of scope, the resulting executable clocks in at around 500 megs. So basically, their build really stresses out the machine they're compiling on.

    The machine freezes EVERY time because of memory shortages. The kernel can't allocate pages for incoming network traffic, causing a backlog, causing processes to hang, causing further backlog.. then powie an unresponsive machine. The obvious solution would be to slim down the build but if anyone's ever worked with a developer suggesting that would be as useful as suggesting Hitler was a saint.

    From what I've gathered of the story, the 2.4 kernel was supposed to have this new grand VM that made dorking with the freepages file obsolete.. to the point where you can't even tweak the kernel with the freepages file anymore. The kernel was supposed to have this feature that would let it detect what processes were stealing all the memory and kill them off.

    NEWS FLASH they took this feature out because it was buggy.

    So what happens? The kernel just paints itself into a corner until the machine freezes. Only way to recover is to power cycle. This is why damn near every patch in the 2.4 line has the line "VM tweaks" in the changelog. Quite frankly the 2.4 VM is garbage, and only functions suitably well in non-intensive applications.

    It's been getting better with each dot release but it's still nothing you'd want to bet money on.

  14. question:malloc support? by CaptnMArk · · Score: 2, Interesting

    Does it support malloc correctly now (returning NULL when out of memory)?

  15. Re:It should all be configurable. by Anton+Anatopopov · · Score: 2, Interesting
    The design of a userspace program should not matter on what virtual memory system is in use.

    Sure locality of reference matters, but any decent VM design will take this into account.

    What I would like to see would be per-process VM algorithms. Like, you give an extra argument to the fork system call, and your new process has its virtual memory managed in a way which is optimal for your application.

    Stack based languages exhibit different behaviour to certain other types of languages, and most VM systems seem to be optimised for this general case.

  16. Re:To fork, or not to fork by Speare · · Score: 3, Interesting

    The problem is the duplication of effort and decreased manpower for each VM. Not only that, but any project that works closely with the VM has to test under twice as many conditions, and may require different code for each. Talk about a maintenance problem.

    And this would somehow not be the problem with a fork? Considering Linux vs *BSD is already a division of the pool of possibly alignable geeks, and considering both Linux and *BSD families continue to grow, innovate and expand, I think the problem is overrated.

    Organizations align on common goals and pursuits, by definition. If there were two or more unalignable goals in the VM, then either a fork or an unforked competition would be in order, and would have the same issues of reduced effort and increased maintenance chores.

    Personally, as a non-kernel developer, I think the different VM issues are probably overblown in the moment, and that the best approaches will forge ahead with some significant consensus in the mid-term. Until then, it's worth the experimentation it takes to decide what are the best approaches.

    --
    [ .sig file not found ]
  17. Re:It should all be configurable. by DarkMan · · Score: 3, Interesting

    Um, I think all the replies here I've read have missed the point. I don't think the poster was asking to be able to switch between the two VM's at complie time, but rather having one VM that was configurable.

    That would allow the system to be tuned at compile time for the large servers, and for the small desktops, without haveing to have a 'one size fit's all' solution.

    I've always felt that that would be the best answear. The reality is, however, that Andrea's VM would not allow for such a range of configurability, being a very simple, and thus easy to balance, system. That's not to put it down, often the simplest solution is best.

    However, Rick's VM is more complex, and can, in principle be made more configurable at compile (or even run) time. It would be a lot of work, but I think that that's the best way to get good performance across the wide range of platforms.

    For example, If I knew that my system would have to work with millions of very small files, and only read them once, then I would configure the VM to forget about caching the files, and keep anything that is used more than once in RAM. Or, of dealing with a computation, have large pages RAM to be swapped in or out that match with the arrays the computation uses, so that everything is pre-fetched. Yes, there are other ways of accomplishing these goals, but I think that that would be a good way to go.

    If nothing else, it acknowledges that a system with 32 Meg of ram and one processor has a very different VM needs from an Octuple processor system with 32 Gig of ram.

  18. Compound errors by Salamander · · Score: 5, Interesting

    IMO both Rik's code (RVM) and Andrea's (AVM) were accepted prematurely, and Linus's ADD is the root of the problem here. Everyone thought the 2.2 VM was broken, so he jumped on RVM when it really hadn't received adequate testing with various workloads. Then, when that didn't work out, he did something even worse by jumping on AVM in the middle of a "stable" kernel series when it was totally undocumented and even less thoroughly tested than RVM. That's just bad software engineering, regardless of the quality of Rik's or Andrea's work.

    Ideally, an "old-fashioned" alternative to RVM would have been maintained throughout the 2.3 process, as a fallback in case RVM turned out not to be ready for 2.4 - which was in fact the case. But this wasn't done, there was no alternative, and so RVM became the basis for 2.4. Once that decision was made it should not have been unmade by replacing RVM with AVM. Andrea's work should have been in the 2.5 tree, which should have been opened a long time ago to deal with precisely this sort of situation. 2.4 is not the last Linux kernel that will ever exist. We don't need to make it perfect. It would be far better to admit its imperfections, band-aid them as best we can, and try to get a head start on creating something better for 2.6. What we have instead is error on top of error, "not ready" replaced with "even less ready".

    To clarify, I have nothing but the highest regard for both Rik's and Andrea's work. Obviously they have different ideas and attitudes. Rik has drawn on many sources in his design, resulting in a system that is both very advanced and very complicated. The process of reining in the complexity is still incomplete, but I still have hope that some day Rik will be able to come up with something that's really awesome, and he has always documented his ideas thoroughly. Andrea, by contrast, is much more pragmatic; he wants something that works now even if it's somewhat more limited in scope (e.g. by being almost impossible to reconcile with NUMA). The dark side of that "pragmatism" is that Andrea has skimped on non-code activities such as documenting or explaining the basic ideas on which his system is based. Nonetheless, both have done great work and should continue to do great work...in the 2.5 tree.

    --
    Slashdot - News for Herds. Stuff that Splatters.
  19. Re:FreeBSD 4.4-STABLE vs 2.4 comparison? by eparusel · · Score: 2, Interesting

    Hmm, recently OpenBSD and FreeBSD (I'm not sure about NetBSD though) have added improved dirpref code (created by an OpenBSD developer(s)).

    When data is written with the new algorithm, subsequent reads and writes are on average faster (being conservative). People are seeing 6x improvements for certain tasks as well!

    So while there weren't any major changes to the VM in FreeBSD AFAIK as well, if the benchmark involves using any files on the disk, then it'll most likely be sped up...!

    Here's a link to the discussion on the FreeBSD-stable mailing list...

    and another link...

  20. Re:To fork, or not to fork by Anonymous Coward · · Score: 1, Interesting

    Too bad gcc still sucks ass. I went from using Apple's MrC on Mac OS 9 to using gcc on Mac OS X, and I'm stunned at the difference. Compile times are longer, binaries are bigger, memory usage is up, and my programs run slower. It's worse on all fronts! Now I know I'll just get modded down for this, because how dare anyone criticize an open-source project; such things are beyond criticism (at least around here). But the facts speak for themselves. I've gotten much poorer results with gcc than with just about any other compiler I've used.

  21. BSD VM by Anonymous Coward · · Score: 1, Interesting

    Why is so much time being wasted developing and re-developing a VM system when a very stable and robust VM system has existed for years in the FreeBsd system. Anyone thought about using that code (as per the terms of the license) as a starting point instead of this senseless writing and rewriting and sub-par performance?

  22. Bring yourself up-to-date by marm · · Score: 5, Interesting

    The machine freezes EVERY time because of memory shortages. The kernel can't allocate pages for incoming network traffic, causing a backlog, causing processes to hang, causing further backlog.. then powie an unresponsive machine.

    This was a common problem with kernels from about 2.4.1 up to 2.4.9 - the machine would gradually eat into swap further and further, failing to release no-longer-used swapspace, until it would go Out Of Memory (OOM) and attempt to kill the process that was eating all the memory. Frequently it would pick the wrong process to kill (sometimes even killing init) or would end up deadlocking.

    I agree with you - that is no way for a virtual memory system to behave.

    However, the Linux development process moves quickly once people get annoyed enough to actually do something about it, and that's precisely what has happened. Starting with 2.4.10, a new, simpler VM system has been used in the official Linus kernels, and I can say with some confidence that it has solved all the major problems with the 2.4 VM system, and continues to get significantly faster with every release.

    If you haven't actually tried a new kernel yet (and from your problems it seems that you haven't), I suggest that you do - it's made the world of difference for me.

    At the same time, the old 2.4 VM has lived on in the -ac series of kernels, and has become a great deal better there - some competition has made a big difference. Almost all of the major areas where it behaved badly have been fixed. However, my own impression is that it is still somewhat slower than the new VM.

    The choice is yours which you want to run - my own recommendation would be for the new VM in the official Linus kernels, but others may disagree.

    [OOM Killer]
    NEWS FLASH they took this feature out because it was buggy.

    Umm, no they didn't - it continues to exist in both the new VM in 2.4.13 and the old VM in the most recent 2.4.13-ac kernels. It does, however, now work correctly in both VMs. There are some philosphical arguments over whether killing processes is the best way of handling an Out Of Memory situation, but it is surely better than deadlocking the box, which is what most VM systems (including the famed FreeBSD's) do when OOM occurs.

    It's been getting better with each dot release but it's still nothing you'd want to bet money on.

    All I can say is that the new VM works great for me and lots of other people, even under extreme load. I can certainly understand your pain if you're using an older 2.4 kernel, but please try a recent one - the difference is astounding.

    If you're still having problems with recent kernels, then I'm sure linux-kernel@vger.kernel.org would love to hear from you - and would certainly be a lot more useful to you than ranting on Slashdot. Getting the VM right is now priority number 1 for the kernel hackers.

    1. Re:Bring yourself up-to-date by Laplace · · Score: 3, Interesting
      I will preface this by saying that I am not a kernel developer.

      Wouldn't it be possible to label some processes as OOM immune? For example, init could have this flag and would never be killed by the OOM algoritmn. Similarly, users could designate some processes more important than others. For example, my PDE solver which is crunching away at data for my thesis could be immune, but X could die if I ran out of memory.

      This whole situation has had an impact on my work. With all of the debate and argument flying around, I'm not sure which kernel to use, if I should upgrade, or if I should revert back to the 2.2 series. Oh well.

      --
      The middle mind speaks!
  23. How can you get around the VM problems in Linux? by cbwsdot · · Score: 2, Interesting
    Good article, but what can I do, the end user, to get around some memory management and scheduling issues? I don't know nearly enough to do any kernel tweaking and tailoring for my processor architecture and hardware configuration. Is it unwise to run a desktop system without swap space? I was thinking of giving the performance of my system a boost by leveraging the low RAM prices and eliminating the swap partition. Other /.'s and I have experienced swap space usage during times when some physical RAM was free. Is the kernel thinking "Well, these pages are *so* old and inactive enough that I'll just stick them in swap, regardless of the current system state"? It sounds like paragraphs 16 and 17 attempts to explain this but it flies over my head. Since you can buy 1GB of EEC PC133 RAM for about $150, I was thinking of buying 3GB of RAM, investing in some type of power backup and running my entire system in RAM. Can you do this?

    Despite this major issue, a Linux based system is still more stable and in most cases faster than Windows 2000. Also, like the article mentions, take into account that Linux runs many different types of processors. Linux on SPARC is good and 21264 Alpha performance is mind-blowing. Keep up the good work.

  24. Why VM is bad by Animats · · Score: 4, Interesting
    Virtual memory is way overrated, and probably should be phased out, both on servers and desktops.

    In Peter Denning's classic paper, The Working Set Model of Program Behavior, Denning concluded that paged virtual memory was, at best, good for an effective 2X increase in memory size. When he wrote that paper in 1968, memory cost about a million dollars a megabyte, so a 2X increase was worth the headaches of a VM system. Today, with memory at a few hundred dollars a gigabyte, it looks less attractive. It's not that expensive to double the size of RAM today. It can be cheaper than adding a fast disk drive just for paging. Uses less power, too.

    Disk as backing store gets worse as RAM gets faster. When Denning wrote that paper, the fastest backing devices (drums) rotated at around 10,000 RPM, for a 6,000 microsecond access time, and core memory cycle times were around 4us. So main memory was 1,500 times faster than backing store. Today, RAM cycle times have dropped to around 0.020us, but disks still top out around 10,000 RPM, making main memory 300,000 times faster than backing store. Thus, the relative cost of a page fault has increased by a factor of 200. This makes VM far less attractive today than it used to be. It's not getting any better, either.

    The price of having virtual memory is terrible performance once paging between active processes starts. That's called "thrashing". On a server which is processing short transactions, you're much better off throttling at the transaction launch point (as, for example, where CGI programs launch) than going into thrashing. This requires some coordination between applications and memory allocation, but where most of the memory is used by Apache and its child processes, that's a viable option.

    The main value of VM today is getting rid of dead code at run-time. A basic problem with shared libraries is that you load in the whole library, needed or not, when you need any function from it. This wastes memory, but after a while, the VM system will notice the unused pages and quietly release them. On a larger scale, the same problem is seen with dormant applications, a problem which has gotten totally out of hand in the Windows world, where far too much unwanted stuff launches at startup. VM ejects them from memory. That's what VM is really used for today.

    So if you're actually page-faulting, VM is hurting, not helping.

    I'd argue that it's time to go back to a swapping model - all of an app has to be in before it runs. That's where UNIX started; virtual memory didn't come in until 4.1BSD. But in support of this, apps need more information about the current memory situation. And they should be able to designate parts of their space as pageable, at least at the shared object/DLL level. Only a few apps (web servers, window managers) need much memory awareness, so that's feasible. Throttling needs to occur at a smart place, just before allocating substantial resources, such as CGI process launch or connection opening. By the time the VM system becomes involved, it's too late; resources are already overcommitted.

    The big win from this is repeatable latency at the memory level. With all the interest in reducing kernel latency at the CPU level, it's time to address it at the memory level too.

    QNX, the real-time OS, is worth looking at in this regard.

  25. What about the AIX VM? by Sara+Chan · · Score: 5, Interesting
    The discussion so far has focussed mainly on Rik's and Andrea's VMs. For the 2.4.x series, that's fair. For 2.5, though, what about considering the AIX VM?


    IBM has said that they will open source any part of AIX that we would like. The AIX VM works well under high stress. Obviously it could not just be put as-is into Linux, but there must be a lot of good ideas/algorithms in it that could--arguably should--be moved to Linux. Why isn't anyone looking at doing this?

  26. Core design issues. by bored · · Score: 2, Interesting

    Part of the problem with the design and redesign of the linux VM is an insistance with sticking with a few core design points that make it 100x harder to write. For instance, virtual memory overcommit spawns a whole bunch of ugly problems that must be solved in order to create a stable and fast system. If the core development team spent some time looking at past OS research then they would completly change their design criteria and a bunch of these problems would go away.


    Another perfect example is the OOM killer. If the VMM could properly balance the workload (and it didn't overcommit) then there wouldn't be a need for code to select the 'correct' process to kill. Since the VM cannot balance correctly, the kernel developers spend massive amounts of time trying to write an OOM that functions correctly in the case where the VMM is wedged. This time would be better spent fixing the VMM so it never got into these states.

  27. Answers to the above by Animats · · Score: 3, Interesting
    False. Any decent VM does demand paging. Only the pages that are needed are loaded from the executable.

    If you implement a VM that way, launching a program takes a very long time. You could, in theory, start out with nothing in memory and page-fault the program in. This requires one disk access per active memory page until enough is loaded for the program to run. The very first virtual memory system, for the Burroughs 5500, worked that way. It worked OK for batch programs, in an era when batch programs ran for minutes or hours, but was terrible for interactive work.

    Most operating systems today load most or all of a program at startup, let the app run for a while, then release the unreferenced pages. Deciding how much to load at startup is an interesting question. The BSD UNIX guess was the first N bytes of the executable, where N is a system tuning parameter. (What, exactly, does Linux do about this?) This is a mediocre guess, but an easy one to make. It's OK for long-running programs, but terrible for short-lived ones. Short-lived programs don't run long enough for the least-recently-used page info to become useful. If paging occurs in this situation, the pages removed are ill-chosen, since the LRU info isn't useful until the program has run for a while.

    Much of the memory-demanding things servers do look like short-lived programs. CGI programs and Java servlets are short-lived programs. So they're a bad case for a VM environment. If memory gets tight enough that short-lived programs get paged out, thrashing is almost inevitable.

    You don't want to page out at all on a server, except (maybe) under transient overload. As soon as paging activity starts, it's time to throttle back the amount of server concurrency until paging stops. This requires coordination between OS and application of a kind not usually seen in the UNIX world, though mainframe transaction systems have had it for decades, all the way back to CICS.

    Desktop systems have a different set of issues, but they don't look like classic time-sharing systems either. My main point here is that in the last decade, the memory usage behavior for most programs has changed considerably, but we're still using virtual memory concepts that were developed in the 1960 and mature by 1980.

    And remember, even when everything works right, you get the effect of at best 2X the memory.

    Here's a basic tutorial on VM, with emphasis on Linux.