Tuning Linux VM swapping
Lank writes "Kernel developers started discussing the pros and cons of swapping to disk on the Linux Kernel mailing list. KernelTrap has coverage of the story on their homepage. Andrew Morton comments, 'My point is that decreasing the tendency of the kernel to swap stuff out is wrong. You really don't want hundreds of megabytes of BloatyApp's untouched memory floating about in the machine. Get it out on the disk, use the memory for something useful.' Personally, I just try to keep my memory usage below the physical memory in my machine, but I guess that's not always possible..."
First swap!
"You really don't want hundreds of megabytes of BloatyApp's untouched memory floating about in the machine. Get it out on the disk, use the memory for something useful."
I absolutely despise the way that XP swaps out applications in order to make the disk cache larger. I have 1GB of RAM on my machine precisely so I don't have to wait two minutes for it to swap my web browser back in after it's swapped out... yet if I copy a 2GB file from one drive to another, the stupid operating system will swap out all the applications it can just to make the cache larger.
Please, please, don't take Linux down the same braindead route as Microsoft has done for XP. It's utterly insane to swap out my browser so that a 2GB file can be copied two seconds faster when I then have to wait two minutes for the browser to swap back in. Or at least provide some kind of '#define STOP_VM_SWAPPING_STUPIDITY' so that I can disable it.
Memory access vs. disk access I mean?
Back when P90s were the norm, was RAM access about as fast as disk access is today?
-- jaf
She had just procured a new Sun machine with 2 GB of RAM. Mind you, disk space hadn't grown all that significantly and you could still get machines with 9 GB drives.
The original practice was to make swap 2xRAM. So when the student she had putting the machine came to her and said, "What do I make swap?" she responded "Twice the RAM."
He said, "Are you sure? That's like almost half the boot drive."
She thought about it for a second and said, "Oh, yeah. I guess just make it the same as the RAM."
So this begs the questions: What do you make your swap now? When does your rule of thumb change? And remember when you could run a "fast" linux box on a P100 with 64MB of RAM and 128MB of swap?
I talk about stuff.
Personally, I just try to keep my memory usage below the physical memory in my machine, but I guess that's not always possible..."
No it isn't possible. With today's RAM prices I almost always have more physical RAM than the system requires. But, due to aggressive VM swapping there are still hundreds of megs swapped out to disk when there is no need at all. This means that those applications, when their time does finally come, are slow because they must be retrieved from disk first. It's really annoying sometimes. Yet, even with excess RAM turning off swap is disasterous.
I think developers could do more at a library level. For example.....dare I suggest using common sub libraries within libraries, that is people like KDE and GTK get thier heads together and say "are thier functions we include in our libraies that could just as well be linked to an underlying library?"
And if you thought that was boring you obviously havn't read my Journal ;-)
Personally, I just try to keep my memory usage below the physical memory in my machine, but I guess that's not always possible..."
I keep my memory usage much below the total ram on the servers, but in real life, the machine still swaps. This is because even tho the machine NEVER needs more ram than is available at any given time, over a period of days, it will use more than the available ram. It caches out the old data that was used 12 hours ago.
Unless you reboot every day (as in a client machine) you will use swap on just about any machine. Using swap is not bad. Using swap for a currently running application is not so good. This isn't a bug, its a feature. Reading data from swap after it has been accessed is still faster than reading new data from the drives, especially if its a network drive.
Tequila: It's not just for breakfast anymore!
You really don't want hundreds of megabytes of BloatyApp's untouched memory floating about in the machine...
/. so you're not supposed to!)
Why not? BloatyApp, if it's that bloaty is probably an object oriented program with template instantiation (or is by Micro$oft); these programs are notoriously huge, but also have notoriously poor locality of reference. The user will get better perceived response if you can keep more of BloatyApp resident.
If there's space in memory, I don't see the point of pre-emptively ejecting as many LRU pages of BloatyApp. (Of course, I haven't RTFA, but this is
Ah yes. It's all the fault of bloaty apps. Apps like database daemons and high-traffic httpd daemons. We've turned swapping off on our servers because we were sick of seeing almost a GB of cache/buffer memory, while it was swapping 500MB of shit to disk. Want a bloaty app? How about the linux Kernel? I love the thing, but Jesus Tapdancing Christ it would rather swap our starting DB process to disk, than free up the fucking buffers and cache. Is there something wrong with wanting it to give precedence to not swapping?
Read: Rabbit Rue - Free serial nove
With read-only & demand code-page loading and copy-on-write even bloatware really doesn't eat memory. And bloatware has to be frequently restarted to recover the memory it leaks.
Sure, there are some jobs that needs swap -- lots of seldom used memory pages.
But not mine. I prefer to save myself the complexity and performance headaches.
Yeah there is nothing I love more than coming to an idle X console session on box I haven't touched in a while and watching it grind itself into oblivion because everything has been paged out.
At what point does VM stop meaning Virtual Machine and start meaning Virtual Memory.
Or is it just the Virtual "M"?
Another reason to gradually and pro-actively swap things out, is that when another program later needs a lot of memory, your system doesn't come to a grinding halt because suddenly a lot of stuff has to be swapped out at once (followed by zeroing all that memory, since you don't want to have one program leaking data to another).
At least, that's the rationale I've read behind OS X's strategy of swapping things out long before all physical memory is used (and of keeping a pool of zeroed memory pages ready to fulfill most requests). Note that this does not require superfluous swap-ins if your reuse strategy is balanced properly, as the fact that something is swapped out doesn't mean that the memory which contained that data will be cleared/reused immediately (i.e., if it's needed again shortly afterwards, that page can be reactivated without having to go to disk).
Under most desktop OS'es, programs can even give some hints to the system regarding their usage of a memory region using e.g. the madvise() system call.
Donate free food here
Disk corruption is handled seperately. If you don't trust your drive controller, then you should spend some time fixing up the disk writing susbsytem. Disk writes should be guaranteed if they say they're succesful, except in the case of a power failure. At that point, swap integrity doesn't matter anyway. :)
Under the performance tab you can use the slider to tune the machine for 'foreground' or 'background' apps.
WURD!!
In modern Unices (including Linux) last I heard, the sticky bit is ignored since everything is simply demand paged.
Could not sticky bit be revived with some similar meaning? As in, "don't be too keen on paging these out?"
is to do something like AIX does, where I can use "vmtune" to customize the percentages of memory I devote (hard or soft limit) to filesystem pages or computational pages. This way I can tune for my Bloatware, tune for file copying a la XP, or tune for my DBMS, whatever suits me.... The developers could take it one step further and provide a simple, understandable (as opposed to AIX's) interface for configuration......
I dont mind the kernel swapping out "old" stuff to grow a huge disk cache. Really, thats OK, it makes things faster for disk hungry processes allright.
However, what I mind is the fact that the pages that are swapped out STAY there!
Why not aging the disk cache the same way the RAM pages are aged ? On an idle machine, the disk cache would gradualy decay and be replaced by the pages back from the swap, and the machine would be all responsive again.
It means that if the user leaves for lunch and a cron wants to eat all the disk, with some luck, when the user gets back, his machine is as responsive as it was when he left.
I have a laptop with 192Mb of ram, I always hate when 2/3 of the ram is "free" while it takes 10 seconds for the kmail window to move to the front. Even if the machine has been idle for hours.
I even regularly do a "swapoff -a;swapon" to claim back the cache!
Without a swap file, the kernel has no place to stick memory segments that are rarely used. They stay in resident memory la-la land until the process is terminated. Those segments add up over time and erode the memory available to the page cache.
Page caches are wonderful. When you load an application (like Firefox), you're not just getting the web browser. You're firing up a large chain of shared objects/DLLs that support the widgets, I/O, and components of the application. All of these components must be read into memory anyhow for program operation, so the kernel tends to just leave it in there for future use (the page cache).
When you shutdown Firefox, you're also releasing the necessity of those libraries (provided nothing else is using them). Those libraries also remove themselves from memory. If you load another application (like Thunderbird) that uses the same type of libraries, the kernel will not have to go to disk in order to fetch those libraries. It will instead opt for the page cache contents.
Turning off the swap file in the historic era of VM infancy was the best way to remove the hard drive bottleneck from system. The operating systems of yester-year did not have good page cache schemes that took advantage of all that unused memory. It is a little different now.
Applications are so modularized that they are broken up into a billions of smaller libraries so that code can be shared. This increases memory efficiency by keeping a shared library resident for multiple processes. These libraries are frequently accessed, more often than many people realized. Getting THOSE into memory is better than making sure my 500+ Linux applications stay resident.Notice that on a web server with 1GB of RAM the Linux kernel is still putting things out to swap. These processes that stay asleep for long periods of time do not need to waste the memory that page cache is currently using (892309504 bytes or 753.7MB). What would be stored in that 753.7MB of memory? The database that drives the website (instead of having to seek the disk). The entire web page hierarchy used to display pages on the web site. All the scripts that are used to display dynamic content on the web site (etc. etc.)
Now, if we subtracted from the page cache the amount of memory that was stored in the swap file, we would have over 200MB less that we could keep cached in memory. That could be an entire database that the kernel would then waste needless CPU cycles to fetch from disk.
The only advantage to turning off a swap file on these modern machines would be for a machine that runs only a select few applications, and not having a lot of processes in the background doing things.
Ayup
AIX uses LRU today, so when you do a backup, the system tries to keep all filesystems in cache (well that what was read last !!), and will happily swap your apps out to disk in order to do so (with default tuning parameter).
I fondly remember the days when I was running Linux with no swap, none whatsoever...
Unfortunately the current crop of best guess VM managers end up denying the end user the experience of their computer's peak performance. Coupled with the horrible state of application bloat, modern 'state of the art' hardware and software combine to give us less and less in terms of overall performance. Software developers throw more code at the cpu to add functionality with little or no concern for performance. And hardware manufacturers add more and more 'special instructions' and 'pipelining' which the majority of software is completely unable to access. If anything it's more like a bunch of dysfunctional co-dependents than an industry that is cogent as to what really needs to be going on. If the folks dealing with processors and the application software could take a page from the gamers (look at the high levels of integration between game engines and video cards) for example, and more effort put into consolidating functionality in dlls and shared libraries; we would be amazed at how truly fast these machines could perform.
"Can there be a Klein bottle that is an efficient and effective beer pitcher?"
Actually, I haven't been very impressed by the whole swapping thing under Linux lately. I'm running 2.4.22 with a 400MB swapfile.
Some apps _can_ make the system unresponsive enough to ignore keystrokes, which is *very* annoying. At other times, xmms will stop playing while the disk goes crazy... Switching from emacs to Firefox after 10 minutes usually takes an extra 5 seconds to redraw the window and load all the stuff again.
Running GNOME2 on this laptop is also quite noisy on the disk. It swaps all the time...
This point is useful, but only if free RAM is at a premium. For the most part, on servers, there will be sufficient RAM to support the on board applications, and the amount of free RAM remaining will be able to handle the variable load of a standard workday, if the server has been sized properly ahead of time.
On desktops, however, where the number and type of apps can vary much more widely, the need for free RAM is much higher. However, pushing all apps off to swap space as a default to keep as much RAM as possible free isn't necessarily an effective solution, since the OS will spend much more time performing swap operations that it necessarily should.
Perhaps a better solution might be an application where certian portions could be swapped off earlier, as they are less used, or not even loaded at all, while maintaining the core of the application in RAM, with "hooks" to the swapped out areas, to let the OS know that swap procedures are required. If the app could tell the os "I probably won't need functions c.d & e, so go ahead and put them in the swap file", it might lead to a good balance of swapping and performance.
Your server apparently believed that it was accessing that cache and buffer more often than that half gig of random pages. Do you have real reason to believe that it was wrong, or does that just "seem" bad?
In other words, do you have actual numbers to demonstrate that your kernel was making poor decisions, or are you only fairly sure that it was?
Dewey, what part of this looks like authorities should be involved?
Whatever swapping scheme is used in Windows, I do not know, and I don't care what it's called either.
What I can't despice, is the fact that I got >300MB free physical memory, and 20MB of the kernel is still swapped. Result? Do this, do that (any minor thing) and you have to wait for it to swap in.
In the end, I have never ever seen a Windows-system without a partially swapped kernel, even with tons of free RAM available.
This is just plain stupid, or is there some sort of "smart" explanation for this?
I, for once, would hate having to turn off virtual memory, just to have the system kernel loaded at all times.... And GOD BE DAMNED if Linux takes the stame stupid design-decision.
Not Buzzword 2.0 compliant. Please speak english.
I know what you mean, but in this case, it seems like your machine is making a reasonable guess: you haven't used Kmail in hours, so the odds of you wanting to resume using it at any particular instant is pretty low. On the other hand, reading from a drive is quite a bit faster than writing, so the penalty for incorrectly swapping out old pages when the system is idle is significantly less than incorrectly not swapping out old pages before users launch giant processes that want to allocate a lot of RAM very quickly.
Dewey, what part of this looks like authorities should be involved?
So way you want to do is:
So if the guy goes to leaving a big make running, it gradually pushed the big apps out while it runs. But if the big make completes, the apps start crawling slowly back in. If it hasn't finished when he comes back from lunch, he probably wants it to carry on running the make: since the CPU is at 100% load, he is probably not surprised it is sluggish.
Consciousness is an illusion caused by an excess of self consciousness.
I have only 256MB of memory in my Linux box, and I use no swap at all.
I run Mozilla, Gimp, Quake3Arena, etc, without a problem. I use Gkrellm to monitor the system, and I see the memory meter rarely goes above the middle.
But I don't run KDE nor Gnome, just Fluxbox.
The old rule of swap size = 2 times memory size is stupid. You just have to consider how much memory the sistem is likely to require at max load, and also how much swap it makes sense to have. For example, it's insane to have 1GB swap on a 128MB machine, cause if it's to use all that swap, sure it will be trashing and slow as hell. Or slower.
http://00f.net/item/14/
describe why swapping is _good_.
{{.sig}}
I've heard on a windows box you want to set your max page file size to 1.5x RAM. So if you're running 512MB of RAM you want your page file to be 768. From what i've noticed with linux or what most distros seem to use for defaults is 1x RAM. I notice my linux box hitting the swap file more than my windows system. However the windows box is always using the page file, even under idle situations when its been doing nothing all night. All be it very low. its stil using it. When i switch over to my linux system...no swap is being used until i open up an ap. Is this consistant with what everyone is saying?
See Sig! See Sig Zig! Zig Sig Zig!!!!!
One time, I had a disk corruption in the swap partition. When I booted the machine, everything went well, until I started opening applications. The machine swapped out more and more data, until it reached the first bad sector in the swap. It crashed quite spectacular.
Once I figured out what happened, I replaced the disk.
That was in the days of the 2.0 kernel. My machine had 16 MB RAM IIRC.
WWTTD?
Yeah there is nothing I love more than coming to an idle X console session on box I haven't touched in a while and watching it grind itself into oblivion because everything has been paged out.
What's wrong with that? The machine shouldn't sit and be ready for every possibility. If you've not logged in in X hours, it shouldn't expect you to suddenly log in. It grinds for a minute or so, and then it's fast again cos it's got what you want to use in RAM. Which is the whole point of swap, stuff you're not using gets paged out.
Best performance improvement I ever got with the 2.4 series kernels was shutting off swap. My machine immediately became more responsive. From that point forward, I wouldn't come back to the machine after an hour away and encounter a jerky X mouse cursor because the instant I turned off the screensaver the kernel had to page all 128MB of my applications back into the 512MB RAM because it decided buffer cache was more important than code.
The 2.4 VM changes causing this behavior were awful, and it's too bad that I have to sacrifice a large (disk-based) physical address space, but I'm not going to put up with my applications being paged out when I have 4x as much RAM as code I'm running. Just allowing the system admin to put a limit on the size of the buffer cache would probably solve most of my problems, but instead I have to turn off swap. Too bad.
[ home ]
As I write this, my process stack (ps) lists 181 processes and KDE's window list contains 50 entries, many of which are Konqueror and Konsole each with multiple tabs open. Among the "bloatier" apps currently running are OpenOffice, Acroread, VMWare Workstation (though, a VM is not currently running), Tomcat 5/JWSDP 1.3, Umbrello (KDE's UML modeller), jEdit, and 2 database systems (Firebird and IBM U2). This, incidentally, represents a typical "snapshot" of my two workstations (both AMD XP 2200+, w/1GB), at most times. Despite all of this, my current memory usage is 990MB w/ 180MB in buffers/cache, and only 320MB in swap (which is mostly holding my WinXPPro dormant VM!).
:-p
The bottom line: "swapping" got you down? By some more RAM! Prices may be on the rise, but it's still likely cheaper than your time.
"LinuX - Dropping the c u r t a i n on Windoze." -- Vee Schade, vschade at mindless dot com
- The process needs to read the page - no problem, one copy is in RAM just read it, and keep both copies.
- The process needs to write the page - no problem, you can modify the copy in RAM and discard the copy on disk. Notice that discarding the copy on disk doesn't require any disk access, as the list of swap allocations will typically be in RAM (it is much smaller than the swapspace).
- You actually need memory - no problem, discard a not recently used RAM page, you still have a copy on disk.
The only problem is, that you need to make the page readonly, so you can trap the write and discard the on disk copy. In other words don't do this for pages that are frequently changed. But usually you don't have many pages that are frequently changed, and you certainly don't want to swap out those you have. And should you occationally happen to swap out one, it is not really a major problem. It will cost you a pagefault, but no disk I/O. And a pagefault is compared to a disk I/O. A system that behaves like I have described here would use a lot more space than Linux typically does, but still it should be faster. I wonder why this isn't done more often, it is not like the idea hasn't been known for years.Another problem that many have noticed, and that isn't easy to deal with, is heavy diskaccess causing the cache to grow and stuff getting swapped out. Yes even some Linux versions suffer from this problem. A Red Hat 9 system I had running for months was really slow in the morning, because all the programs had been swapped out while cron jobs where running during the night. But you never know when it is a good idea to swap the stuff out and when it is not. When the disk access is going on, the process page might not have been used for hours. But still you might want it to be kept in RAM. File pages that have been accessed just once shouldn't be kept in cache for long time. But of course you shouldn't remove them unless the memory was needed for something else. Removing the pages too early is also bad, because you wouldn't notice, that this was really a page that was going to be accessed frequently. Some people are fanatic, and don't want process pages to ever get swapped out to make room for cache. That isn't a good idea either. You can really have process pages that may not be needed even once, do you want such a page to be kept in ram for months just in case? And notice how disabling swap is not going to solve the problem. You still have to think about memory mapped files, that in many ways must be treated like anonymous mappings.
Do you care about the security of your wireless mouse?
The universal IT answer of "It depends" applies here as well. Yes, having Mr. Bloaty App glob onto scads of memory that are then not referenced for long periods of time can have a negative impact on other apps if the system becomes memory constrained. And, Yes, if the memory manager swaps a bunch of unreferenced memory out to disk and Mr. User has to wait a long time for Mr. Bloaty App to become responsive because it was his memory that got swapped out, that's a problem, too. The ideal is to be able to address this (haha, bad pun) at the application level and not simply at a global level. This has been the standard on the mainframe (MVS, OS/390, z/OS) operating systems for a long time, where there is a very sophisticated virtual memory manager. If there are, say, a 100 apps and 2 of them are very sensitive to response time, most of them aren't, and 10 are just dead dogs you couldn't care less about how nice is it to be able to actually tell the system that? The 2 "loved ones" then receive preferential storage treatment at the expense of the other, "less loved ones" and the dead dogs are always first on the pecking order of who to steal storage from. The memory manager then is acting to maintain the responsiveness of the applications (the reasons we run OS's in the first place) to meet the needs and expectations of the user(s) (the reasons we run the Apps). Without that ability, arguing over "more swappy" vs. "less swappy" when it's only applied at a global, default, level is not especially productive except within the context of attempting to establish, perhaps, where the best general-use default happy setting is - for the general-use default system we all use (is that you? I know it's not me).
"The bigger the lie, the more they believe." - Det. Bunk
I've seen a number of posts echoing this point, overlooking one of the key reasons for swapping. It's not just because you're out of memeory for applications, it's because sometimes there are better things to be doing with your memory. Mainstream operating systems use otherwise unused memory to cache disk access, dramatically speading things up. If you've got an process that hasn't been run for a a while it may actually be more efficient to swap it to disk. This frees up memory to cache data that may be being hit quite frequently. inetd hasn't been needed for a while? Swap it out so that your disk cache is larger, benefitting your heavily used web server.
To be fair, when to make that trade off is very tricky and will never work perfectly 100% of the time. Inevitably you'll occasionally be burned by a bad decision. But there are real benefits. The real question is not how to turn it off, the question is how to improve it and perhaps how to allow users to tune it for their needs.
Search 2010 Gen Con events
"DisablePagingExecutive"=dword:00000001
"LargeSystemCache"=dword:00000001
The combo will keep your kernel in RAM (DisablePagingExecutive)and enforce a minimum reserve (4MB) amount of memory for it (LargeSystemCache), although that amount can grow dynamically. The LargeSystemCache may cause "Delayed Write Failed" errors, if it does, reboot in SafeMode and undo the damage.
More goodies here, many adjustments that can affect how your sytem divides between swap, cache and currently-running-apps:
HKLM/System/CurrentControlSet/Control/Session Manager/Memory Management
"ClearPageFileAtShutdown" "DisablePagingExecutive" "IoPageLockLimit" "LargeSystemCache" "NonPagedPoolQuota" "NonPagedPoolSize" "PagedPoolQuota" "PagedPoolSize" "PagingFiles" "SecondLevelDataCache" "SystemPages" "PhysicalAddressExtension" "WriteWatch"
HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Se ssion Manager\Memory Management\PrefetchParameters
"VideoInitTime" "AppLaunchMaxNumPages" "AppLaunchMaxNumSections" "AppLaunchTimerPeriod" "BootMaxNumPages" "BootMaxNumSections" "BootTimerPeriod" "MaxNumActiveTraces" "MaxNumSavedTraces" "RootDirPath" "HostingAppList" "EnablePrefetcher"
Can someone please describe any adaptive algorithms that could be used. Specifically, I'm thinking of:
- dirty marking unreferenced pages when swaped. if these mem pages are not used after the swap out, no need to swap them in again. i'm prety sure this already occurs
- for process using high swap demands, increase their weighted priority for pages, with a window-averaged for swaps. so then, my database process could hog under load while my less-used apps may swap because they're used less often. could be taylored differently for code versus data segs.
- page-impage comparisons to avoid holding duplicate code segment pages in memory. this plays with the concept of shared libs a bit, but could avoid duplicate pages, especially if this information is saved in a precalc'd hash table that is stored.
just ideas.
Is a difficult dilemma, but that's because an overly complicated scheme is used.
There is a simpler and more powerful scheme that unifies swapping and disk caches, while allowing applications to persist between reboots, all with better performance than current systems!
EROS implements such a system. Generally it is referred to as "Orthogonal persistence", and functionally it behaves as though the computer is "always on", and returns to the exact state it was in after a reboot. The thing is, with orthogonal persistence, the structure on the disk is not a file system, but just the application data.
Since applications no longer work with the disk explicitly (open/read/write) but only with one type of memory (persistent memory), the OS manages all of the disk I/O, and it allows it to eliminate almost completely the largest delay in disk-work - the seek time in all writes. Since all application memory is just mapped to disk transparently, all RAM is just considered a "disk cache", and the kernel does not have to make nasty tradeoffs between disk caches (of explicit open/read/write calls) and virtual memory.
Of course there is still a problem if large work-areas of unimportant applications "swap out" smaller areas of important applications. I suggest solving that by prioritizing pages to the memory manager. In a system like *nix it is not a problem. In more secure systems however (EROS, for instance), it may create additional covert channels between applications so it was avoided.
My approach has been to start all the needed services and then run this small perl script (which I named memhog.pl) to create a process that hogs quite a bit of memory:
;)
#!/usr/bin/perl -w
use strict;
my $a = "xxxxxxxxxx" x (131 *1024*1024);
This is just a quick hack, you may want to adjust the size to suit your memory size. The server from where this script was copied has 2GB of memory. Essentially I want to page out all the stuff that doesn't get used after starting the server and the related server processes. Of course, given enough time the server would swap out those pages anyway, but this method just does it quicker. After the script has been run, the server will gradually swap in those pages it really needs. OK, doing this may be pointless but I don't care
Follow your Euro bills at EBT
The funny thing about the whole thread is this: it eventually turned out that the guy who originally complained about swapping didn't have any actual swapping going on at all. What is usually happening is that pages from an *executable* are being dropped, and people incorrectly refer to this as "swapping". In Linux, pages from an executable aren't "swapped out" but are simply dropped and read back from the original executable as needed. The trouble is that once you then access some of those pages again -- say, when you're exercising the repaint code from some bloatware app for the first time in a while, because it's been in the background for a while -- they have to be faulted in one by one, and that takes a lot of time per page.
Sorry but this is not a complex equation and I think these guys are getting wrapped up in too many details and missing the big picture.
The harddrive is really, really, really, really fscking slow. In comparison Ram is really really really fast. As a result, you want to interact with the hard drive as little as you possibly can, and interact with ram instead as much as you possibly can (the only thing which beats that is interacting with only the cpu registers and avoiding ram and harddrive altogether).
As is, linux doesn't even begin touching the disk until there is only enough ram left to turn on VM. Now this has a negative impact when that limit is reached because there is overhead turning it on.. this impact is negligable and tweakable since you can wait and see if you hitting the limit, add more memory, see again and reevaluate until you simply aren't swapping. This is a good thing.
One of the worst things windows does is swap constantly. In fact beyond a certain point (read enough ram to run an XP desktop) the system swaps MORE if you have more ram. You boot the system with all uneeded services turned off and no startup processes and all the eyecandy turned off. And you've got 4gb of ram in the system, guess what, it's already using VM.
Maybe VM management itself could be tweaked more, but it certainly shouldn't be used unless it absolutely has to (and if you don't have enough ram and it has to all the time then it's not like you suffer that performance hit more than once).
The only exception to this I've found is a linux desktop running kde or gnome with about 256mb of ram, at that point the numbers seem to work out just about right(or wrong I should say) and the system is constantly turning VM on and off, encountering the performance hit again and again and again, with pretty much every operation you perform.
Swapping out any semi-active process, especially an interactive one, is utter foolishness which should be subject to termination grounds (unless of course you work for a big iron computer company that needs to boost revenue by purposefully slowing systems down). Any time you decide to clear memory and page a task out ... you are creating two I/O's ... one out, one back in ... and doing it with shit poor locality since the swap area is probably at the other end of the disk from any other active I/O .... just discarding entries in the file cache only requires a single I/O to recover it. .... so tell me ... who is the idiot that believes two I/O's to kill the performance of an interactive process is better?
Here's a solution to the whole debate -- make the sticky bit have meaning under Linux like it does on other UNIXen -- if the sticky bit is set on the execuatble, do not swap it. If it is not set, the executable is free to be swapped. This solves the entire debate (for instance, if you don't want the 'interactive' mozilla process swapped, set the sticky bit on the executable).
We either need some way of expressing this in code in a way that's exposed to the OS so it can avoid paging it out (and don't say 'well, just lock the pages containing it', think about how GUIs code & data are arranged--i.e. OO), or, perhaps instead of LRU, paging should pay attention to when a page was last used relative to an interaction--if you use some menu to bring up a dialog that triggers a long, memory-intensive process, that menu may have been used longer-ago than stuff happening as a result, but it's going to be used again first--LRU is the wrong model. (Also consider the interactive parts of other apps that you have open windows for.) Maybe it should still get swapped out, but swapped back in when some memory is freed up--'this thing was used very soon after an interaction, so it should be in memory if possible'. I don't know if any OSes attempt to page-in from swap before something is requested. Aren't some processors these days trying to prefetch memory requests based on patterns of memory access (not just prefetching code memory)? Same sort of idea.
Tuning for interaction isn't new to OSes; the VMS operating system's process scheduler treated 'interactive' apps differently from 'batch' apps (where, if I remember correctly, an app was interactive if it paused for I/O before using up its timeslice). I dunno if Unixen or Windows do things like that.