Swap Performance in Linux
GizmoDuck writes "I'm working in a computational chemistry lab, and we find ourselves using memory and CPU hogs like Amber and Gaussian. The CPU hogging isn't a problem, thanks to Condor, but when submitting one of the jobs that request (and pretty much require) all the physical RAM in the machines, Linux promptly starts swapping so hard that the mouse pointer in X stops moving, NFS and NIS halt, and things don't get back to normal for five minutes. I've tried toying a bit with the settings in /proc/sys/vm/kswapd to no avail. I've done some poking around on the 'net looking for answers. Faster disks and swap partitions at the beginning of the drive aren't really an option at this point. I haven't found a good solution yet. I was wondering if the /. community has any input on how to keep the system from locking during periods of necessarily high swap activity?"
2.4.x
I still have no idea why Linus used 2.4 as a development tree. Go back to 2.2.x, no swapping problems going on there.
By the way, does anyone know the command to flush the swap partition?
Maybe try out the preemptible kernel patch?
My personal experience is that it has helped my workstation's interactive performance noticeably for big ass c++ compiles and periods of lots of disk activity (big apt-get dist-upgrades). Thankfully, I'm no longer doing the big ass c++ compiles, so it's not as big of an issue as it used to be :)
It should improve interactive performance (i.e., your mouse will start moving again :) ) when load is high. Also, running your background process nice'ed will be helpful.
You might also consider a crazy idea of having swap file on NFS -- you'll get (if your network is decent) almost the same bandwidth as you get when accessing (older) disk, but much higher latency (this will put your background process in disadvantage compared to your interactive processes).
Hope this helps.
Paul B.
If your program(s) push Linux to the point where it actually runs out of available RAM faster than it can free it up, then "all hell breaks loose". It has to swap something out, and just about every program is eligible to be swapped out. That includes GPM (if you are on a virtual console) or X (if you are in X Windows). You need to account for all of these things to determine your RAM needs. Add up the memory usage of all your active programs, plus the buffer demands they have doing disk I/O, plus the kernel, and you need that much RAM. If the program is doing a LOT if disk/file writes, you can expect the buffer demands to be the majority of this, too (because the kernel believes what you just wrote you might soon want to read back, so it tries to keep lots of it in RAM even if that means swapping out GPM and X).
now we need to go OSS in diesel cars
Is switching to FreeBSD an option? The virtual memory management there is much better than in Linux under stress.
The best way to handle this(or at least the best way I handled a similar situation) is to combine Robert Love's Preempt patch and Ingo's Scheduler.
They will significant increase high load user performance, keeps the system from running away with itself. If your feeling really, adventuresome you could also throw in Rik's Rmap VM...I have done very little testing with it, but I hear alot of reports that it helps.
there are all available in the authors respective directories on Kernel.org riel,rml,mingo
Power Corrupts,Absolute Power Corrupts Absolutely, leaving one person(group)in charge is absolutely corrupt.
Unfortunately, you're out of luck. The current linux VM (in later 2.4 series) is fine for low to medium load systems but falls apart on high load systems. The previous VM (early 2.4 series) is a good design but isn't really ready for production.
I would suggest buying more RAM (it's cheap) if you aren't already maxed at 4 gigs (x86). Alternatively switch to FreeBSD which has a very stable efficient VM. Any source should recompile without too much trouble and it can run linux binaries at almost full speed!
Best slashdot comment
First of all, I would recommend trying the preemptible kernel patch and even the low-latency patch. It seems like an obvious enough suggestion, but some will tell you that these patches should not be used in servers where throughput is important, and that is correct... in some cases. It has been shown, however, that in most cases the preemptible patch increases performance and throughput. I have not heard of any such testing on the low-latency patch, as I am new to it.
/proc/sys/kernel/lowlatency. An additional patch may be required to allow these two to work together, but I am unable to locate it currently.
In my testing, these two patches have been a big help, especially on my P166 system with 48MB RAM.
Also, you say "faster drives" and repartitioning are not feasible ATM, but how about multiple small drives? As shown in this howto, the linux kernel has support for striping data to swap disks, just by specifying multiple swap entries in fstab.
Then again, if you're not on SCSI, trying to stripe to the swap drives won't be much help anyway, as RAID over IDE for _speed_ usually is just crap.
That last suggestion may not be for you, but definitely try the two patches. It should also be noted that preempt is a compile-time option, and there is also a compile-time option to control the low-latency patch through
XML is like violence. If it doesn't solve the problem, use more.
just a shot in the dark here, but can you just give a lower priority to those applications in order to keep the workstation usable while doing this work?
I can't recall the command line option off the top of my head but I know using Gtop, you right click the app, and pick renice, then set it to 1 instead of 0.
"The Most Fun Possible on 4 wheels" is at SunBuggy in Las Vegas
Did you miss all the 2.4 Linux VM Stories?
I suggest build/installing the latest kernel with the aa VM (the default VM, since 2.4.10). If you still have VM (Swap) problems then go get the latest rmap VM patch and try that.
The kernel VM (Virtual Machine) is what manages memory and sawp, btw.
And if u did miss all the VM stories, a summery:
at the start of 2.4 a new fancy mv was put in to action, using something known as reverse mapping. this was very clever but it wasn't quite ready and there were teathing troubles then suddenly (2.4.10) Linus switched VM to one similar to that of 2.3 (with some updates and a few features from the previous 2.4 VM) This started a big fight, which caused concerns (such that it may split the linux comunity)
which is better i dont know some swer by one other swer the other. but unless ur using RH 2.4.9 kernel i would not recommend a pre 2.4.10 kernel.
however you may need to experiment which is best the VM now in 2.4 (to stay) or rmap, u should try both and see
steps
Install 2.4.[17,18,19]
try it
if it fails u try the rmap patch
Do you really have to be using the machine while it's calculating? If not, what about shutting down X and any other memory-hogging system components? Unlike on Windows you do have the option of turning off that expensive GUI.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
What I'd like to see is something along the lines of some kind of LRU which gently starts swapping data back into memory from swap when memory becomes free. There's nothing like having VMWare sitting in swap since you stopped using it an hour ago to do some other work and then jumping back and having to wait the 5-10 seconds of heavy disk activity to resume work there.
As for those saying "don't use swap at all" -- that's crazy talk. I'd rather have an app or two go to swap instead of being outright killed by the VMM when it needs an extra meg or so. If I'm not mistaken Linux tends to pick the big memory eaters to dump to swap over the little guys so if you start a compile... there goes VMWare... or your IM client... or Konqueror... lots of fun. :-)
if you have enough RAM, it won't even use the swap. so if it is using the swap, and you take away the swap, you'll likely just run out of memory.
when the rain comes, they run and hide their heads. they might as well be dead.
Why don't you submit yhis query to the computational chemistry mailing list (see CCL)
Those people may be able to give you some sensible suggestions, especially with respect to those particular peices of software.
I believe that you can restrict the amount of memory that Gaussian uses via its keywords. When it requires more, it will handle the dumping of data to disk itself. Read the manual - I haven't used gaussian since g94 was the current version so can't remember..
How big is your AMBER simulation? I think I would run a smaller system... or even better... buy some more RAM given that it is dirt cheap nowadays.
AMBER's memory use is a bit heavy - you may have better luck with another MD package. Maybe NAMD? (Although I'd still vote for the "buy more RAM" option)
With that as a given, if your app needs all available memory, run top and lsmod to see what's using your memory and remove everything you don't need (usually by deleting the links to those processes in the /etc/???/rc5.d directory).
If you can't remove it, scale it down. For example /etc/inittab lists off the different virtual terminals that appear when you press ctrl-alt and a function key. If you never use this feature, try reducing this down to 1 or 2 terminals. Leave some behind just in case you need them later. To do this, just comment the higher numbered lines that look like this;
6:2345:respawn:/sbin/mingetty tty6
(NOTE: Removing these lines might not make any difference -- it all depends on the distribution.)
As for X (assuming you need it and are using XFree), try removing any Load lines in the modules section that you don't need and scaling down the display size, background images, and color depth. Another big area of savings is changing the window manager. FVWM usually is installed, and while it is ugly it is also fairly light weight when compaired to KDE, Gnome, and other popular full-featured WMS.
While these steps alone won't eliminate the speed problems -- the other comments might solve that -- the time you spend waiting might be cut way down.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
If you already are maxed out on ram then "nice" is your friend. You can try to squeeze out as much performance as you want, but if you don't have enough ram or ram is not an option then you just have to deal. Also make your swap partition bigger if it's getting full to quick, if 8 people are running gaussian, amber. Then obviously you need more than the "recommended" swap even with large amounts of memory. 4 gigs of memory seems like alot until you start doing heavy shit.. thats where you get a nice cheap 40-60-100 gig drive and make the whole thing primarily for swap. Problem solved. You might also want to write yourself a daemon that nice's based on order.. from -20 to 20. First in gets the highest priority.. subsequent processes get a lower priority and get reniced to a higher priority as the first process finishes.. This way if it's only a dual system even though linux is pretty good with multi-processor support you get even more efficient scaling. Worst comes to worst lobby for a couple of blades or netras or something and stack em.
You could try hdparm -u 1 which unmasks interrupts when the disk interrupt service routine is active. This often allows your mouse to continue moving even if the disk is busy dealing with swap. It's not perfect but it helps a lot. As others have suggested, also try the preemptible kernel patch but keep backups!
Scroogle
1) I can't seem to get on the CCL list. I couldn't find automated instructions and when I sent an e-mail to chemistry-request, nothing
happened.
2) We're already nice-ing things up the yin yang and using the 2.4.18 kernel with pre-empt patch with no noticeable results.
3) The machines must stay useable as they are also analysis and server machines in addition to computational boxes.
4) Machines are dual P3 1400s. Unfortunately, disks are EIDE and RAM is 256MB in the process of being upped to a gig. However, this doesn't change the fact that we'll be running some calculations that will use all of that.
4) We're not so anxious to buy 4GB of RAM for each machine until we're sure what kind of Beowulf cluster we're constructing and hence how much of our money goes to it.
The memory manager in Linux has lots of problems (as previous posters have pointed out).
Have you tried FreeBSD? Apart from being a better OS all round, the 4.x series has a brand new revamped VM subsystem that handles high memory loads very efficiently. I never have a problem with swapping on any of my machines (which range from 32mb, 64mb, to 512mb ram machines).
This isn't a troll. Sometimes a certain OS isn't the best solution for a job, and a different OS should be used. I use Linux for GUI/X type things, FreeBSD for heavily loaded servers (since it handles much better), and even Windows 2000/XP for other things. If those programs you use are linux binaries, FreeBSD can easily run them. If you have source, all the better. Recompile with all the specific optimizations for your hardware. (-O3, -mcpu=pentiumpro, -march=pentiumpro, etc)
D.
You can tell how powerful someone is by the magnitude of the crime they can commit and be able to get away with.
IMHO, the your box is underspecced (ram, ide harddisks) forv m.txt /proc parameters (eg. /proc/sys/vm/ overcommit_memory should be 0 (zero))
p tip2.html
the job you are doing.
Of course, you can try read:
/usr/src/linux[name]/Documentation/sysctl/
for some tunable
Since you are using ide disks, 'man hdparm' is your friend.
Check your kernel config for dma support of your mobo chipset.
Daniel Robbins (from gentoo linux) has written an interesting
article "Maximum swappage" http://www-106.ibm.com/developerworks/library/swa
Linux allow you to parallelize swap, just like a RAID 0 stripe
/etc/fstab:
/dev/hda2 none swap sw,pri=1 0 0
/dev/hdb2 none swap sw,pri=3 0 0
/dev/hdc2 none swap sw,pri=3 0 0
Eg.: spread your swapfile on two disks, with equal priority.
That way, you should in theory, double RW access speed for the
swap. Also, some gains could be gained, if the swap partitions
were moved from disks, that the OS and apps writes to.
But read the article.
Any change in available memory can have a drastic effect. The sum total of the changes should add up to a minimum of 10M on an untuned system (One example: Bonobo on Gnome uses ~3.5MB by itself, while a few Gnome terms with a large history buffer chew up an additonal 10MB -- not all of it shared. Just switching from a heavy weight WM to a light weight one and smaller helper apps would recover the bulk of this space. Other changes would only add to the savings).
That minimum of 10MB might be just enough to cut disk swapping down -- by how much it really depends on the application. If it's a single block of data, and no calculations are being done, no speed improvement will be noticed. If it's an in-memory array, the savings could be substantial.
Without giving it a try, or knowing the application's demands, nobody can say for certian.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
A new post points out that the systems had 256MB, so recovery of 10MB should make a substantial difference.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.