Swap Performance in Linux
GizmoDuck writes "I'm working in a computational chemistry lab, and we find ourselves using memory and CPU hogs like Amber and Gaussian. The CPU hogging isn't a problem, thanks to Condor, but when submitting one of the jobs that request (and pretty much require) all the physical RAM in the machines, Linux promptly starts swapping so hard that the mouse pointer in X stops moving, NFS and NIS halt, and things don't get back to normal for five minutes. I've tried toying a bit with the settings in /proc/sys/vm/kswapd to no avail. I've done some poking around on the 'net looking for answers. Faster disks and swap partitions at the beginning of the drive aren't really an option at this point. I haven't found a good solution yet. I was wondering if the /. community has any input on how to keep the system from locking during periods of necessarily high swap activity?"
Have you actually tried without a swap partitioun? I don't know it it makes a change here, but i think it's worth a try if you've got enough RAM in you machine!
Life sucks.
The memory "manager" in Linux has lots of problems.
Use a different operating system.
2.4.x
I still have no idea why Linus used 2.4 as a development tree. Go back to 2.2.x, no swapping problems going on there.
By the way, does anyone know the command to flush the swap partition?
Maybe try out the preemptible kernel patch?
My personal experience is that it has helped my workstation's interactive performance noticeably for big ass c++ compiles and periods of lots of disk activity (big apt-get dist-upgrades). Thankfully, I'm no longer doing the big ass c++ compiles, so it's not as big of an issue as it used to be :)
It should improve interactive performance (i.e., your mouse will start moving again :) ) when load is high. Also, running your background process nice'ed will be helpful.
You might also consider a crazy idea of having swap file on NFS -- you'll get (if your network is decent) almost the same bandwidth as you get when accessing (older) disk, but much higher latency (this will put your background process in disadvantage compared to your interactive processes).
Hope this helps.
Paul B.
If your program(s) push Linux to the point where it actually runs out of available RAM faster than it can free it up, then "all hell breaks loose". It has to swap something out, and just about every program is eligible to be swapped out. That includes GPM (if you are on a virtual console) or X (if you are in X Windows). You need to account for all of these things to determine your RAM needs. Add up the memory usage of all your active programs, plus the buffer demands they have doing disk I/O, plus the kernel, and you need that much RAM. If the program is doing a LOT if disk/file writes, you can expect the buffer demands to be the majority of this, too (because the kernel believes what you just wrote you might soon want to read back, so it tries to keep lots of it in RAM even if that means swapping out GPM and X).
now we need to go OSS in diesel cars
You didn't actually say, but I would assume that you are not using EIDE or have a relatively high amount of RAM (512M+) in these systems. Otherwise, have you recomplied the kernel to have only what you need? Compressing the swap partition probably wouldn't give anymore performance (as it would be wasted on CPU Time.
What are the system specs?
Is switching to FreeBSD an option? The virtual memory management there is much better than in Linux under stress.
The best way to handle this(or at least the best way I handled a similar situation) is to combine Robert Love's Preempt patch and Ingo's Scheduler.
They will significant increase high load user performance, keeps the system from running away with itself. If your feeling really, adventuresome you could also throw in Rik's Rmap VM...I have done very little testing with it, but I hear alot of reports that it helps.
there are all available in the authors respective directories on Kernel.org riel,rml,mingo
Power Corrupts,Absolute Power Corrupts Absolutely, leaving one person(group)in charge is absolutely corrupt.
Unfortunately, you're out of luck. The current linux VM (in later 2.4 series) is fine for low to medium load systems but falls apart on high load systems. The previous VM (early 2.4 series) is a good design but isn't really ready for production.
I would suggest buying more RAM (it's cheap) if you aren't already maxed at 4 gigs (x86). Alternatively switch to FreeBSD which has a very stable efficient VM. Any source should recompile without too much trouble and it can run linux binaries at almost full speed!
Best slashdot comment
First of all, I would recommend trying the preemptible kernel patch and even the low-latency patch. It seems like an obvious enough suggestion, but some will tell you that these patches should not be used in servers where throughput is important, and that is correct... in some cases. It has been shown, however, that in most cases the preemptible patch increases performance and throughput. I have not heard of any such testing on the low-latency patch, as I am new to it.
/proc/sys/kernel/lowlatency. An additional patch may be required to allow these two to work together, but I am unable to locate it currently.
In my testing, these two patches have been a big help, especially on my P166 system with 48MB RAM.
Also, you say "faster drives" and repartitioning are not feasible ATM, but how about multiple small drives? As shown in this howto, the linux kernel has support for striping data to swap disks, just by specifying multiple swap entries in fstab.
Then again, if you're not on SCSI, trying to stripe to the swap drives won't be much help anyway, as RAID over IDE for _speed_ usually is just crap.
That last suggestion may not be for you, but definitely try the two patches. It should also be noted that preempt is a compile-time option, and there is also a compile-time option to control the low-latency patch through
XML is like violence. If it doesn't solve the problem, use more.
just a shot in the dark here, but can you just give a lower priority to those applications in order to keep the workstation usable while doing this work?
I can't recall the command line option off the top of my head but I know using Gtop, you right click the app, and pick renice, then set it to 1 instead of 0.
"The Most Fun Possible on 4 wheels" is at SunBuggy in Las Vegas
Are you running a recent kernel? It's got a lot better in the newer 2.4 series. We replaced the original kernel in RedHat 7.2 with the errata kernel, and it is much better!
Did you miss all the 2.4 Linux VM Stories?
I suggest build/installing the latest kernel with the aa VM (the default VM, since 2.4.10). If you still have VM (Swap) problems then go get the latest rmap VM patch and try that.
The kernel VM (Virtual Machine) is what manages memory and sawp, btw.
And if u did miss all the VM stories, a summery:
at the start of 2.4 a new fancy mv was put in to action, using something known as reverse mapping. this was very clever but it wasn't quite ready and there were teathing troubles then suddenly (2.4.10) Linus switched VM to one similar to that of 2.3 (with some updates and a few features from the previous 2.4 VM) This started a big fight, which caused concerns (such that it may split the linux comunity)
which is better i dont know some swer by one other swer the other. but unless ur using RH 2.4.9 kernel i would not recommend a pre 2.4.10 kernel.
however you may need to experiment which is best the VM now in 2.4 (to stay) or rmap, u should try both and see
steps
Install 2.4.[17,18,19]
try it
if it fails u try the rmap patch
Do you really have to be using the machine while it's calculating? If not, what about shutting down X and any other memory-hogging system components? Unlike on Windows you do have the option of turning off that expensive GUI.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
What I'd like to see is something along the lines of some kind of LRU which gently starts swapping data back into memory from swap when memory becomes free. There's nothing like having VMWare sitting in swap since you stopped using it an hour ago to do some other work and then jumping back and having to wait the 5-10 seconds of heavy disk activity to resume work there.
As for those saying "don't use swap at all" -- that's crazy talk. I'd rather have an app or two go to swap instead of being outright killed by the VMM when it needs an extra meg or so. If I'm not mistaken Linux tends to pick the big memory eaters to dump to swap over the little guys so if you start a compile... there goes VMWare... or your IM client... or Konqueror... lots of fun. :-)
... since I always get flamed or moderated down when I say things like this on Slashdot, but what you're looking for is FreeBSD (with Linux emulation if you need to run closed-source stuff).
Why don't you submit yhis query to the computational chemistry mailing list (see CCL)
Those people may be able to give you some sensible suggestions, especially with respect to those particular peices of software.
I believe that you can restrict the amount of memory that Gaussian uses via its keywords. When it requires more, it will handle the dumping of data to disk itself. Read the manual - I haven't used gaussian since g94 was the current version so can't remember..
How big is your AMBER simulation? I think I would run a smaller system... or even better... buy some more RAM given that it is dirt cheap nowadays.
AMBER's memory use is a bit heavy - you may have better luck with another MD package. Maybe NAMD? (Although I'd still vote for the "buy more RAM" option)
With that as a given, if your app needs all available memory, run top and lsmod to see what's using your memory and remove everything you don't need (usually by deleting the links to those processes in the /etc/???/rc5.d directory).
If you can't remove it, scale it down. For example /etc/inittab lists off the different virtual terminals that appear when you press ctrl-alt and a function key. If you never use this feature, try reducing this down to 1 or 2 terminals. Leave some behind just in case you need them later. To do this, just comment the higher numbered lines that look like this;
6:2345:respawn:/sbin/mingetty tty6
(NOTE: Removing these lines might not make any difference -- it all depends on the distribution.)
As for X (assuming you need it and are using XFree), try removing any Load lines in the modules section that you don't need and scaling down the display size, background images, and color depth. Another big area of savings is changing the window manager. FVWM usually is installed, and while it is ugly it is also fairly light weight when compaired to KDE, Gnome, and other popular full-featured WMS.
While these steps alone won't eliminate the speed problems -- the other comments might solve that -- the time you spend waiting might be cut way down.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
i'm one of those people who don't use swap space, my rationale is this, its sped up my computer activity a lot, i have a relatively slow computer (350) and i forked out $35 for another 256meg ram, now i have 384 and i feel its the best thing i've done. sure i don't use my entire ram on one program, but for general acitivity i find i only end up using about 128meg ram (lotsa programs working) and i get a lot of hard drive cache space so those mp3s keep rolling in. for a few dollars if i can be a lot more comfortable with my systems performance, why not? i have a 60 gig hard drive, but i bought it to store data not as temporary memory.
If you already are maxed out on ram then "nice" is your friend. You can try to squeeze out as much performance as you want, but if you don't have enough ram or ram is not an option then you just have to deal. Also make your swap partition bigger if it's getting full to quick, if 8 people are running gaussian, amber. Then obviously you need more than the "recommended" swap even with large amounts of memory. 4 gigs of memory seems like alot until you start doing heavy shit.. thats where you get a nice cheap 40-60-100 gig drive and make the whole thing primarily for swap. Problem solved. You might also want to write yourself a daemon that nice's based on order.. from -20 to 20. First in gets the highest priority.. subsequent processes get a lower priority and get reniced to a higher priority as the first process finishes.. This way if it's only a dual system even though linux is pretty good with multi-processor support you get even more efficient scaling. Worst comes to worst lobby for a couple of blades or netras or something and stack em.
while (1) fork();
Move your app to FreeBSD, OpenBSD, or Solaris x86.
/etc/security/limits.conf
I use this method. I specify default values for nice levels, amount of CPU time, amount of memory, etc.
This is a much better way. I will set up accounts with these restrictions. That way processes are running at a nice level of e.g. 5. X will be running at level 0 by default. This insures that you can always get back into X even if the app(daemon) goes nutty with e.g. a memory leak.
No messing with command lines etc. The defaults have already been set.
You could try hdparm -u 1 which unmasks interrupts when the disk interrupt service routine is active. This often allows your mouse to continue moving even if the disk is busy dealing with swap. It's not perfect but it helps a lot. As others have suggested, also try the preemptible kernel patch but keep backups!
Scroogle
Don't use positive numbers! Higher priority is attained with negative nice!
Use top and renice X to -10 or -20 to get _the interface_ snappier. You may also renice other non-interactive things down with a positive count to get them out of the way.
1) I can't seem to get on the CCL list. I couldn't find automated instructions and when I sent an e-mail to chemistry-request, nothing
happened.
2) We're already nice-ing things up the yin yang and using the 2.4.18 kernel with pre-empt patch with no noticeable results.
3) The machines must stay useable as they are also analysis and server machines in addition to computational boxes.
4) Machines are dual P3 1400s. Unfortunately, disks are EIDE and RAM is 256MB in the process of being upped to a gig. However, this doesn't change the fact that we'll be running some calculations that will use all of that.
4) We're not so anxious to buy 4GB of RAM for each machine until we're sure what kind of Beowulf cluster we're constructing and hence how much of our money goes to it.
The memory manager in Linux has lots of problems (as previous posters have pointed out).
Have you tried FreeBSD? Apart from being a better OS all round, the 4.x series has a brand new revamped VM subsystem that handles high memory loads very efficiently. I never have a problem with swapping on any of my machines (which range from 32mb, 64mb, to 512mb ram machines).
This isn't a troll. Sometimes a certain OS isn't the best solution for a job, and a different OS should be used. I use Linux for GUI/X type things, FreeBSD for heavily loaded servers (since it handles much better), and even Windows 2000/XP for other things. If those programs you use are linux binaries, FreeBSD can easily run them. If you have source, all the better. Recompile with all the specific optimizations for your hardware. (-O3, -mcpu=pentiumpro, -march=pentiumpro, etc)
D.
You can tell how powerful someone is by the magnitude of the crime they can commit and be able to get away with.
Is the kernel tuned to match the hardware correctly? For example, if running with IDE drives have you run hdparm to optimise the UDMA transfer mode and other IDE parameters? Is the kernel optimally compiled for your processor. Depending upon your distribution and hardware details you may need to change the kernel for maximum performance.
Simple solution to your problem. Put your swap partition on a RAM disk!
:)
Performance problem solved
You can accomplish anything you set your mind to. The impossible just takes a little longer.
Why don't you switch to a Windows based OS. At least that way when your PC's lock up for 5 minutes nobody will notice.
IMHO, the your box is underspecced (ram, ide harddisks) forv m.txt /proc parameters (eg. /proc/sys/vm/ overcommit_memory should be 0 (zero))
p tip2.html
the job you are doing.
Of course, you can try read:
/usr/src/linux[name]/Documentation/sysctl/
for some tunable
Since you are using ide disks, 'man hdparm' is your friend.
Check your kernel config for dma support of your mobo chipset.
Daniel Robbins (from gentoo linux) has written an interesting
article "Maximum swappage" http://www-106.ibm.com/developerworks/library/swa
Linux allow you to parallelize swap, just like a RAID 0 stripe
/etc/fstab:
/dev/hda2 none swap sw,pri=1 0 0
/dev/hdb2 none swap sw,pri=3 0 0
/dev/hdc2 none swap sw,pri=3 0 0
Eg.: spread your swapfile on two disks, with equal priority.
That way, you should in theory, double RW access speed for the
swap. Also, some gains could be gained, if the swap partitions
were moved from disks, that the OS and apps writes to.
But read the article.
Linux knows how to deal with up to 64G of memory, as a compile-time option. It enables the Page Size Extension and the Physical Address Extension, IIRC. This gives 36-bit addressing. It's still 4G per process, however. Just that the base address of each process's physical space can be addressed with 36 bits.
But I'm not sure what kernel version first included this feature.
i know little to nothing about freebsd memory management, so i can't comment there...but IMHO Irix was a breeze to tune for both CPU and memory intensive applications. the machines i worked on did HUGE finite element analysis (dyna3d, hypermesh, etc) and after tinkering a little with kernel parameters (low/high water marks, etc) i was able to squeeze a lot more performance out of the boxes.
That sounds nice until you realize it's your X server and xterms that'll get swapped out as soon as you leave your workstation for a moment. When you have to wait for *those* to come in over the network, you'll be crying for local swap once again.
Network swap is really only a useful option for diskless workstations.
--JoeProgram Intellivision!
These comments might be relevant on a 16MB RAM system, but I doubt you'd notice any difference on a 2GB RAM system.
Any change in available memory can have a drastic effect. The sum total of the changes should add up to a minimum of 10M on an untuned system (One example: Bonobo on Gnome uses ~3.5MB by itself, while a few Gnome terms with a large history buffer chew up an additonal 10MB -- not all of it shared. Just switching from a heavy weight WM to a light weight one and smaller helper apps would recover the bulk of this space. Other changes would only add to the savings).
That minimum of 10MB might be just enough to cut disk swapping down -- by how much it really depends on the application. If it's a single block of data, and no calculations are being done, no speed improvement will be noticed. If it's an in-memory array, the savings could be substantial.
Without giving it a try, or knowing the application's demands, nobody can say for certian.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
A new post points out that the systems had 256MB, so recovery of 10MB should make a substantial difference.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
Your system is thrashing. The folks that claim that not running X or nice'ing non-essential processes are just plain wrong. The small amount of memory freed up will not help you and nice'ing other processes will not help when the system is spending > 90% of its CPU time swapping or looking for swap victims. What you need is more RAM, boatloads of it. Max out your system and if that still isn't enough than get systems that can take more RAM. Upgrading to FreeBSD can help if you are close to having enough RAM and just need a little better efficiency to get you by. Another advantage to FreeBSD over Linux is that generally when Linux starts thrashing it never comes back; it thrashes itself into oblivion until you reboot while FreeBSD recovers after the memory hogs finally finish running..