Linux 2.2 and 2.4 VM Systems Compared
Derek Glidden writes "I got sick of trying to figure out from other people's reports whether or not the 2.4 kernel VM system was broken or not, so I decided to run my own tests, write them up and post them online. The short conclusion is that the 2.4 VM rocks when compared with 2.2, but there's more to it than just that."
From the article:
These were pretty uninteresting - just sitting there watching the kernel compile. Except that at one point, while running the 2.4.13 kernel, the hard drive started grinding away with the drive light pegged on continuously, the display became extremely sluggish and quickly froze up entirely, and about ten minutes later, the hard drive light went off but the machine remained unresponsive, requiring a hard reboot. I don't think this was related to anything I was doing as I wasn't actually doing a compile run at the time - probably just a random occurance, but worth mentioning.
So the machine essentially BSoD'd, but it's not interesting?
it goes like this -
the 2.2 VM 'feels' better for single-user use (which i disagree with) but falls down under 'heavy' load. (which, as i've pushed 2.2 to load avgs above 250, i also disagree with)
but, anyways, that's what he's saying. i found 2.4 to be much nicer in the one userland task that frequently shows off the VM - mp3 decoding under load. 2.4 never, ever skips, 2.2, with or without ESD, skipped frequently.
YMMGV.
Indie rock lives! b-side!
As of 2.4.14pre the Andrea/Marcelo VM is definitely doing better on most workloads. Where the Riel one has advantages the right way is going to be to build on the Andrea VM.
The 2.4.14pre VM also seems to be taking the harder brutalisation test sets pretty well - something the Andrea VM didnt do in 2.4.10-13
If anyone out there has been having problems with 2.4 vm's (and there have been some problems) you should give 2.4.14-pre7 a try. Things have been moving fast on this front for a while now, but Linus thinks it's pretty much there now.
In his words, "In fact, I'd _really_ like to know of any VM loads that show bad behaviour. If you have a pet peeve about the VM, now is the time to speak up. Because otherwise I think I'm done.
This is an experimental patch to 2.4.13, and you shouldn't run it on an important machine, but the VM by all accounts is much improved.
Even Alan Cox (who has been maintaining the older Rik Van Riel version of the VM in his -ac patches) agrees that the new VM is faster and simpler, and he plans to switch to it as soon as it is reliable enough to pass his stress testing. (which should be really soon, it seems.)
(Yes, I spend an hour a day reading the kernel mailing list.)
Torrey Hoffman (Azog)
"HTML needs a rant tag" - Alan Cox
Quite often I get the feeling that Linux and BSD are doing quite a bit of "me-too"-isms in an attempt to catch up with the mainstream OSes--including MS, Apple and commercial Unixen.
I read this story and wonder if I should still be getting the same feeling -- isn't a VM subsystem mostly a solved problem? Or am I reading this wrong, and this is merely tweaking and specialization?
Since I'm no Alan Cox (I'm closer to Alan Thicke), I can't see the truth of the matter, but I get the feeling that we're doing a lot of walking in a tight circle on the path, while others have already left the forest.
Potato chips are a by-yourself food.
I don't think most people really thought the 2.4 VM was a worse performer than 2.2, especially under normal load, and in recent kernels even under high loads.
However, one thing that was not evaluated in this writeup at all was stability, especially on big boxes (as in SMP and >1GB) and heavy workloads. This is where neither VM really seems to be able to hang in there.
I admin seven such boxes that all have 2 or 4 CPUs, and 2 of 4GB of RAM, and during heavy jobs get hit pretty hard. These things run 100% rock solid with 2.2.19, I've achieved uptimes of greated than six months on all boxes simultaneaously. Basically, reboots are for kernel upgraded, nothing more.
With 2.4.x, I'm happy to get a few weeks, and sometimes much less. The machine practically always dies during heavy VM load. It has kept me from upgrading to 2.4 for several months now.
The real kicker, when 2.4 is running correctly, my jobs run as much as 35-50% faster than with 2.2, especially on the 4 CPU server, so I really wish the VM was stable enough to allow me to move.
Anyway, I'm sure it will get there sometime.
BTW, before people write about how they're 2.4 boxes have super long uptimes let me say that I too have some 2.4 based systems that have been up since 2.4.2 was released, but these machines are either single CPU, or SMP but with 512MB of RAM. 2.4 seems to run quite well in this case.
...so why does linux have 1 VM? it seems that 2 of them exist, and the BSD's have more... guys, "gimme a hunk" and "page fault" aren't exactly rocket science anymore, particularly with hardware support... the fact that there is room to make a big deal out of this is the problem, not the VMs.
so why does linux have 1 VM? it seems that 2 of them exist, and the BSD's have more... guys, "gimme a hunk" and "page fault" aren't exactly rocket science anymore, particularly with hardware support... the fact that there is room to make a big deal out of this is the problem, not the VMs.
If Linux was a microkernel I'm pretty sure this would be possible but from what I've seen of the Linux kernel code and from some discussions on the linux kernel mailing list, the virtual memory code is too entrenched in various parts of the code to be #ifdefed around with any sort of ease.
But the fact remains is that this VM holy war should have been resolved in the 2.3 series of kernels.
The number of major problems and architectural changes that are being made to the supposedly 'stable' branch of Linux kernel is really run amok.
I'm sure there's plenty of outrages to come as bad bugs are found in the volume manager and other new elements of 2.4
Conformity is the jailer of freedom and enemy of growth. -JFK
...is that Linux's warts are fully out in the open for all to see. Microsoft would never admit to such failings openly, even though anyone who has used Windows extensively is painfully aware of them.
And it's been my experience that you don't hear, "Linux never crashes" that much anymore. At least I don't say it anymore, whereas I used to. I would still say that a properly configured Linux box is more stable than any Windows box, but I've had my share of lockups. (on the desktop anyway. You'll notice my server has been up for 140+ days. The last reboot was when the power supply died [it's a patched together P166] which interrupted 243 days uptime)
All the mailing lists are public, and all of Linux's problems are there for anyone to see. This allows people to make truly informed decisions about which version of Linux to use, or whether to even use it at all. (Yes, of course these things are also true of *BSD) The current issues are why I still run 2.2.19 on my servers, since none of them get anywhere near enough load to need the newer VM's. "Stable" is definitely a relative term.
The link is in this article.
Best Slashdot Co
Heh, If I ever said anything like this at work I'd get a "Are you afraid to code" out of my boss...
Systems are getting more complex and demands on them get more complex. People have to plan harder and think harder to keep up. It's time to step up to the plate or go home. Years ago Linux didn't need a VM that worked with 4 way SMP and 2 Gig of ram. Today it does.
Claiming that modularity is (I'm paraphrasing) "too hard" comes off more as a cop-out than a reason. If it's hard, do a better job at it. I don't think anyone is claiming that the VM should only take a weekend to do... They just want it done and done right. The argument FOR modularity put forth above is a solid one. Whatever planning/archeteting/coding/testing and debugging that takes. Just do it[tm].
Contrary to popular belief, coding is not all free blow-jobs and beer. Those things cost MONEY!
Just a possibly interesting data point. I played Unreal Tournament with 2.4.12-ac5 and 2.4.10 (both from Debian). 2.4.10 always seems to work fine for extended periods of Lan play (as both a client and server), whereas the 2.4.12-ac5 choked after a few games--the swap ended up being nearly all used up.
Of course, this was hardly a scientific test, but I think I'll stick to something proven for now.
I can't speak for the differences between the two VM layers in the most recent versions of each, but I went from 2.4.7-2 (RH Roswell Beta stock kernel) to 2.4.13 (+ext3 patch), and I've noticed a serious improvement.
My notebook has 192 megs and 256 meg swap partition. I run Mozilla constantly (which seems to constantly grow in memory usage as the days pass). Prior to the upgrade (2.4.7-2, recompiled without the debugging options RH had on by default), swapping was ungodly slow. Switching between Mozilla and an xterm would literally take a few seconds waiting for the window to draw on the screen. Even switching between tabs in Moz was slow.
Since going to 2.4.13 with ext3 patch, I've noticed a serious improvement in this behavior. Under the same conditions (between 20 and 50 megs swap usage), switching between windows is quite fast. I don't know if it's faster at swapping per se, or if it's just swapping different things (eg, more intelligently deciding what to swap out), but for me it "seems" much faster for day-to-day usage.
I haven't yet tested in a server environment... but for desktop usage, 2.4.13 rocks. Can't wait for 2.4.14, to see if any noticable improvements are added...
Though it will be a non-issue once I add another 128 megs to this machine, it's nice to see such great VM performance under (relatively) low memory conditions.
NGWave - Fast Sound Editor for Windows
(Yes, I spend an hour a day reading the kernel mailing list.)
I'm too lazy to read LKML, but I am interested in the happenings of the Linux kernel development. I highly recommend Linux Weekly News' kernel news (updated every Thursday) and Kernel Traffic , an in depth summary of the week's LKML happenings (usually updated every Sunday or Monday).
cpeterso
Actually, 2^64 bytes is 1.84467E+19 bytes, or approx. 18.4 exabytes
I didn't want to get too detailed, but I always have quite a lot of things running. 100 Moz windows? Not quite, but I typically keep anywhere between 5 and 20 open at a given time...
And, anyone who uses Mozilla constantly knows that it doesn't seem to free memory, ever... it grows and grows. The "tabs" feature is great (so technically it's just the one "window" open) but unfortunately closing a tab does not free any memory. I rarely restart Moz because I'd have to then re-open all the pages I had going, etc...
Trust me, run Mozilla for a few days straight, you'll see quite a bit of memory usage (it's at 80 megs right now).
Then there's LICQ (11 megs for such a tiny lil program), KMail (10 megs), 5 terminals, BitchX, Nautilis (using 12 megs, I suppose just showing the desktop)... the list goes on.
So yes, I'm typically using the full 192 megs plus a bit of swap after running for a while, and the new kernel has, IMHO, improved performance under these conditions.
NGWave - Fast Sound Editor for Windows
Claiming that modularity is (I'm paraphrasing) "too hard" comes off more as a cop-out than a reason
Yes, let's make a fundamental, pervasive, part of the kernel hot pluggable, introduce tons of potential bugs and incompatibilities and create lots of work, all for questionable benefit. Engineering involves a series of tradeoffs and, in this case, most people see the pain as being too great for the potential payoff.
But, if you still want to, go right ahead. That's one of the cool thing about Linux: if nobody else wants to do it, you still can.
Reboot macht Frei.
Basically, the people who sided with Linus/Andrea were of the opinion that "things are so bad now [which was between 2.4.5 and 2.4.9] that a complete replacement of the VM even in a 'stable' kernel series is justified", and those who sided with Alan Cox/Ben La Haise/Rik van Riel thought that the existing VM code could be massaged and tweaked enough so that the performance would become acceptable and huge changes could be postponed until 2.5 opened.
This was complicated by the fact that between 2.4.5 and 2.4.9, the -ac series had accepted patches from Rik which weren't applied in the Linus branch and did in fact seem to be fairly successful in increasing performance through much less intrusive code changes. This was one of the main complaints of the Alan/Ben/Rik contingent; that the problems had already been largely resolved in the -ac tree, and that that approach should have been applied in Linus' tree before jumping to a complete rewrite.
At this point, a consensus seems to be forming that the Andrea VM is *much* simpler, the changes haven't had much adverse effect on other subsystems, and the performance is just as good or better than the VM in the -ac series.
The question of whether or not it should have waited until 2.5 is one that will probably never be answered to everyone's satisfaction, but at least will soon be academic.
I have some concern about how all these changes appear to businesses. Linux is supposed to be a high-quality alternative to other operating systems. Yet we've recently had a production kernel that failed to even compile, and there have been major upheavals to the "stable series" VMM, which sometime degrade reliability or performance. This isn't going to impress, and it gives competitors valid ammunition.
Almost all successful enterprises have to overcome a hurdle: how to transition from the "just for fun" start-up stage to the "managed with professionalism" stage. Ideally, fun is kept along with the professionalism. Note that even at Microsoft, B. Gates now lets S. Balmar (CEO) and R. Belluzzo (President and COO) manage things. Gates is the "chief software architect" and Microsoft is still his, but others do the managing.
Linux needs more professionalism in the management of releases, I believe.
I don't know about other shells, but tcsh has some features that provide other useless statistics. You can set a variable called "time" that can provide additional information. From the tcsh man page [edited]:
time: If set to a number, then the time builtin (q.v.) executes automatically after each command which takes more than that many CPU seconds. If there is a second word, it is used as a format string for the output of the time builtin. (u) The following sequences may be used in the format string:
%U The time the process spent in user mode in cpu seconds.
%S The time the process spent in kernel mode in cpu seconds.
%E The elapsed (wall clock) time in seconds.
%P The CPU percentage computed as (%U + %S) / %E.
%W Number of times the process was swapped.
%X The average amount in (shared) text space used in Kbytes.
%D The average amount in (unshared) data/stack space used in Kbytes.
%K The total space used (%X + %D) in Kbytes.
%M The maximum memory the process had in use at any time in Kbytes.
%F The number of major page faults (page needed to be brought from disk).
%R The number of minor page faults.
Particularly, if you could measure the number of swaps/page faults in
the different kernels it would be pretty useful. I've got $time set
to:
# have time report verbose useless statistics
set time= ( 30 "%Uuser %Skernel %Eelapsed %Wswap %Xtxt %Ddata %Ktotal %Mmax %Fmajpf %Rminpf %I+%Oio" )
He notes in his commentary that the 2.2 kernel "felt faster" or something to that effect, while still performing much worse in actual numbers. This is probably the manifestation of the a well-known effect in the world of performance: responsiveness and throughput are often mutually exclusive.
:). On paper it was fast as hell, in practice it
seemed very slow.
In other words, given fixed parameters, it's usually not the case that you can improve both responsiveness and throughput at once. If you don't change memory, CPU speed or I/O bandwidth, and your code is devoid of excess baggage which effectively reduces one of the above, it is almost a given that the two are a tradeoff. I've personally experienced this numerous times in my own performance work, and have read the research of others that corroborate it.
Here are some really interesting fundamental examples. One company I worked at lived and died by disk performance benchmarks, in particular the Neal Nelson benchmark. This test ran multiple concurrent processes, each of which read/wrote its own file. The files were prebuilt to be as contiguous on disk as possible so that sequential I/O operations wouldn't cause disk seeks. By the nature of the test, though, seeking would happen a lot because you had N processes each reading/writing a different contiguous file. So, you lost the benefit of the contiguousness. Until, that is, we came up with a way of scheduling disk I/Os which, given a choice of many pending I/Os in a queue, favored starting I/Os which were close to where the disk head happened to be. This wasn't your father's elevator sort! The disk head would hover in one spot for extended periods, even going backwards if necessary to avoid long seeks. It was a bit more sophisticated than that, but those are the basics.
The effect was, if a process started a series of sequential I/O operations, such as reading a file from beginning to end, no other process could get much of anything through until it was done. So what did this do to performance? Well throughput shot through the roof because disk seeks were nonexistent. The test performed beautifully, as it only measured throughput, and we consistently won the day. However, I/O latency for the processes that had to wait was extremely high, sometimes on the order of minutes.
Needless to say, these "enhancements" were only useful for benchmarking, or perhaps for a filesystem on which the only thing running were batch processes of some kind. It would feel slow as molasses to actual human users, verging on unusable if anyone started pounding the disk. You can't wait 60 seconds for your editor to crank up a one-page file (well, okay, we didn't use MS office in those days
One paper I read on the subject of process scheduling postulated that by increasing the max time slice of a process you could improve performance. The idea was that you would context switch less, would improve the benefits of the CPU cache, and so on. They increased the time slice to something above 5 seconds and ran some tests. Of course, the throughput improved by some nontrivial amount. Predictably, though, the system became unusable by actual human users for the same reason as in my disk test example.
The other extreme would be absolute responsiveness, in which you spend all your time making people happy but not getting any real work done. An example of this would be "thrashing", where the kernel spends most of its time context switching and not actually running any one process for an appreciable amount of time.
The sweet spot for the real world is somewhere inbetween, perhaps a little closer to the throughput side of the spectrum. It sounds like this may be the direction they've gone with the 2.4 kernel, though I'm sure they've done a lot of optimizing and rearchitecting to improve performance overall.