Debate on Linux Virtual Memory Handling
xturnip sent us a good piece running over at Byte about Linux's VM. Somewhat more technical then the stuff we usually see online, this one talks about different VM systems, and the egos in the kernel. Its worth a read.
first post
for troll tuesday
Of course, there's the "state-sanctioned" version of how Rob "CmdrTaco" Malda got his most peculiar nickname ("'CmdrTaco' is a reference to a Dave Barry article where he lists places not to take a date. Among them is any place called 'The Commander Taco' or something like that.") and then there's the real reason for said nickname.
In order to explain it, we'll need to hop into the time machine and step back a few years to when Mr. Malda was still but a wee pup in college. So I'd like to take you back to the early 90's.
Rob was fresh out of Catholic high school, with dozens of years of Catholic guilt impressed upon and built up inside him. He'd snored his way through high school, tinkering around with nothing more than computers. Fact of the matter is that most girls don't like geeks and he was too repressed to figure out a way to approach those of the fairer sex. For that matter, he was even afraid to touch himself. Based on what little sex ed had been taught in school, he knew better than to engage in premarital copulation or let his seed touch the ground, lest he burn in hell or suffer the fate of Onan. It wasn't the bullying and the scornful glances that were the worst torture of high school, it was waking up in the middle of the night, his genitals throbbing, gritting his teeth, and clenching his perineum to abate the oncoming rush of verboten relief (after his mom found his stained underwear once, he had learned better).
But college represented the ultimate to a scrawny kid who wasn't quite sure how to play well with others. It was the chance to meet completely new people and to completely reinvent himself, a rebirth of sorts. And what kind of rebirth would it be? The kind that meant he would (finally) get chicks. Catholic guilt be damned! He'd heard that throbbing in his loins loud and clear and it was finally time to do something about it. But how? The answer was clear: in addition to the obvious major in computer science, he'd pick up a minor in art. Women would look at him and see not only the provider instincts that comp sci implied, but a sensitive heart and a mind with a flair for aesthetics as well, a heart with art in it. What lady could possibly resist such a formidable combination?
Unfortunately, all of them. A little scribble on paper saying you know art is no replacement for the ability to clearly communicate that you love it as he was finding out. Things at college were no different than in high school. The girls were still hung up on the football players, leaving him struggling to make a saving throw vs. pathetic geekdom. He discovered the concept of alcohol, figuring that cracking a sixer and his inhibitions meant that he'd be cracking their legs, but again, he turned into nothing but an incoherent mess.
A year went by and no luck, aside from ridding himself of some Catholic guilt: the liberal nature of campus and the wonders of the nascent world wide web meant that with a little (very little) peer interaction skirting around the subject and lonely hours in the dead of night on weekends when his roommates were out presumably dipping their wicks meant that he'd finally been able to overcome his irrational fear of masturbation. And boy, did he ever.
Saying that he took to it like a fish to water was an understatement: he masturbated as if he honestly believed that if he did it enough, he'd win a prize. Unfortunately discovering Usenet, he learned all manner of deviant masturbatory practices, of course convincing himself that it was all OK and that this was just practice for when he finally met Ms. Right, etc., etc. You can justify some things to yourself, but there shouldn't be any way to rationally justify getting your penis lodged in a beaker. Stupid stupid! What was he thinking? But the guy on alt.sex.masturbation had said that the sensation of a penis displacing a beaker full of warm olive oil was the most "realistic" feeling ever, so who was he to doubt? It was a heart pounding few minutes waiting to return to his normal, pitifully small flaccid state, hoping that his roommate wouldn't return to find him in such a grotesque state. His roommate was, of course, aware that Rob was wacking it like it was going out of style, but while that was mildly normal, there was something horribly wrong about having your member painfully lodged in a glass beaker. But things there all worked out and the beaker replaced his normal jitrag "hidden" underneath his bed. He even jokingly contemplated submitting the beaker half-full of swirled olive oil and rank seed as an art project, but thankfully thought better of it.
This was all foreplay to what would give him his nickname forever. Perusing alt.sex.masturbation after he'd mauled himself one afternoon while his roommate was still out, he came upon a life-changing post: the most realistic sex sensation, ever, guaranteed. Dozens of replies to the post over the next few days verifying that this was indeed the best thing since sliced bread assuaged his fears that this would turn into another Beaker Incident. So for the first time ever, Rob set out to the hardware store. Having picked up a small length of modestly gauged PVC piping, it was off to the supermarket to procure some liver.
When he burst back into his room, rosy-cheeked and visibly excited, his roommate and a few of his friends began to cruelly inquire about why he had some piping and liver. Malda, somewhere between stutter and a mumble, blurted out some half-assed explaination about "Maxwell's Demon" and "passive heating". They laughed and headed on out to "throw some brews back and nail some broads". Malda waited the longest five minutes of his life until he was convinced that they were gone, then snuck down to the microwave to heat up the liver for the longest 45 seconds of his life. Sprinting with the foul organ in tow back to his room, he stuffed the liver into the PVC pipe and then stuffed his foul organ inside of it. So amazing was the sensation that it provided that he copulated with the homebrewed artificial vagina multiple four more times that evening, finally passing out with the semen-laced liver-stuffed pipe leaking all manner of horrible fluids leaking onto his sheets. With a start, he woke in the middle of the night, scrambling furiously to hide the pipe, dispose of the pearly mistake-covered liver, and then wash his sheets. His roommate and his friends stumbled in while he was washing the sheets, and they cruelly inquired if he'd shat the bed or what. He responded that he'd had a bit too much to drink and had puked on it. They gave each other knowing glances, shoved him aside and went back to their respective rooms.
So Malda's love affair with a pipe and some liver continued unabated, and things were going well: in one of his art classes, he'd even managed to tell a (not even remotely attractive) girl that he was a comp sci major and an art minor, and was patting himself on the back for a job well done. He returned to his room high on life and ready for a few rounds with the liverpipe, and so thought nothing of it when his roommate invited him over to dinner at his friend's place. He accepted, thrusted to fruition in his unholy contraption, cleaned up after himself and then took a shower and a nap before getting up to head to dinner over at his roommate's friend's house.
He showed up at six prompt, and they began by cracking open a few Coronas and watching some TV. It was Mexican night, they informed him. Nachos and tacos: what would he like? Tacos, he responded.
At the dinner table (OK, huddled around the TV), Malda was talking with excitement in his voice about how he'd unearthed some of his old disks with shareware classics like Duke Nuke 'Em, Jumpman, Tapper and Commander Keen on them and had been playing them all afternoon. One of the guys snickered and he asked if they weren't into old games.
"*snicker* Hey, uh. Guys. Do these tacos taste a little funky to you?"
"*snicker* Yeah, a little bit."
Rob looked around, not quite getting the gist of it and responded "These taste fine. Why?"
As his roommate burst out laughing, one of the guys said "Yeah. I sort of... ran out of meat and I had to make your tacos with this piece of meat I found in the garbage near your roommate's room. But don't worry. It was all wrapped up and so it wasn't dirty... COMMANDER TACO!!"
It was then, with a sinking feeling in his stomach, that he realized that he'd been fed a piece of liver that he'd been intimate with only hours before. He ran out of the apartment crying and failed his classes for the rest of the semester, getting enough counseling and living in enough denial afterwords that he managed to graduate in 4 1/2 years like a real trooper.
So why would he choose such an embarassing nickname for a website he decided to run shortly thereafter, you ask? Who knows? Brainfart, Freudian slip, self-deprecation, therapy, anybody's guess, really. On the bright side, it's one less question that those pesky reporters will have to ask him about the meaning behind his name, right?
Easy does it!
This comment has been submitted already, 276865 hours , 59 minutes ago. No need to try again.
nice one my son
(last post)
Here's another reason why Open Source is not a viable alternative as far as business is concerned - when you pay a group of programmers someone can make a final determination of the right way to proceed. In Open Source, two big egos can fork the code.
So what if it's free (in either sense)? Why would you risk your business when this kind of bickering can hold up everything?
Did you guys forget to set the clock back on the Slashdot server? Looks like it's an hour ahead of the rest of the world...
I got this from The Slashdot Privacy Watch. Check out their Open Letter!!!
An Open Letter to VA Linux Concerning Privacy on Slashdot
To whom it may concern,
It has come to our attention that Slashdot is building a detailed database of every visitor and user of Slashdot. This database includes, among other personal details, an address history which permanently records every IP address assosciated with every Slashdot user and comment for all time. We are concerned that this database is a signifigant Intellectual Property asset that may be abused in the event of a sale of Slashdot by VA Linux to a third party.
In addition, we feel that keeping a permanent and indelible record of every IP address used to post every Anonymous comment on Slashdot erases whatever hopes of anonymity that endangered or threatened users may have had. To name two examples, Chinese dissidents and corporate insiders can have no expectation of anonymously revealing civil rights violations and corporate abuse.
It is our hope that given these concerns, VA Linux or Slashdot may choose to provide an opt-out option to users, whereby users could choose not to be tracked and profiled if they so request. Some discussion has been made of a Slashdot subscription service; perhaps one revenue stream for Slashdot would be to sell Privacy Rights. For a low yearly fee, a user could purchase the right not to be tracked, profiled, and logged by IP address.
Whatever steps are taken, it is our hope that Slashdot will address the current privacy concerns in public to allay our fears and to promote open discussion.
Thanks again for creating one of the most popular sites on the Internet, and all the best.
-The Slashdot Privacy Watch Team.
I don't know is this true or not?!?
FREE DEAD PENIS BIRD!! STRENGTH TO THE OPPRESSED!!
Check out this comment and that comment.
This morning, when I logged in roughly 9 AM EDT, the comments were rated 5 and 4 respectively. Now they picked up a total of 6 "Overrated" mods. What's most strange about this is that NO BAN has been tripped.
This reeks of editor abuse. How a fairly old comment can pick up FOUR "Overrated" mods in such a short span can be explained in no other way.
You'd figure that Taco and Co. would love to see a troll change his ways and post some meaningful stuff. Apparently this is not the case. "Once a troll, always a troll" is their motto.
I was looking forward to the challenge of reforming a troll. But their shortsighted ways have proved otherwise. Fuck 'em with a broomstick, I say.
He seems to think a lot in favor of the Andrea VM.
That's ok to me, but he might want to take notice of the fact that linus didn't accept Rik's patches a lot and that 2.4.9 still had actually the VM of 2.4.5. The -ac tree was more up to date.
So for a good comparison you'll need to compare the linus and the ac tree.
Well, don't worry about that. We can get you back before you leave. (Dr. Who)
I say, include it all in the kernel and make it configurable by the user. After all, most Linux users are pretty tech-savvy, they are unlikely to wreck their machines (the way windoze lusers do every time they tweak their registry).
What do others think ?
Nobody has yet dared to speak of a Linux source fork, but this is dangerously close to one.
Is this truly dangerous? If so, why? Why not let the 2 VM's compete and the users will decide?
Better to split than stagnate.
I think this shows the power of open source software. Everyone, thought Rik's system would be great. Unfortunately it was not, so Linus used Arcangeli's new VM code. Problem solved. Stable as ever. This major OS change happened over a couple months I bet Redmond couldn't make that happen if a VM bug was found now in XP.
has VD? Is it serious, or just something like crabs? If it's crabs, I will pick them out of his pubes.
enTAH selecTAH
I'm sure you're curious as to why Slashdot, OSDN, and the rest of VA Linux's network wasn't available the weekend of Friday, June 22, 2001. I was. Then I found out.
In this expose, I will inform the reader on the why and the how of Slashdot's worst outage yet, its narrow escape from death, its darkest day in history...Slashdot's
Black FridayInnocent Beginnings
Picture this: you're young, you're gay, and you own a successful web log. But you want more. Enter buyouts by a company called VA Linux, headed by the ruddy fag ESR and his band of Open Source homosexuals, hand picked by Larry Augustin himself and charged with taking over the Linux world. Got it so far? Good.
Short of having kidnapped Linux Torvalds, VA Linux virtually was Linux. You had hit the big time. You were the loud mouth of the biggest, baddest mother of a Faggot Linux Empire ever assembled.
(Important note: VA Linux had, indeed, tried to hire Linux Torvalds away, but Linus had refused, so as not to favor any single company or distribution. VA Linux, in turn, had kidnapped Torvalds and had Rob Malda and ESR rape his mouth unil he couldn't feel his jaw. Linus also needed his stomach pumped. However, good ol' Linus, the stout Finn that he is, never gave in and so was returned to Helsinki soon thereafter.)
The IPO
December, 1999:
Stock: $253 Volume: 8,000,000
IPO time, and you were riding high. You had become a millionaire and didn't know it. ESR had been surpised by wealth. And scores of other investors and Linux nuts found themselves with swollen bank accounts. Even though the stock fell sharply soon after, you figure it was just a burp in the market, and you headed out to celebrate by sucking some cock and buying your sports cars, boy-servants, horses, bathhouses and mansions. Still with me? OK. Now fast forward a few months.
The Ugly Truth
June, 2001:
Stock: $2.53 Volume: 1,000
You have Linux companies that have lost large parts of their market valuations, Linux distros merging, IPO's cancelled...
In short, Linux was dying. If you wished to portray the worst of the present state of the Linux market, you could not do so without factoring in how the GPL works to un-employ programmers. You didn't have to be a Kreskin to see what was happening; the handwriting was on the wall: Linux faced a bleak future. Even RMS commented on the current position of his Free Software Foundation due to Linux's misfortunes, which, indeed, represented the boat everyone was in with Linux:
" I am goat-fucked! "Who Are You?
If you can sit there and read this expose and nod your head in affirmation of the events I have thus far documented, you can be only one person: Rob Malda, aka CmdrTaco, of Slashdot.
All of the events here led up to Slashdot's Black Friday, where Rob Malda almost lost everything he had left (after the VA Linux stock plummet, that is). The only thing left really was Slashdot itself, and the homosexual orgies the Slashdot staff held every Friday night. Alan Cox had since abandoned sucking the Slashdot staffs' cocks, and had returned to civillian life, disillusioned with Linux. Banner ad hits came only by means of the Slashdot staff themselves, and ESR, drunker and drunker with every stock plummet, would call and ream out Rob Malda over the phone every day, holed up in his cabin of 386s running Linux.
written: 2001/06/25
updated: n/a
Disclaimer: all content provided on this site is fiction (so far as the author knows). No claims made by the author are to be taken as fact.
trollaxor@mac.comMoving on, we can see a whole gnu world evolving around dependable/economical information systems, filled with legitimate information, as opposed to that whoreabull ?pr? stuff we were subjected to, ALL THE TIME, when the felonious kingdumb was still in ?power?.
We will NEVER (careful with that word) again use any infactdead m$bugwear. ITs too unreliable, restrictive, etc.... Plus, part of the money (after buying yachts, mansions, etc...) that those felons get for IT, is used to attempt to asphyxiate the good GNUs guys.
He alludes to some FreeBSD vs. Linux benchmarks at the end of the peice. Anyone got any links?
So in a Linux Distro install it can ask you "Are you using this computer as a (a) server (b) Multimedia (c) Desktop (d) Games (e) Mixture, and then it can install an appropriate kernel for you with the appropriate VM and preemption patches. A multimedia computer would like the low latency preemption, for example...
The linux kernel has 5 main parts. There is no reason that each part cannot be modularised away in order to get alternative functionality depending on build options and the target environment.
Do you think that Windows 2000 DataCenter has the same VM system as Windows 2000 Professional? I severely doubt it. ANd I bet that MS' in-house kernel build tool will have VM type as a selectable option, as well as many other subsystems.
This article by Moshe Mar talks about the linux kernel, and especially a recent split in the community in which/whose VM to use.
The "old" vm, the article says, had some relative impressive problems- swapoff with full memory and you could sit in front of a swap-crazy machine for 15 minutes, machines with small memory (40 MB or less) could experience sudden swap storms up to kernel 2.4.7, and the amount of swap available in the system depended on the amount of memory (as opposed to the system in 2.2 where the total memory was RAM+swap).
The VM used in 2.4 'till 2.4.10 was written (it seems from the article) in large part by Rik Van Riel. Later Andrea Arcangeli (in Sept) wrote a whole new VM, which was accepted into the kernel. Viola, new Vm in 2.4.10.
The article discusses that Alan Cox doesn't like the new VM as much, and has stuck with the old one. The author of the article seems to be in support of Andrea's new VM.
The article then goes on to discuss kernel preemption- pros (low latency), cons (lower throughput/power), and on which machines such things are important (servers, vs PCs).
Moshe Bar seems to indicate that Alan Cox is creating some kind of fork of the Linux kernel. Actually, -ac kernels are alwasys different from Linux kernels to some extent, since they include slightly more experimental code (e.g. ext3), or code that Linus has not had a chance to review yet. This way, the experimental code gets more testing before going into official Linus kernels. You can read more about -ac kernels at KernelNewbies.Org.
As anyone following LKML knows, Alan thinks that drastic VM changes should be reserved for 2.5, and so continues to keep Rik's VM going. This actually helps quite a bit as both VMs get tested and there have been several comparative tests conducted leading to improvements in both VMs. Competition in this case is certainly helping Linux.
Oh and for all you fork conspirators, here's another fact: Andrea Arcangeli also releases his own kernel releases, called -aa. I don't think any of these are considered forks; everyone understands that this way pacthes get more testing, "crosstalk" between the different flavors is a given.
Much ado about nothing, IMHO...
-Rahul
Genebrew
The article seems to come out in favour of the new VM code. It makes it sound like it works much more effectively. So, why does Alan Cox continue with the old VM code? There must be some reason why he thinks it's better, or why go through the effort of continually patching the old code into the newer kernel?
CmdrTaco like to molest animals*
*May not actually be true.
Roadkill is yummy.
Which is what he's talking about.
Best Slashdot Co
Who says an article cant be biased ?
Its an editorial for crying out loud
I have been wondering when the "masses" would figure out this is a MAJOR issue with the current linux kernel. I use the -ac patches because of various issues not in the main branch, fs support etc. on my workstations , I have to say the VM in the main 2.4.12-13 seems much more predictable and solid, I think im just going to patch against the main for ext3 until this gets hammered out. Alan seems like a very gifted indivdual as does Linus, but it seems as if Alan cant sometimes say hey I was wrong this is a better way to do it, or lets for the goals of the effort, decide and move on improving it as best we can. The author was very correct in his statment of you can fix something that sucks so it dosent suck as much but in the end it still sucks. I still dont get what alans problem is , is it because he didnt write it ? Or is it because its new and less tested ?
Excellent article in all.
Sig went tro...aahemmm.....fishing........
See this posting to LKML:
Alans talking about switching VMs in -ac kernels
Genebrew
From the article - " All earlier 2.4 kernels (since 2.3.12) needed at least the same amount of RAM in swap and then more to give you additional virtual memory. This meant that on an 8-GB server, you needed to put aside almost a full 9-GB disk just to be able to swap"
Is this accurate? For just about everything I've always gone with 512Mb of swap, regardless of whether I had more or less RAM (not that I'm technically proficient or anything). This would also be a shortcoming of Linux since it would make it a pain in the ass upgrading RAM if you needed to allocate more swap space somewhere else each time. Well I'm all for the newer VM. Simple is good.
I only wish I could find an escape.
Breaking through or breaking down.
I long to stop enduring this world's torment.
Defiance, alliance, reliance...
Eliminate, eradicate, exterminate...
I never asked to be here.
As if I were human too.
And so I cry, tasting the salt.
Liberate, emancipate, defecate...
These things have consumed me.
I do not know how I will live without you.
Emptiness was the answer.
This thread reminds me of the time I got my ass caught in the toaster.. Painful and hard to get away from, yet somehow warm.
Roadkill is yummy.
The results were very interesting indeed. Since this benchmark is too much to be handled in this article, Byte.com will post it here soon for you to read.
Can't wait to read the sequel...
you want good luck to follow you and your offspring for geneations to come? This troll has the solution for you?
All you have to do is copy this troll onto two to four of the discussion threads of your choice! That?s right! Just copy this into a new message and click "post anonymously." That?s all there is to it!
Tired of that idiot talking about geek culture! Stick one of these babies on it! And it?s good for the economy!
Marge Gentry of Cambridge, Minnesota participated, and the next day she received a large fruit basket outside of her door from a secret admirer. Unfortunately, Marge was hit by a truck the next day, so she didn?t get to the Granny Smith apples.
Commander Taco of Hole-in-the-ground West Virginia didn?t participate, and he was violated by a group of raging homosexuals. Since the gang was headed by Jon Katz, Taco had no recourse to the law because the entire town knew about their previous relationship. The unfortunate outcome is enshrined forever at goatse.cx.
So if you want to get the fruit basket and not get poked in the bread basket, just copy this troll onto two of the discussions threads of your choice. We could have this place blanketed by sundown!
Moshe Bar argues two points I vehemently disagree with:
(1) Alan made a mistake in not switching to Andrea's VM. Alan is trying to maintain a stable kernel. Switching out large chunks of the VM is the last thing to do to achieve those goals. Alan will switch in due time.
(2) The preemptible kernel is unfit for certain scenarios. Everyone I know loves the preemptible kernel. It gets good reports on lkml and the kernel news sites - Hell, it even got good comments here!
I realize this is an editorial, and I understand everyone has an opinion, but if it isn't true it isn't true. An opinion can't contradict fact.
Tim
Is probably like arguing over the "best text editor".
Best Slashdot Co
CommanderTaco
He thinks he has censored me
Yet I can still post
Michael Loves Me!
If they are to truly compete, then we should be allowed to choose between the Andrea VM code and the Rik VM code when we compile our beloved kernels.
However, a kernel fork would not neccessarily(sp?) be a bad thing, as long as the forking doesn't break the ability to run binaries. I'd hate to have to recompile my entire system just switch between VM-s.
Stop the brainwash
I don't care if you want to swear by the Linus kernel, but it gets killed by IO. I mean, come on, I'm using 2.4.12, and I can't rip a CD an play an MP3. Under the AC series, I can rip CDs, play MP3s, watch divx movies, surf the web, untar a file, and have a compile job going at the same time. Even for more usual setups, like viewing a video without doing anything else, the Linus kernel drops frames left and right, whereas the AC series laughs at it. Don't tell me I need to use mplayer with SDL, because I do.
Because I treat my Linux box as though it were a Windows box (one of the reason I switched over to Linux for everything is that the widgets in GTK are prettier than the widgets in Windows -- it's nice to have people ask me how to get their desktops to look like mine and tell them they have to install linux) and I expect it run at least as well as a Windows machine, I must use the AC series. While I'm sure that the Linus kernel has it's applications, it is simply unacceptable for replacing the Windows kernel.
Mod me flamebait or troll if you want, but I speak the truth. I have a Thunderbird-750 with 224 MB of ram, and I find it simply unacceptable when I can't run Quake or view movies under linux because of the Linus kernel. When mp3s skip because I'm moving some data around, it tells me that something is wrong with the Linus kernel. I'm glad that I had a friend who introduced me to the AC series, or I would have given up on linux. Plain and simple, politics aside, the end user doesn't care that he's being loyal to Linus the Great, he just cares that he can view that movie. If Windows outperforms linux in multimedia, he'll use Windows.
Join the Slashcott! Stay away entirely Feb 10 thru Feb 17! Close all tabs to prevent autorefresh!
I had a similar problem a while back... The company was designing a commercial software project.... Egos ruled the engineering meetings. Noone could agree on the correct way to write the modules. We couldn't even agree on what language to use. Some wanted to use ColdFusion, some wanted to use Delphi, some wanted to use Python, a few wanted to use PHP and some wanted to use Java. There was only one tech guru who could glue the whole project together...this guy wrote glue components which tied the whole system together, and bugfixed everyone's code (all in different languages). It was all going well, but the guy started to take longer and longer lunch breaks.... eventually the CEO followed him one day when he went on his lunch break ... it turned out that he was having sex with mares on a local farm. Of course, the CEO couldn't keep him on - but he still needed his technical expertise. Unfortuantely, rule of law overruled the need to keep the guy and he was fired. After that, the project just fell to peices and the company went out of business. And you thought Open Source projects had problems. Peh.
Linux Kernel Pillow Talk
(Linux Kernel Pillow Talk: Page1of1)
By Moshe Bar
October 29, 2001
And you thought the netherworlds of dry kernel engineering were free of politics, egos, and prima-donnas? Guess again. The events of the last four to six weeks and the e-mails flying to and from the Linux kernel mailing list show how Byzantine and complex the dynamics of decision finding, features design, and implementations can be. Go to http://www.tux.org/lkml/ to subscribe to the kernel mailing list, but be careful: This is a very high-traffic list. Subscribe only if you really want to follow every single detail of the Linux kernel, or instead read the weekly digest at Linux Kernel Cousin at http://kt.zork.net/kernel-tra ffic.
Sure, the lively debates have always existed. In the past there have been disputes about the Linux firewalling code, networking code, scheduler, installer, driver model, and many more. One recurrent theme has always been the Virtual Memory (VM) manager. Nothing determines the peculiar behavior, the feel -- even the ultimate success or failure of an operating system -- like its virtual memory design. Sometime during the development cycle leading up to the Linux 2.4.0 kernel, in other words in 2.3.xx times, Rik Van Riel (http://www.surriel.com), a Dutch kernel hacker working for Brazil-based Conectiva (one of the smaller Linux distributions), introduced a radically new VM code. It was based on what seemed to be new and advanced algorithms for efficient finding, allocation, and disposal of virtual memory pages requested by programs. Rik later introduced an interesting new kernel feature called the "OOM killer." OOM stands for Out Of Memory. The OOM killer attempts to locate a killable process when memory runs out in the system. Without such a feature the whole machine can go nuts or enter a vicious cycle of swapping out a few pages, realizing immediately after that those pages are needed, and searching again for swappable page candidates, keeping the kernel busy doing only this instead of letting user processes run.
Rik is a gifted hacker, and among other things he has been trying to improve the efficiency and speed of maintenance of those lists in the kernel responsible for managing all the virtual memory pages in the system. One of the main questions to address in every operating system VM code is: "How do you choose which page to steal next when there is a RAM shortage?"
In the 2.4.0 release, the Linux kernel scans the process page and decides which page to remove. The problem with this approach is that sometimes a lot of process tables have to be scanned to free just one page, or very few pages. Also, this approach does not guarantee that the pages stolen are only those that will not be needed again very soon. Some UNIXes introduced the notion of the working set; that is, the minimum amount required by a process to function efficiently. This solution is, however, limited to per-process pages only and does not consider other kinds of pages, such as filesystem caching. Stealing from these pages might in some cases even prove counter-productive. Very often in VM theory, a solution to one problem can worsen another; that's why kernel programming is difficult.
Rik van Riel and I have variously discussed another approach, called "reverse mapping," which implements a reverse-lookup between the page and process table. Once you have reverse-mapped pages, the VM can simply scan the pages for the ones to be freed. Naturally, some extra fields need to be added to the appropriate control tables to allow this reverse mapping. My own implementation has an overhead of 14 bytes and is therefore certainly a lesser solution than Rik's -- his overhead is just 8 bytes.
Other extremely talented kernel hackers such as Marcelo Tosati and Ben LaHaise have made other important contributions to the Linux VM.
However, even though all these intelligent people tried hard to make the Linux VM fast, efficient, and powerful, user reports since the 2.4.0 release indicated poor Linux kernel performance and erratic and unstable behaviors. Up to kernel 2.4.7, for instance, on machines with small memory footprints (less than 40-MB RAM), sudden swap storms could erupt which would virtually freeze the system while it inexplicably started swapping pages in out and like crazy. In some cases, the aforementioned OOM Killer would choose the wrong process to kill; I have seen the all-important init process killed erroneously. Many fringe kernel projects, like my own Mosix project or others such as Win4Lin, suffered because users accused these projects of unstable operations, assuming that a released kernel like 2.4.0 must be free of such nasty bugs. Even though the kernel had gradually evolved from 2.4.0 to 2.4.9, it was evident that the VM design was more of a liability than an advantage.
Linus himself said in a recent kernel list mailing that he wasn't happy yet with the VM. These problems were enough for many Linux shops to resist the migration to the 2.4 kernels and instead continue using the 2.2.19 kind of kernels. Obviously, compared to 2.4., the 2.2. series has many shortcomings -- like no zero-copy networking, the division of page cache and buffer-cache in filesystem operations, big spinlocks (serializations of kernel execution paths for computers with more than one CPU) for many parts of the kernel, and so on.
A simple C program like the one below shows how kernels up to 2.4.9 had problems dealing with stress workloads on the VM system. If, after running this program, you turned the swap partition off with swapoff, your server or workstation would become totally unresponsive for up to 15 minutes.
Back in February 2001, I ran an informal and unscientific benchmark comparing FreeBSD 4.1.1 to kernel 2.4.0 (visit http://ww w.byte.com/documents/s=558/byt20010130s0010/) on exactly the same hardware and with exactly the same subsystems versions (MySQL, Sendmail, Apache, and others). The results clearly showed that, indeed, there were major problems with the efficiency and speed of the early 2.4 kernels. A New VM
Then, on September 24, with the kernel standing at version 2.4.9, everything suddenly changed. Andrea Arcangeli, an Italian kernel hacker (read my interview with him two years ago at http://ww w.byte.com/documents/s=287/byt20000229s0008/) and a very prolific contributor, decided that enough was enough. He sat down and in one of those marathon hacking bouts completely rebuilt the VM from scratch. In short succession he sent to Linus Torvalds over 150 patches to the 2.4.9 kernel, to implement a new VM engine. This is an extremely remarkable feat. A VM is a major piece of software and by nature very complex. One needs to satisfy many opposed objectives: Simultaneously efficient handling for server-type loads and interactive-type loads; ease of implementation and at the same time, optimized use of every last and small feature of the CPU. The VM must also be able to run well on Intel CPUs spanning 4 or 5 generations, as well as on AMD chips, Alphas, MIPSes, Sparcs, ARMs, and what have you. Andrea, by the way, does all his development on a Compaq AlphaServer with 2 500-MHz CPUs and 3-GB RAM.
Out of the blue, Linus accepted the new VM and incorporated it into the official Linux kernel tree.
Recently, I spent two days with Andrea giving speeches. During the two days, over many bottles of beer, we had plenty of time to discuss his new VM. I was mainly interested in how the new VM affects Mosix. Because Mosix must migrate virtual memory pages belonging to the program's address spaces between cluster nodes, it is important to correctly understand the VM and interface efficiently to it.
Specifically, Andrea took exception to the following problems in the 2.4 VM:
The new VM is much simpler and faster. Let me explain how it works.
The old 2.4 VM had a major design problem that manifested itself mainly when freeing physically dirty pages (remember dirty pages are the frames of 4-KB memory in the RAM whose contents have been modified by one of the virtual memory pages residing in it). The last owner of the page (usually the VM, except in swapoff) has to clear the dirty flag before freeing the page. When being swapped off in swapoff it may be a little more complicated -- we may need to grab the pagecache_lock to ensure nobody starts using the page while we clear it.
So, Andrea went and did the following: All physical pages are now divided into active and inactive pages. These two are further divided into dirty and clean for both active and inactive. When the active dirty pages become about 66 percent of the total number of pages, the VM starts to scan them for the oldest ones to be put into inactive dirty and then, later still, from there to the swap when memory becomes tight. This part is very central to the new VM and its simplicity is...well, simply stunning.
This elegant mechanism totally changes the behavior of the 2.4.10 kernel under heavy load and also makes for much better predictability of the system. Another very important change is that the swap is now additional to the RAM, just like in 2.2 times. All earlier 2.4 kernels (since 2.3.12) needed at least the same amount of RAM in swap and then more to give you additional virtual memory. This meant that on an 8-GB server, you needed to put aside almost a full 9-GB disk just to be able to swap, similar to some versions of Solaris or other UNIXes.
Finally, the page scanner doesn't page scan if there are theoretically no freeable pages, whereas before it did. Oh, and the OOM killer never really worked, so Andrea disabled it, as I did for all my kernels. In 2.4.12 it is enabled again; this time, however, it works much better. Try it with the above program to see it in action.
Arcangeli's VM is stable, acts predictably -- something that the old VM never really achieved -- and it makes the swap space look like it did in 2.2 days. Additionally, the design is much simpler and easier to understand. People will catch up fast with it.
However, many kernel hackers disagree. Upon the release of kernel 2.4.10, a virulent and sometimes aggressive debate flamed up, with many people trying to show why one of the two was a good VM and the other not. Some comments got a bit out of control, and only in the last two weeks or so has some calm been restored.
However, one nasty side effect stays. Alan Cox, the number two man after Linus Torvalds, does not yet like the new VM and in his own kernel tree (called the "ac tree") he still continues to use and patch the old VM. As a consequence, users and system administrators now find themselves facing two very different kernel trees to choose from: the official Linux tree and the Alan Cox tree. Quite often, latest patches to drivers and new features are only in Alan Cox's tree. Those who want to go with the official Linux source code may find themselves unable to apply the patches due to the different VM code all over. It is acceptable for the two trees to be different for a few days on such important subsystems like VM, but it is not acceptable to have them different for months and across many kernel versions.
Nobody has yet dared to speak of a Linux source fork, but this is dangerously close to one.
It became obvious that the VM up to 2.4.10 was a design liability. You can try to fix something that was designed badly, but it will never become a beauty. I think Linus' decision to scrap the old VM and go with the Arcangeli VM was courageous and right. Having a functioning and stable Linux box should not be deferred to 2.5 when we can do it already with 2.4. Kernel Preemption
But apart from the VM issues, there are other lively debates in the kernel community. There was an interesting interview at h ttp://kerneltrap.com/article.php?sid=328&mode=thre ad&order=0 with
Robert Love, who is leading one of two projects trying to make the
Linux kernel fully preemptible. Making the kernel preemptible means
making it possible to interrupt whatever the kernel is doing (say,
executing a system call) to process some other outstanding task and
then return to its original task. Linux, as a multiprocessing OS,
obviously always did that for user-land processes. However, many, just
like Robert Love, feel that the fact that Linux up to now would not
let itself be interrupted contributed to poor latency. Latency
describes how quickly you can expect a response from your kernel when
you actually need something from it. Note that Linux is not designed
as a real-time OS (though there is at least one Linux real-time
implementation somewhere), and therefore does not explicitly guarantee
latency. User-land programs must be aware of this as, especially with
kernel preemption, latencies can be very unpredictable.
Theoretically, an OS will answer faster if it can be interrupted. What does suffer from kernel preemption is the global throughput. If you have a task that gets n seconds within the kernel to complete (let's say executing a given system call takes 0.005 seconds), then all the interruptions add some overhead to switch from one kernel task to another. So, finishing the execution of that system call (in our example) will finally require n+op where p is the frequency of switching and o the static overhead for one switching operation. Notice that kernel context switching does not invalidate the CPU cache, and is therefore not as expensive as process switching. However, kernel preemption will surely lead to a higher rate of switching from kernel space to user space, because upon preemption the scheduler might decide to give higher priority to a user process.
In other words, kernel preemption does decrease latency but slows down overall throughput. It's the math: nothing to be done against it.
Furthermore, in his interview, Robert Love heavily criticized Linus Torvalds for adopting Andrea Arcangeli's new VM in 2.4.10 and dropping the old van Riel VM.
Well, I did try the patch with kernel 2.4.12 and with pre13. While accurate measurement (which Robert Love provides with the preemption kernel patches) does indeed report an improvement in latency, for the life of me I have not noticed it on an empirical basis.
I really do appreciate Love's work, but I do not fully agree with some of his comments in the interview. First, as Linus himself said, if latency sucks in the kernel then we should check why it sucks, with or without preemptive scheduling. If the latency is bad in the stock kernel, then it should be fixed anyway.
The preemptive kernel 2.4.12 worked fine on my laptops and on my SGI 550 workstation where I do interactive work. The MP3 player very rarely skipped beats when doing heavy background work such as kernel compiling or opening large files in the editor. But for my servers and clusters, the decrease in performance and the unpredictability of latency is a problem. Also, some important patches will not apply to a Love-patched kernel. Mosix, the clustering kernel extension, does not patch correctly, and neither do some versions of the LIDS intrusion detection system.
It is up to each individual user to decide whether or not to use the patch, but is important to understand the implications of using it. Linux and FreeBSD Revisited
Upon returning home the other week after meeting with Andrea, I went to my lab and searched for the disk images of the server comparison I ran back in January of this year (of FreeBSD 4.1.1 versus Linux 2.4.0). I took the Compaq ML500 server I have been reviewing (2x 1-GHz CPUs, 2-GB RAM) and upgraded both the FreeBSD disk image to 4.4-Stable and the Linux version to 2.4.12. Then, I changed the memory down to 192-MB RAM so as to stress the VM system more. I also upgraded to the latest stable versions of Sendmail (8.12.1) and MySQL (version 3.23.42). Finally, I compiled everything with the latest version of gcc, 3.0.2, and tuned the two instances to the best of my knowledge (softupdates and increased maxusers for FreeBSD, and untouched default values for Linux).
The results were very interesting indeed. Since this benchmark is too much to be handled in this article, Byte.com will post it here soon for you to read.
The story of this article is that the 2.4 kernel has finally grown up with the 2.4.10 release. Not many users outside the relatively small kernel community realize that. Now you know about it, too. Spread the good news and immediately install 2.4.12 on your busy server. The server will thank you for it.
Moshe Bar is a systems administrator and OS researcher who started learning UNIX on a PDP-11 with AT&T UNIX Release 6, back in 1981. Moshe has a M.Sc and a Ph.D. in computer science and writes UNIX-related books.
For more of Moshe's columns, visit the Serving With LinuxIndex Page . Page1of1
It is quite simple
Haiku should not be funny
Try a Senryu
I have a large ext3 partition to store all my data and a 256MB partition of swap. I also have 384MB of RAM. Occasionally I'll hear the had drive grinding away like it's using the swap. I check and it is using the swap, but my real RAM isn't full yet. It's actually far from full, like 100MB free. I know if I don't have a swap partition it wont use it, but then I'll run out of memory sometimes, like when I have a huge pile of applications open at once. It really needs some work. I don't care about forks or anything, just make it work better.
The GeekNights podcast is going strong. Listen!
Does it support malloc correctly now (returning NULL when out of memory)?
The old linux VM seamed incredibly insiteful to me. It is far superior to any other operating systems that I've seen out there. It keeps good stats on how long its been since the page was last accessed and swaps pages accessed least recently first and gives a bonus to clean pages.
:) Don't cause so much strife by switching key elements of the kernel in a stable tree.
The problems they describe is a fault of the schedular for picking a process that had a lot of pages just swapped. If the memory gets low, you need to try to give processes that have all their pages in memory and aren't blocking priority so they can do their thing and hopefully free some ram up. Thats why the guy made the kill patch it sounds like, but it should rather suspend than kill. There is nothin you can do if you are just running too many processes anyhow though. You have to swap out a bunch of pages for one process to swap in the pages from another. It's called thrashing, and the only thing you can really do is give one priority over the other, or get more ram. Hey, rams cheap, buy more if your gonna use that much
I don't know how the guy pulled it off with the fact that paging is VERY arcitecture dependant. (the kernel pretty much ignores some of the features of X86 (segmentation) to make it more portable) I think their both right, but maybe linus should call it 2.6 so people can have some time to make their unofficial patches still work with it and have some time for testing before its used in a production environment.
Karma Clown
As if recent events (attacks on the World Trade Centres, Anthrax Attacks) raising our collective conciousness into a state of terror wasn't bad enough, Halloween is just around the corner. Soon, a new terror, a spooky terror, will unfold as the souls of thousands of innocent civilians who died, raise from the dead on All Hallow's Eve to terrorise yawl's neighborhood. And you people have the gall to be discussing the Linux VM subsystem???? My *god*, people, GET SOME PRIORITIES!
The angry souls of the recent dead could give a good god damn about the Linux VM subsystem, instead preferring to wander the areas where they met their untimely ends, seeking out unwitting victims for retribution. By all means, on the evening of Hallowe'en, try to avoid the area around Ground Zero of the WTC, the area near the Pentagon, and the crash site in Pennsylvania unless you don't mind becoming a victim of terror (a very spooky terror indeed), yourself.
You have been warned!
Hearing this, I want to upgrade, but the only ext3 patch(I use redhat 7.2) I have found is for the 2.2.19 kernel. Does anyone know where I can find one for 2.4.13?
I'm not kernel hacker, but this seams like a good opertunity to implement a pluggable vm archetecture. This would allow this debate to become moot, as well as allowing folks to load vm's that were talored for their specific work loads.
The Linux kernel is already forked. Linus' and the AC kernal have never been fully synched in even the 2.2 branch let alone the 2.4. We all know AC excels in maintaining the stable branch, but with 2.4 the handoff from Linus to AC has not been clear. AC has eroded Linus' authority. This is not in Linux's long term interests.
IMO both Rik's code (RVM) and Andrea's (AVM) were accepted prematurely, and Linus's ADD is the root of the problem here. Everyone thought the 2.2 VM was broken, so he jumped on RVM when it really hadn't received adequate testing with various workloads. Then, when that didn't work out, he did something even worse by jumping on AVM in the middle of a "stable" kernel series when it was totally undocumented and even less thoroughly tested than RVM. That's just bad software engineering, regardless of the quality of Rik's or Andrea's work.
Ideally, an "old-fashioned" alternative to RVM would have been maintained throughout the 2.3 process, as a fallback in case RVM turned out not to be ready for 2.4 - which was in fact the case. But this wasn't done, there was no alternative, and so RVM became the basis for 2.4. Once that decision was made it should not have been unmade by replacing RVM with AVM. Andrea's work should have been in the 2.5 tree, which should have been opened a long time ago to deal with precisely this sort of situation. 2.4 is not the last Linux kernel that will ever exist. We don't need to make it perfect. It would be far better to admit its imperfections, band-aid them as best we can, and try to get a head start on creating something better for 2.6. What we have instead is error on top of error, "not ready" replaced with "even less ready".
To clarify, I have nothing but the highest regard for both Rik's and Andrea's work. Obviously they have different ideas and attitudes. Rik has drawn on many sources in his design, resulting in a system that is both very advanced and very complicated. The process of reining in the complexity is still incomplete, but I still have hope that some day Rik will be able to come up with something that's really awesome, and he has always documented his ideas thoroughly. Andrea, by contrast, is much more pragmatic; he wants something that works now even if it's somewhat more limited in scope (e.g. by being almost impossible to reconcile with NUMA). The dark side of that "pragmatism" is that Andrea has skimped on non-code activities such as documenting or explaining the basic ideas on which his system is based. Nonetheless, both have done great work and should continue to do great work...in the 2.5 tree.
Slashdot - News for Herds. Stuff that Splatters.
Good, this would be an interesting benchmark.
Then, I changed the memory down to 192-MB RAM so as to stress the VM system more.
ok, this is fair, but you should also run with the same memory configuration you originally ran.
I also upgraded to the latest stable versions of Sendmail (8.12.1) and MySQL (version 3.23.42). Finally, I compiled everything with the latest version of gcc, 3.0.2, and tuned the two instances to the best of my knowledge (softupdates and increased maxusers for FreeBSD, and untouched default values for Linux).
NO!!!! why would you do this? Don't you want to know how the earlier linux/FreeBSD kernel compares to a later ones? Now instead of modifying one variable you've modified 3,846 variables. It's going to see if any improvements in FreeBSD/Linux are due to an updated kernel, compiler, mysql, etc etc. Go back to your original setup and only change the kernel, since I believe that's what you want to benchmark.
Why is so much time being wasted developing and re-developing a VM system when a very stable and robust VM system has existed for years in the FreeBsd system. Anyone thought about using that code (as per the terms of the license) as a starting point instead of this senseless writing and rewriting and sub-par performance?
You know you love the jizz & juice!
Troll Tuesday 2001.
--The Mess
The machine freezes EVERY time because of memory shortages. The kernel can't allocate pages for incoming network traffic, causing a backlog, causing processes to hang, causing further backlog.. then powie an unresponsive machine.
This was a common problem with kernels from about 2.4.1 up to 2.4.9 - the machine would gradually eat into swap further and further, failing to release no-longer-used swapspace, until it would go Out Of Memory (OOM) and attempt to kill the process that was eating all the memory. Frequently it would pick the wrong process to kill (sometimes even killing init) or would end up deadlocking.
I agree with you - that is no way for a virtual memory system to behave.
However, the Linux development process moves quickly once people get annoyed enough to actually do something about it, and that's precisely what has happened. Starting with 2.4.10, a new, simpler VM system has been used in the official Linus kernels, and I can say with some confidence that it has solved all the major problems with the 2.4 VM system, and continues to get significantly faster with every release.
If you haven't actually tried a new kernel yet (and from your problems it seems that you haven't), I suggest that you do - it's made the world of difference for me.
At the same time, the old 2.4 VM has lived on in the -ac series of kernels, and has become a great deal better there - some competition has made a big difference. Almost all of the major areas where it behaved badly have been fixed. However, my own impression is that it is still somewhat slower than the new VM.
The choice is yours which you want to run - my own recommendation would be for the new VM in the official Linus kernels, but others may disagree.
[OOM Killer]
NEWS FLASH they took this feature out because it was buggy.
Umm, no they didn't - it continues to exist in both the new VM in 2.4.13 and the old VM in the most recent 2.4.13-ac kernels. It does, however, now work correctly in both VMs. There are some philosphical arguments over whether killing processes is the best way of handling an Out Of Memory situation, but it is surely better than deadlocking the box, which is what most VM systems (including the famed FreeBSD's) do when OOM occurs.
It's been getting better with each dot release but it's still nothing you'd want to bet money on.
All I can say is that the new VM works great for me and lots of other people, even under extreme load. I can certainly understand your pain if you're using an older 2.4 kernel, but please try a recent one - the difference is astounding.
If you're still having problems with recent kernels, then I'm sure linux-kernel@vger.kernel.org would love to hear from you - and would certainly be a lot more useful to you than ranting on Slashdot. Getting the VM right is now priority number 1 for the kernel hackers.
Despite this major issue, a Linux based system is still more stable and in most cases faster than Windows 2000. Also, like the article mentions, take into account that Linux runs many different types of processors. Linux on SPARC is good and 21264 Alpha performance is mind-blowing. Keep up the good work.
I have read most of Moshe's articles at Byte and he hits me as the worst kind of dilettante; can't keep his mind made up, and is more interested in fashion then really helping solve any problems. It all about being on the inside and looking clever then anything else.
Am I wrong? I kinda hope so."think of it as evolution in action"
We manage a LOT of servers and when we went gold with our new appliance, we decided on the 2.4.8 kernel (because it fixed a bunch of smbfs bugs that would slow down smb transfers to a few bytes/sec). Well, we started getting heavily loaded machines going offline. I'd arrive on site and not even be able to log in on the console. Hard drive light would be on steady and it was just thrashing itself to death.
Problem was, there machines, although supporting a lot of users, weren't supporting them simultaneously. It seemed that they would creep up to their limit of memory and once they arrived at the edge, the kernel couldn't figure out what to release, even though the machine only had one or two users at the time. It was the straw that broke the camel's back. So we're throw more RAM in the sucker, and wonder why the hell this never happened on the 2.2 kernels. I'd mutter that it probably had something to do with the VM, but since I'm not on the kernel mailing list and I hadn't heard any grumblings on any linux based news sites, I figured the problem was us. Well well well! This is a HUGE bug and I'm surprised that it was not addressed more publicly.
My desktop does this too from time to time. Swaps itself to death on a 2.4.8 kernel as well. Open staroffice 6.0 beta, gimp, and bunch of other random things and it eventually goes unstable. Never happened with the 2.2 series.
A compilin' I will go, a compilin' I will go!
Toddlers are the stormtroopers of the Lord of Entropy.
of course they keep the user info in a database...
anybody who this comes as a surprise to shouldn't come here anyway, because they must be really stupid.
And they have the right to keep information like who is banned, why, and etc.
Otherwise, leave..... i see no hypocrisy. they don't make you tell them your real name address and credit card number like the real privacy abusers.
This statement is pure bullshit. FreeBSD's VM is lightyears ahead from Linux' one. Same for SCSI subsystem. I once ran Oracle for Linux in a FreeBSD box with 2MB or RAM and 8MB of swap with 1500 concurrent users with no problem at all. A Linux box with 256MB was unable to handle it. Andrea wrote the VM in one month, that's the quality of Linux kernel code, re-code every release because last version sucked ass. Nice, really, nice.
The word you're thinking of is "THAN". How many times are you going to make this mistake? I don't think I've ever seen Taco post a story using the word "than" properly. It's always "then". Real professional site you got here, Taco-man.
Which kernel will RedHat use for their next Linux distribution?
Isn't that basically Microsofts argument as to why its ok for them to be a monoply? That competition is not efficient in the software industry?
In Peter Denning's classic paper, The Working Set Model of Program Behavior, Denning concluded that paged virtual memory was, at best, good for an effective 2X increase in memory size. When he wrote that paper in 1968, memory cost about a million dollars a megabyte, so a 2X increase was worth the headaches of a VM system. Today, with memory at a few hundred dollars a gigabyte, it looks less attractive. It's not that expensive to double the size of RAM today. It can be cheaper than adding a fast disk drive just for paging. Uses less power, too.
Disk as backing store gets worse as RAM gets faster. When Denning wrote that paper, the fastest backing devices (drums) rotated at around 10,000 RPM, for a 6,000 microsecond access time, and core memory cycle times were around 4us. So main memory was 1,500 times faster than backing store. Today, RAM cycle times have dropped to around 0.020us, but disks still top out around 10,000 RPM, making main memory 300,000 times faster than backing store. Thus, the relative cost of a page fault has increased by a factor of 200. This makes VM far less attractive today than it used to be. It's not getting any better, either.
The price of having virtual memory is terrible performance once paging between active processes starts. That's called "thrashing". On a server which is processing short transactions, you're much better off throttling at the transaction launch point (as, for example, where CGI programs launch) than going into thrashing. This requires some coordination between applications and memory allocation, but where most of the memory is used by Apache and its child processes, that's a viable option.
The main value of VM today is getting rid of dead code at run-time. A basic problem with shared libraries is that you load in the whole library, needed or not, when you need any function from it. This wastes memory, but after a while, the VM system will notice the unused pages and quietly release them. On a larger scale, the same problem is seen with dormant applications, a problem which has gotten totally out of hand in the Windows world, where far too much unwanted stuff launches at startup. VM ejects them from memory. That's what VM is really used for today.
So if you're actually page-faulting, VM is hurting, not helping.
I'd argue that it's time to go back to a swapping model - all of an app has to be in before it runs. That's where UNIX started; virtual memory didn't come in until 4.1BSD. But in support of this, apps need more information about the current memory situation. And they should be able to designate parts of their space as pageable, at least at the shared object/DLL level. Only a few apps (web servers, window managers) need much memory awareness, so that's feasible. Throttling needs to occur at a smart place, just before allocating substantial resources, such as CGI process launch or connection opening. By the time the VM system becomes involved, it's too late; resources are already overcommitted.
The big win from this is repeatable latency at the memory level. With all the interest in reducing kernel latency at the CPU level, it's time to address it at the memory level too.
QNX, the real-time OS, is worth looking at in this regard.
Well, I read his first article and I switched to FreeBSD, so I don't care about those flame wars anymore :)
While I normally take issue with the way Linus bullies a kernel issue based on what he perceives is technical merit, I have to agree with him on this one. First, if a better way of doing things has been found, even though it's in the middle of a stable series, it should be changed in order not to propagate wrong coding. I've been coding for a long time now and I still believe that if an error, bug or better coding scheme is found, it be implemented as soon as possible.
The problem with leaving the change till the 2.5 series is that the 2.5 series is nowhere in sight and development kernels usually take more than a year to cycle through (no matter what the kernel hackers say). The fact that 2.5 hasn't even begun may be an indicator of how long 2.5 will take to finish.
RedHat ships an -ac kernel with RH 7.2, I think 7.1's was also an -ac kernel.
Not pure -ac kernel, though, like most major distributions they also pull stuff from Linus and other kernel trees (there are others) so what they actually ship is really the "RedHat" tree.
DNA just wants to be free...
I really hate to say this, but I'm wondering if jumping ship to freebsd (etc) makes sense. I've been a major linux supporter for quite a long time, but I know that the *bsd guys have had their act together (good smp, good networking under load, etc) for a long time.
would it be all that crazy to adopt the VM system from the 'establishment' (bsd)? frequently the linux codebase DOES borrow from bsd. why is the VM system all that different?
--
"It is now safe to switch off your computer."
Oh an djust becuase there are two different vm's (Alan vs Linus) does not mean that there is an official fork. As well there is already companies like Redhat and Suse that release their own patches to the linux kernel that make them unpatchabe against the main tree.
Only 'flamers' flame!
IBM has said that they will open source any part of AIX that we would like. The AIX VM works well under high stress. Obviously it could not just be put as-is into Linux, but there must be a lot of good ideas/algorithms in it that could--arguably should--be moved to Linux. Why isn't anyone looking at doing this?
This is a link to the kernel-traffic discussion with details and basic benchmarks: here!.
http://blog.grcm.net/
Another area of Linux with problems (albeit mostly non-kernel related) is the threading model. For Christ's sake, grow up. "We can do kernel context switches fast enough for 1:1 threading model!". Pure hubris. Linux is unusable for any serious Java development.
Part of the problem with the design and redesign of the linux VM is an insistance with sticking with a few core design points that make it 100x harder to write. For instance, virtual memory overcommit spawns a whole bunch of ugly problems that must be solved in order to create a stable and fast system. If the core development team spent some time looking at past OS research then they would completly change their design criteria and a bunch of these problems would go away.
Another perfect example is the OOM killer. If the VMM could properly balance the workload (and it didn't overcommit) then there wouldn't be a need for code to select the 'correct' process to kill. Since the VM cannot balance correctly, the kernel developers spend massive amounts of time trying to write an OOM that functions correctly in the case where the VMM is wedged. This time would be better spent fixing the VMM so it never got into these states.
Why not just use the 640k limit and we wouldn't have any of these problems...
KidA
"Karma can only be portioned out by the cosmos." -Homer Simpson
I would've thought there would be many posts comparing and contrasting (and most likely flaming) the Linux and Windows' VM architectures. As a primarily Winddows user, this discussion is highly interesting and I would like to learn more about how these issues (VM page allocation, thrashing, kernel pre emption) are handled in NT land.
If he says so.
I have never experienced such "system freeze" swapping with Linux, since I started about 4 years ago. Felt like my Win95 days with 8Mb RAM were back.
With 256Mb RAM, I only need to open up a ~1Mb pdf with Acroreader to experience some extreme disk thrashing that takes an age to give back my system. Something I've never seen happen with Linux.
War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
would be hidden behind some interface, and that it's just a matter of plugging in a different implementation. If it's that difficult to switch between the two, something's badly broken in the design.
But what do I know, I'm just another Slashdolt, who couldn't code his way out of a paper bag if I tried.
I want to hear more about music videos, and computer games.
1G ram is under $100. I just turned the VM off. Some folk I know that do large clustering solutions are doing the same thing. Who is using VM these days? And is it really worth the headache, rather than spending a few dollars?
I guess he forgot to put his C program to stress test the kernel inside of a
block. Anyway, the #include line is missing a header file to include. It's supposed to read: #include Hope this helps.A musician without the RIAA, is like a fish without a bicycle.
Gosh! When are you Slashdot people going to learn the difference between the words "then" and "than"? Learn proper English!
They mention LIDS (Linux Intrustion Detection System).
My question is..
does the LIDS actually do *any* intrustion detection, or does it just prevent modification of certain files?
AIX's VMM assumes way to much about the underlying arch of the machine its running on. Combine that with the fact that it doesn't have a hardware abstraction layer and large pieces of the VMM drops to assembly. The result is something that would be absolutly useless for linux other than maybe as a case study of how not to write a VMM.
Wrong. The problem is that Java (at least pre-1.4) is unusable for serious network programming due to its stupid one (or two, depending on the situation) threads per connection model.
There is a reason that professional Java software like BEA weblogic and Websphere use JNI networking layers written in C instead of the brain-dead java.io/java.net APIs.
The linux kernel's threading support is optimized for software not written by absolute morons: software that has roughly the same order of threads as CPUs.
Get a language that doesn't suck (even if that language is Java 1.4, I have no favorites here), and stop whining because other people won't make well written software run more slowly to help out your broken software.
My father kept all his old Bytes from the late 70s and early 80s... I used to spend my afternoons going through them, one by one, reading the technical details and discussions...
... items that were absent when the paper publication finally stopped a few years ago.
That article is exactly the kind of thing that would have fit right in with the old Byte. I hope they continue such things in the future.
Hey,
I'm a newbie to linux kernels, and I dont understand a lot of the termonology flying around. I.E. What is a spinlock? What's the difference between a dirty page and a clean page? What is a page? Anyhoo, is there a page that explains some kernel terminology and maybe gives a conceptual map of it?
Thanks-
Buck
-Bucky
So why doesn't Linux just copy BSD?
The code here seems rather incidental, it's the design that is more important. But why not copy a good design? Or do one (or both) of the contending VMs do so?
I'm posting this from a 5 year old Toshiba laptop: P120/32Meg RAM ( 64Meg swap indeed) running Linux. It does everything I need, and I just don't see a reason replacing the machine. Linux saved one of two years of extra lifetime for that machine, and I saved some cash
Strangely enough a lot of people tell me to buy a new one because it is "obsolete"....those are Windows users, yup, no kidding. Obsolete? I can play MP3's while working (okay, I admit, in mono...)
I didn't even consider trying OpenOffice on this machine....now thanks to your post, I'll give it a shot! I really start to wonder why I bought a huge desktop machine. Besides, that one has 768Meg RAM and Windows 2000 still uses swapspace. I don't know why: it has more than enough memory available. It probably just swaps out some unused DLL's.
Ahhh...the great dumpster continuum. Many a free computer will be found there. -- sowth (748135)
I am not a kernel hacker, but I make heavy use of many of my system's resources. My target systems have a least 2 Processors, 2GB of RAM, the lastet NVIDIA GPU and fast SCSI disks, and I try to render 3D scenes fast, utilizing every processor, using a lot of IPC, while asynchronously loading hundreds of megabytes per minute from disks and the network.
However, everything I program runs outside the kernel, as a user mode process, and my understanding of one of the main tasks of the kernel is to isolate (misbehaving) user processes, but it seems to me it was never so easy to take out my LINUX-boxes as it with the current 2.4.x kernel series; my app runs fine for days, but after I quit it there is a 50/50 chance the system survives. I get kernel oopses or the system freezes completely; sometimes just the X-Server hangs; when the system continous to operate, sometime the gcc freaks out with internal compiler errors, which disappear after reboot. And I had to switch from Reiserfs to ext2 because of serious filesytem corruptions, and NFS is even more troublesome than usual under LINUX My system logs are full of kernel messages like:
Sep 25 15:02:21 dellomat kernel: Unable to handle kernel paging request at virtual address 002200dc
Sep 25 15:02:21 dellomat kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Oct 30 18:13:38 dellomat kernel: kernel BUG at page_alloc.c:81!
Call Trace: [__delete_from_swap_cache+126/132] [__free_pages+27/28] [delete_from_swap_cache_nolock+106/108] [free_page_and_swap_cache+115/196]
I am not absolutely sure I can blame the kernel, but I can rule out the hardware, as the problems appear on at least five different platform on a about ten systems. And the severity of these problems varies with kernel versions; the most stable we currently use is 2.4.8, and 2.4.12 was almost unusable for us. Before reading the article, my favorite suspect for causing these problems were the NVIDIA drivers, but now everything points to some vital parts of the kernel causing these problems. I do really appreciate the great work many people are doing on the kernel, and I think I can't criticise someone for doing me a favour by working on a free OS, but I am really concerned that all this might seriously endanger the only alternative OS that runs one a broad range of state-of-the-art hardware.
I hope that there are not too much people having the same trouble, and I will get rid of these problems without becoming a kernel hacker.
While writing this I became aware of the "Linux Test Project", and I am going to download and run a bunch of tests now; I never thought this would be necessary as I do not do kernel development, but is seems to be the best thing I can do towards getting rid of the problems. (Except whining on slashdot, of course ;-)
p.
Without order, nothing can exist. Without chaos, nothing can be created.
You can't avoid overcommit in a usable system.
This is absolute BS. Most OS's don't allow overcommit. Linux is one of the few that does. Basically what this means is that Linux allows an application (or the total system workingset) to exceed the amount of ram+pagefile available. This is completely unacceptable. When an application comes up and asks for memory that is being consumed it should have its memory allocation fail. That is why malloc has the option to return NULL. If it returns NULL then the current application has to deal with it. If it cannot continue then it needs to write to a log and exit. Otherwise the kernel is forced to choose (very ungracefully) what process gets killed when it is unable to find space to swap a page out to get the required page into memory. This is VERY BAD!!!. Sometimes the only pages in RAM are critical to the operating system. Say for instance only kernel pages and pages for init are in memory. Now which do you kill? Hmmm pretty bad hu? Its not a solvable problem. This is why it should be avoided. This discussion has come up in the past but the base kernel developers always say something like "well at that point your system is pretty much screwed up already". Duh! but its not a fatal system reboot. The system that isn't overcommited can unwedge itself with proper scheduling and memory management given enough time, Linux has to reboot. Ever see W2k pop up a dialog that says "Running low on memory increasing page file size"? Thats not because the machine is 'low' on memory. In fact the whole working set may still fit in memory. What that means is that an application just asked for more RAM than will fit into the pagefile if the OS needed to swap it completly out.
VM isn't just "virtual memory". It is the entire memory management system for processes. So, if you want to have more than one process running at a given time, you need it. what you and your friends did is not 'disable VM' but rather 'disable swap' which are easily confused. Disabling swap may be a good idea, but turning off VM will give you DOS, which is a BAD idea.
Okay, I was shooting for funny, but whatever works. :)
Who swaps these days, anyway? :)
- undoware.ca
Since linus is obviously more open to sudden changes in codebases divisible by two than he used to be, I think the obvious next step is to rewrite the kernel in C# as a
I mean, I know IE has some sort of swapping system... my disk grinds every time it starts up....
- undoware.ca
In the past you probably needed a relative swap/ram ratio. Not, it really depends on the case.
If you usually "touch" the swap partitions you can do well with 120mb swap. If you run a big webserver then you need a lot of swap just in case (it can't hurt). If you reach the post where it's not uncommon to need that load of swap, then you need more ram!
In the end, how big your swap must be depends on what you are using the machine for. On a 1 GB Linux system for desktop use you can leave it with NO swap at all.
--
unfinished: (adj.)
Since the issue has been brought up. How do all the other "Unices" impliment their VM? And how does that compare/contrast to Linuxes?
I've been reading through the current threads concerning the "VM", and the different sides of the coin concerning Alans "VM" opposed to Ricks.It suddenly dawned on me that both have put forward commendable work as was mentioned and are doing a great job, but as the saying goes "two heads are better than one" and exactly that! I think that if these two guys got together and formed a joint project drawing equally on each others positive aspects and combined "ideas" Then we would have a project worth writing home about.
I have tried almost all kernel versions from 2.2.13 till 2.4.14-pre5aa1 including ac series. When I switched from 2.2 series to 2.4 I have got perception that my system became slower, not to mention that 2.4 VM put every process to swap, using RAM mostly for buffers and cache (e.g., when switching to different WindowMaker workspace you see that xterms on that workspace already got swapped with no reason, i.e. at no memory shortage). And now it seems like those good days of 2.2 perfomance is coming back. Maybe it is not perfect yet, but it is really getting there. It is again possible to run system without swap if you have enough RAM.
If you implement a VM that way, launching a program takes a very long time. You could, in theory, start out with nothing in memory and page-fault the program in. This requires one disk access per active memory page until enough is loaded for the program to run. The very first virtual memory system, for the Burroughs 5500, worked that way. It worked OK for batch programs, in an era when batch programs ran for minutes or hours, but was terrible for interactive work.
Most operating systems today load most or all of a program at startup, let the app run for a while, then release the unreferenced pages. Deciding how much to load at startup is an interesting question. The BSD UNIX guess was the first N bytes of the executable, where N is a system tuning parameter. (What, exactly, does Linux do about this?) This is a mediocre guess, but an easy one to make. It's OK for long-running programs, but terrible for short-lived ones. Short-lived programs don't run long enough for the least-recently-used page info to become useful. If paging occurs in this situation, the pages removed are ill-chosen, since the LRU info isn't useful until the program has run for a while.
Much of the memory-demanding things servers do look like short-lived programs. CGI programs and Java servlets are short-lived programs. So they're a bad case for a VM environment. If memory gets tight enough that short-lived programs get paged out, thrashing is almost inevitable.
You don't want to page out at all on a server, except (maybe) under transient overload. As soon as paging activity starts, it's time to throttle back the amount of server concurrency until paging stops. This requires coordination between OS and application of a kind not usually seen in the UNIX world, though mainframe transaction systems have had it for decades, all the way back to CICS.
Desktop systems have a different set of issues, but they don't look like classic time-sharing systems either. My main point here is that in the last decade, the memory usage behavior for most programs has changed considerably, but we're still using virtual memory concepts that were developed in the 1960 and mature by 1980.
And remember, even when everything works right, you get the effect of at best 2X the memory.
Here's a basic tutorial on VM, with emphasis on Linux.
With a few tweeks, an inferior techmology can be faster than a better written system. The fact that some select video games may run faster in Win98 than Linux does not make Win98 supperior. It could be either VM system that will work better in the future. I would hate for people to choose the (Only Currently) faster technology that may not be the best....
Why don't you just boot different kernels if you want to try stuff out?
Especially if you are just testing the VMs out, the other stuff shouldn't affect your tests much.
Because Linux and some other O/Ses don't handle out of memory situations well.
To me it's better to have a gradual degradation of performance as memory runs out, than to run straight into a brickwall, wheels spinning at 133MHz and have Linux kill the wrong process or deadlock.
Another thing about your proposal: if all of the app has to be in then how do you propose to handle process forking?
Cheerio,
Link.
/proc/sys/vm/overcommit_memory=0
certainly doesn't prevent the overcommit. If it worked, this issue wouldn't come up regularly.
Personally, I want to disable overcommit. Disk is relatively cheap and I have no problem with adding 1gig of swap.
And it wouldn't decrease the efficiency at all.
Good. I didn't realize it could send mail when these things are attempted. That makes it okay in my books then.
I just thought it was funny to always hear about this 'intrustion detection system' that didn't actually detect anything.